Friday, December 14, 2007

Semantic Ramblings

This past Tuesday I attended a talk about the Semantic Web and social networks by Sun's Henry Story. I've posted my impressions, mixed with some uninformed commentary about semweb in general, on my work blog.

Sunday, December 09, 2007

REST Beyond the Obvious

Now that the heat of the REST vs. SOA battle seems to be dissipating, we can try to shed some light into how the choice of a resource centric design affects overall enterprise architecture. We can start by following a thought experiment. Imagine a company growing so fast the HR staff is overwhelmed by the task of forwarding all the new job openings to recruiting agencies. Our job is to automate this problem away. Using REST.

The basics

As RESTful Web Services go, this one seems pretty simple. One resource per job opening, represented with a simple custom XML format; maybe a collection resource, listing all current openings; and possibly also a search resource, to look for openings with specific attributes. Something like this:

URI Template Method Representation
/job_opening/{id} GET Job Opening XML Schema
/job_openings GET Links to all openings as a custom schema or with XOXO.
/job_openings/find?{query_params} GET Custom schema or XOXO or atom feed, all w/ Opensearch elements.

So, how do we create the job opening resources? The obvious choice would be to apply the traditional RESTful collections pattern: a POST to the /job_openings collection with an entity body containing the xml representation of the opening triggers the creation of a new resource, whose URL would then be returned in the location header of the response.

Shaking things up

But there is an alternate model: let client departments simply publish job_opening resources by themselves, on their departmental web servers. There is a vast array of options to do such a thing, none of them requiring users to write XML, of course. IT could whip up a word processor macro to convert documents to our xml format, and save them to a web server. In the MsWorld, one could use Word's XML support for inserting elements from the job_opening namespace into a document and publish it to an WebDav enabled server, maybe using Sharepoint. In the OpenOffice universe, we could just as easily save a document as ODF, pass it trough an xml transformation to the job_opening format, and then publish it to an Web Server. The publishing itself can be made with a number of techniques, from FTPing to a shared directory to using an Atompub client with a compatible server to the aforementioned WebDav protocol. A completely different, and much more complex, option would be to have the department run an instance of a custom job openings application, maybe even something similar to the “obvious choice” outlined above, but I really don't see how it could be useful.

So far, all very cute and distributed, postmodern even, but does it work? I mean, the end goal is to send all the openings info to external recruiting companies, and just throwing the resources at the internal web doesn't quite accomplish that. The missing piece of the puzzle is a simple crawler; a small piece of software that scans the web by following links and looks for resources that accept being viewed as job_openings. A side effect of most publishing options mentioned in the past paragraph is that a link is created to the newly available document in some sort of listing. Our crawler needs only to know how to get at those listings and how to parse them looking for links. If you think parsing is complicated, I urge you to think about the following code snippet: /(src|href)="(.*?)"/

What If...

Since this is a thought experiment, we can get more, well, experimental. Let's see what could be gained from shunning XML altogether and using an HTML microformat for the job_opening resource representations. This would expand even more the publishing options. For instance, a plain Wordpress blog, maybe already used as a bulletin board of sorts by some department, could be repurposed as a recruiting server. Another benefit would be to have every document in the system in human-readable form, not XML “human-readable”, but really human-readable.

Now, suppose the company decided it just wasn't growing fast enough, went ahead and bought a smaller competitor. And of course, this competitor already had an in-house recruiting application. Being of a more conservative nature, which just might explain why they weren't market leaders, their IT department built it as a traditional Web front-ed / RDBMS backed application. How do we integrate that with our REST-to-the-bone job openings system? First we note that there is no need to have the data available in real-time, after all, no company needs to hire new people by the minute. Given that, the simplest solution would probably be a periodic process (someone said cron?) that extracts the data directly from the database, transforms it to our job_opening format and shoves it in some web server.

So what?

I can't say if this scenario were for real would I choose such a distributed approach. Maybe a quick Rails app running on a single server would better fit the bill. But that's not the point of our little exercise in architectural astronautics, we are here to think about the effects of REST's constraints to overall architecture. So, let's recap some of them:

  • The line between publishing content and using an application blurs. “Static” documents and active programs have equal footing as members of the system.

  • Thanks to the uniform interface constraint, the mere definition of a data format allows for unplanned integration.

  • If we add a standard media type to the mix, we can take advantage of established infrastructure, as in the case of a blogging platform reused as a “service endpoint” simply by adoption of HTML microformats.

  • REST in HTTP encourages pull-based architectures (think of our HR crawler), which aren't all that common outside of, well, HTTP web applications

  • The very idea of a service may fade away in a network of distributed connected resources. The closest thing to a service in our system is the crawler, but note that it does its work autonomously, no one ever actually calls the service.

  • Links (aka hypermidia) are a crucial enabler for loosely-coupled distributed services. In some sense, everything is a service registry.
  • One of the forgotten buzzwords of the 90's, intranet, may come to have a new meaning.

Sunday, December 02, 2007

Metaprogramming and conjuring spells

"A computational process is indeed much like a sorcerer's idea of a spirit. It cannot be seen or touched. It is not composed of matter at all. However, it is very real. It can perform intellectual work. It can answer questions. It can affect the world by disbursing money at a bank or by controlling a robot arm in a factory. The programs we use to conjure processes are like a sorcerer's spells. They are carefully composed from symbolic expressions in arcane and esoteric programming languages that prescribe the tasks we want our processes to perform."
- Abelson and Sussman — SICP

Among the plethora of metaphors that plague our field, I find computing as incantation of spells one of the least annoying. Some fellow with a long beard writes odd-looking prose that will later, through some magical process, turn lead into gold or help vanquish some inept demon. Man, that show sucked, I don't know why anyone would watch that shit, specially the reruns Tuesday to Saturday at 04AM and Monday to Friday at 12PM on channel 49. Well, anyway, what's interesting about the analogy is that it shows the dual nature of software: it is both a magical process and a bunch of scribbled lines. To mix metaphors a bit, we can say that software is, at the same time, a machine and the specification for that machine.

This is what makes metaprogramming possible and, also, what makes it unnecessary. By the way, when I write "metaprogramming", I'm specifically thinking about mutable meta-structures, changing the tires while the car is running metaprogramming, not mere introspection. Throwing away that mundane car analogy and getting back to our wizardry metaphor, metaprogramming would be like a spell that modifies itself while being casted. This begs the question, beyond fodder for bad TV Show scripts, what would this be useful for? My answer: for very little, because if the wizard programmer already has all the information needed for the metaprogram, then he might as well just program it... To make this a little less abstract, take a simple example of Ruby metaprogramming:
01  class DrunkenActress
02 attr_writer :blood_alcohol_level
03 end
05 shannen_doherty =
06 shannen_doherty.blood_alcohol_level = 0.13
By the way, I'm not a Rubyist, so please let me know if the example above is wrong. The only line with meta-stuff is line 2, where a method named "attr_writer" is being called with an argument of :blood_alcohol_level. This call will change the DrunkenActress class definition to add accessor methods for the blood_alcohol_level attribute. We can see that it worked in line 6, where we call the newly defined setter.

But the programmer obviously already knows that the DunkenActress class needs to define a blood_alcohol_level attribute, so we see that meta-stuff is only applied here to save a few keystrokes. And that is not a bad motivation in itself, more concise code often is easier to understand. Then again, there are other ways to eliminate this kind of boilerplate without recursing to runtime metaprogramming, such as macros or even built-in support for common idioms (in this case, properties support like C# or Scala).

There may be instances where the cleanest way to react to information available only in runtime is trough some metaprogramming facility, but I have yet to encounter them. Coming back to Rubyland, Active Record is often touted as a poster child for runtime metaprogramming, as it extracts metadata from the database to represent table columns as attributes in a class. But those attributes will be accessed by code that some programmer will write — and that means, again, that the information consumed by the metaprogram will need to be available in development time. And indeed it is, in the database. So ActiveRecord metaprogramming facilities are just means to delegate the definition of some static structure to an external store, with no real dynamicity involved. If it were not so, this kind of thing would be impossible. Also note that recent Rails projects probably use Migrations to specify schema info in yet another static format.

To summarize, runtime mutable metaprogramming is like that bad purple translucent special effect, it is flashy, but ultimately useless. Anyway, that's my current thinking in the matter, but I still need to read more on staging.

[EDIT: corrected a mistake relating to the code sample]