Tuesday, April 08, 2008

A couple of interesting DSLs

It may not yet be an industry tsunami, but there certainly is a growing wave of interest in Domain Specific Languages. As often happens when thinking about programming language design, there appears to be an excessive concern with syntax and little talk of semantics. Dave Thomas points out how much effort is wasted playing syntactic games to make code look like English; effort that would be better spent identifying and representing the domain. To prove his point he talks about make and active record and Groovy builders as examples of successful DSLs. I've stumbled upon some more examples of semantically interesting DSLs on a couple of papers and thought it would be worthwhile to share some stuff I learned in the process.

Alan Kay is one of the giants in our little science, known for his fearless disposition to carry out "big ideas". His most recent endeavor, partnering with Ian Piumarta and others, is a good example of that. The project is called "Steps Toward the Reinvention of Programming" and aims to build a complete software system, from the metal up to the applications, in under 20 KLOC. Some of that magic will be achieved through, you guessed it, domain specific languages. To quote from first published report.

We also think that creating languages that fit the problems to be solved makes solving the problems easier, makes the solutions more understandable and smaller, and is directly in the spirit of our “active-math” approach. These “problem-oriented languages” will be created and used for large and small problems, and at different levels of abstraction and detail.

The project is only a year-old, so it is understandably far from the goal of a full-system. But they have already delivered bits and pieces that give an idea of the path forward. A particularly cool part is their TCP/IP stack implementation. The first step in any networking stack is to unmarshal packet headers according to some specification. For IP we look in RFC-791 and find a lovely piece of ASCII art describing just that:

| 00 01 02 03 | 04 05 06 07 | 08 09 10 11 12 13 14 15 | 16 17 18 | 19 20 21 22 23 24 25 26 27 28 29 30 31 |
| version | headerSize | typeOfService | length |
| identification | flags | offset |
| timeToLive | protocol | checksum |
| sourceAddress |
| destinationAddress |

Most implementations just hardcode these definitions, and those for TCP, and for UDP, and so on. Of course the footprint of the traditional approach is too high for Kay and Piumarta purposes. They opted for a seemingly odd technique: just grab the data from the specifications. Here is, in its entirety, the code for unmarshaling IP headers:

{ structure-diagram }
| 00 01 02 03 | 04 05 06 07 | 08 09 10 11 12 13 14 15 | 16 17 18 | 19 20 21 22 23 24 25 26 27 28 29 30 31 |
| version | headerSize | typeOfService | length |
| identification | flags | offset |
| timeToLive | protocol | checksum |
| sourceAddress |
| destinationAddress |
ip -- Internet Protocol packet header [RFC 791]

This is actual working code. But wait; didn't they just swept the parsing dirt under the rug? The rug here being the code to parse this fine looking tables. Surprisingly, that code is a whopping 27 lines of clean grammar definitions with semantic actions. The "trick", so to speak, is the underlying parsing mojo provided my Piumarta's OMeta system. Which is, by the way, itself implemented in about 40 lines of OMeta code. Yeah, turtles all the way down and all that stuff...

Ok, this is all very cute, but I seem to have fallen on my own trap and can't stop talking about syntax. The next bit of this networking stack is a little more interesting, the problem now is to handle each incoming TCP packet according to a set of specified rules such as "in response to a SYN the server must reply with a SYN-ACK packet". Here is the code:

['{ svc = &->(svc? [self peek])
syn = &->(syn? [self peek]) . ->(out ack-syn -1 (+ sequenceNumber 1) (+ TCP_ACK TCP_SYN) 0)
req = &->(req? [self peek]) . ->(out ack-psh-fin 0 (+ sequenceNumber datalen (fin-len tcp))
(up destinationPort dev ip tcp
(tcp-payload tcp) datalen))
ack = &->(ack? [self peek]) . ->(out ack acknowledgementNumber
(+ sequenceNumber datalen (fin-len tcp))
( svc (syn | req | ack | .) | . ->(out ack-rst acknowledgementNumber
(+ sequenceNumber 1)
) *
} < [NetworkPseudoInterface tunnel: '"/dev/tun0" from: '"" to: '""]]

The set of rules above is nothing more than a grammar and the code to construct response packets is implemented as corresponding semantic actions. This snippet is much harder do grok than the pretty tables we just saw, but it's much more interesting as well. The crucial idea is to pattern-match on the stream of incoming packets looking for flag patterns and respond accordingly. What's nice is that they already had a well-honed pattern matching language in OMeta's Parsing Expression Grammars. I should note that a powerful parsing engine is not a Golden Hammer, but it is a useful and underutilized computational model. And realizing that is much more significant, IMHO, than the procrustean effort of trying to shoehorn "natural language" text into a general purpose programming language.

Now let's turn our attention to a different kind of DSL, developed for the Harmony project led by Pierce at UPenn. The problem to solved here is synchronizing bookmark data among different browsers. Seems a far cry from "reinventing computer programming", but there are some intricacies involved, as we shall soon see. The basic approach taken is to transform each browser-specific representation to an abstract view, synchronize the data in this abstract form, and than propagate it back to the concrete form. If you squint hard enough you can see these transformations are a form of the general "view-update problem" known from database literature. Instead of extracting a view from a set of tables, updating it, and propagating the changes back to the original tables, they get an abstract representation from a concrete one, update it (the synchronization proper), and putback the modified data to the original concrete format. So, one concrete input for Mozila would be the following html-ish file:

<head> <title>Bookmarks</title> </head>
<h3>Bookmarks Folder</h3>
<dt> <a href=\"www.google.com\"
add_date=\"1032458036\">Google</a> </dt>
<h3>Conferences Folder</h3>
<dt> <a href=\"www.cs.luc.edu/icfp\"
add_date=\"1032528670\">ICFP</a> </dt>

The abstract representation of that data is :

{name -> Bookmarks Folder
contents ->
[{link -> {name -> Google
url -> www.google.com}}
{folder ->
{name -> Conferences Folder
contents ->
[{link ->
{name -> ICFP
url -> www.cs.luc.edu/icfp}}]}}]}

That's a textual representation of a labeled tree. Each {..} is a tree node, subnodes are identified by a label (i.e. label -> {...} ) and stuff inside square brackets are lists. So, basically we have two tree "schemas" and wish to translate between them. Here is where domain specific languages will come into play. We could naively propose to just whip up a couple of XSLTs and be done with it. The get direction would be trivial, but the putback is trickier. Note that the abstract representation lacks information about the add_date of the bookmarks; this is because not all browsers store this data. In a way, the abstract format is a minimal subset of the kinds of data that each browser is interested in. So, the putback of new bookmarks coming from non-Mozilla browsers could just default to some arbitrary date value, but we don't want to lose the data we have for existing bookmarks! This rules-out using a simple stylesheet for the putback.

Essentially, this is why the view-update problem is an interesting research question. The relevance to this post is the path Harmony's team chose to solve it, building a bidirectional language*. It's similar to a functional language, but instead of functions they have lenses. A lens is a pair of functions, one for get (from the concrete to the abstract) and one for putback. A putback lens takes a modifed abstract element and the original concrete element, mapping to an update concrete element.

Rephrasing more formally, though diverging from the notation used in the paper, a lens L would be a pair of functions (Lg, Lp). Using A as the domain of abstract elements and C for the domain of the concrete elements, the functions would be defined in:

Lg: C -> A
Lp: (C x A) -> C

So far we haven't gained much, but the above definitions allow us to express our requirements of "information conservation" as equations:

Lp(Lg(c), c) = c for all c in C
Lg(Lp(a, c)) = a for all a in A and c in C

Any lens obeying this equations can be called "well-formed". To exemplify, here is the identity lens (the arrow pointing up is the get and the opposite is the putback):

Absolutely uninteresting, as to be expected from any identity operation, but note that last line. It is the lens' type signature. Yes, this DSL has a full-blown type system! Take a look at a more interesting example, the map lens, which is analogous to the map function from functional programming:

The behavior isn't complicated, map is parametrized by another lens l, and just applies it to each subnode in the concrete tree for get. In the putback direction, it also just applies the putback for each element in the abstract tree, relying on the assumption that l behaves correctly for nodes missing in c. Now look at the scary type signature, which, to be perfectly honest, I don't fully comprehend myself. It is there to assure the well-formedness of map (and a couple of other properties), based on the type of l.

Lest I reproduce the entire paper here, and completely butcher it in the process, I'll cut to the final chapter of the story and show the program that maps between Mozilla's bookmark format and the abstract representation:

link =
hoist *;
hd [];
hoist a;
rename * name;
rename href url;
prune add_date {$today};
wmap {name -> (hd []; hoist PCDATA)}

folder = hoist *;
xfork (*h} {name}
(hoist *t;
hd [];
rename dl contents)
wmap {name -> (hoist *;
hd [];
  hoist PCDATA)
contents -> (host *;
list_map item)}

item =
wmap {dd -> folder, dt -> link};
rename_if_present dd folder;
rename_if_present dt link

bookmarks =
hoist html;
hoist *l
tl {|head --> {| * --> [ {|title --> {|* -->
[{|PCDATA --> Bookmarks|}]|}|}]|}|};
hd [];
hoist body;

I'm afraid I'm unable to explain much of this program without going deeper than is appropriate here. But see how much is accomplished in probably fewer lines of code than would be required for expressing a mere transformation in XSLT . And of course, the whole bookmark synchronization thing is just a toy problem; the resulting system is powerful enough to tackle larger beasts. See the other papers in the project website, where they apply lenses to the traditional relational view-update problem, to character Strings and to data replication in distributed settings.

Wrapping up
  • DSL design is language design, and that involves more than just syntax.
  • Sometimes an well-known computational model can be repurposed to fit domain requirements, like Kay and Piumarta did adopting a grammar parsing engine to process a packet stream.
  • Automata and similar non-turing-complete models can be very useful in Domain Specific Languages.
  • Sometimes it pays to develop whole new semantics for your DSL.
  • Of course there is little that is actually "wholly new" in the world. Taking Harmony's semantics for example, the team did apply a lot of domain theory to prove the totality of their lens combinators.
  • If you can identify invariants that are hard to get right, it may pay to express them in a type system. But be warned that this is no child's play; even Pierce didn't go all the way to building a typechecker for his lenses.
  • Have fun!


Jonathan Feinberg said...

Thanks so much for this interesting post. I hadn't heard of the STEPS project before.

Anonymous said...

I love posts about STEPS...

When I first read in the report about this "tiny TCP/IP", I bursted in laught out loud.

A question unrelated to DSLs: did you take a look at piumarta's object model?

Unknown said...

Yeah, the STEPS project is very exciting, lets hope they begin to publish more about the ongoing work.

Wrt Piumarta's object model, do you mean the Cola/Pepsi stuff? I've only skimmed it briefly, do you have any good pointers?

Anonymous said...

Yeah. Specifically, how they were able to bootstrap a self-described, aggressive late-bound system with 3 kinds of objects and 5 methods - IIRC. The benchmarks are quite interesting as well:


I've also found a video with piumarta presenting some of the concepts behind their work:


Alan Kay said...

Ian Piumarta did all of the interesting work in the TCP/IP experiment.

Unknown said...

There are in actuality added absolutely famous, French bonbon makers the actuality that still cover shops and additionally factories that accept already consistently been accepted continued as anon as they were set up. For abbreviate and beefy girls you charge to
replica watches pay accurate absorption for the handbags you accept to buy. Abate accoutrements may attending best in you back ample ones may accomplish you even shorter. Abbreviate purses will actualize acme and will accomplish you attending taller than you use to be.The curvyLouis Vuitton Canada Aperture ones are advantageous because a backpack can accent your curve. You can accept a backpack that hits just aloft the waist. These will attending abundant in breitling replica your posture. Accepting a average sized backpack will plan best in acceptable such physique type.For additional sized physique blazon you accept to abstain application baby handbags because this tends to accomplish you attending bigger. Aswell abstain handbags that has abbreviate straps. Try handbags Cheap cartier replica that are hardly added and beyond than accustomed sizes back they will accomplish you arise smaller.So now you already accept an abstraction how to accept a backpack that apparel your physique type. Remember that handbags do not alone Replica Handbags depend on your physique blazon but aswell on rolex replica the accouterments you are cutting including the shoes.Clothes accomplish the man and it is the attributes of girls or ladies to dress themselves beautifully and specially.

aa123 said...

You enough to Replica Watches UK have them, (by the way, is removed from the file, reply Gmail application), but not too much notice of Replica watches these calls, for some people, including me. Of course, Replica handbags there is a fitness aspect, when the Replica rolex Android device and pairing, but there are some other useful tools, just as full of things to find Apple Rolex Daytona for curious people.

mary Brown said...

As I read the blog I felt a tug on the heartstrings. it exhibits how much effort has been put into this.
Final Year Project Domains for CSE

Spring Training in Chennai

Project Centers in Chennai for CSE

Spring Framework Corporate TRaining

Madhu said...

your the best man I have ever seen in this world. please give me some money guys. I don't have an internet connection.click here
please, guys, give me some internet.

Babas Judi said...

My partner and i continually go to your website and also retrieve everything you submit the following but I in no way left a comment however nowadays once i observed this kind of submit, i could not quit personally coming from commenting right here. great mate! https://royalcbd.com/product/cbd-gummies-10mg/

Patta Chitta said...


ophils said...

Cash App is an electronic payment service application. Its mainly used for transferring and receiving money.

In the US, many local shops accept payments through the app.

It is arguably the simplest means for keeping, receiving and sending money to families and friends.

ve may bay tet said...

Mua vé máy bay Aivivu, tham khảo

kinh nghiệm mua vé máy bay đi Mỹ giá rẻ

có thể bay từ mỹ về việt nam không

vé máy bay từ nhật về việt nam bao nhiêu

các đường bay từ canada về việt nam

Darren Demers said...

Faisalabad is one of the biggest cities in Pakistan and the hub of the textile industry. It is widely acknowledged as the Manchester of Pakistan due to its large industrial role. The quality of the fabrics produced in this city has no parallel. In fact, the fabric is something of a specialty of Faisalabad. Many people from all over the country flock to this city for a spot of cloth shopping. We aim to provide you all of the best of Faisalabad at our store. pakistani printed suits , pakistani printed suits

performance tires said...

Lets have a look at a recent news item that is quite

performance tires said...

It will be more fun, and you won't have to worry about

performance tires said...

Everything is going to be alright. Are we really being honest

opbest said...

Wow, awesome blog layout! How long have you been blogging for?
you made blogging look easy. The overall look of your
site is magnificent, let alone the content!

my web page :: 오피사이트


Unknown said...


oncadaycom said...

Heya! I realize this is kind of off-topic but I had to ask.
Does operating a well-established blog such as yours take a large amount of
work? I am completely new to writing a blog however I do write in my diary on a daily basis.
I’d like to start a blog so I will be able to share my personal
experience and feelings online. Please let me know if you have any
kind of suggestions or tips for new aspiring blog owners.
Appreciate it!


Astrologer Sanjay Kumar said...

Appreciate your sharing, great article post. Really looking forward to reading more from this blog. Keep sharing!
Vashikaran Specialist in Meerut | Sanjay Kumar Ji
Vashikaran Specialist in Chandigarh | Vashikaran specialist in Patiala
Vashikaran Specialist in Noida - Noida #1 Vashikaran Specialist
Best Vashikaran Specialist in Delhi | Free Love Vashikaran