Rafael rambling: functional programming

Showing posts with label functional programming. Show all posts

Tuesday, June 08, 2010

Where scala left me wanting

n the past few years, I've grown to enjoy more and more programming in a functional style. Even the java code I write nowadays is mostly free of mutable state. But before that, I spent a good chunk of time learning about object orientation and found some great ideas there. Ideas that can carry over to the functional world. One of them is Domain Driven Design, where we strive to have the symbols in the code respresent conncepts in the domain. Of course, I'm oversimplifying DDD, but the ideal of a bijection between the reality our software models and constructs in a programming language is one I believe we should maintain in many, perhaps most, applications.

A natural consequence is that our systems architecture tends to resemble an onion, with the domain model in the center and code to make it interact with the rest of the world surrounding it:

I've labeled the code that interacts with the rest of the world "Adapters", as it should do little more than adapt external representations into concepts the domain model can understand. This architecture is not in any way novel. In fact, it is a version of Alistair Cockburn Hexagonal Architecture. In the common case where the only interactions the domain model has with the rest of the world are through a Database on one end and an User Interface on the other, we've just described the 3-layer architecture that was so popular in the 90s. Some examples of adapters are — to increase our shot at the buzzword bingo — Web MVC Controllers, Repositories or DAOs, message endpoints, GUI listeners, etc., you get the picture.

Since Scala is a truly hybrid language, marrying quite elegantly OO and FP, it would appear to be the perfect vehicle to write this kind of software. I've found this premise to be correct, for the most part. As you might have guessed, there is an exception; (If there weren't, I wouldn't be writing this post, would I?). The problem is in the code for the adapters. As mentioned, they tend to be simple shims, translating some external representation into domain objects. Coding such translations every single time for every adapter instance in every application is very dull. We desperately need our old friend, lady abstraction, to help us out here. In the Java world, she lends us a hand though dynamic reflection, allowing the construction of objects and invocation of methods to be done generically. Unfortunately, Scala has no reflection API, so we have no alternative but to resort to Java reflection, in a sense reverse-engineering the Scala compilation process.

This isn't a major gripe, as Scala constructs tend to map to Java constructs in a straightforward fashion. Also, rumor has it that a Scala reflection API is being developed for 2.8.1. But that's only half the story. Powerful as it is, dynamic reflection alone is not enough to solve the Adapter problem once and for all. We often need to parameterize the translation in some way. For instance, when translating objects to database rows, we must discover the Column names corresponding to object properties. In many cases we can trust convention, for instance we could expect the columns to match exactly the names of the properties. But in other cases there is no option but to somehow configure the translation with an explicit mapping.

In the Java world this kind of thing used to be done with verbose and annoying XML configuration files. As the language evolved, annotations were introduced, and they are now the main way to configure such translations.

Annotations are a great improvement over external XML files, but they can't be the end of the story. There is the relatively minor issue that annotations pollute the domain model we strive so much to keep clean and organized, specially when the same domain objects will be active in many adapters. Beyond that, there is the larger problem that annotations are just metadata. Sometimes we want to parameterize our adapters in richer ways. Take, for instance, the proposal for a typesafe API for database queries to be added to the new version of the Java Persistence API. It requires a special pre-processor to generate a metamodel that can be used to parameterize a database adapter.

A better and more generic approach would be to have the language itself provide this metamodel: a kind of static reflection. I would love to see something like this in Scala. Some time ago I toyed with the idea of writing a compiler plugin to provide a static meta-model of Scala classes, but apparently compiler plugins have issues with non-transparent code generation.

Postscript. As with all matters regarding programming languages on the web, we must tread lightly. I am not ranting against Scala, in fact I rather enjoy it, as can be gleaned from some of the previous posts. I am only relating a very specific domain where I believe the language can be much improved.

Saturday, October 10, 2009

Type-safe printf in scala ‽

All the excitement surrounding Scala's next big release - 2.8.0 - reminds me of another big release a few years ago. Java 5 was poised to be a huge leg up for developers, long feature lists were all over the java blogosphere: generics, enums, enhanced for-loops, autoboxing, varargs, util.concurrent, static imports, and so on; in sum, a whole heap of goodness. But there were a few odd ducks hidden in the feature lists, most notably "an interpreter for printf-style format strings".

Anyone who has been around curly-brace-land long enough knows that printf (or java.util.Formatter) serves one main purpose: it's a poor substitute for built-in variable interpolation in the language. Unfortunatly Scala also lacks variable interpolation and there isn't much we can do about it: our BDFL has ruled. But, there is another use for printf, as a shortcut for applying special formatting for some value types. We can specify a currency format using the string "%.2f USD", or a short american date format with "%tD". We can even go wild and use the two together:

In Java:

String.format("%.2f USD at %tD", 133.7, Calendar.getInstance());

In Scala:

"%.2f USD at %tD" format (133.7, Calendar.getInstance)

This snippet is saying that 133.7 should be formatted as a decimal with two digits after the point, and Calendar.getInstance() - the horrific Javaism for "right now" - should be formatted as a date, not a time. What always trips me up is that the order of the values must exactly correspond to the order of the format specifiers. It's a simple task, but my tiny little brain keeps messing it up. Let's see if our good friend the Compiler can help.

Formatters
The first step is to have our logic leave it's String hideout and show itself. So, instead of "%.2f", we'll say F(2)^*, and instead of "%tD" we'll say D(T). D and F are now Formatters:

trait Formatter[E] {
 def formatElement(e: E):String
}
 
case class F(precision: Int) extends Formatter[Double] {
 def formatElement(number: Double) = ("%."+precision+"f") format number
}
 
import java.util.Calendar
abstract class DateOrTime
object T extends DateOrTime
object D extends DateOrTime

case class T(dateOrTime: DateOrTime) extends Formatter[Calendar] {
 def formatElement(calendar: Calendar) = dateOrTime match {
   case Time => "%tR" format calendar
  case Date => "%tD" format calendar
 }
}

Chains
Next we tackle the issue of how to chain formats together. The best bet here is to use a composite, very similar to scala.List ^***. We even have a :: method to get right-associative syntax. This is how it looks like:

val fmt = F(2) :: T(D) :: FNil

And this is the actual, quite unremarkable, code:

trait FChain {
  def :(formatter: Formatter) =
    new FCons(formatter, this)

  def format(elements: List[Any]):String
}

object FNil extends FChain {
  def format(elements: List[Any]):String = ""
}

case class FCons(formatter: Formatter, tail: FChain) extends FChain { 
  def format(elements: List[Any]):String = 
    formatter.formatElement(elements.head) + tail.format(elements.tail)
}

There is still a missing piece. Remember "%.2f USD at %tD"? We have no way of chaining the " USD at" to our formatters. This is what we want to be able to write:

val fmt = F(2) :: " USD at " :: T(D) :: FNil

The solution is simple, we overload :: in FChain:

trait FChain {
  ...
  def ::(constant: String) =
    new FConstant(constant, this)
  ...
}

and create a new type of format chain that appends the string constant:

case class FConstant(constant:String, tail: FChain) extends FChain { 
  def format(elements: List[Any]):String = 
    constant + tail.format(elements)
}

Cool, that works. But wait; what have we gained so far? The problem was to match the types of the formatters with the types of the values, and we aren't really using types at all. The remedy, of course, is to keep track of them.

Cue the Types
We want to check the types of the values passed to the FChain.format(), but this method currently takes a List[Any], a List of anything at all. We could try to parameterize it somehow and make it take a List[T], a list of some type T, instead. But, if we take a List[T], it means all values must be of the same type T, and that's not what we want. For instance, in our running example we want a list of a Double and a Calendar and nothing more.

So, List doesn't cut it. Fortunately, the great Jesper Nordenberg created an awesome library, metascala, that contains an awesomest class: HList. It is kind of like a regular List, with a set of similar operations. But it differs in an important way: HLists "remember" the types of all members of the list. That's what the H stands for, Heterogeneous. Jesper explains how it works here.

We'll change FChain to remember the required type of the elements in a type member, and to require this type in the format() method:

trait FChain {
  type Elements <: HList
  ...
  def format(elements: Elements):String
}

FNil is pretty trivial, it can only handle empty HLists (HNil):

object FNil extends FChain {
   type Elements = HNil

   def format(elements: Elements):String = "" 
}

FCons is somewhat more complicated, it is parameterized on the type of the head element, and on the type of the rest of the chain:

case class FCons[E, TL <: HList](formatter: Formatter[E], tail: FChain { type Elements = TL }) extends FChain { 
  type Elements = HCons[E, TL]

  def format(elements: Elements):String = 
    formatter.formatElement(elements.head) + tail.format(elements.tail)
}

We also had to tighten-up the types of the constructor parameters: formatter is now Formatter[E] ^** — so it can format elements of type E, and tail is now FChain{type Elements=TL} — so it can format the rest of the values. The Elements member is where we build up our list of types. It is an HCons: a cons pair of a head type - E - and another HList type - TL. We changed how to FCons constructor parameters, so we also need to change the point where we instantiate it, in FChain:

trait FChain {
  type Elements <: HList
   ...
  def ::[E](formatter: Formatter[E]) =
    new FCons[E, Elements](formatter, this)
  ...
}

Just passing along the type of the formatter and of the elements so far to FCons. FConstant has to be changed in an analogous way. This is it, now format() only accepts a list whose values are of the right type. Check out an interpreter session:

scala> (F(2) :: " USD at " :: T(D) :: FNil) format (133.7 :: Calendar.getInstance :: HNil)
res5: java.lang.String = 133.70 USD at 10/10/09

scala> (F(2) :: " USD at " :: T(D) :: FNil) format (Calendar.getInstance :: 133.7 :: HNil)
:25: error: no type parameters for method ::: (v: T)metascala.HLists.HCons[T,metascala.HLists.HCons[Double,metascala.HLists.HNil]] exist so that it can be applied to arguments (java.util.Calendar)
 --- because ---
no unique instantiation of type variable T could be found
       (F(2) :: " USD at " :: T(D) :: FNil) format (Calendar.getInstance :: 133.7 :: HNil)

Some random remarks

The interpreter session above nicely showcases the Achilles heel of most techniques for compile-time invariant verification: the error messages are basically impenetrable.
A related issue with this kind of metaprogramming is that it's just plain hard. The code in this post looks pretty simple (compared with, say, JimMcBeath's beautiful builders), but it took me days of fiddling around with metascala to find an adequate implementation.
Take the above two points together and it's clear that we are talking about a niche technique. Powerful, but not for everyday coding.
Since I've mentioned the word coding, the FChain structure showed here looks like a partial encoding of a stack-automaton. Stack automata can represent context-free grammars, if I remember my college classes correctly. That said, I don't see any particular use for this tidbit of information.
Since this is random remarks section, I'll randomly remark that we can implement the whole thing without type members and refinements. Just good old type parameters in action.
Since it is possible to use nothing but type parameters, and Java has type parameters, would it be possible to implement our type-safe Printf in Java? Quite possibly, a good way to start would be with Rúnar Óli's HLists in Java. Just take care not to get cut wading through all those pointy brackets.
A powerful type system without type inference is useless. Quoting Benjamin Pierce: "The more interesting your types get, the less fun it is to write them down".

The whole code:

import metascala._
import HLists._

object Printf {
 trait FChain {
  type Elements <: HList

  def ::(constant: String) =
   new FConstant[Elements](constant, this)
  
  def ::[E](formatter: Formatter[E]) =
   new FCons[E, Elements](formatter, this)

  def format(elements: Elements):String
 }

 case class FConstant[ES <: HList](constant:String, tail: FChain { type Elements = ES }) extends FChain { 
  type Elements = ES

  def format(elements: Elements):String = 
   constant + tail.format(elements)
 
 }
 
 object FNil extends FChain {
  type Elements = HNil
  def format(elements: Elements):String = ""
 }

 case class FCons[E, TL <: HList](formatter: Formatter[E], tail: FChain { type Elements = TL }) extends FChain { 
  type Elements = HCons[E, TL]

  def format(elements: Elements):String = 
   formatter.formatElement(elements.head) + tail.format(elements.tail)
 
 }
 
  trait Formatter[E] {
   def formatElement(e: E):String
  }
   
  case class F(precision: Int) extends Formatter[Double] {
   def formatElement(number: Double) = ("%."+precision+"f") format number
  }
   
  import java.util.Calendar
  abstract class DateOrTime
  object T extends DateOrTime
  object D extends DateOrTime
  
  case class T(dateOrTime: DateOrTime) extends Formatter[Calendar] {
   def formatElement(calendar: Calendar) = dateOrTime match {
     case T => "%tR" format calendar
    case D => "%tD" format calendar
   }
  }
 }

* Or F(precision=2) thanks to Scala 2.8 awesome named parameter support.

** In fact, the previous untyped version didn't even compile, as Formatter has always been generic. Sorry for misleading you guys.

*** If you are unfamiliar with how Scala lists are built, check out this article.

Thursday, July 17, 2008

Comments on Comments on the Previous post

Henry Ware suggested a modification to the builder with abstract members removing a lot of the boilerplate. Incidentally, this is a nice illustration of how nested types can be put to a good use in Scala.
Justin ported the code to Haskell, which was very cool.
A couple of commenters suggested that languages with support for default parameter values (like Python and Groovy) don't need elaborate constructs such as the builder pattern. There are two ways to respond. One is to remind that the intent of the pattern, specially as originally described in the GoF book, has little to do with optional data. The other is to acknowledge that I probably put too much emphasis on this issue and forgot to mention a very common idiom for building objects in Scala: just declare mandatory "parameters" as abstract vals and optional ones as concrete vals with default values, like so:
```
abstract class OrderOfScotch {
  val brand:String
  val mode:Preparation
  val isDouble:Boolean 
  val glass:Option[Glass] = None
}
```
And to instantiate:
```
val myDose = new OrderOfScotch {val brand = "Bobby Runner"; val mode = OnTheRocks; val isDouble = false}
```
I guess that's it. Thanks y'all.

Wednesday, July 09, 2008

Type-safe Builder Pattern in Scala

The Builder Pattern is an increasingly popular idiom for object creation. Traditionally, one of it's shortcomings in relation to simple constructors is that clients can try to build incomplete objects, by omitting mandatory parameters, and that error will only show up in runtime. I'll show how to make this verification statically in Scala.

So, let's say you want to order a shot of scotch. You'll need to ask for a few things: the brand of the whiskey, how it should be prepared (neat, on the rocks or with water) and if you want it doubled. Unless, of course, you are a pretentious snob, in that case you'll probably also ask for a specific kind of glass, brand and temperature of the water and who knows what else. Limiting the snobbery to the kind of glass, here is one way to represent the order in scala.

sealed abstract class Preparation  /* This is one way of coding enum-like things in scala */
case object Neat extends Preparation
case object OnTheRocks extends Preparation
case object WithWater extends Preparation

sealed abstract class Glass
case object Short extends Glass
case object Tall extends Glass
case object Tulip extends Glass

case class OrderOfScotch(val brand:String, val mode:Preparation, val isDouble:Boolean, val glass:Option[Glass])

A client can instantiate their orders like this:

val normal = new OrderOfScotch("Bobby Runner", OnTheRocks, false, None)
val snooty = new OrderOfScotch("Glenfoobar", WithWater, false, Option(Tulip));

Note that if the client doesn't want to specify the glass he can pass None as an argument, since the parameter was declared as Option[Glass]. This isn't so bad, but it can get annoying to remember the position of each argument, specially if many are optional. There are two traditional ways to circumvent this problem — define telescoping constructors or set the values post-instantiation with accessors — but both idioms have their shortcomings. Recently, in Java circles, it has become popular to use a variant of the GoF Builder pattern. So popular that it is Item 2 in the second edition of Joshua Bloch's Effective Java. A Java-ish implementation in Scala would be something like this:

class ScotchBuilder {
  private var theBrand:Option[String] = None
  private var theMode:Option[Preparation] = None
  private var theDoubleStatus:Option[Boolean] = None
  private var theGlass:Option[Glass] = None

  def withBrand(b:Brand) = {theBrand = Some(b); this} /* returning this to enable method chaining. */
  def withMode(p:Preparation) = {theMode = Some(p); this}
  def isDouble(b:Boolean) = {theDoubleStatus = some(b); this}
  def withGlass(g:Glass) = {theGlass = Some(g); this}

  def build() = new OrderOfScotch(theBrand.get, theMode.get, theDoubleStatus.get, theGlass);
}

This is almost self-explanatory, the only caveat is that verifying the presence of non-optional parameters (everything but the glass) is done by the Option.get method. If a field is still None, an exception will be thrown. Keep this in mind, we'll come back to it later.

The var keyword prefixing the fields means that they are mutable references. Indeed, we mutate them in each of the building methods. We can make it more functional in the traditional way:

object BuilderPattern {
  class ScotchBuilder(theBrand:Option[String], theMode:Option[Preparation], theDoubleStatus:Option[Boolean], theGlass:Option[Glass]) {
    def withBrand(b:String) = new ScotchBuilder(Some(b), theMode, theDoubleStatus, theGlass)
    def withMode(p:Preparation) = new ScotchBuilder(theBrand, Some(p), theDoubleStatus, theGlass)
    def isDouble(b:Boolean) = new ScotchBuilder(theBrand, theMode, Some(b), theGlass)
    def withGlass(g:Glass) = new ScotchBuilder(theBrand, theMode, theDoubleStatus, Some(g))

    def build() = new OrderOfScotch(theBrand.get, theMode.get, theDoubleStatus.get, theGlass);
  }

  def builder = new ScotchBuilder(None, None, None, None)
}

The scotch builder is now enclosed in an object, this is standard practice in Scala to isolate modules. In this enclosing object we also find a factory method for the builder, which should be called like so:

import BuilderPattern._

val order =  builder withBrand("Takes") isDouble(true) withGlass(Tall)  withMode(OnTheRocks) build()

Looking back at the ScotchBuilder class and it's implementation, it might seem that we just moved the huge constructor mess from one place (clients) to another (the builder). And yes, that is exactly what we did. I guess that is the very definition of encapsulation, sweeping the dirt under the rug and keeping the rug well hidden. On the other hand, we haven't gained all the much from this "functionalization" of our builder; the main failure mode is still present. That is, having clients forget to set mandatory information, which is a particular concern since we obviously can't fully trust the sobriety of said clients*. Ideally the type system would prevent this problem, refusing to typecheck a call to build() when any of the non-optional fields aren't set. That's what we are going to do now.

One technique, which is very common in Java fluent interfaces, would be to write an interface for each intermediate state containing only applicable methods. So we would begin with an interface VoidBuilder having all our withFoo() methods but no build() method, and a call to, say, withMode() would return another interface (maybe BuilderWithMode), and so on, until we call the last withBar() for a mandatory Bar, which would return an interface that finally has the build() method. This technique works, but it requires a metric buttload of code — for n mandatory fields 2ⁿ interfaces should be created. This could be automated via code generation, but there is no need for such heroic efforts, we can make the typesystem work in our favor by applying some generics magic. First, we define two abstract classes:

abstract class TRUE
abstract class FALSE

Then, for each mandatory field, we add to our builder a generic parameter:

class ScotchBuilder[HB, HM, HD](val theBrand:Option[String], val theMode:Option[Preparation], val theDoubleStatus:Option[Boolean], val theGlass:Option[Glass]) {

  /* ... body of the scotch builder .... */

}

Next, have each withFoo method pass ScotchBuilder's type parameters as type arguments to the builders they return. But, and here is where the magic happens, there is a twist on the methods for mandatory parameters: they should, for their respective generic parameters, pass instead TRUE:

class ScotchBuilder[HB, HM, HD](val theBrand:Option[String], val theMode:Option[Preparation], val theDoubleStatus:Option[Boolean], val theGlass:Option[Glass]) {
  def withBrand(b:String) = 
      new ScotchBuilder[TRUE, HM, HD](Some(b), theMode, theDoubleStatus, theGlass)

  def withMode(p:Preparation) = 
    new ScotchBuilder[HB, TRUE, HD](theBrand, Some(p), theDoubleStatus, theGlass)

  def isDouble(b:Boolean) = 
    new ScotchBuilder[HB, HM, TRUE](theBrand, theMode, Some(b), theGlass)

  def withGlass(g:Glass) = 
    new ScotchBuilder[HB, HM, HD](theBrand, theMode, theDoubleStatus, Some(g))
}

The second part of the magic act is to apply the world famous pimp-my-library idiom and move the build() method to an implicitly created class, which will be anonymous for the sake of simplicity:

implicit def enableBuild(builder:ScotchBuilder[TRUE, TRUE, TRUE]) = new {
  def build() = 
    new OrderOfScotch(builder.theBrand.get, builder.theMode.get, builder.theDoubleStatus.get, builder.theGlass);
}

Note the type of the parameter for this implicit method: ScotchBuilder[TRUE, TRUE, TRUE]. This is the point where we "declare" that we can only build an object if all the mandatory parameters are specified. And it really works:

scala> builder withBrand("hi") isDouble(false) withGlass(Tall) withMode(Neat) build()
res5: BuilderPattern.OrderOfScotch = OrderOfScotch(hi,Neat,false,Some(Tall))

scala> builder withBrand("hi") isDouble(false) withGlass(Tall)  build()                
<console>:9: error: value build is not a member of BuilderPattern.ScotchBuilder[BuilderPattern.TRUE,BuilderPattern.FALSE,BuilderPattern.TRUE]
       builder withBrand("hi") isDouble(false) withGlass(Tall)  build()

So, we achieved our goal (see the full listing below). If you are worried about the enormous parameter lists inside the builder, I've posted here an alternative implementation with abstract members instead. It is more verbose, but also cleaner.

Now, remember those abstract classes TRUE and FALSE? We never did subclass or instantiate them at any point. If I'm not mistaken, this is an idiom named Phantom Types, commonly used in the ML family of programming languages. Even though this application of phantom types is fairly trivial, we can glimpse at the power of the mechanism. We have, in fact, codified all 2ⁿ states (one for each combination of mandatory fields) as types. ScotchBuilder's subtyping relation forms a lattice structure and the enableBuild() implicit method requires the supremum of the poset (namely, ScotchBuilder[TRUE, TRUE, TRUE]). If the domain requires, we could specify any other point in the lattice — say we can doll-out a dose of any cheap whiskey if the brand is not given, this point is represented by ScotchBuilder[_, TRUE, TRUE]. And we can even escape the lattice structure by using Scala inheritance. Of course, I didn't invent any of this; the idea came to me in this article by Matthew Fluet and Riccardo Pucella, where they use phantom types to encode subtyping in a language that lacks it.

object BuilderPattern {
  sealed abstract class Preparation
  case object Neat extends Preparation
  case object OnTheRocks extends Preparation
  case object WithWater extends Preparation

  sealed abstract class Glass
  case object Short extends Glass
  case object Tall extends Glass
  case object Tulip extends Glass

  case class OrderOfScotch(val brand:String, val mode:Preparation, val isDouble:Boolean, val glass:Option[Glass])

  abstract class TRUE
  abstract class FALSE

  class ScotchBuilder
  [HB, HM, HD]
  (val theBrand:Option[String], val theMode:Option[Preparation], val theDoubleStatus:Option[Boolean], val theGlass:Option[Glass]) {
    def withBrand(b:String) = 
      new ScotchBuilder[TRUE, HM, HD](Some(b), theMode, theDoubleStatus, theGlass)

    def withMode(p:Preparation) = 
      new ScotchBuilder[HB, TRUE, HD](theBrand, Some(p), theDoubleStatus, theGlass)

    def isDouble(b:Boolean) = 
      new ScotchBuilder[HB, HM, TRUE](theBrand, theMode, Some(b), theGlass)

    def withGlass(g:Glass) = new ScotchBuilder[HB, HM, HD](theBrand, theMode, theDoubleStatus, Some(g))
  }

  implicit def enableBuild(builder:ScotchBuilder[TRUE, TRUE, TRUE]) = new {
    def build() = 
      new OrderOfScotch(builder.theBrand.get, builder.theMode.get, builder.theDoubleStatus.get, builder.theGlass);
  }

  def builder = new ScotchBuilder[FALSE, FALSE, FALSE](None, None, None, None)
}

* Did you hear that noise? It's the sound of my metaphor shattering into a million pieces

EDIT 2008-07-09 at 19h00min: Added introductory paragraph.

Tuesday, April 08, 2008

A couple of interesting DSLs

It may not yet be an industry tsunami, but there certainly is a growing wave of interest in Domain Specific Languages. As often happens when thinking about programming language design, there appears to be an excessive concern with syntax and little talk of semantics. Dave Thomas points out how much effort is wasted playing syntactic games to make code look like English; effort that would be better spent identifying and representing the domain. To prove his point he talks about make and active record and Groovy builders as examples of successful DSLs. I've stumbled upon some more examples of semantically interesting DSLs on a couple of papers and thought it would be worthwhile to share some stuff I learned in the process.

Kay
Alan Kay is one of the giants in our little science, known for his fearless disposition to carry out "big ideas". His most recent endeavor, partnering with Ian Piumarta and others, is a good example of that. The project is called "Steps Toward the Reinvention of Programming" and aims to build a complete software system, from the metal up to the applications, in under 20 KLOC. Some of that magic will be achieved through, you guessed it, domain specific languages. To quote from first published report.

We also think that creating languages that fit the problems to be solved makes solving the problems easier, makes the solutions more understandable and smaller, and is directly in the spirit of our “active-math” approach. These “problem-oriented languages” will be created and used for large and small problems, and at different levels of abstraction and detail.

The project is only a year-old, so it is understandably far from the goal of a full-system. But they have already delivered bits and pieces that give an idea of the path forward. A particularly cool part is their TCP/IP stack implementation. The first step in any networking stack is to unmarshal packet headers according to some specification. For IP we look in RFC-791 and find a lovely piece of ASCII art describing just that:


+-------------+-------------+-------------------------+----------+----------------------------------------+
| 00 01 02 03 | 04 05 06 07 | 08 09 10 11 12 13 14 15 | 16 17 18 | 19 20 21 22 23 24 25 26 27 28 29 30 31 |
+-------------+-------------+-------------------------+----------+----------------------------------------+
|   version   | headerSize |       typeOfService      |                     length                        |
+-------------+-------------+-------------------------+----------+----------------------------------------+
|                     identification                  | flags    |                  offset                |
+---------------------------+-------------------------+----------+----------------------------------------+
|       timeToLive          |         protocol        |                    checksum                       |
+---------------------------+-------------------------+---------------------------------------------------+
|                                               sourceAddress                                             |
+---------------------------------------------------------------------------------------------------------+
|                                             destinationAddress                                          |
+---------------------------------------------------------------------------------------------------------+

Most implementations just hardcode these definitions, and those for TCP, and for UDP, and so on. Of course the footprint of the traditional approach is too high for Kay and Piumarta purposes. They opted for a seemingly odd technique: just grab the data from the specifications. Here is, in its entirety, the code for unmarshaling IP headers:


                                           { structure-diagram }
+-------------+-------------+-------------------------+----------+----------------------------------------+
| 00 01 02 03 | 04 05 06 07 | 08 09 10 11 12 13 14 15 | 16 17 18 | 19 20 21 22 23 24 25 26 27 28 29 30 31 |
+-------------+-------------+-------------------------+----------+----------------------------------------+
|   version   | headerSize |       typeOfService      |                     length                        |
+-------------+-------------+-------------------------+----------+----------------------------------------+
|                     identification                  | flags    |                  offset                |
+---------------------------+-------------------------+----------+----------------------------------------+
|       timeToLive          |         protocol        |                    checksum                       |
+---------------------------+-------------------------+---------------------------------------------------+
|                                               sourceAddress                                             |
+---------------------------------------------------------------------------------------------------------+
|                                             destinationAddress                                          |
+---------------------------------------------------------------------------------------------------------+
                           ip -- Internet Protocol packet header [RFC 791]

This is actual working code. But wait; didn't they just swept the parsing dirt under the rug? The rug here being the code to parse this fine looking tables. Surprisingly, that code is a whopping 27 lines of clean grammar definitions with semantic actions. The "trick", so to speak, is the underlying parsing mojo provided my Piumarta's OMeta system. Which is, by the way, itself implemented in about 40 lines of OMeta code. Yeah, turtles all the way down and all that stuff...

Ok, this is all very cute, but I seem to have fallen on my own trap and can't stop talking about syntax. The next bit of this networking stack is a little more interesting, the problem now is to handle each incoming TCP packet according to a set of specified rules such as "in response to a SYN the server must reply with a SYN-ACK packet". Here is the code:


['{ svc     = &->(svc? [self peek])
    syn     = &->(syn? [self peek]) .   ->(out ack-syn    -1 (+ sequenceNumber 1) (+ TCP_ACK TCP_SYN) 0)
    req     = &->(req? [self peek]) .   ->(out ack-psh-fin 0 (+ sequenceNumber datalen (fin-len tcp))
                                                             (+ TCP_ACK TCP_PSH TCP_FIN)
                                                             (up destinationPort dev ip tcp
                                                                 (tcp-payload tcp) datalen))
    ack     = &->(ack? [self peek]) .   ->(out ack           acknowledgementNumber
                                                             (+ sequenceNumber datalen (fin-len tcp))
                                                             TCP_ACK 0)
            ;
    ( svc (syn | req | ack | .) | .     ->(out ack-rst       acknowledgementNumber
                                                             (+ sequenceNumber 1)
                                                             (+ TCP_ACK TCP_RST) 0)
    ) *
  } < [NetworkPseudoInterface tunnel: '"/dev/tun0" from: '"10.0.0.1" to: '"10.0.0.2"]]

The set of rules above is nothing more than a grammar and the code to construct response packets is implemented as corresponding semantic actions. This snippet is much harder do grok than the pretty tables we just saw, but it's much more interesting as well. The crucial idea is to pattern-match on the stream of incoming packets looking for flag patterns and respond accordingly. What's nice is that they already had a well-honed pattern matching language in OMeta's Parsing Expression Grammars. I should note that a powerful parsing engine is not a Golden Hammer, but it is a useful and underutilized computational model. And realizing that is much more significant, IMHO, than the procrustean effort of trying to shoehorn "natural language" text into a general purpose programming language.

Pierce
Now let's turn our attention to a different kind of DSL, developed for the Harmony project led by Pierce at UPenn. The problem to solved here is synchronizing bookmark data among different browsers. Seems a far cry from "reinventing computer programming", but there are some intricacies involved, as we shall soon see. The basic approach taken is to transform each browser-specific representation to an abstract view, synchronize the data in this abstract form, and than propagate it back to the concrete form. If you squint hard enough you can see these transformations are a form of the general "view-update problem" known from database literature. Instead of extracting a view from a set of tables, updating it, and propagating the changes back to the original tables, they get an abstract representation from a concrete one, update it (the synchronization proper), and putback the modified data to the original concrete format. So, one concrete input for Mozila would be the following html-ish file:


<html>
 <head> <title>Bookmarks</title> </head>
 <body>
  <h3>Bookmarks Folder</h3>
  <dl>
   <dt> <a href=\"www.google.com\"
           add_date=\"1032458036\">Google</a> </dt>
   <dd>
    <h3>Conferences Folder</h3>
    <dl>
     <dt> <a href=\"www.cs.luc.edu/icfp\"
             add_date=\"1032528670\">ICFP</a> </dt>
    </dl>
   </dd>
  </dl>
 </body>
</html>

The abstract representation of that data is :


{name -> Bookmarks Folder
 contents ->
  [{link -> {name -> Google
              url -> www.google.com}}
   {folder ->
     {name -> Conferences Folder
      contents ->
       [{link ->
         {name -> ICFP
          url -> www.cs.luc.edu/icfp}}]}}]}

That's a textual representation of a labeled tree. Each {..} is a tree node, subnodes are identified by a label (i.e. label -> {...} ) and stuff inside square brackets are lists. So, basically we have two tree "schemas" and wish to translate between them. Here is where domain specific languages will come into play. We could naively propose to just whip up a couple of XSLTs and be done with it. The get direction would be trivial, but the putback is trickier. Note that the abstract representation lacks information about the add_date of the bookmarks; this is because not all browsers store this data. In a way, the abstract format is a minimal subset of the kinds of data that each browser is interested in. So, the putback of new bookmarks coming from non-Mozilla browsers could just default to some arbitrary date value, but we don't want to lose the data we have for existing bookmarks! This rules-out using a simple stylesheet for the putback.

Essentially, this is why the view-update problem is an interesting research question. The relevance to this post is the path Harmony's team chose to solve it, building a bidirectional language*. It's similar to a functional language, but instead of functions they have lenses. A lens is a pair of functions, one for get (from the concrete to the abstract) and one for putback. A putback lens takes a modifed abstract element and the original concrete element, mapping to an update concrete element.

Rephrasing more formally, though diverging from the notation used in the paper, a lens L would be a pair of functions (Lg, Lp). Using A as the domain of abstract elements and C for the domain of the concrete elements, the functions would be defined in:

Lg: C -> A
Lp: (C x A) -> C

So far we haven't gained much, but the above definitions allow us to express our requirements of "information conservation" as equations:

Lp(Lg(c), c) = c for all c in C
Lg(Lp(a, c)) = a for all a in A and c in C

Any lens obeying this equations can be called "well-formed". To exemplify, here is the identity lens (the arrow pointing up is the get and the opposite is the putback):

Absolutely uninteresting, as to be expected from any identity operation, but note that last line. It is the lens' type signature. Yes, this DSL has a full-blown type system! Take a look at a more interesting example, the map lens, which is analogous to the map function from functional programming:

The behavior isn't complicated, map is parametrized by another lens l, and just applies it to each subnode in the concrete tree for get. In the putback direction, it also just applies the putback for each element in the abstract tree, relying on the assumption that l behaves correctly for nodes missing in c. Now look at the scary type signature, which, to be perfectly honest, I don't fully comprehend myself. It is there to assure the well-formedness of map (and a couple of other properties), based on the type of l.

Lest I reproduce the entire paper here, and completely butcher it in the process, I'll cut to the final chapter of the story and show the program that maps between Mozilla's bookmark format and the abstract representation:


link =
  hoist *;
  hd [];
  hoist a;
  rename * name;
  rename href url;
  prune add_date {$today};
  wmap {name -> (hd []; hoist PCDATA)}

folder = hoist *;
  xfork (*h} {name}
    (hoist *t;
     hd [];
     rename dl contents)
  wmap {name -> (hoist *;
                               hd [];
                               hoist PCDATA)
               contents -> (host *;
                                    list_map item)}

item =
  wmap {dd -> folder, dt -> link};
  rename_if_present dd folder;
  rename_if_present dt link

bookmarks =
  hoist html;
  hoist *l
  tl {|head --> {| * --> [ {|title --> {|* -->
          [{|PCDATA --> Bookmarks|}]|}|}]|}|};
  hd [];
  hoist body;
  folder

I'm afraid I'm unable to explain much of this program without going deeper than is appropriate here. But see how much is accomplished in probably fewer lines of code than would be required for expressing a mere transformation in XSLT . And of course, the whole bookmark synchronization thing is just a toy problem; the resulting system is powerful enough to tackle larger beasts. See the other papers in the project website, where they apply lenses to the traditional relational view-update problem, to character Strings and to data replication in distributed settings.

Wrapping up

DSL design is language design, and that involves more than just syntax.
Sometimes an well-known computational model can be repurposed to fit domain requirements, like Kay and Piumarta did adopting a grammar parsing engine to process a packet stream.
Automata and similar non-turing-complete models can be very useful in Domain Specific Languages.
Sometimes it pays to develop whole new semantics for your DSL.
Of course there is little that is actually "wholly new" in the world. Taking Harmony's semantics for example, the team did apply a lot of domain theory to prove the totality of their lens combinators.
If you can identify invariants that are hard to get right, it may pay to express them in a type system. But be warned that this is no child's play; even Pierce didn't go all the way to building a typechecker for his lenses.
Have fun!

Wednesday, August 30, 2006

A taste of Scala

I've recently been trying to learn Scala, a programming language developed at a Swiss university. It has many (many!) cool features, such as seamless interoperability with Java - a result of being compiled to JVM bytecodes -, strong support for functional programming, sophisticated object oriented characteristics and a strong static type system. Rather than continue listing the language's capabilities I will, instead, share a personal use case.

OK, so one of the first items things I looked at was the unit testing framework, SUnit. It comes bundled in the standard library in the scala.testing.SUnit package and is dead simple. I hope the following snippet is self-explanatory enough:

object StackTest {
  import scala.testing.SUnit._

  def main(args:Array[String]): Unit = {
    val tr = new TestResult
    new TestSuite(
    new Test01,
      new Test02,
  ).run(tr)

    for(val f <- tr.failures())
    Console println f
  }

  class Test01 extends TestCase("pushing an element onto an empty stack") {
  override def runTest() = {
      val stack = new Stack()
      val element = "asdf"
      stack push element
      assertEquals(stack.peek(), element)
    }
  }

  class Test02 extends TestCase("popping an element from a stack") {
  override def runTest() = {
      val stack = new Stack()
      val element = "asdf"
      stack push element
      stack.pop()
      assertEquals(stack.isEmpty, true)
  }
  }
}

It really is pretty simple; the whole testing framework sits on a single 200-line file. But is also is a bit verbose, isn't it? All those inner classes, cluttering the code... I tried to simplify things a bit. Here is what I came up with:

import sorg.testing._;

object StackTests extends Tests with ConsoleDriver {
  test ("pushing an element onto an empty stack") {
    val stack = new Stack()
    val element = "asdf"
   stack push element
    assertEquals(stack.peek(), element)
  }

  test ("popping an element from a stack") {
  val stack = new Stack()
    val element = "asdf"
    stack push element
   stack.pop()
    assertEquals(stack.isEmpty, true)
  }
}

I think it looks better. Sort of like those DSLs that are so fashionable these days... But the really cool thing is that it only took a couple dozen lines of code and a couple of hours to extend SUnit. Mind you that someone really proficient in Scala could probably do it much more quickly. See the whole unit testing domain specific language: (the name is almost larger than the code itself :)

package sorg.testing;
import scala.testing.SUnit._;

abstract class Tests extends Test with Assert {
  type TestExp = () => Unit;
  var tests = List[Pair[String,TestExp]]();

  def test(desc: String)(t: => Unit) : Unit = {tests = Pair(desc,()=>t) :: tests};

  override def run(tr: TestResult) = {
    for (val Pair(desc, expression) <- tests) new TestCase(desc) {
       override def runTest() = {Console println "running (" + desc + ")"; expression()}
    }.run(tr)
  }
}

trait ConsoleDriver extends Test {
 def main(args:Array[String]): Unit = {
   val results = new TestResult
   Console println "running tests..."

   this.run( results )

    if (!results.failures.hasNext)
     Console println "Success!";
    else {
     Console println "The following tests failed:";
     for(val each:TestFailure <- results.failures)
       Console println (each.toString + ":\n" + each.trace);
   }
 }
}

Expressiveness and power, what more can one ask of a programming language?

Monday, July 10, 2006

add1

Faltou um.

Little Schemer é um bom livro?

É sim.

Para que?

Para aprender programação funcional.

Como ele ensina programação funcional?

O leitor vê uma pergunta, pensa um pouco, compara com a resposta do livro e depois faz a mesma coisa para a próxima pergunta.

Depois de terminar o livro, dá para sair programando em Lisp ou Scheme?

Não, não dá.

Ok, o livro não foca na prática. Mas, ele é ser forte na teoria?

Não muito. Conceitos como continuations e closures são trabalhados informalmente. Cálculo lambda não chega nem a ser mencionado.

Hmm. Então porquê ele é um bom livro?

Para aprender programação funcional.

Wednesday, May 10, 2006

Language advocacy

Lisp: author makes a successful effort to show Lisp's virtues to the non-initiated. This tends to be a not so easy task, since the sources of Lisp's superiority are the very same things that make it look so alien to programmers versant in ALGOL descendant languages.
Scheme: Great language, naive writing. Scheme deserves better.
Smalltalk: This article highlights very well a lot of the points that struck me personally when I first came into contact with Smalltalk.
More smalltalk: "Smalltalk: Requiem or Resurgence?" I'm not very optimistic... But, who knows?
More lisp: When you get bored with reading about computer programming languages, you can relax, sit back, and enjoy this video about computer programming languages.