Saturday, October 10, 2009

Type-safe printf in scala ‽

All the excitement surrounding Scala's next big release - 2.8.0 - reminds me of another big release a few years ago. Java 5 was poised to be a huge leg up for developers, long feature lists were all over the java blogosphere: generics, enums, enhanced for-loops, autoboxing, varargs, util.concurrent, static imports, and so on; in sum, a whole heap of goodness. But there were a few odd ducks hidden in the feature lists, most notably "an interpreter for printf-style format strings".

Anyone who has been around curly-brace-land long enough knows that printf (or java.util.Formatter) serves one main purpose: it's a poor substitute for built-in variable interpolation in the language. Unfortunatly Scala also lacks variable interpolation and there isn't much we can do about it: our BDFL has ruled. But, there is another use for printf, as a shortcut for applying special formatting for some value types. We can specify a currency format using the string "%.2f USD", or a short american date format with "%tD". We can even go wild and use the two together:

In Java:
String.format("%.2f USD at %tD", 133.7, Calendar.getInstance());

In Scala:
"%.2f USD at %tD" format (133.7, Calendar.getInstance)

This snippet is saying that 133.7 should be formatted as a decimal with two digits after the point, and Calendar.getInstance() - the horrific Javaism for "right now" - should be formatted as a date, not a time. What always trips me up is that the order of the values must exactly correspond to the order of the format specifiers. It's a simple task, but my tiny little brain keeps messing it up. Let's see if our good friend the Compiler can help.

Formatters
The first step is to have our logic leave it's String hideout and show itself. So, instead of "%.2f", we'll say F(2)*, and instead of "%tD" we'll say D(T). D and F are now Formatters:

trait Formatter[E] {
 def formatElement(e: E):String
}
 
case class F(precision: Int) extends Formatter[Double] {
 def formatElement(number: Double) = ("%."+precision+"f") format number
}
 
import java.util.Calendar
abstract class DateOrTime
object T extends DateOrTime
object D extends DateOrTime

case class T(dateOrTime: DateOrTime) extends Formatter[Calendar] {
 def formatElement(calendar: Calendar) = dateOrTime match {
   case Time => "%tR" format calendar
  case Date => "%tD" format calendar
 }
}

Chains
Next we tackle the issue of how to chain formats together. The best bet here is to use a composite, very similar to scala.List ***. We even have a :: method to get right-associative syntax. This is how it looks like:

val fmt = F(2) :: T(D) :: FNil

And this is the actual, quite unremarkable, code:

trait FChain {
  def :(formatter: Formatter) =
    new FCons(formatter, this)

  def format(elements: List[Any]):String
}

object FNil extends FChain {
  def format(elements: List[Any]):String = ""
}

case class FCons(formatter: Formatter, tail: FChain) extends FChain { 
  def format(elements: List[Any]):String = 
    formatter.formatElement(elements.head) + tail.format(elements.tail)
}

There is still a missing piece. Remember "%.2f USD at %tD"? We have no way of chaining the " USD at" to our formatters. This is what we want to be able to write:

val fmt = F(2) :: " USD at " :: T(D) :: FNil

The solution is simple, we overload :: in FChain:

trait FChain {
  ...
  def ::(constant: String) =
    new FConstant(constant, this)
  ...
}

and create a new type of format chain that appends the string constant:

case class FConstant(constant:String, tail: FChain) extends FChain { 
  def format(elements: List[Any]):String = 
    constant + tail.format(elements)
}

Cool, that works. But wait; what have we gained so far? The problem was to match the types of the formatters with the types of the values, and we aren't really using types at all. The remedy, of course, is to keep track of them.

Cue the Types
We want to check the types of the values passed to the FChain.format(), but this method currently takes a List[Any], a List of anything at all. We could try to parameterize it somehow and make it take a List[T], a list of some type T, instead. But, if we take a List[T], it means all values must be of the same type T, and that's not what we want. For instance, in our running example we want a list of a Double and a Calendar and nothing more.

So, List doesn't cut it. Fortunately, the great Jesper Nordenberg created an awesome library, metascala, that contains an awesomest class: HList. It is kind of like a regular List, with a set of similar operations. But it differs in an important way: HLists "remember" the types of all members of the list. That's what the H stands for, Heterogeneous. Jesper explains how it works here.

We'll change FChain to remember the required type of the elements in a type member, and to require this type in the format() method:

trait FChain {
  type Elements <: HList
  ...
  def format(elements: Elements):String
}
FNil is pretty trivial, it can only handle empty HLists (HNil):
object FNil extends FChain {
   type Elements = HNil

   def format(elements: Elements):String = "" 
}
FCons is somewhat more complicated, it is parameterized on the type of the head element, and on the type of the rest of the chain:
case class FCons[E, TL <: HList](formatter: Formatter[E], tail: FChain { type Elements = TL }) extends FChain { 
  type Elements = HCons[E, TL]

  def format(elements: Elements):String = 
    formatter.formatElement(elements.head) + tail.format(elements.tail)
}
We also had to tighten-up the types of the constructor parameters: formatter is now Formatter[E] ** — so it can format elements of type E, and tail is now FChain{type Elements=TL} — so it can format the rest of the values. The Elements member is where we build up our list of types. It is an HCons: a cons pair of a head type - E - and another HList type - TL. We changed how to FCons constructor parameters, so we also need to change the point where we instantiate it, in FChain:
trait FChain {
  type Elements <: HList
   ...
  def ::[E](formatter: Formatter[E]) =
    new FCons[E, Elements](formatter, this)
  ...
}
Just passing along the type of the formatter and of the elements so far to FCons. FConstant has to be changed in an analogous way. This is it, now format() only accepts a list whose values are of the right type. Check out an interpreter session:

scala> (F(2) :: " USD at " :: T(D) :: FNil) format (133.7 :: Calendar.getInstance :: HNil)
res5: java.lang.String = 133.70 USD at 10/10/09

scala> (F(2) :: " USD at " :: T(D) :: FNil) format (Calendar.getInstance :: 133.7 :: HNil)
:25: error: no type parameters for method ::: (v: T)metascala.HLists.HCons[T,metascala.HLists.HCons[Double,metascala.HLists.HNil]] exist so that it can be applied to arguments (java.util.Calendar)
 --- because ---
no unique instantiation of type variable T could be found
       (F(2) :: " USD at " :: T(D) :: FNil) format (Calendar.getInstance :: 133.7 :: HNil)

Some random remarks
  • The interpreter session above nicely showcases the Achilles heel of most techniques for compile-time invariant verification: the error messages are basically impenetrable.
  • A related issue with this kind of metaprogramming is that it's just plain hard. The code in this post looks pretty simple (compared with, say, JimMcBeath's beautiful builders), but it took me days of fiddling around with metascala to find an adequate implementation.
  • Take the above two points together and it's clear that we are talking about a niche technique. Powerful, but not for everyday coding.
  • Since I've mentioned the word coding, the FChain structure showed here looks like a partial encoding of a stack-automaton. Stack automata can represent context-free grammars, if I remember my college classes correctly. That said, I don't see any particular use for this tidbit of information.
  • Since this is random remarks section, I'll randomly remark that we can implement the whole thing without type members and refinements. Just good old type parameters in action.
  • Since it is possible to use nothing but type parameters, and Java has type parameters, would it be possible to implement our type-safe Printf in Java? Quite possibly, a good way to start would be with Rúnar Óli's HLists in Java. Just take care not to get cut wading through all those pointy brackets. 
  • A powerful type system without type inference is useless. Quoting  Benjamin Pierce: "The more interesting your types get, the less fun it is to write them down".


The whole code:
import metascala._
import HLists._

object Printf {
 trait FChain {
  type Elements <: HList

  def ::(constant: String) =
   new FConstant[Elements](constant, this)
  
  def ::[E](formatter: Formatter[E]) =
   new FCons[E, Elements](formatter, this)

  def format(elements: Elements):String
 }

 case class FConstant[ES <: HList](constant:String, tail: FChain { type Elements = ES }) extends FChain { 
  type Elements = ES

  def format(elements: Elements):String = 
   constant + tail.format(elements)
 
 }
 
 object FNil extends FChain {
  type Elements = HNil
  def format(elements: Elements):String = ""
 }

 case class FCons[E, TL <: HList](formatter: Formatter[E], tail: FChain { type Elements = TL }) extends FChain { 
  type Elements = HCons[E, TL]

  def format(elements: Elements):String = 
   formatter.formatElement(elements.head) + tail.format(elements.tail)
 
 }
 
  trait Formatter[E] {
   def formatElement(e: E):String
  }
   
  case class F(precision: Int) extends Formatter[Double] {
   def formatElement(number: Double) = ("%."+precision+"f") format number
  }
   
  import java.util.Calendar
  abstract class DateOrTime
  object T extends DateOrTime
  object D extends DateOrTime
  
  case class T(dateOrTime: DateOrTime) extends Formatter[Calendar] {
   def formatElement(calendar: Calendar) = dateOrTime match {
     case T => "%tR" format calendar
    case D => "%tD" format calendar
   }
  }
 }

* Or F(precision=2) thanks to Scala 2.8 awesome named parameter support.
** In fact, the previous untyped version didn't even compile, as Formatter has always been generic. Sorry for misleading you guys.
*** If you are unfamiliar with how Scala lists are built, check out this article.

2 comments:

Quintesse said...

I'm not sure where the value is in this. One of the advantages of the original method, a single string with formatters, is that you can easily translate it for example. You just give the entire text to a translator and either tell them to leave the formatters alone, only moving them to their correct grammatical position in the sentence, or adjust the formatters to the local customs if necessary.
With your solution that possibility is completely gone.

I must also say I find it:
a) hardly readable, and
b) I don't see the advantage over something like this:

(133.7).format(0, 2) + " USD at " Calendar.getInstance().format(T)

(this is non-working code of course, but I'm sure something like this could be made)

Cheers

Rafael de F. Ferreira said...

Hi Quintesse. Yes, it should be possible to write the code in the manner you propose, applying the pimp-my-library pattern to the Int and Calendar classes to add a format method to them. The repetition of ".format(...)" for each argument would be ugly, but not that much.

But, to be honest, practicality wasn't my main goal in this exercise. Foremost, because most of time I have to use printf (or .format) is in order to concatenate .toString()'s without any special formatting. The most elegant solution to this case seems to be plain variable interpolation, which sadly Scala lacks.