Wednesday, August 29, 2012

A Practitioner's requests for Programming Language Research

I have been interested in programming languages and programming language theory for some time, CTM is my favorite technical book, and I even managed to wade through the first chapters in Pierce's TAPL (some day I'll get through the rest, some day...). But it's not often that I can connect that interest with by job as practicing programmer. This post is an attempt to forget about my particular research interests and try to list my daily pain points and wants as a user of programming languages.

If this is to make any sense, I have to start outlining the context: mostly back-end-related web development, currently with a content-based website, with past consulting stints at more and less enterprisey environments. I won't presume to represent any particular segment of the programming population, but I would be surprised if my peeves are unique.

Semi-automatic mapping between data representations.

Most programs receive data in one or more external forms — database rows, JSON documents, ASN.1 records, form data and many others — transform them to internal representations for computation and write them out again. Reflection enables generic marshaling libraries so we can escape the drudge of writing and maintaining all that conversion code by hand. There is a whole  research area for doing about the same in typed functional programming — polytypic programming — but I know next to nothing about it, I'm afraid.

All this is great, but in practice a completely generic mapping algorithm is rarely sufficient. It forces one of the two representations to bend to conform to the other: who hasn't seen horrible schema-generated classes or ugly auto-generated xml? On the other hand, when we try to sophisticate the mapping to give developers more flexibility, we get ever more cumbersome configuration options cluttering our code. We need something better and I hope PLT can help; it could be an elegantly terse language for writing the mappings, it could be an ingenious abstraction mechanism for composing generic mapping with custom instructions, or it could be something else entirely.

Library versioning

The issues around library versioning and dependency management are well known and have been for a long time, generation after generation of technology creating it's own version dependency hell. The situation is so dire, just enabling the coexistence of many versions of a library on a single machine is hailed as a major breakthrough.

A language based approach can afford to be much more ambitious. Most module systems I'm familiar with handle compatibility by having client-supplied rules preside over version numbers defined by the library provider. The provider, in turn, must decide wholesale for his entire library what level of compatibility he thinks each release can maintain. In practice there is a lot of guessing going on both on the client and the provider side, mediated by the very lossy medium that is the version number.

Perhaps things need not be so complicated if the language gets in on the action. Imagine if we could indicate to our compiler "this little change in this method here is supposed to be compatible, while this other change over there is not" and the information would trickle down from caller to caller all the way to the public API surface. The compiler would of course flag a call from a compatible method to a non-compatible method, and perhaps a theorem prover/counterexample generator could pinpoint places where the developer unwittingly broke compatibility. If our modules are first-class, we can even contemplate having several versions of them interoperating in runtime to satisfy transitive dependencies differences.

Data evolution

If the behaviour evolution problem described on the item above is solved, the next issue to tackle is the evolution of data. In a way it's a harder problem to face, as it's not enough to track what changed, we need to know why it changed (simple example: the User record has a surname String field on version one; on version two this field is gone and a new last_name field is there; should it be considered a rename?).

In the context of server-side web applications I don't see a particular need for in-memory application data evolution (clusters alleviate the need for hot-deploys). But migrating externally stored data is a real necessity. As far as I know, the state of the art now amounts to filtering and sorting manually written database scripts to run at deploy time, perhaps with some syntactic sugar sprinkled on top.

But if we note that on the moment the data declaration code is being changed, the developer knows why it's changing, we just need a way to record this intent and tie it to the mapping and change structures from the previous two items, and voilĂ . Things sure sound easy on blog posts, don't them?

Tooling

I've written about this before, so I'll just briefly lament what a shame it is we are trapped between monstrous cockpit-like IDEs and dumb-as-a-rock text editors, when the obvious path of a language-aware editor seems to interest almost no one. By the way, I wish the Light Table guys all the luck.

Non-issues

From my biased perspective as a practitioner on my particular domain, there are some problems tackled by programming language research that I don't see as particularly pressing.
  • Sequential performance: Cache is king.
  • Parallel performance: On the server, multicore parallelism is very well exploited by request-level concurrency and virtualization.
  • Concurrency: STM and Actors are nice but a very large portion of the concurrency issues in practice are simply delegated to the database (and I haven't the foggiest idea what concurrency support those guys crave for, but I bet many are content with locks, monitors and semaphores).
  • Correctness: We have bugs, of course, but they seldom present themselves as classical broken invariants.  What are they then? Faults of omission, unintended interactions between separate systems, API usage misunderstandings. I'm skeptical on language help for these, but I'd be glad to be proven wrong.
Seems like a good time to repeat that on this post I'm trying to connect my particular needs as a developer on my current domain to what programming language research might accomplish. I'd expect the issues and non-issues to be different for other domains, or even for other developers in a similar domain but with a different background.

Cake

Who doesn't love cake? No, not this cake, this cake. Hmm.

3 comments:

Andrei said...

The problem is the difference between what would be more interesting to the daily practice of developers, and what is valued in research communities. Versioning and tooling are two aspects that aren't particular popular between researchers. Making good tools is a lot of work but it's not considered good research, so few researchers will dedicate the time to something that won't produce many papers.

On the other hand, there are "hot" topics in research that often are less important in practice but generate lots of papers.

Rafael Ferreira said...

@Andrei

I don't want to come across as if I believe research goals should be directed by industry needs. It's a complicated relationship, and I don't see it getting any simpler.

My intent with this post was more towards hightlighting some real needs that could benefit from research and aren't usually cited when academics talk about the industry impact of their research.

I know you didn't said this but the post may have left the impression that I think research should be more engineering-oriented. On the contrary, I think the points I mentioned all could benefit from a lot of theoretical work.

Thanks for commenting, Andrei.

Cynthia Burns said...

Then, while programming can be left alone to do its thing after completion, it still needs human intervention. As you have said, there are auto-generated xml that are ugly, thus needing the logical thinking processes of a human being.