Sunday, October 21, 2012

Security and Objects

Mobile code is one of the great challenges for software security. Lets say  you are writing an email application. The idea that people could send little apps to each other in email messages might seem like a potentially interesting feature: users could build polls, schedule meetings, play games, share interactive documents. Kind of cool.

And if the platform you are building upon supports reflectively evaluating code, it could be as easy as something like this (in OO pseudocode):

    define load_message(message)
        ...
        eval(message.code)

Of course it can't be that easy. What if the code in the message does something like:

    new stdlib.io.File("/").delete()

The standard way to avoid the vulnerability is to put the code in a so-called sandbox. It sounds very secure, but in practice this usually amounts to gathering up a list of "dangerous" call sites and inserting in each some code to check if the caller has permission to proceed. So the implementation of delete would include code along the lines of:

    define delete()
        if VM.callStackContainsEvilCode?()
            raise YouShallNotPassException()

        ...

This is fraught with problems. It requires runtime support for inspecting the call stack and a system for declaring that certain code has some level of authorization while some other code has a lower level. Not to mention the busywork of going trough the code and peppering that little snippet over every suspect call site. If you miss one — say, for instance, a method that gets the addresses of all contacts on your email application — and you have a security bug on your hands.

A better way ?

Perhaps there is a better way. Take another look at the offending line: new stdlib.io.File("/").delete(). It is only able to call the dangerous delete() method because it has a reference to a file object pointing to the root of the filesystem. And it only has that reference because it could reach for the File class on a global namespace. What if there was no global namespace?

It might seem weird, but it's not that hard to imagine a programming system lacking a global namespace. Many object-oriented languages, following Smalltalk's lead, have a notion of a metaclass, an object that represents a class. Many of them (also following Smalltak) also get by without a "new" operator. Objects are created by calling a method — usually named new() — on the metaclass object.

We are very close now. The last step, unfortunately not taken by most common languages, is to avoid anchoring the metaclass object onto a global namespace. The result is that code can only create objects of the classes it holds a reference to. And it only has a reference if it is given one via a method or constructor parameter.

Proceeding recursively, we end up with a stratified program. There is an entry point that receives a reference to the entire standard library, and each call site decides how much authority to grant each callee. On our example, when we evaluate external code we can grant very little authority, meaning we can pass the evaluated code just a handful of references. Care must be taken so that none of them will direct or indirectly provide a way to create a File. In a way, object design becomes security policy.

And we get very fine-grained control over such policy. We could, for instance, grant loaded code authority to write on a designated directory just by passing it a reference to the Directory object for that directory. Our choices get even more interesting when we realize we can pass references to proxies instead of real objects in order to attenuate authority. Continuing with our example, hoping it doesn't get too contrived, we could build a proxy for the Directory that checks if callers exceed a given quota of disk space.

Research

I have mentioned above that most common languages don't fit this post's description. But there are languages that do, a prime example is E. In fact, there is a whole area of research for dealing with security in this manner, it's called "object capability security".

I'm not really a security guy, I got interested in the area due to the implications for language and system design. If you got interested, for any reason, please check out Mark Miller's work. He is the creator of the E language and the javascript-based Caja project. His thesis is very readable.