Thursday, April 27, 2006

(RDBMS) v (¬RDBMS)

My previous post was in response to Keith's writing regarding the possibility of escaping object relational mapping problems by avoiding altogether RDBMS's. The discussion of where and how relational databases are to be used turned out to be a hot topic on the blogosphere* these days, mostly fueled by Tim O'Reilly's nine-part saga on data management in Web 2.0* companies. The three most salient points I got from Tim's posts were: there is a preference for clustering by data partitioning instead of general replication strategies; predictably there also is a preference for open source relational db systems (MySQL being the front-runner); and there is a belief that read-mostly data is better served from flat files (sometimes with sophisticated replication mechanisms) than from RDBMS'. There is a lot of interesting info there, I recommend reading the whole thing.

Speaking of read-mostly data, I found this cool project via Joe Gregorio. It's a relational database system being built at MIT optimzed for this kind of data. Another interesting system for dealing with simple read-mostly data is Google's BigTable, that seeks to achiveve high throughput by adopting of a very simple data model, whithout integrity constraints nor transactional capabilities, structured in a highly distributed and fault-tolerant fashion (as would be expected of google). It also features native data versioning support.

PS: Yes, this post was written in english. At least something approaching what one might call "english"... The short motive is that I just felt like it. The longer one is that I need to practice my english at least as much as I need to practice my portuguese, and writing practice is one of the primary reasons for this blog's existence (yes, I'm a selfish bastard).

* buzzword count: 2. I'll try to do better next time.

Thursday, April 20, 2006

Don't forget to use your brain

Keith Braithwaite, one of the postmodern programming guys, wrote this interesting blog post. I tried to comment on the post page but, sadly, my wise and deep words didn't find it's way to the world wide web. So, in the immortal words of the prophet, I'm just gonna have to wing it here.

A reasonable amount of bytes and brain cells have been lost, over the years, on the object-relational mapping battleground. Keith's argument is that many of those casualties could be prevented if developers let go of the RDMS dogma and adopt simpler solutions when possible. If the requirements don't indicate a heavy load of concurrent updates, then there is no need for powerful (and expensive) transactional capabilities. He gives as an example an online shopping web site. Usually, additions and changes to the product catalog don't need to be immediately reflected to the users. Also, the possibility of simultaneous modifications can be negligible here. He proposes* that, in cases like this, the data can be made available in a simple format locally, on the front-end servers. Updates can be periodically pushed from a central database to the front-ends.

The only nit I have to pick with his post is when he talks about queries:
"For instance, if you want to get hold of an object that (you have to know) happens to be backed by the underlying database then what you do is obtain an entry point to the data, and search over it for the object you want. Unlike the navigation that you might expect to do in an object graph, instead you build...queries. (...)"
Yes, building a whole query just to get hold of one object reference is too much trouble and a violation of DRY (parenthesis: this sort of thing eases somewhat this pain, minus all the factories and spurious abstractions). But I think he overlooks a bit the fact that often we really need to do a query. It's frequently part of the problem domain, not the solution domain, to put it in other (more pompous) words.

I view queries as inherently declarative operations (given x information, get me Y more information). As such, they are better expressed through declarative means. So, aCollection select: aBlock in Smalltalk is better than the equivalent for loop in Java. Still in OOland, Evans and Fowler's Specification pattern is even better for more complex cases. Advancing to the logical conclusion, a language specifically designed for searching would be even better. Unfortunatly, SQL falls short of reaching this goal in practice, because of the mess that is integrating it with the rest of the application. Microsoft's linq project is an intriguing technology in this space.

Anyway, what I wanted to point out is that there is no one size fits all software architecture and every project needs to be thought out** by a team who knows what it's doing and isn't afraid to think outside the vendor-supplied box.


* barring possible misunderstandings of my part.
** This is not a defense of BDUF

Monday, April 10, 2006

Friday, April 07, 2006

Code Contest

Essa história começou aqui. A segunda edição está proposta no blog do Daniel Quirino. Achei uma boa para brincar com a Time and Money library, do Eric Evans. Aqui vai a minha submissão:

package cc;

import java.io.File;
import java.util.TimeZone;

import com.domainlanguage.time.*;
import com.domainlanguage.timeutil.Clock;

public class DeleteOldFiles {
public static void main(String[] args) {
Clock.setDefaultTimeZone(TimeZone.getDefault());
CalendarInterval lastWeek =
Duration.weeks(1).subtractedFrom(Clock.today()).through(Clock.today());

for (File f : new File("D:\\tmp\\backup").listFiles()) {
TimePoint lastModified = TimePoint.from(f.lastModified());

if (lastWeek.includes(lastModified))
f.delete();
}
}
}

Só não tente rodar, porque não vai dar certo. Encontrei um bugzinho na biblioteca e mandei um patch, daqui a um tempo a versão no CVS deve ser atualizada.

Tuesday, April 04, 2006

Faltam só 3h32min3s

Mensagem do Richard Gabriel para o Hillside group, via Grady Booch:
I got this from a colleague: "As you may have noted, on Wednesday, at two minutes and three seconds after 1:00 in the morning, the time and date will be 01:02:03 04/05/06. Unless you are very young, or live a very long time, this is probably your one chance to observe this date. Whoop it up."

Monday, April 03, 2006

Holismo

Às vezes acontece de uma boa palavra ser coagida a defender idéias ruins. Estou pensando no caso da palavra "holismo", figurinha comum nos lábios de charlatões e enganadores. Apesar disso, é esclarecedora a idéia, que remonta a Aristóteles, de que um sistema pode ser melhor compreendido como um todo do que como a soma de suas partes.

Uns meses atrás eu resolvi descobrir o que Extreme Programming tem de tão especial e comprei o livro canônico: Extreme Programming Explained, do Kent Beck, pai da criança. É um livro bastante persuasivo, repleto de argumentos incisivos. Um dos que mais me chamaram a atenção foi a idéia de que a otimização de cada etapa de um processo de produção não leva, necessariamente, à otimização do processo inteiro. Esse argumento é aplicado à engenharia de software para apoiar a tese de que uma abordagem metodológica baseada em ciclos curtos produzindo resultados úteis, mas "incompletos", é superior à tradicional visão do processo como uma sucessão de longas fases, cada uma gerando montes de "artefatos". Podemos encarar o ponto de vista dos xispeiros como uma aplicação do princípio do holismo ao desenvolvimento de software.

Também é possível enxergar o holismo em outro membro da família desajustada que é a ciência da computação. O princípio end-to-end, do qual já tratei por aqui, é claramente uma defesa da otimização global de sistemas no âmbito da arquitetura de software.

Eu comecei a pensar essa bobajada toda quando estava discutindo com um amigo sobre a eficiência de aplicações web. Ele afirmou que vale a pena recorrer à técnicas extremas para atingir o propósito de maximizar a performance de uma aplicação web, visto que quanto mais um harware for aproveitado, menor é o custo para servir o mesmo número de usuários (ou "maior é o número de usuários que podem ser servidos por um mesmo hardware", se você é do tipo gosta de entender as coisas ao contrário). Meu amigo chegou a defender que é válido escrever sites web como dlls ISAPI em C++... O raciocínio dele está formalmente correto, mas as considerações econômicas são mais amplas que uma minimização simples de função de custo. Se tratamos de uma situação competitiva, a capacidade de reagir rápido aos avanços da concorrência é vital. No caso de desenvolvimento de software interno, a pressão competitiva é indireta, se manifestando nos famosos requerimentos mutantes que forçam a evolução constante e eterna da funcionalidade das aplicações.

Bolha versus mercado.

Leiam o ótimo post do pcalçado.