The relational database is dead. Long live the document database!
Ok, maybe that statement is far too gloomy and patently untrue. Relational databases have their place and their uses. But as a general purpose web application data storage system, they are not always the best tool for the job. There are many use cases where the RDBMS is not an effective storage engine.
Enter the ‘document database’. Or otherwise known as the big and fancy hash map. The concept is gaining momentum in the industry, particularly for online applications which are made up of structured data and documents: social sites, search engines, blogging applications, etc. It is an extension of data-de-normalization. By de-normalizing data, you can safely distribute and parallelize its storage and representation. You can do this with a RDBMS as well, but at its heart, you are still using an RDBMS.
Document databases are basic key/value storage systems. They are distinctly different from relational databases in that queries can only be performed (efficiently) on a key. Documents should also be de-normalized, minimizing external references so a complete view can be obtained without having to fall back to relational semantics (primarily JOINs).
Projects such as Hadoop/HBase, CouchDB, and even Google’s BigTable are great examples of emerging (and successful) document oriented databases. The problem is they all have non-standard access modules (if any), and are “bare bones” in the access model.
What Python needs is a specification and implementation of a document database access API. On top of this could be a layer similar in ideology to SQLAlchemy, in which Python classes could represent documents and any links amongst them.
I am working on an implementation of the lower-level access layers for HBase and CouchDB (leveraging the excellent CouchDB Python module). In addition, there will be a “DB-API” adapter and anydbm adapter for developer prototyping.
Its not specification worthy at this point in time, but will hopefully foster growth in this emerging field.
And since every project needs a nifty name, I dub thee tome.