Yann’s Blog

July 11, 2008

A Python API for Document Databases? Introducing tome

Filed under: Python, Software, Tome — Yann @ 11:24 pm

The relational database is dead. Long live the document database!

Ok, maybe that statement is far too gloomy and patently untrue. Relational databases have their place and their uses. But as a general purpose web application data storage system, they are not always the best tool for the job. There are many use cases where the RDBMS is not an effective storage engine.

Enter the ‘document database’. Or otherwise known as the big and fancy hash map. The concept is gaining momentum in the industry, particularly for online applications which are made up of structured data and documents: social sites, search engines, blogging applications, etc. It is an extension of data-de-normalization. By de-normalizing data, you can safely distribute and parallelize its storage and representation. You can do this with a RDBMS as well, but at its heart, you are still using an RDBMS.

Document databases are basic key/value storage systems. They are distinctly different from relational databases in that queries can only be performed (efficiently) on a key. Documents should also be de-normalized, minimizing external references so a complete view can be obtained without having to fall back to relational semantics (primarily JOINs).

Projects such as Hadoop/HBase, CouchDB, and even Google’s BigTable are great examples of emerging (and successful) document oriented databases. The problem is they all have non-standard access modules (if any), and are “bare bones” in the access model.

What Python needs is a specification and implementation of a document database access API. On top of this could be a layer similar in ideology to SQLAlchemy, in which Python classes could represent documents and any links amongst them.

I am working on an implementation of the lower-level access layers for HBase and CouchDB (leveraging the excellent CouchDB Python module). In addition, there will be a “DB-API” adapter and anydbm adapter for developer prototyping.

Its not specification worthy at this point in time, but will hopefully foster growth in this emerging field.

And since every project needs a nifty name, I dub thee tome.

4 Comments »

  1. Nice idea.
    Are you familiar with the AppEngine DataStore API? It’s nice and simple djangoish ORM without the RM.

    Comment by Rich — July 12, 2008 @ 1:26 am

  2. A good idea; is the eventual plan something like the standard Python database API, where lots of modules have the same interface, or an actual one like SQLAlchemy, which is a nice high-level representation? Or possibly both?

    I would look at the website, but it fails to resolve here :)

    Comment by Andrew — July 12, 2008 @ 3:58 am

  3. I was thinking of both models. The access layer being represented by a common interface, and the high-level interface (document objects) being much like SQL Alchemy. And yes, I have been looking at the GAE API, and I am drawing some pointers from its design.

    I’ve broken my DNS resolving of the new site, but don’t worry, you aren’t missing much beyond a boilerplate page of no content (yet anyway :)).

    Comment by Yann — July 12, 2008 @ 11:16 am

  4. Great idea Yann, this would help me loads at work, when are you hoping to launch this we would definitely use something like this?

    Comment by Bethany Storager — August 20, 2008 @ 3:22 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress