diagramming software wireframing/prototyping software

Thursday, January 7, 2010

First steps towards a Query backend interface

Our post about how to use the Django non-relational port and its supported features is already in the pipeline, but in the meantime I'd like to post a little update for those who want to contribute. In case you haven't heard of it, yet: the non-relational port adds support for Django's ORM (Model, QuerySet, etc.), so you can use the same code on App Engine, other cloud platforms, and on SQL.

You can find a short introduction to the internal architecture on our project wiki. Please read that before you continue.

Yesterday I finally moved out all hacked-in App Engine code into our djangoappengine package. This means that it's now possible to write real non-relational database backends for our Django port. On the long run it would be great to also have SimpleDB, MongoDB, and CouchDB support. Of course, the API is not set in stone, yet, so expect changes, especially in QueryData's internal format. If you want to implement a backend and get involved with the port you should fork the djangoappengine package and adapt the backends to your platform. I think you could write a simple backend in a few days. Beware, though, the djangoappengine code isn't clean, yet.

The new backend API

The backend's DatabaseOperations object (accessible via connection.ops) now provides a query_class() function which returns the QueryBackend class. This function is used by QuerySet and Model. Normally, you shouldn't override it. In order to specify the backend's Query class you just need to add a query_backend attribute to your DatabaseOperations like this:
class DatabaseOperations(BaseDatabaseOperations):
    query_backend = 'djangoappengine.db.backend.QueryBackend'

The backend itself then has to implement the database operations on top of QueryData. The BaseQueryBackend class provides a default constructor which takes the querydata instance and the connection instance and stores them in self.querydata and self.connection, respectively. Use them in your QueryBackend implementation. For example, the results_iter() function should convert self.querydata into the actual database query expression, execute it, and return the results. I won't go into the details, here, because the API is still not finished. Just look at the source if you want to know more.

The next step is to complete the low-level QueryBackend interface (e.g., QuerySet.update() isn't supported) and port Django's SQL layer to our backend and QueryData API. Yes, SQL is turned off right now. The incomplete SQL backend is in the django.db.models.sql.backend module. Once SQL is working, again, the Django team can comment on our QueryData solution and hopefully add non-relational backend support as a must-have feature for Django 1.3.

Since QueryData stores QuerySet function calls almost unmodified (except where it saves work for the backend), the translation to sql.Query shouldn't be very difficult, but it requires that you understand what's going on in the sql.Query class. QueryData copied a lot of code directly from that class, but modified the intermediate representation of the query expression to be more portable. The intermediate representation must be easily translatable by the sql.Query class and at the same time independent enough of SQL that you can write non-relational backends with it. If the representation is too simple we make the backends unnecessarily complicated. If the representation is too SQL-specific we can't built non-relational backends on top of it. That's the nut we have to crack. We're not sure, yet, if our current representation hits the sweet spot.

Finally, there are still a few TODOs in the Django source (search for "GAE"). For example, Django expects that when you delete an entity all related entities get deleted, too. This can take too much time, so the backend must be able to defer this process to a background task, for example. Also, multi-table inheritance isn't supported on non-relational backends. Currently it just raises an exception if you want to save/query a model on such a backend, but Django could alternatively implement the PolyModel concept to emulate multi-table inheritance. For now, we've just extended the backend API, so it can turn those features off for certain connections, but ideally these features should still be emulated somehow.

If you want to see more and faster progress you should help us. Fork our repositories (djangoappengine and django-nonrel-multidb), join our discussion group, and tell us what you're working on and feel free to ask for feedback.

No comments:

Post a Comment