Every month or so I find myself searching the same thing. Let me filter properties. If only Django let me filter properties then my life would be easy. A few days ago, a fortune cookie contained the old adage from the ever-wise Lao Tsu: a journey of a thousand miles begins with a single step. So at 3am on a Saturday morning, I said fuck it and started the journey.

Annotations 101

The beautiful thing about Django is its dead simple ORM. So many of your underlying RDBMS’ capabilities are abstracted away from you in a way that makes it really hard to mess up. One of these abstractions is the annotate API. The annotate API is a simple, sleek and tactful expression of SQL statements in a Pythonic way. The API is centred around a few core classes/concepts. These are the Q and F classes, aggregation classes like Sum, Count, Min and Max, the ExpressionWrapper class, the Case/When class and the Func class. Each of these expresses an SQL operation using Python objects and syntax which makes it really easy to pick up and meet the majority of your DB requirements.

For example, summing two columns can be expressed as:

 >>> something = Something.objects.all().annotate(total=Sum(F("column_one"), F("column_two")))

This is great because Django automagically adds an attribute on your class called “total” and it’s there for you to access in the normal property access syntax. What you can also do, however, is run aggregations on these fields.

For example, I can now get the grand total of all the totals:

 >>> something.aggregate(grand_total=Sum(F("column_one"))

This is super powerful as it:

  1. Allows the RDBMS to perform these aggregations (which are fucking fast).
  2. Reduces the need for extraneous SQL calls in models with joins.
  3. Adds the data onto your model to act as if it were a field

You can even specify a custom method on your QuerySet to automatically add this annotation onto all your future objects.

More annotations than I can handle

In my day-to-day, I deal with a lot of data where some values we need to aggregate aren’t stored on a field in the model. We did the normal thing and implemented an “add” function to our custom QuerySet and used that to add annotations for particular things when we needed them. The API looks something like this.

Something.objects.filter(foo="bar").add("baz", "bop", "bingo")

This would give us the baz, bop and bingo on the model with the annotations being stored in a dictionary for easy look up in the add function. This soon got unwieldy as our annotations became more and more complex as we converted more and more for loops into SQL expressions for speed. The dict in our add function was moved to another file where it just sat as a massive 300-line tribute to our efforts of speeding up our app. We wanted a cleaner way to annotate our objects and, more importantly, make it so that each object would consistently and declaratively have these annotations in every single instance.

So with that, we decided to build django-annotations-ng with 3 goals in mind.

  1. We wanted annotations to feel like fields. That is, we wanted annotations to be added to code declaratively, similar to how fields are added; and we wanted querying annotations to feel like querying normal Django fields.
  2. We wanted annotated data to be automatically available to related models so you could traverse relations and automatically have each relation’s annotations automatically run when referenced.
  3. We wanted to be able to lazy-load these annotations if they weren’t required.

Ambitious goals, huh?

Challenge #1
To be able to declare annotations like fields, we would need to understand the inner workings of Django’s low-level ORM. We would need to be able to hook into when the model is initialised and allow the attribute to be replaced when the annotation is performed. In addition to that, we would need to be able to detect when the user has elected not to proceed with the annotation and replace it with a function call that would allow the calculation to run on-demand.

Challenge #2
To be able to traverse classes, we would need to understand the inner workings of Django’s relationships API. We would need to be able resolve these relationships and pre-emptively annotate related models with required attributes if they are accessed through another class.

Challenge #3
We wanted to lazy-load annotations, but not a greater computational/latency cost than if we were just using standard Python. We wanted to only make a single extra SQL call if possible.

Barton Ip

Addicted to programming. Follow me for posts on Python, Django, Elasticsearch, React, AWS, photography and more.