Lazy loading relationships in Django (and other MVCs/ORMs) - django

Interested in knowing how lazy loading is achieved in frameworks like Django. When is the decision made to perform the join? And is there a way to force eager loading in Django? Are there times when you would need to force Django to eager load?

The general answer is that Django makes the decision to perform the query when you actually ask for some records. Most commonly this means iterating over the queryset (for record in queryset:) or using the list() built-in function to convert the queryset to a list.
See When QuerySets are evaluated for more specifics from the official docs.
It accomplishes this by defining a class, called QuerySet in django/db/models/query.py, where the special methods like __repr__, __getitem__ and __iter__ are coded to do the right thing.
If you need to force eager loading just run the built-in Python list function on the queryset, like:
qs = SomeModel.objects.all()
ql = list(qs)
This call to list() will perform the DB query and load all of the objects into memory. It should be pretty rare that you need to do this, but one case is when you need to use the query results in more than one place in your templates. Converting to list and passing the list in your template context will perform the query only once instead of once for every place in your template you iterate.

Related

What is the difference between annotations and regular lookups using Django's JSONField?

You can query Django's JSONField, either by direct lookup, or by using annotations. Now I realize if you annotate a field, you can all sorts of complex queries, but for the very basic query, which one is actually the preferred method?
Example: Lets say I have model like so
class Document(models.Model):
data = JSONField()
And then I store an object using the following command:
>>> Document.objects.create(data={'name': 'Foo', 'age': 24})
Now, the query I want is the most basic: Find all documents where data__name is 'Foo'. I can do this 2 ways, one using annotation, and one without, like so:
>>> from django.db.models.expressions import RawSQL
>>> Document.objects.filter(data__name='Foo')
>>> Document.objects.annotate(name = RawSQL("(data->>'name')::text", [])).filter(name='Foo')
So what exactly is the difference? And if I can make basic queries, why do I need to annotate? Provided of course I am not going to make complex queries.
There is no reason whatsoever to use raw SQL for queries where you can use ORM syntax. For someone who is conversant in SQL but less experienced with Django's ORM, RawSQL might provide an easier path to a certain result than the ORM, which has its own learning curve.
There might be more complex queries where the ORM runs into problems or where it might not give you the exact SQL query that you need. It is in these cases that RawSQL comes in handy – although the ORM is getting more feature-complete with every iteration, with
Cast (since 1.10),
Window functions (since 2.0),
a constantly growing array of wrappers for database functions
the ability to define custom wrappers for database functions with Func expressions (since 1.8) etc.
They are interchangable so it's matter of taste. I think Document.objects.filter(data__name='Foo') is better because:
It's easier to read
In the future, MariaDB or MySql can support JSON fields and your code will be able to run on both PostgreSQL and MariaDB.
Don't use RawSQL as a general rule. You can create security holes in your app.

Django: Filtering a queryset locally from cache

If I perform a prefetch_related('toppings') for a queryset, and I want to later filter(spicy=True) by fields in the related table, Django ignores the cached info and does a database query. I found that this is documented (under the Note box) and seems to happen for all forms of caching (select_related(), already evaluated querysets, etc.) when another filter() is performed.
However, is there some sort of super secret hidden time-saving shortcut to filter locally (using the cache and not hitting the database) without having to write the python code to loop the queryset (using list/dict comprehension, etc.)? Maybe something like a filter_locally(spicy=True)?
EDIT:
One of the reasons why a list/comprehension doesn't work well for me is because a list/dict does not have the queryset methods. In my case, the first level M2M field, toppings, isn't the end goal for me and I need to check a 2nd related M2M field (which I have already pre-fetched as well). While this is also possible using list comprehension, it's just much simpler to have something such as filter_locally(spicy=True, origin__country='Spain') because:
it allows accessing many levels of related fields with minimal effort
it allows chaining other queryset methods
it's easier to read because it's consistent with the familiar filter()
it's easier to modify existing code using filter() without prefetch to add this optimization in without much changes.
But from the responses, Django has no such support :(
You have to write the python code to loop through the queryset (a list/dict comprehension is ideal). All the filter() code knows how to do is add filtering language to the SQL sent to the database. Filtering locally is a totally different problem than filtering remotely, so the solutions to those two separate problems won't be able to share any logic.
A list comprehension one-liner would be pretty straightforward, though; the syntax might not be much more complex than with filter().
If you're filtering on a boolean doing the list comprehension is pretty easy. You can also swap out the topping.spicy==True for a string comparison or whatever.
I would do something like:
qs = Pizza.objects.all().prefetch_related('toppings')
res = list(qs)
def get_spicy(qs):
res = list(qs)
return [pizza for pizza in res if any(topping.spicy==True for
topping in pizza.toppings.all())]
That is if you want to return the pizza object if any of its toppings is spicy. You can also replace the any() with all() to check for all, and do a lot of pretty powerful queries with this syntax. I'm somewhat surprised that there is no easy way to do this in django. It seems like a lot of these simple queries should be easy to implement in a generic manner.
The above code assumes a many2many. It should be easy to modify to work with a simple FK relationship such as a one2one or one2many.
Hope this was helpful.

packing objects as json with django?

I've run into a snag in my views.
Here "filtered_posts" is array of Django objects coming back from the model.
I am having a little trouble figuring out how to get as text data that I can
later pack into json instead of using serializers.serialize...
What results is that the data comes double-escaped (escaped once by serializers.serialize and a second time by json.dumps).
I can't figure out how to return the data from the db in the same way that it would come back if I were using the MySQLdb lib directly, in other words, as strings, instead of references to objects. As it stands if I take out the serializers.serialize, I get a list of these django objects, and it doesn't even list them all (abbreviates them with '...(remaining elements truncated)...'.
I don't think I should, but should I be using the __unicode__() method for this? (and if so, how should I be evoking it?)
JSONtoReturn = json.dumps({
'lowest_id': user_posts[limit - 1].id,
'user_posts': serializers.serialize("json", list(filtered_posts)),
})
The Django Rest Framework looks pretty neat. I've used Tastypie before, too.
I've also done RESTful APIs that don't include a framework. When I do that, I define toJSON methods on my objects, that return dictionaries, and cascade the call to related elements. Then I call json.dumps() on that. It's a lot of work, which is why the frameworks are worth looking at.
What you're looking for is Django Rest Framework. It handles related objects in exactly thew way you're expecting it to (you can include a nested object, like in your example, or simply have it output the PK of the related object for the key).

Using Django ORM to serialize data and related members

I'm currently working on a REST api, using django. I started using the nice djangorestframework, which I loved to use the "View" class.
But, I'm facing with the serialization problem.
I do not like the Serialization using the Serializer classes.
The main goal is to prepare a sort of giant dict, with all the infos, and give it to a renderer class which translate it into an xml, json, yaml, depending on the "Accept:" HTTP header. The goal is classy, but 60% of the CPU time is spend on creating the "GIANT DICT".
This dict can be created using django Models, but I think using on the fly instanciated classes and object is VERY un-efficient ? I'm trying to use some QuerySet methods to filter which models member I want to have, and getting a simple dict : the ::values() method, but unfornately, I can't access the m2m and foreignkey from my models.
Did you already tried this ? Any though ?
You could use the QuerySet's iterator method:
... For a QuerySet which returns a large number of objects that you only
need to access once, this can results in better performance and a
significant reduction in memory.
Your code should looks like:
for obj in SomeModel.objects.values_list('id', 'name').iterator():
# do something

Proper way to call a database function from Django?

i'm triyng to make a full text search with postgresql and django So I've created a function search_client(text) which returns a list of clients. To call it from the DB i use something like this:
SELECT * FROM search_client('something')
and i'm not really sure how to call it from django. i know i could do something like
cursor = connection.cursor()
cursor.execute("SELECT * FROM search_client('something')")
result = cursor.fetchall()
but that will only return a list of values, and i'd like to have a list of objects, like when i use the "filter()" method.
Any ideas?? thanks for your time!
If your goal is a full-featured search engine, have a look at django-haystack. It rocks.
As for your question, the new (Django 1.2) raw method might work:
qs = MyModel.objects.raw("SELECT * FROM search_client('something')")
If you're using Django 1.2, you can use the raw() ORM method to execute custom SQL but get back Django models. If you're not, you can still execute the SQL via the extra() method on default QuerySet, and pump it into a custom method to either then go pull the real ORM records, or make new, temporary, objects
First, you probably don't want to do this. Do you have proof that your database function is actually faster?
Implement this in Python first. When you can prove that your Python implementation really is the slowest part of your transaction, then you can try a stored procedure.
Second, you have the extra method available in Django.
http://docs.djangoproject.com/en/1.2/ref/models/querysets/#django.db.models.QuerySet.extra
Note that compute-intensive database procedures are often slow.