Optimized Django Queryset - django

I have the following function to determine who downloaded a certain book:
#cached_property
def get_downloader_info(self):
return self.downloaders.select_related('user').values(
'user__username', 'user__full_name')
Since I'm only using two fields, does it make sense to use .defer() on the remaining fields?
I tried to use .only(), but I get an error that some fields are not JSON serializable.
I'm open to all suggestions, if any, for optimizing this queryset.
Thank you!

Before you try every possible optimization, you should get your hands on the SQL query generated by the ORM (you can print it to stdout or use something like django debug toolbar) and see what is slow about it. After that I suggest you run that query with EXPLAIN ANALYZE and find out what is slow about that query. If the query is slow because lot of data has to be transfer than it makes lot of sense to use only or defer. Using only and defer (or values) gives you better performances only if you need to retrieve lot of data, but it does not make your database job much easier (unless you really have to read a lot of data of course).
Since you are using Django and Postgresql, you can get a psql session with manage.py dbshell and get query timings with \timing

Related

What is the difference between annotations and regular lookups using Django's JSONField?

You can query Django's JSONField, either by direct lookup, or by using annotations. Now I realize if you annotate a field, you can all sorts of complex queries, but for the very basic query, which one is actually the preferred method?
Example: Lets say I have model like so
class Document(models.Model):
data = JSONField()
And then I store an object using the following command:
>>> Document.objects.create(data={'name': 'Foo', 'age': 24})
Now, the query I want is the most basic: Find all documents where data__name is 'Foo'. I can do this 2 ways, one using annotation, and one without, like so:
>>> from django.db.models.expressions import RawSQL
>>> Document.objects.filter(data__name='Foo')
>>> Document.objects.annotate(name = RawSQL("(data->>'name')::text", [])).filter(name='Foo')
So what exactly is the difference? And if I can make basic queries, why do I need to annotate? Provided of course I am not going to make complex queries.
There is no reason whatsoever to use raw SQL for queries where you can use ORM syntax. For someone who is conversant in SQL but less experienced with Django's ORM, RawSQL might provide an easier path to a certain result than the ORM, which has its own learning curve.
There might be more complex queries where the ORM runs into problems or where it might not give you the exact SQL query that you need. It is in these cases that RawSQL comes in handy – although the ORM is getting more feature-complete with every iteration, with
Cast (since 1.10),
Window functions (since 2.0),
a constantly growing array of wrappers for database functions
the ability to define custom wrappers for database functions with Func expressions (since 1.8) etc.
They are interchangable so it's matter of taste. I think Document.objects.filter(data__name='Foo') is better because:
It's easier to read
In the future, MariaDB or MySql can support JSON fields and your code will be able to run on both PostgreSQL and MariaDB.
Don't use RawSQL as a general rule. You can create security holes in your app.

Django: how do i create a model dynamically

How do I create a model dynamically upon uploading a csv file? I have done the part where it can read the csv file.
This doc explains very well how to dynamically create models at runtime in django. It also links to an example of doing so.
However, as you will see after looking at the document, it is quite complex and cumbersome to do this. I would not recommend doing this and believe it is quite likely you can determine a model ahead of time that is flexible enough to handle the CSV. This would be much better practice since dynamically changing the schema of your database as your application is running is a recipe for a ton of bugs in your code.
I understand that you want to create new schema's on the fly based on fields in the those in a CSV. While thats a valid use case and could be the absolute right call. I doubt it though - it lends itself to a data model for a single tenet SaaS application that could have goofy performance and migration issues.
I'd try using Mongo/ some other NoSQL solutions as others have mentioned. But a simpler approach may be a modified Star Schema implemented in SQL. In this case you create a dimensions tables that stores each header, then create an instance of each data element that has a foreign key to dimension and records the value of that dimension.
If you read the csv the psuedo code would look something like this:
for row in DictReader(file):
for k in row.keys():
try:
dim = Dimension.objects.get(name=k)
except:
dim = Dimension(name=k)
dim.save()
DimensionRecord(dimension=dim, value=row[k]
Obviously you could better handle reading the headers and error trapping if dimensions already exist, but this would be an example of how you could dynamically load variable headered CSV's into a SQL db.

Django: Filtering a queryset locally from cache

If I perform a prefetch_related('toppings') for a queryset, and I want to later filter(spicy=True) by fields in the related table, Django ignores the cached info and does a database query. I found that this is documented (under the Note box) and seems to happen for all forms of caching (select_related(), already evaluated querysets, etc.) when another filter() is performed.
However, is there some sort of super secret hidden time-saving shortcut to filter locally (using the cache and not hitting the database) without having to write the python code to loop the queryset (using list/dict comprehension, etc.)? Maybe something like a filter_locally(spicy=True)?
EDIT:
One of the reasons why a list/comprehension doesn't work well for me is because a list/dict does not have the queryset methods. In my case, the first level M2M field, toppings, isn't the end goal for me and I need to check a 2nd related M2M field (which I have already pre-fetched as well). While this is also possible using list comprehension, it's just much simpler to have something such as filter_locally(spicy=True, origin__country='Spain') because:
it allows accessing many levels of related fields with minimal effort
it allows chaining other queryset methods
it's easier to read because it's consistent with the familiar filter()
it's easier to modify existing code using filter() without prefetch to add this optimization in without much changes.
But from the responses, Django has no such support :(
You have to write the python code to loop through the queryset (a list/dict comprehension is ideal). All the filter() code knows how to do is add filtering language to the SQL sent to the database. Filtering locally is a totally different problem than filtering remotely, so the solutions to those two separate problems won't be able to share any logic.
A list comprehension one-liner would be pretty straightforward, though; the syntax might not be much more complex than with filter().
If you're filtering on a boolean doing the list comprehension is pretty easy. You can also swap out the topping.spicy==True for a string comparison or whatever.
I would do something like:
qs = Pizza.objects.all().prefetch_related('toppings')
res = list(qs)
def get_spicy(qs):
res = list(qs)
return [pizza for pizza in res if any(topping.spicy==True for
topping in pizza.toppings.all())]
That is if you want to return the pizza object if any of its toppings is spicy. You can also replace the any() with all() to check for all, and do a lot of pretty powerful queries with this syntax. I'm somewhat surprised that there is no easy way to do this in django. It seems like a lot of these simple queries should be easy to implement in a generic manner.
The above code assumes a many2many. It should be easy to modify to work with a simple FK relationship such as a one2one or one2many.
Hope this was helpful.

Slow page generation in Django with 50+ sql queries per page

In my Django app I noticed that pages with big number of sql queries load considerably slower than other pages. I'm not a first day in web dev and mainly I have a deal with such a resource hog as Drupal, but even Drupal with its 150 - 200 sql queries per page generates page in 0.5 - 0.7 sec.
Django from the other side, performs really bad with more or less average number of queries per page. For example, one of my pages generates 60 queries like this:
SELECT`gamenode_gamenode`.`id`, `gamenode_gamenode`.`title`, `gamenode_gamenode`.`short_desc`, `gamenode_gamenode`.`full_desc`, `gamenode_gamenode`.`slug`, `gamenode_gamenode`.`type`, `gamenode_gamenode`.`source_gameid`, `gamenode_gamenode`.`created`, `gamenode_gamenode`.`updated`, `gamenode_gamenode`.`status`, `gamenode_gamenode`.`promote`, `gamenode_gamenode`.`sticky`, `gamenode_gamenode`.`hit_count`, `gamenode_gamenode`.`game_rank`, `gamenode_gamenode`.`share_count`, `gamenode_gamenode`.`like_count`, `gamenode_gamenode`.`comment_count` FROM `gamenode_gamenode` WHERE `gamenode_gamenode`.`id` = 1058
and outputs the data as a simple string and it takes 1200ms to generate a page! I did this just for a test to generate many fairly simple queries. If I lower the number of queries to 10 - 15, page generation time will come back to more or less acceptable number.
So I have a question, why Django is so slow when there are many sql queries on the page? I did similar comparisons by using Rails, Symfony and Drupal and all these "resource hogs" performed way better than Django. Am I doing something wrong or there's some "secret" setting to make things faster in Django or, maybe, Djangonauts consider such times as normal and just strive to write code which produces as few queries as possible? Please help me to figure this out.
Yes, Django's ORM is pretty slow. You have three choices for dealing with this:
Complain about it.
Switch to another web application framework.
Make some effort to understand why your application is generating so many database queries, and learn how to use Django's ORM effectively so as to reduce the number of queries.
(1) might be psychologically satisfying but won't solve your problem; (2) is off-topic here at Stack Overflow (but you might look at Wikipedia's Comparison of web application frameworks).
We can help you with (3), but only if you show us some more of your code. The query you quoted looks like a typical query that Django would generate for a call to get():
GameNode.objects.get(id = 1058)
You shouldn't be running more than a couple of queries like this on a page: if you want to get many GameNodes you need to get them in a single query:
GameNode.objects.filter(<criteria>)
Or if the GameNode objects are related to some other object by a foreign key on another model that you are querying, then you could fetch all the related GameNode objects by using Django's select_related() method.
There's almost always a way to speed things up (see this testimonial) but we need to know the details before we can say how to do it.

Proper way to call a database function from Django?

i'm triyng to make a full text search with postgresql and django So I've created a function search_client(text) which returns a list of clients. To call it from the DB i use something like this:
SELECT * FROM search_client('something')
and i'm not really sure how to call it from django. i know i could do something like
cursor = connection.cursor()
cursor.execute("SELECT * FROM search_client('something')")
result = cursor.fetchall()
but that will only return a list of values, and i'd like to have a list of objects, like when i use the "filter()" method.
Any ideas?? thanks for your time!
If your goal is a full-featured search engine, have a look at django-haystack. It rocks.
As for your question, the new (Django 1.2) raw method might work:
qs = MyModel.objects.raw("SELECT * FROM search_client('something')")
If you're using Django 1.2, you can use the raw() ORM method to execute custom SQL but get back Django models. If you're not, you can still execute the SQL via the extra() method on default QuerySet, and pump it into a custom method to either then go pull the real ORM records, or make new, temporary, objects
First, you probably don't want to do this. Do you have proof that your database function is actually faster?
Implement this in Python first. When you can prove that your Python implementation really is the slowest part of your transaction, then you can try a stored procedure.
Second, you have the extra method available in Django.
http://docs.djangoproject.com/en/1.2/ref/models/querysets/#django.db.models.QuerySet.extra
Note that compute-intensive database procedures are often slow.