Slow page generation in Django with 50+ sql queries per page - django

In my Django app I noticed that pages with big number of sql queries load considerably slower than other pages. I'm not a first day in web dev and mainly I have a deal with such a resource hog as Drupal, but even Drupal with its 150 - 200 sql queries per page generates page in 0.5 - 0.7 sec.
Django from the other side, performs really bad with more or less average number of queries per page. For example, one of my pages generates 60 queries like this:
SELECT`gamenode_gamenode`.`id`, `gamenode_gamenode`.`title`, `gamenode_gamenode`.`short_desc`, `gamenode_gamenode`.`full_desc`, `gamenode_gamenode`.`slug`, `gamenode_gamenode`.`type`, `gamenode_gamenode`.`source_gameid`, `gamenode_gamenode`.`created`, `gamenode_gamenode`.`updated`, `gamenode_gamenode`.`status`, `gamenode_gamenode`.`promote`, `gamenode_gamenode`.`sticky`, `gamenode_gamenode`.`hit_count`, `gamenode_gamenode`.`game_rank`, `gamenode_gamenode`.`share_count`, `gamenode_gamenode`.`like_count`, `gamenode_gamenode`.`comment_count` FROM `gamenode_gamenode` WHERE `gamenode_gamenode`.`id` = 1058
and outputs the data as a simple string and it takes 1200ms to generate a page! I did this just for a test to generate many fairly simple queries. If I lower the number of queries to 10 - 15, page generation time will come back to more or less acceptable number.
So I have a question, why Django is so slow when there are many sql queries on the page? I did similar comparisons by using Rails, Symfony and Drupal and all these "resource hogs" performed way better than Django. Am I doing something wrong or there's some "secret" setting to make things faster in Django or, maybe, Djangonauts consider such times as normal and just strive to write code which produces as few queries as possible? Please help me to figure this out.

Yes, Django's ORM is pretty slow. You have three choices for dealing with this:
Complain about it.
Switch to another web application framework.
Make some effort to understand why your application is generating so many database queries, and learn how to use Django's ORM effectively so as to reduce the number of queries.
(1) might be psychologically satisfying but won't solve your problem; (2) is off-topic here at Stack Overflow (but you might look at Wikipedia's Comparison of web application frameworks).
We can help you with (3), but only if you show us some more of your code. The query you quoted looks like a typical query that Django would generate for a call to get():
GameNode.objects.get(id = 1058)
You shouldn't be running more than a couple of queries like this on a page: if you want to get many GameNodes you need to get them in a single query:
GameNode.objects.filter(<criteria>)
Or if the GameNode objects are related to some other object by a foreign key on another model that you are querying, then you could fetch all the related GameNode objects by using Django's select_related() method.
There's almost always a way to speed things up (see this testimonial) but we need to know the details before we can say how to do it.

Related

Django QuerySet Limit hits database too many times?

I've recently stumbled upon a massive bottleneck on a production website only after updating from Django 1.11 > 2.1
Here is my simple slice of code;
pages = Page.objects.filter(cat="news_item").order_by('-created')[:2]
This in turn, creates around ~30 queries, around the number of pages under that specific filter.
I have now implemented a somewhat hacky way to resolve the 32 queries which i'm not satisfied with.
pages = [Page.objects.filter(cat='news_item').order_by('-created')[i] for i in range(0,2)]
Speed is notably effected, a few other chunks of code used this method which caused >400 queries per page load - I have since adapted these to use a combination of the above code & Model.objects.raw
Did something change in Django 2.0/2.1 that I missed or does the [:2] limit not work correctly?
Weirdest issue/bug/confusion I've ever seen.
Doing the following only queries once;
pages = Page.objects.filter(cat="news_item").order_by('-created')[:2:1]
I noted on the django documentation here that it states
https://docs.djangoproject.com/en/dev/topics/db/queries/#limiting-querysets
Generally, slicing a QuerySet returns a new QuerySet – it doesn’t evaluate the query. An exception is if you use the “step” parameter of Python slice syntax. For example, this would actually execute the query in order to return a list of every second object of the first 10:
Entry.objects.all()[:10:2]
So, using this weird trick above - it forces this basic piece of code to evaluate and query the database immediately, only causing one database query instead of 30+

RESTomancy: Getting products from category in Prestashop 1.5

Disclaimers:
This is oriented towards Prestashop 1.5 but if the answer is: "this
is fixed in version 1.x" then I'll raise a petition to update our
shop.
I'm also tagging it as REST because I think I explained it throughly
enough so you don't need actual experience with Prestashop to understand it.
So in Prestashop we have this Web Services which lack support for use cases as simple as search by category.
1. Let's say you want to get all the products from categories 1, 3 and 17:
So what is the solution?
Well, you can do something in the lines of this answer: https://stackoverflow.com/a/34061229/4209853
where you get all the products from categories 1, 3 and 17 and then you make another call for products filtering by those ID's.
'filter[id]' => '['.implode('|',$productIDsArrayIGotBefore).']',
it's ugly and 20th centurish, but well... if it gets the job done... except it doesn't.
You see, this is a call for getting a resource, and someone somewhere decided:
Hey, we have all this nice HTTP action verbs, so let's use them for REST CRUD interfaces: POST for C, GET for R, PUT for U and DELETE for D. Neat.
And that's nice and all, but when combined with the lack of expressive power of Prestashop's Web Services means it's stupidly easy to run into, you guessed it? Yes, 414.
Error HTTP 414 Request URI too long
and we all know that modifying Apache so it accepts longer request URIs is not the neat scalable solution.
So we could try to split the array and make multiple calls, which is just conceptually ugh. Not just because of the performance hit of making multiple queries, but also because we would need to take into account the number of characters of all IDs concatenated to make the calculation of how many we can (safely) ask for in one call. And all that would have their own caveats, like:
2. What if we want to also filter them e.g. active=1?
Now we're in for a ride, because we can't know beforehand how many calls we will need to make.
Let's define:
N are the IDs I got from categories
n is the number of IDS I can safely ask for
T is the number of (filtered) products I want
tare the (filtered) products I already have
k are the (filtered) products we receive from the call
So we would end up with something like:
do{
n0= max(T-t, n);
k= get(products, n0);
t +=k;
}while(count(k)!=0 and count(t)<T and !empty(N))
..which is just... bonkers.
The only elegant solution I can come up with is creating a new Prestashop Web Service that acts as a wrapper, receiving the petition through POST and forwarding it to the Prestashop service.
But, befores that... do you have a better solution using some kind of RESTomancy I may be missing?

Optimized Django Queryset

I have the following function to determine who downloaded a certain book:
#cached_property
def get_downloader_info(self):
return self.downloaders.select_related('user').values(
'user__username', 'user__full_name')
Since I'm only using two fields, does it make sense to use .defer() on the remaining fields?
I tried to use .only(), but I get an error that some fields are not JSON serializable.
I'm open to all suggestions, if any, for optimizing this queryset.
Thank you!
Before you try every possible optimization, you should get your hands on the SQL query generated by the ORM (you can print it to stdout or use something like django debug toolbar) and see what is slow about it. After that I suggest you run that query with EXPLAIN ANALYZE and find out what is slow about that query. If the query is slow because lot of data has to be transfer than it makes lot of sense to use only or defer. Using only and defer (or values) gives you better performances only if you need to retrieve lot of data, but it does not make your database job much easier (unless you really have to read a lot of data of course).
Since you are using Django and Postgresql, you can get a psql session with manage.py dbshell and get query timings with \timing

ZF2 Doctrine2 App is very slow because of doctrine method calls

The request time for the homepage of my app is about 5 seconds although there are only 6 database queries. So I decided to install xdebug with webgrind on my local server to profile my app. There I can see, that I have a huge amount of doctrine method calls, but I don't know really how to interpret this to minify the number of that calls. Maybe someone could give me a hint.
RestaurantRepository
public function findByCity(City $city) {
$queryBuilder = $this->createQueryBuilder('restaurant');
$queryBuilder->addSelect('cuisines')
->addSelect('openingHours')
->addSelect('address')
->addSelect('zipCode')
->addSelect('city')
->leftJoin('restaurant.cuisines', 'cuisines')
->leftJoin('restaurant.openingHours', 'openingHours')
->leftJoin('restaurant.meals', 'meals')
->innerJoin('restaurant.address', 'address')
->innerJoin('address.zipCode', 'zipCode')
->innerJoin('zipCode.city', 'city')
->where('zipCode.city = :city')
->andWhere('restaurant.state <= :state')
->setParameter('city', $city)
->setParameter('state', Restaurant::STATE_ENABLED)
->orderBy('restaurant.state', 'ASC')
->addOrderBy('restaurant.name', 'ASC');
return $queryBuilder->getQuery()->getResult();
}
You probably load all associations of some of your entities. It is hard to say where the problem is exactly without any more information about your entity definitions and the queries you are executing.
In the doctrine documentation are some suggestions for improving performance (one of them is about lazy loading associations) that might help you to get on your way.
Install and enable the ZendDeveloperToolbar module. There you will have a possibility to check how many DB calls you are making with each action.
As you can see on the image, there's a lot of hydration going on under the hood. There's a lot of tutorials on the net how NOT to use Doctrine. I can't tell anything without looking at what you are doing with your entities.
Also make sure you have enabled cache when in production mode so Doctrine don't have to parse mapping information with each request, which is very heavy. You probably are using the Annotation Driver, which is the slowest one.
I can also see you are using the Zend autoloader which is inefficient comparing to Composer. Simply add your modules/src's to the autoload section of composer.json file and let the Composer do the autoloading.

Datastore NDB best practices when querying and extracting thousands of rows

I'm using the High Replication Datastore, along with ndb. I have a kind with over 27,000 entities, which isn't that much. Supposedly the datastore is efficient in querying and extracting large amounts of data, but whenever I query over that kind, queries take a long time to finish (I've even got DeadlineExceededErrors).
I have a model where I store keywords and URLs I want to index in Google:
class Keywords(ndb.Model):
keyword = ndb.StringProperty(indexed=True)
url = ndb.StringProperty(indexed=True)
number_articles = ndb.IntegerProperty(indexed=True)
# Some other attributes... All attributes are indexed
My current use cases are to build my Sitemap, and to fetch my top 20 keywords to link from my hope page.
When I fetch many entities, I usually do:
Keywords.query().fetch() # For the sitemap, as I want all of the urls
Keywords.query(Keywords.number_articles > 5).fetch() # For the homepage, I want to link to keywords with more than 5 articles
Is there a better way to extract data?
I've tried to index data into the Search API, and I've seen huge speed gains. Even though this works, I don't think it's ideal to replicate data from the Datastore into Search API with basically the same fields.
Thanks in advance!
I would split this functionality.
For home page you can use your second query, but add, as advised by Bruyere, limit=20 paramater. Such request should run very fast, if you have the right index.
The site map is a bigger issue. Usually, to process large number of entities, you use Map reduce.
It's probably a good idea, but only if you don't have too many requests to sitemap. It can also be the only solution if you update Keywords entities often and want as up to date site map as possible.
Another option can be to generate sitemap in a task, save it as a blob and serve this blob in the request. That is really quick. If your updates to the Keywords entities are not very frequent, then you can run this task after any update. If you have many updates, then you can schedule the task to run periodically in cron. As you have success using search API, then this is probably the best option for you.
Generally speaking I don't think it's a good idea to use datastore to retrieve large amounts of data. I recommend to look at least at Datastore comparison with traditional databases. It's designed to handle large databases, but not necessarily large result sets. I would say that datastore is designed to handle large amounts of small requests.
DB speed is related to the number of results returned, not the number of records in the DB. You say:
to build my Sitemap, and to fetch my top 20 keywords
If thats the case add limit=20 in both fetches. If you do it that way then use run instead as per the docs:
https://developers.google.com/appengine/docs/python/datastore/queryclass#Query_fetch