So I have been developing a Django server application that mostly works as an API endpoint for a mobile app, with about 30 different models, almost all of them having FKs to each other or other or being in MTM relationships. Now that we're going into production (but do not yet have a lot of users), I have noticed (using Silk) that the most complex query that fetches a bunch of objects as JSON makes about 500 SQL queries (and those objects each have about 5 FKs and 2 MTMs, which are all fetched as object fields in the JSON). The numbers don't seem to be too huge (as 50k qps seems to be a normal number for Postgres, which we are using as our DBMS), but I am getting worried about the future. Are those numbers normal in early production? What is the normal distribution of database requests per view for an API like the one I described? We are not currently using DRF, but I am looking towards it. Does it solve this problem?
Related
I'm new to django, I have this application where each of my views have an average of 6 queries. Is that ok, or should I optimise my database for better.
Instagram is mainly using Django as their backend framework mainly, and they have a huge amount of user as we know. Their solution for solving your problem is to create multiple data centers to process the request to make it like a distributed system.
In your case, if it is running as a private project and the query is not querying a huge database, I think it is okay.
Running a Django application on Appengine we need to make a query that returns approx. 450 rows per request including joins M2M prefetch_related and select_related.
When we make many concurrent requests, the query time for each request goes up in a way that all requests end simultaneously.
Running the same concurrent requests on a non-appengine Django installation or in an appengine instance that has threading set to false do not show this behavior.
There is also a slight improvement when the requests are separated to different appengine instances.
Has anyone seen this before?
Sounds like your database backend is too heavily loaded by your query. Have you tried upgrading to a higher tier?
The basic tier only handles 25 concurrent queries. You said "many" in your question, so if "many" > 25 that's the source of your problem:
https://developers.google.com/cloud-sql/pricing
I am currently trying to figure out he best practice in order to design my web services between a django administrated database (+ images) and a mobile app. My main concern is how to separate a bulk update (send every data in the database and all the files on the server) and a lighter, smaller update with only the new and / or modified objects (images or data.)
I have had access to a working code-base using a cronjob and states for each data field (new, modified, up to date) to generate either a reference data file or an update file. I find it to be very redundant and somewhat unelegant, in contradiction with the DRY spirit of Django (there are tons of lines of code, making it nearly unmaintainable.))
I find it very surprising that this aspect is almost un-documented, since web traffic is a crucial matter in mobile developpment.. Fetching everytime all the data served quickly becomes unsustainable as the database grows..
I would be very grateful for any lead or advice you could give me :-) Thx in advance !
Just have a last_modified DateTimeField in your table, and in your user's profile a last_synchronized DateTimeField. When the mobile app wants to synchronize, send the data which was modified after the last synchronization run, and update the last_synchronized field in the user's profile.
My Django backend is always dynamic. It serves an iOS app similar to that of Instagram and Vine where users upload photos/videos and their followers can comment and like the content. Just for the sake of this question, imagine my backend serves an iOS app that is exactly like Instagram.
Many sources claim that using memcached can improve performance because it decreases the amount of hits that are made to the database.
My question is, for a backend that is already in dynamic in nature (always changing since users are uploading new pictures, commenting, liking, following new users etc..) what can I possibly cache?
It's a problem I've been thinking about for quite some time. I could cache the user profile data, but other than that, I don't know where else memcached would be useful.
Other sources mentioned using it everywhere in the backend where a 'GET' call is made but then I would need to set a suitable time limit to expire the cache since the app is always dynamic. What are your solutions and suggestions for getting around this problem?
You would cache whatever is being most frequently accessed from your Database. Make a list of the most frequent requests to get data from the database and cache the data in that priority.
Cache the most frequent requests based on category of the pictures
Cache based on users - power users go into cache (those which do a lot of data access)
Cache the most recent inserts (in case you have a page which shows the recently added posts/pictures)
I am sure you can come up with more scenarios. I am positive memcached (or any other caching) will help, even though your app is very 'dynamic'.
If you are trying to diagnose slow queries in your mysql backend and are using a Django frontend, how do you tie together the slow queries reported by the backend with specific querysets in the Django frontend code?
I think you has no alternative besides logging every django query for the suspicious querysets.
See this answer on how to access the actual query for a given queryset.
If you install django-devserver, it will show you the queries that are being run and the time they take in your shell when using runserver.
Another alternative is django-debug-toolbar, which will do the same in a side panel-overlay on your site.
Either way, you'll need to test it out in your development environment. However, neither really solves the issue of pinpointing you directly to the offending queries; they work on a per-request basis. As a result, you'll have to do a little thinking about which of your views are using the database most heavily and/or deal with exceptionally large amounts of data, but by cherry-picking likely-candidate views and inspecting the times for the queries to run on those pages, you should be able to get a handle on which particular queries are the worst.