I use matches.query.__format__('') to print the raw SQL that a Django query will execute.
If I execute that query directly in psql it takes 5-10ms, while the Django query as timed below can take up 100ms the first time it's executed.
Losing 100ms is a lot (will have to run a second query - so that's 2 x 100ms - add in latency and users easily notice). Is this normal? Am I missing something?
def api(request):
tag = request.GET.get('q', '')
matches = Relationship.objects.filter(keyword=tag, count__gte=3).order_by('-count')[:30]
print(matches.query.__format__('')) # get raw SQL query here
start_time = time.time()
print(matches) # lazy query executed here
print("Time elapsed {0:0.1f}ms".format((time.time() - start_time) * 1000))
mydict = serialize_matches(matches, tag)
return JsonResponse(mydict)
UPDATE:
Thanks for the tips below. Django seems fine, it's the database that's slow after all. Some of my psql queries were very fast as results were already cached. There seems to be some caching even when psql is restarted, which can confuse performance tests.
I found that Django is fine and it was my database that was slow. Some psql queries were fast only because results were cached. Note, there seems to be some caching even when psql is restarted.
So when you are testing the performance of your database make sure queries aren't cached.
It was not necessary to use raw SQL queries at the end as the Django ORM seems just fine in terms of performance.
Related
I see time of query QUERY = 'SELECT COUNT_BIG(*) AS [__count] FROM ... in Django debug toolbar. Is it pure perfomance of database or dirty time that includes handling of query by Django and third-party libraries?
As seen on the docs:
SQL
classdebug_toolbar.panels.sql.SQLPanel
SQL queries including time to execute and links to EXPLAIN each query.
This means it doesn't include django processing time.
https://django-debug-toolbar.readthedocs.io/en/latest/panels.html#sql
I have a query that I need within my Django app that I needed to hand-optimize. But getting the query to run fast means that I need to be able to tell Postgres "don't use parallelism on this".
What I thought would work was:
from django.db import connection
cursor = connection.cursor()
# start a transaction so that PGBouncer runs the next statements
# on the same connection
cursor.execute("begin")
# turn off parallelism for this next query
cursor.execute("set max_parallel_workers_per_gather = 0")
# run my query
cursor.execute("HAND-TUNED SELECT QUERY GOES HERE")
# process the cursor results
# Put this connection back in the PGBouncer pool, and reset
# max_parallel_workers_per_gather.
cursor.execute("rollback")
But it does not seem to be working. My query continues to show up in my "slow query" logs when I run it through the Django site, and the performance remains lousy (4+ seconds with parallelism, 0.5 seconds without).
Is there a way to do what I need to do?
First, you should use SET LOCAL so that the effect is limited to the transaction.
Then I recommend that you use auto_explain to find out the actual plan that was used for the query. Maybe there is a different reason for the slowdown.
I'm currently in the process of optimizing my Django app, which is acting as an API for my front-end with the Django Rest Framework. While running my server in debug mode, I've noticed that every time a queryset gets executed, there's a query run right before it that always looks like this:
SELECT COUNT('*') AS "__count" FROM "table_name WHERE ..."
The ... part always mirrors the query that returns the objects that I want. I'm unsure if this is only run in debug, something that the QuerySet object does innately, or an error with my code. Would appreciate some insight as to why this is happening and if it's something I need to worry about
This occurs in Django Rest Framework when you are using paging on a list view:
One query to fetch the data for your current page.
A second query to calculate the total number of records for the same queryset.
I was wondering if there is a way to cache querysets to memcache on a site that has authenticated users and users that are not authenticated.
Basically I just need to cache queries from one table.
Any ideas would be great.
Thanks
Check out johnny-cache. This worked great for us until we did so much writing (updating records) that we were invalidating cache constantly. At that point we just started using memcache directly, like this.
cache.set("some_unique_key", my_queryset, 3600)
cache_object = cache.get(cache_key)
If you're dealing with large querysets or objects, you might want to pickle them first.
cache.set("some_unique_key", zlib.compress(cPickle.dumps(cache_object), 1), 3600)
zipped_cache_object = cache.get(cache_key)
if zipped_cache_object:
cache_object = cPickle.loads(zlib.decompress(zipped_cache_object))
Django's Caching Docs
Hi I have some problems that has been bothering me for a week. I am running Selenium testing scripts on my dev machine, and in my test I would call simple script to delete accounts by their sub domain names:
for a in Account.objects.filter(domain = sub_domain):
a.delete()
The problem is that the query to find all such accounts are not returning correct results after the first time it is run (I use this query to clean up the database before each test). When I set a break point at this point, I can see the query return 0 records, even though in the database it has one record. I also set up mysql query log to see the actual query Django sent to mysql, and the query looks good, and will return correct result if I copy and paste to mysql command shell.
What am I missing? Why Django model query does not give me the correct result? MySQL is using InnoDB engine in case that makes any difference.
Transactions. Do a COMMIT in the shell.
This is a recurring problem, so I'm doing a shameless plug with a question in which I described details of the problem:
How do I deal with this race condition in django?