Turn off Postgres parallel query for a single Django query - django

I have a query that I need within my Django app that I needed to hand-optimize. But getting the query to run fast means that I need to be able to tell Postgres "don't use parallelism on this".
What I thought would work was:
from django.db import connection
cursor = connection.cursor()
# start a transaction so that PGBouncer runs the next statements
# on the same connection
cursor.execute("begin")
# turn off parallelism for this next query
cursor.execute("set max_parallel_workers_per_gather = 0")
# run my query
cursor.execute("HAND-TUNED SELECT QUERY GOES HERE")
# process the cursor results
# Put this connection back in the PGBouncer pool, and reset
# max_parallel_workers_per_gather.
cursor.execute("rollback")
But it does not seem to be working. My query continues to show up in my "slow query" logs when I run it through the Django site, and the performance remains lousy (4+ seconds with parallelism, 0.5 seconds without).
Is there a way to do what I need to do?

First, you should use SET LOCAL so that the effect is limited to the transaction.
Then I recommend that you use auto_explain to find out the actual plan that was used for the query. Maybe there is a different reason for the slowdown.

Related

Django: reset a specific field of a model every hour

I have a field in one of my models in django which I want to be reset every hour.
(i.e. at each o'clock its value becomes zero)
How can I do this task? Can I schedule a function in django?
As you know we can define EVENTs and TRIGGERs in mysql and other database backend. Also I am familiar with signals in django but those can not fit in my needs. (because database event is somewhat outside of django and have problems; with signals although it seems this is impossible!)
You could use schedule, it's very easy to apply for your problem.
import schedule
import time
def job():
print("I'm working...")
schedule.every().hour.do(job)
while True:
schedule.run_pending()
time.sleep(1)
Here there is a thread where it is shown how to execute a task periodically. Then you could add some conditions to fit your scenario.

Postgresql Could not serialize access due to concurrent update, how to find out why

Project use Django and Postgresql 9.5. Sometimes I see the error in celery task.
When object need change specified column it uses celery task.
This task writes in separate table change history of an object and update column(not raw SQL, by Django ORM).
Task write history by FDW extension into the foreign table.
Thrown exception:
Remote SQL command: COMMIT TRANSACTION\nSQL statement "SELECT 1 FROM ONLY "public"."incident_incident" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"\n',)
I can't take understand why it raises the exception. Task very simple
screen logs(maybe it help):
In celery, when you are doing transaction, then you can use transaction.atomic block to do that.
For example:
#app.task(bind=True)
def do_task(self)
try:
with transaction.atomic():
# Do DB OP
except (SomeException,Exception) as exc:
raise self.retry(exc=exc)
There are other approaches as well. You can add a new field regarding object change in Model and track it. You can read this article on medium regarding this approach. Hope it helps!!

Django query slow relative to raw SQL

I use matches.query.__format__('') to print the raw SQL that a Django query will execute.
If I execute that query directly in psql it takes 5-10ms, while the Django query as timed below can take up 100ms the first time it's executed.
Losing 100ms is a lot (will have to run a second query - so that's 2 x 100ms - add in latency and users easily notice). Is this normal? Am I missing something?
def api(request):
tag = request.GET.get('q', '')
matches = Relationship.objects.filter(keyword=tag, count__gte=3).order_by('-count')[:30]
print(matches.query.__format__('')) # get raw SQL query here
start_time = time.time()
print(matches) # lazy query executed here
print("Time elapsed {0:0.1f}ms".format((time.time() - start_time) * 1000))
mydict = serialize_matches(matches, tag)
return JsonResponse(mydict)
UPDATE:
Thanks for the tips below. Django seems fine, it's the database that's slow after all. Some of my psql queries were very fast as results were already cached. There seems to be some caching even when psql is restarted, which can confuse performance tests.
I found that Django is fine and it was my database that was slow. Some psql queries were fast only because results were cached. Note, there seems to be some caching even when psql is restarted.
So when you are testing the performance of your database make sure queries aren't cached.
It was not necessary to use raw SQL queries at the end as the Django ORM seems just fine in terms of performance.

Django SELECT COUNT(*) as "__count" for every query

I'm currently in the process of optimizing my Django app, which is acting as an API for my front-end with the Django Rest Framework. While running my server in debug mode, I've noticed that every time a queryset gets executed, there's a query run right before it that always looks like this:
SELECT COUNT('*') AS "__count" FROM "table_name WHERE ..."
The ... part always mirrors the query that returns the objects that I want. I'm unsure if this is only run in debug, something that the QuerySet object does innately, or an error with my code. Would appreciate some insight as to why this is happening and if it's something I need to worry about
This occurs in Django Rest Framework when you are using paging on a list view:
One query to fetch the data for your current page.
A second query to calculate the total number of records for the same queryset.

How can I limit database query time during web requests?

We've got a pretty typical django app running on postgresql 9.0. We've recently discovered some db queries that have run for over 4 hours, due to inefficient searches in the admin interface. While we plan to fix these queries, as a safeguard we'd like to artificially constrain database query time to 15 seconds--but only in the context of a web request; batch jobs and celery tasks should not be bounded by this constraint.
How can we do that? Or is it a terrible idea?
The best way to do this would be to set up a role/user that is only used to run the web requests, then set the statement_timeout on that role.
ALTER ROLE role_name SET statement_timeout = 15000
All other roles will use the global setting of statement_timeout (which is disabled in a stock install).
You will need to handle this manually. That is checking for the 15 second rule and killing the queries that violate it.
Query pg_stat_activity and find the violators and issue calls to pg_terminate_backend(procpid) to kill the offenders.
Something like this in a loop:
SELECT pg_terminate_backend(pg_stat_activity.procpid)
FROM pg_stat_activity
WHERE pg_stat_activity.datname = 'TARGET_DB'
AND usename = 'WEBUSERNAME'
AND (now()-query_start) > '00:00:15';
As far as the timing goes, you could pass all of your queries through a class which, on instantiation, spawns two threads: one for the query, and one for a timer. If the timer reaches 15 seconds, then kill the thread with the query.
As far as figuring out if the query is instantiated from a web request, I don't know enough about Django to be able to help you. Simplistically, I would say, in your class that handles your database calls, an optional parameter to the constructor could be something like context, which could be http in the event of a web request and "" for anything else.