Django Query extremely slow

Django Query extremely slow - django

I have a problem with a Django application. Queries on the model Scope are extremely slow and after some debugging I still have no clue where the problem lies.
When I query the db like scope = Scope.objects.get(pk='Esoterik I') it takes 5 to 10 seconds. The database has less than 10 entries and an index on the primary key, so it is way too slow. When executing the an equivalent query on the db like SELECT * FROM scope WHERE title='Esoterik I'; everything is ok and it takes only about 50ms.
The same problem happens if I query a set of results like scope_list = Scope.objects.filter(members=some_user) and then call print(scope_list) or iterate over the list elements. The query itself only takes a few ms but the print or iterating of the elements takes again like 5 to 10 seconds but the set has only two entries.
Database Backend is Postgresql. The same problem occurs on the local development server and apache.
Here the code of the model:
class Scope(models.Model):
title = models.CharField(primary_key=True, max_length=30)
## the semester the scope is linked with
assoc_semester = models.ForeignKey(Semester, null=True)
## the grade of the scope. can be Null if the scope is not a class
assoc_grade = models.ForeignKey(Grade, null=True)
## the timetable of the scope. can be null if the scope is not direct associated with a class
assoc_timetable = models.ForeignKey(Timetable, null=True)
## the associated subject of the scope
assoc_subject = models.ForeignKey(Subject)
## the calendar of the scope
assoc_calendar = models.ForeignKey(Calendar)
## the usergroup of the scope
assoc_usergroup = models.ForeignKey(Group)
members = models.ManyToManyField(User)
unread_count = None
update
Here is the output of the python profiler. It seems that query.py was getting called 1.6 million times - a little too much.

You should try and first isolate the problem. Run manage.py shell and run the following:
scope = Scope.objects.get(pk='Esoterik I')
print scope
Now django queries are not executed until they very much have to. That is to say, if you're experiencing slowness after the first line, the problem is somewhere in the creation of the query which would suggest problems with the object manager. The next step would be to try and execute raw SQL through django, and make sure the problem is really with the manager and not a bug in django in general.
If you're experiencing slowness with the second line, the problem is eitherwith the actual execution of the query, or with the display\printing of the data. You can force-execute the query without printing it (check the documentation) to find out which one it is.
That's as far as I understand but I think the best way to solve this is to break the process down to different parts and finding out which part is the one causing the slowness

For being sure about the database execution time, it is better to test queries generated by Django since Django-generated queries may not be a simple SELECT * from blah blah
To see the Django generated query:
_somedata = Scope.objects.filter(pk='Esoterik I') # you must use filter in here
print somedata.query.__format__('')
This will display you the complete query generated by Django. Then copy it and open a Postgresql console and use Postgresql analyze tools:
EXPLAIN ANALYZE <your django query here>;
like:
EXPLAIN ANALYZE SELECT * FROMsomeapp_scope WHERE id = 'Esoterik I';
EXPLAIN will show average execution data while ANAYLZE will also show you some extra data about execution time of that analyze.
You can also see if any index is used by postgresql during query execution in those analyze results.

Related

Best index for a Django model when filtering on one field and ordering on another field

I use Django 2.2 linked to PostgreSQL and would like to optimise my database queries.
Given the following simplified model:
class Person(model.Models):
name = models.CharField()
age = models.Integerfield()
on which I have to do the following query, say,
Person.objects.filter(age__gt=20, age__lt=30).order_by('name')
What would be the best way to define the index in the model Meta field so as to optimise the query?
Which of these four options would be best?
class Meta
indexes = [models.Index(fields=['age','name']),
models.Index(fields=['name','age']),
models.Index(fields=['name']),
models.Index(fields=['age'])]
Is it, for example, possible to prevent sorting when the query is done? Thank you.

This is really a postgres question, as much as a Django question, right?
I think there is a good chance that creating an index on your sort field will help with performance. But there are a lot of caveats and if it's really important to you, you might want to do some testing focused on Postgres (ie, just run some queries in psql and see what happens). Some caveats include:
it might depend on which type of index is created for you by Django
Postgres, of course, does not always use index anyway when running a query but it should if you've got the right one and the right query (and if there is enough data in the table to justify loading the index)
it might matter how your SELECT is formatted by Django
I suggest you create your model and specify that you want the index. Then use Django Debug Toolbar to find out what SELECT query is really getting run. Then, open a dbshell with manage.py dbshell (aka psql) and run ANALYZE with that same select. Assuming you can interpret the output, you will see for yourself whether your index is coming in to play. Paste the ANALYZE output here, if you like.
According to this Postgres documentation ORDER BY can be assisted by a btree index. The b-tree type of index is what Django will create for you by default.
So, why don't you try this:
class Meta:
indexes = [models.Index(fields=['age', 'name'])]
Then go run an EXPLAIN ANALYZE in dbshell and see whether it worked.

# You should apply indexing on age, because you are searching for 'age' column data
indexes = [
models.Index(fields=['age'])
]

Why is the database used by Django subqueries not sticky?

I have a concern with django subqueries using the django ORM. When we fetch a queryset or perform a DB operation, I have the option of bypassing all assumptions that django might make for the database that needs to be used by forcing usage of the specific database that I want.
b_det = Book.objects.using('some_db').filter(book_name = 'Mark')
The above disregards any database routers I might have set and goes straight to 'some_db'.
But if my models approximately look like so :-
class Author(models.Model):
author_name=models.CharField(max_length=255)
author_address=models.CharField(max_length=255)
class Book(models.Model):
book_name=models.CharField(max_length=255)
author=models.ForeignKey(Author, null = True)
And I fetch a QuerySet representing all books that are called Mark like so:-
b_det = Book.objects.using('some_db').filter(book_name = 'Mark')
Then later if somewhere in the code I trigger a subquery by doing something like:-
if b_det:
auth_address = b_det[0].author.author_address
Then this does not make use of the original database 'some_db' that I had specified early on for the main query. This again goes through the routers and picks up (possibly) the incorrect database.
Why does django do this. IMHO , if I had selected forced usage of database for the original query then even for the subquery the same database needs to be used. Why must the database routers come into picture for this at all?

This is not a subquery in the strict SQL sense of the word. What you are actually doing here is to execute one query and use the result of that to find related items.
You can chain filters and do lots of other operations on a queryset but it will not be executed until you take a slice on it or call .values() but here you are actually taking a slice
auth_address = b_det[0].#rest of code
So you have a materialized query and you are now trying to find the address of the related author and that requires another query but you are not using with so django is free to choose which database to use. You cacn overcome this by using select_related

Move back issues from other projects

I need your kindly support.
I have a big project in my redmine with a lot of subprojects in it.
More than 300 issues have been moved from this project to another subprojects by mistake. And I haven't got a chance to rescue it by hands directly from redmine. But I have a database dump which has been done before this accident.
So, my question is - Can I compare table "issue" from right database with damaged database and move issues back? Or May be has any tools or methods to move back issues to right project?
Redmine version is 2.0.4. Database: PostgreSQL.
Thank you in advance.

plan a:
You can try to analyze table issues and find all issues which are move wrongly.
You know new project_id and you know approximately timestamps of changes. And write sql query (or use rails console) to undo action.
for example (code NOT tested!)
new_project_id = Project.find(ID).id # note that ID is project identificator not id of record!
timestamp = DateTime.parse('2013-10-30 12:20:45')
issues = Issue.where(project_id: new_project_id).where('updated_at > ? AND updated_at < ?', timestamp - 1.minute, timestamp + 1.minute)
# check that all selected issues must be updated!!!
issues.update_all(project_id: old_project_id) # note that old_project_id is correct id (integer value) of record in DB
plan b:
you can find all issue_id which have project_id in correct DB. And then apply SQL query to update project id to correct value for all issues where id IN (issue_ids) on corrupted DB
# load correct DATABASE and start rails console
project = Project.find(OLD_ID) # note that OLD_ID is project identificator not id of record!
issue_ids = project.issue_ids
# save somewhere issue_ids
# load corrupted database and start rails console
issue_ids = [saved_array_of_ids_from_previous_step]
Issue.where(id: issue_ids).update_all(project_id: correct_project_id) # note that correct_project_id is correct id (integer value) of record in DB

Django/Postgres: Performance issues with LIMIT querysets

I have came across a strange query performance issue that I am struggling to understand.
The following is a simplified version of the model structure I have, hopefully it will be enough to illustrate the issue:
class Note(models.Model):
...
name = models.CharField(max_length=50)
parentNote = models.ForeignKey('self', null=True)
form = models.ForeignKey('NoteForm', null=True)
...
class Event(Note):
...
startDate = models.DateField()
...
class Activity(Event):
...
The Activity model is the source of the issue I am facing. It has an extensive inheritance heirarchy, none of which is abstract. I do not know if this contributes to the issue. Activity has ~280000 records and, obviously, its parents have at least that many, if not more.
The NoteForm model is not described above - it is only necessary to know that it is external to the Activity model's hierarchy and contains less than 100 records.
I am using Django version 1.3.
The problem occurs when querying for the latest "child" Activity of some parent Activity. The query filters by the parentNote field, orders by the 'startDate' field (descending) and uses Python's index notation to select the first result (which, by my understanding, simply adds LIMIT 1 to the generated SQL). See below for the code.
This query runs unexpectedly slowly when no results are found - 10+ seconds. If results are found, it runs as expected - well under 1 second.
Further investigation revealed the following:
It is the limit causing the issue. Just doing the filter, without limiting to the first result, is not slow - whether results are found or not.
Ordering is partially a culprit. Removing the ordering removes the issue.
The parentNote filter is partially a culprit. Changing the filter to use the form or name field removes the issue.
In code:
# Original - SLOW
try:
latest = Activity.objects.filter(
parentNote=activity.pk
).order_by('-startDate')[0]
except IndexError:
latest = None
# FAST
# No limit
Activity.objects.filter(
parentNote=activity.pk
).order_by('-startDate')
# No ordering
try:
latest = Activity.objects.filter(
parentNote=activity.pk
)[0]
except IndexError:
latest = None
# Different filter
try:
latest = Activity.objects.filter(
form=activity.pk
).order_by('-startDate')[0]
except IndexError:
latest = None
# Different filter
try:
latest = Activity.objects.filter(
name=activity.pk
).order_by('-startDate')[0]
except IndexError:
latest = None
If the issue is at the database level, I can't see it. I've run the "Original" and "No Limit" examples from above in the django-debug-toolbar's debugsqlshell. The "Original" took 16 seconds and "No Limit" took 59ms. I copied both queries printed by the debugsqlshell and ran them in pgAdmin. "Original" took 1375ms and "No Limit" took 94ms. So it is slower, but not by the amount I'm seeing using the ORM. EXPLAIN ANALYZE definitely shows the query analyzer taking different paths, which I completely understand. But I cannot reproduce the 16 second query using SQL directly.
So, in summary:
I am seeing LIMIT queries running far slower than identical queries without the LIMIT, but only when no results are found.
Queries that return results do not run slowly - and they are identical apart from the values of the filters.
It appears to be a function of which fields are included in the filters, and whether or not the queryset is ordered.
It does NOT appear to be a database level issue as running the SQL directly does not run slowly.
Update:
While trying suggestions made in the comments, the above examples suddenly ceased suffering from this issue - before I found any evidence as to the cause, let alone implemented a fix. I still have no idea what the problem was, but now I do not have a means to reproduce it in order to further investigate.

Designing a database for a user/points system? (in Django)

First of all, sorry if this isn't an appropriate question for StackOverflow. I've tried to make it as generalisable as possible.
I want to create a database (MySQL, site running Django) that has users, who can be allocated a certain number of points for various types of action - it's a collaborative game. My requirements are to obtain:
the number of points a user has
the user's ranking compared to all other users
and the overall leaderboard (i.e. all users ranked in order of points)
This is what I have so far, in my Django models.py file:
class SiteUser(models.Model):
name = models.CharField(max_length=250 )
email = models.EmailField(max_length=250 )
date_added = models.DateTimeField(auto_now_add=True)
def points_total(self):
points_added = PointsAdded.objects.filter(user=self)
points_total = 0
for point in points_added:
points_total += point.points
return points_total
class PointsAdded(models.Model):
user = models.ForeignKey('SiteUser')
action = models.ForeignKey('Action')
date_added = models.DateTimeField(auto_now_add=True)
def points(self):
points = Action.objects.filter(action=self.action)
return points
class Action(models.Model):
points = models.IntegerField()
action = models.CharField(max_length=36)
However it's rapidly becoming clear to me that it's actually quite complex (in Django query terms at least) to figure out the user's ranking and return the leaderboard of users. At least, I'm finding it tough. Is there a more elegant way to do something like this?
This question seems to suggest that I shouldn't even have a separate points table - what do people think? It feels more robust to have separate tables, but I don't have much experience of database design.

this is old, but I'm not sure exactly why you have 2 separate tables (Points Added & Action). It's late, so maybe my mind isn't ticking, but it seems like you just separated one table into 2 for some reason. It doesn't seem like you get any benefit out of it. It's not like there's a 1 to many relationship in it right?
So first of all, I would combine those two tables. Secondly, you are probably better off storing points_total into a value in your site_user table. This is what I think Demitry is trying to allude to, but didn't say explicitly. This way instead of doing this whole additional query (pulling everything a user has done in his history of the site is expensive) + looping action (going through it is even more expensive), you can just pull it as one field. It's denormalizing the data for a greater good.
Just be sure to update the value everytime you add in something that has points. You can use django's post_save signal to do that

It's a bit more difficult to have points saved in the same table, but it's totally worth it. You can do very simple ordering/filtering operations if you have computed points total on user model. And you can count totals only when something changes (not every time you want to show them). Just put some validation logic into post_save signals and make sure to cover this logic with tests and you're good.
p.s. denormalization on wiki.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django Query extremely slow - django

Related

Best index for a Django model when filtering on one field and ordering on another field

Why is the database used by Django subqueries not sticky?

Move back issues from other projects

Django/Postgres: Performance issues with LIMIT querysets

Designing a database for a user/points system? (in Django)

Categories

Resources