Speed up Django query - django

I am working with Django to create a dashboard which present many kind of data. My problem is that the page loading slowly despite I hit the database (PostgreSql) always once. These tables are loading with data in every 10th minute, so currently consist of millions of record. My problem is that when I make a query with Django ORM, I get the data slowly (according to the Django toolbar it is 1,4 second). I know that this not too much b is the half of the total loading time (3,1), so If I could decrease the time of the query the page loading could decrease to there for the user experience could be better. When the query run I fetch ~ 2800 rows. Is there any way to speed up this query? I do not know that I do something wrong or this time is normal with this amount of data. I attach my query and model. Thank you in advance for your help.
My query (Here I fetch 6 hours time intervall.):
my_query=MyTable.filter(time_stamp__range=(before_now, now)).values('time_stamp', 'value1', 'value2')
Here I tried to use .iterator() but the query wasn't faster.
My model:
class MyTable(models.Model):
time_stamp = models.DateTimeField()
value1 = models.FloatField(blank=True, null=True)
values2 = models.FloatField(blank=True, null=True)

Add an index:
class MyTable(models.Model):
time_stamp = models.DateTimeField()
value1 = models.FloatField(blank=True, null=True)
values2 = models.FloatField(blank=True, null=True)
class Meta:
indexes = [
models.Index(fields=['time_stamp']),
]
Don't forget to run manage.py makemigrations and manage.py migrate after this.

Related

Django queryset: annotate with calculated value

I am making a very simple notification system for my website, powered by a Django REST Framework API. It's for sending website updates and things to all users, everyone gets the same notifications, and they can then mark it as read / archive it. I have come up with the following model:
class Notification(models.Model):
title = models.CharField(max_length=255)
text = models.TextField()
type = models.CharField(max_length=255, blank=True)
read_by = models.ManyToManyField(User, blank=True, related_name="read_notifications")
archived_by = models.ManyToManyField(User, blank=True, related_name="archived_notifications")
created_at = models.DateTimeField(auto_now_add=True, db_index=True)
updated_at = models.DateTimeField(auto_now=True)
So there is no receiver field or something like that, as all users get all notifications anyway.
Now I am trying to write the view logic, notably the following 2 things: only fetch non-archived notifications made after the user was created, and add a calculated "is_read" field to it, in a way that doesn't do extra queries for every single notification / user combination.
The query looks like this now:
queryset = Notification.objects
.order_by("-created_at")
.filter(created_at__gt=self.request.user.created_at)
.exclude(archived_by=self.request.user)
This does indeed filter out archived queries as expected, and I think it's doing it without an extra query for every notification:
SELECT "notifications_notification"."id", "notifications_notification"."title", "notifications_notification"."text", "notifications_notification"."type", "notifications_notification"."created_at", "notifications_notification"."updated_at" FROM "notifications_notification" WHERE ("notifications_notification"."created_at" > 2022-09-26 12:44:04.771961+00:00 AND NOT (EXISTS(SELECT 1 AS "a" FROM "notifications_notification_archived_by" U1 WHERE (U1."user_id" = 1 AND U1."notification_id" = ("notifications_notification"."id")) LIMIT 1))) ORDER BY "notifications_notification"."created_at" DESC
So far so good! But I still need to add an "is_read" value (or "is_unread" if easier) to the query somehow, which I am not able to work out how to do.
How can I finish the query and make it performant as well?
After trial and error I came up with this:
queryset = Notification.objects.order_by("-created_at")
.filter(created_at__gt=self.request.user.created_at)
.exclude(archived_by=self.request.user)
.annotate(is_read=Exists(Notification.objects.filter(pk=OuterRef("id"), read_by=self.request.user)))
And that works, although it does do 2 subqueries and I wonder if this is going to become a bottleneck later on?
SELECT "notifications_notification"."id", "notifications_notification"."title", "notifications_notification"."text", "notifications_notification"."type", "notifications_notification"."created_at", "notifications_notification"."updated_at", EXISTS(SELECT 1 AS "a" FROM "notifications_notification" U0 INNER JOIN "notifications_notification_read_by" U1 ON (U0."id" = U1."notification_id") WHERE (U0."id" = ("notifications_notification"."id") AND U1."user_id" = 1) LIMIT 1) AS "is_read" FROM "notifications_notification" WHERE ("notifications_notification"."created_at" > 2022-09-26 14:40:29.368043+00:00 AND NOT (EXISTS(SELECT 1 AS "a" FROM "notifications_notification_archived_by" U1 WHERE (U1."user_id" = 1 AND U1."notification_id" = ("notifications_notification"."id")) LIMIT 1))) ORDER BY "notifications_notification"."created_at" DESC

How could you make this really reaaally complicated raw SQL query with django's ORM?

Good day, everyone. Hope you're doing well. I'm a Django newbie, trying to learn the basics of RESTful development while helping in a small app project. Currently, there's a really difficult query that I must do to create a calculated field that updates my student's status accordingly to the time interval the classes are in. First, let me explain the models:
class StudentReport(models.Model):
student = models.ForeignKey(Student, on_delete=models.CASCADE,)
headroom_teacher = models.ForeignKey(Teacher, on_delete=models.CASCADE,)
upload = models.ForeignKey(Upload, on_delete=models.CASCADE, related_name='reports', blank=True, null=True,)
exams_date = models.DateTimeField(null=True, blank=True)
#Other fields that don't matter
class ExamCycle(models.Model):
student = models.ForeignKey(student, on_delete=models.CASCADE,)
headroom_teacher = models.ForeignKey(Teacher, on_delete=models.CASCADE,)
#Other fields that don't matter
class RecommendedClasses(models.Model):
report = models.ForeignKey(Report, on_delete=models.CASCADE,)
range_start = models.DateField(null=True)
range_end = models.DateField(null=True)
# Other fields that don't matter
class StudentStatus(models.TextChoices):
enrolled = 'enrolled' #started class
anxious_for_exams = 'anxious_for_exams'
sticked_with_it = 'sticked_with_it' #already passed one cycle
So this app will help the management of a Cram school. We first do an initial report of the student and its best/worst subjects in StudentReport. Then a RecommendedClasses object is created that tells him which clases he should enroll in. Finally, we have a cycle of exams (let's say 4 times a year). After he completes each exam, another report is created and he can be recommended a new class or to move on the next level of its previous class.
I'll use the choices in StudentStatus to calculate an annotated field that I will call status on my RecommendedClasses report model. I'm having issues with the sticked_with_it status because it's a query that it's done after one cycle is completed and two reports have been made (Two because this query must be done in StudentStatus, after 2nd Report is created). A 'sticked_with_it' student has a report created after exams_date where RecommendedClasses was created and the future exams_date time value falls within the 30 days before range_start and 60 days after the range_end values of the recommendation (Don't question this, it's just the way the higherups want the status)
I have already come up with two ways to do it, but one is with a RAW SQL query and the other is waaay to complicated and slow. Here it is:
SELECT rec.id AS rec_id FROM
school_recommendedclasses rec LEFT JOIN
school_report original_report
ON rec.report_id = original_report.id
AND rec.teacher_id = original_report.teacher_id
JOIN reports_report2 future_report
ON future_report.exams_date > original_report.exams_date
AND future_report.student_id = original_report.student_id
AND future_report.`exams_date` > (rec.`range_start` - INTERVAL 30 DAY)
AND future_report.`exams_date` <
(rec.`range_end` + INTERVAL 60 DAY)
AND original_report.student_id = future_report.student_id
How can I transfer this to a proper DJANGO ORM that is not so painfully unoptimized? I'll show you the other way in the comments.
FWIW, I find this easier to read, but there's very little wrong with your query.
Transforming this to your ORM should be straightforward, and any further optimisations are down to indexes...
SELECT r.id rec_id
FROM reports_recommendation r
JOIN reports_report2 o
ON o.id = r.report_id
AND o.provider_id = r.provider_id
JOIN reports_report2 f
ON f.initial_exam_date > o.initial_exam_date
AND f.patient_id = o.patient_id
AND f.initial_exam_date > r.range_start - INTERVAL 30 DAY
AND f.initial_exam_date < r.range_end + INTERVAL 60 DAY
AND f.provider_id = o.provider_id

bulk add m2m relationship for multiple instances

This coding is hitting the DB a lot. Is there a way to reduce the number of DB hits by grouping them together? If not in Django is it possible with SQL? My development machine is SQLite and production is PostgreSQL. If possible with SQL, please give me a few hints of where to get started.
class Sensor(models.Model):
Name = models.CharField( max_length=200 )
Value = models.FloatField()
class DataPoint(BaseModel):
Taken_datetime = models.DateTimeField( blank=True, null=True )
Sensors = models.ManyToManyField( SensorVal, blank=True, null=True )
for row in rows:
dp = DataPoint.objects.get(Taken_datetime=row['date'])
sensorToAdd = []
for sensor in sensors:
s = Sensor.objects.get(Name=sensor.name, Value=sensor.value )
sensorToAdd.append( s )
dp.Sensors.add( sensorToAdd )
All the data is stored in a cvs file, so I know all of it at the start.
For each row, the code hits the DB to load DataPoint, load the Sensors, and attach the sensors to the DataPoint. I'm looking for something like bulk_create, but for the m2m field. All the solutions I've found have used the same method I'm using above. The problem I'm running into is that there is a lot of time DataPoints, and I'm hitting the DB a lot of individual times. I'd like to group all these together and do a few DB calls.
If there is a better way to model the data without making the DB larger? I'd be open to that.

Creating a query with foreign keys and grouping by some data in Django

I thought about my problem for days and i need a fresh view on this.
I am building a small application for a client for his deliveries.
# models.py - Clients app
class ClientPR(models.Model):
title = models.CharField(max_length=5,
choices=TITLE_LIST,
default='mr')
last_name = models.CharField(max_length=65)
first_name = models.CharField(max_length=65, verbose_name='Prénom')
frequency = WeekdayField(default=[]) # Return a CommaSeparatedIntegerField from 0 for Monday to 6 for Sunday...
[...]
# models.py - Delivery app
class Truck(models.Model):
name = models.CharField(max_length=40, verbose_name='Nom')
description = models.CharField(max_length=250, blank=True)
color = models.CharField(max_length=10,
choices=COLORS,
default='green',
unique=True,
verbose_name='Couleur Associée')
class Order(models.Model):
delivery = models.ForeignKey(OrderDelivery, verbose_name='Delivery')
client = models.ForeignKey(ClientPR)
order = models.PositiveSmallIntegerField()
class OrderDelivery(models.Model):
date = models.DateField(default=d.today())
truck = models.ForeignKey(Truck, verbose_name='Camion', unique_for_date="date")
So i was trying to get a query and i got this one :
ClientPR.objects.today().filter(order__delivery__date=date.today())
.order_by('order__delivery__truck', 'order__order')
But, i does not do what i really want.
I want to have a list of Client obj (query sets) group by truck and order by today's delivery order !
The thing is, i want to have EVERY clients for the day even if they are not in the delivery list and with filter, that cannot be it.
I can make a query with OrderDelivery model but i will only get the clients for the delivery, not all of them for the day...
Maybe i will need to do it with a Q object ? or even raw SQL ?
Maybe i have built my models relationships the wrong way ? Or i need to lower what i want to do... Well, for now, i need your help to see the problem with new eyes !
Thanks for those who will take some time to help me.
After some tests, i decided to go with 2 querys for one table.
One from OrderDelivery Queryset for getting a list of clients regroup by Trucks and another one from ClientPR Queryset for all the clients without a delivery set for them.
I that way, no problem !

Slow iteration over django queryset

I am iterating over a django queryset that contains anywhere from 500-1000 objects. The corresponding model/table has 7 fields in it as well. The problem is that it takes about 3 seconds to iterate over which seems way too long when considering all the other data processing that needs to be done in my application.
EDIT:
Here is my model:
class Node(models.Model):
node_id = models.CharField(null=True, blank=True, max_length=30)
jobs = models.TextField(null=True, blank=True)
available_mem = models.CharField(null=True, blank=True, max_length=30)
assigned_mem = models.CharField(null=True, blank=True ,max_length=30)
available_ncpus = models.PositiveIntegerField(null=True, blank=True)
assigned_ncpus = models.PositiveIntegerField(null=True, blank=True)
cluster = models.CharField(null=True, blank=True, max_length=30)
datetime = models.DateTimeField(auto_now_add=False)
This is my initial query, which is very fast:
timestamp = models.Node.objects.order_by('-pk').filter(cluster=cluster)[0]
self.nodes = models.Node.objects.filter(datetime=timestamp.datetime)
But then, I go to iterate and it takes 3 seconds, I've tried two ways as seen below:
def jobs_by_node(self):
"""returns a dictionary containing keys that
are strings of node ids and values that
are lists of the jobs running on that node."""
jobs_by_node = {}
#iterate over nodes and populate jobs_by_node dictionary
tstart = time.time()
for node in self.nodes:
pass #I have omitted the code because the slowdown is simply iteration
tend = time.time()
tfinal = tend-tstart
return jobs_by_node
Other method:
all_nodes = self.nodes.values('node_id')
tstart = time.time()
for node in all_nodes:
pass
tend = time.time()
tfinal = tend-tstart
I tried the second method by referring to this post, but it still has not sped up my iteration one bit. I've scoured the web to no avail. Any help optimizing this process will be greatly appreciated. Thank you.
Note: I'm using Django version 1.5 and Python 2.7.3
Check the issued SQL query. You can use print statement:
print self.nodes.query # in general: print queryset.query
That should give you something like:
SELECT id, jobs, ... FROM app_node
Then run EXPLAIN SELECT id, jobs, ... FROM app_node and you'll know what exactly is wrong.
Assuming that you know what the problem is after running EXPLAIN, and that simple solutions like adding indexes aren't enough, you can think about e.g. fetching the relevant rows to a separate table every X minutes (in a cron job or Celery task) and using that separate table in you application.
If you are using PostgreSQL you can also use materialized views and "wrap" them in an unmanaged Django model.