Simplifying Django Query Annotation

Simplifying Django Query Annotation - django

I've 3 models and a function is called many times, but it generates 200-300 sql queries, and I'd like to reduce this number.
I've the following layout:
class Info(models.Model):
user = models.ForeignKey(User)
...
class Forum(models.Model):
info = models.ForeignKey(Info)
datum = models.DateTimeField()
...
class InfoViewed(models.Model):
user = models.ForeignKey(User)
infokom = models.ForeignKey(Info)
last_seen = models.DateTimeField()
...
. I need to get all the number of new Forum messages, so only a number. At the moment it works so that I iterate over all the related Infokoms and I summerize all Forums having higher datum than the related InfoViewed's last_seen field.
This works, however results ~200 queries for 100 Infos.
Is there any possibility to fetch the same number within a few queries? I tried to play with annonate and django.db.models.Count, but I failed.
Django: 1.11.16
Currently I'm using this:
infos = Info.objects.filter(user_id=**x**)
return sum(i.number_of_new_forums(self.user) \
for i in infos)
and the number_of_new_forums looks like this:
info_viewed = self.info_viewed_set.filter(user=user)
return len([f.id for f in self.get_forums().\
filter(datum__gt = info_viewed[0].last_seen)])

I managed to come up with some solution from a different perspective:
Forum.objects.filter(info_id__in=info_ids,
info__info_viewed__user=request.user,
datum__gt=F('info__info_viewed__last_seen')).count()
where info_ids are all related Infos' id (list), however i'm unsure if it's a 100% procent solution... .
If someone might have a different approach I'd welcome it.

Related

Django filter exact value in list

Good day again SO. I was hoping you can help me with some of the logic.
Based on this SO Answer, I can filter the search with a list which works perfectly. However, I wish to get an EXACT id instead of at least one matches.
models:
class Condition:
condition_name = models.CharField(....)
class Jobs:
jobs = models.CharField(...)
class JobsConditions:
account = models.ForeignKey(Account...)
job_item = models.ForeignKey(Jobs...)
condition = models.ForeignKey(Condition...)
So if I try to search for Jobs with Conditions, I do the following:
cond_array = [1,2,4,5] # Append to array based on request.
condition_obj = Condition.objects.filter(id__in=cond_array)
Then compare condition_obj to JobsConditions model. How to use this so that I will only get only the jobs with exact condition? No more no less.

I think you're wanting something like this:
Filter JobsConditions by condition__id and get the associated job_item__jobs as a list:
jobs_list = (JobsConditions.objects
.filter(condition__id__in=cond_array)
.values_list('job_item__jobs', flat=True))
Filter Jobs by that jobs_list:
jobs = Jobs.objects.filter(jobs__in=jobs_list)

Modern methods for filtering a Django annotation?

I'd like to filter an annotation using the Django ORM. A lot of the articles I've found here at SO are fairly dated, targeting Django back in the 1.2 to 1.4 days:
Filtering only on Annotations in Django - This question from 2010 suggests using an extra clause, which isn't recommended by the official Django docs
Django annotation with nested filter - Similar suggestions are provided in this question from 2011.
Django 1.8 adds conditional aggregation, which seems like what I might want, but I can't quite figure out the syntax that I'll eventually need. Here are my models and the scenario I'm trying to reach (I've simplified the models for brevity's sake):
class Project(models.Model):
name = models.CharField()
... snip ...
class Milestone_meta(models.Model):
name = models.CharField()
is_cycle = models.BooleanField()
class Milestone(models.Model):
project = models.ForeignKey('Project')
meta = models.ForeignKey('Milestone_meta')
entry_date = models.DateField()
I want to get each Project (with all its fields), along with the Max(entry_date) and Min(entry_date) for each associated Milestone, but only for those Milestone records whose associated Milestone_meta has the is_cycle flag set to True. In other words:
For every Project record, give me the maximum and minimum Milestone entry_dates, but only when the associated Milestone_meta has a given flag set to True.
At the moment, I'm getting a list of projects, then getting the Max and Min Milestones in a loop, resulting in N+1 database hits (which gets slow, as you'd expect):
pqs = Projects.objects.all()
for p in pqs:
(theMin, theMax) = getMilestoneBounds(p)
# Use values from p and theMin and theMax
...
def getMilestoneBounds(pid):
mqs = Milestone.objects.filter(meta__is_cycle=True)
theData = mqs.aggregate(min_entry=Min('entry_date'),max_entry=Max('entry_date'))
return (theData['min_entry'], theData['max_entry'])
How can I reduce this to one or two queries?

As far as I know, you can not get all required project objects in one query.
However, if you don't need the objects and can work with just their id, one way would be-
Milestone.objects.filter(meta__is_cycle=True).values('project').annotate(min_entry=Min('entry_date')).annotate(max_entry=Max('entry_date'))
It will give a list of dicts having data of distinct projects, you can then use their 'id' to lookup the objects when needed.

Django ORM - LEFT JOIN with WHERE clause

I have made a previous post related to this problem here but because this is a related but new problem I thought it would be best to make another post for it.
I'm using Django 1.8
I have a User model and a UserAction model. A user has a type. UserAction has a time, which indicates how long the action took as well as a start_time which indicates when the action began. They look like this:
class User(models.Model):
user_type = models.IntegerField()
class UserAction:
user = models.ForeignKey(User)
time = models.IntegerField()
start_time = models.DateTimeField()
Now what I want to do is get all users of a given type and the sum of time of their actions, optionally filtered by the start_time.
What I am doing is something like this:
# stubbing in a start time to filter by
start_time = datetime.now() - datetime.timedelta(days=2)
# stubbing in a type
type = 2
# this gives me the users and the sum of the time of their actions, or 0 if no
# actions exist
q = User.objects.filter(user_type=type).values('id').annotate(total_time=Coalesce(Sum(useraction__time), 0)
# now I try to add the filter for start_time of the actions to be greater than or # equal to start_time
q = q.filter(useraction__start_time__gte=start_time)
Now what this does is of course is an INNER JOIN on UserAction, thus removing all the users without actions. What I really want to do is the equivalent of my LEFT JOIN with a WHERE clause, but for the life of me I can't find how to do that. I've looked at the docs, looked at the source but am not finding an answer. I'm (pretty) sure this is something that can be done, I'm just not seeing how. Could anyone point me in the right direction? Any help would be very much appreciated. Thanks much!

I'm having the same kind of problem as you. I haven't found any proper way of solving the problem yet, but I've found a few fixes.
One way would be looping through all the users:
q = User.objects.filter(user_type=type)
for (u in q):
u.time_sum = UserAction.filter(user=u, start_time__gte=start_time).aggregate(time_sum=Sum('time'))['time_sum']
This method does however a query at the database for each user. It might do the trick if you don't have many users, but might get very time-consuming if you have a large database.
Another way of solving the problem would be using the extra method of the QuerySet API. This is a method that is detailed in this blog post by Timmy O'Mahony.
valid_actions = UserAction.objects.filter(start_time__gte=start_time)
q = User.objects.filter(user_type=type).extra(select={
"time_sum": """
SELECT SUM(time)
FROM userAction
WHERE userAction.user_id = user.id
AND userAction.id IN %s
""" % (%s) % ",".join([str(uAction.id) for uAction in valid_actions.all()])
})
This method however relies on calling the database with the SQL table names, which is very un-Django - if you change the db_table of one of your databases or the db_column of one of their columns, this code will no longer work. It though only requires 2 queries, the first one to get the list of valid userAction and the other one to sum them to the matching user.

django annotate question

I have the following model:
class Pick(models.Model):
league = models.ForeignKey(League)
user = models.ForeignKey(User)
team = models.ForeignKey(Team)
week = models.IntegerField()
result = models.IntegerField(default=3, help_text='loss=0, win=1, tie=2, not started=3, in progress=4')
I'm trying to get generate a standings table based off of the results, but I'm unsure how to get it done in a single query. I'm interested in getting, for each user in a particular league, a count of the results that = 1 (as win), 0 (as loss) and 2 as tie). The only thing I can think of is to do 3 separate queries where I filter the results and then annotate like so:
Pick.objects.filter(league=2, result=1).annotate(wins=Count('result'))
Pick.objects.filter(league=2, result=0).annotate(losses=Count('result'))
Pick.objects.filter(league=2, result=2).annotate(ties=Count('result'))
Is there a more efficient way to achieve this?
Thanks!

The trick to this is to use the values method to just select the fields you want to aggregate on.
Pick.objects.filter(league=2).values('result').aggregate(wins=Count('result'))

Django complex query without using loop

I have two models such that
class Employer(models.Model):
name = models.CharField(max_length=1000,null=False,blank=False)
eminence = models.IntegerField(null=False,default=4)
class JobTitle(models.Model):
name = models.CharField(max_length=1000,null=False,blank=False)
employer= models.ForeignKey(JobTitle,unique=False,null=False)
class People(models.Model):
name = models.CharField(max_length=1000,null=False,blank=False)
jobtitle = models.ForeignKey(JobTitle,unique=False,null=False)
I would like to list random 5 employers and one job title for each employer. However, job title should be picked up from first 10 jobtitles of the employer whose number of people is maximum.
One approach could be
employers = Employer.objects.filter(isActive=True).filter(eminence__lt=4 ).order_by('?')[:5]
for emp in employers:
jobtitle = JobTitle.objects.filter(employer=emp)... and so on.
However, loop through selected employers may be ineffiecent. Is there any way to do it in one query ?
Thanks

There is! Check out: https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
select_related() tells Django to follow all the foreign key relationships using JOINs. This will result in one large query as opposed to many small queries, which in most cases is what you want. The QuerySet you get will be pre-populated and Django won't have to lazy-load anything from the database.
I've used select_related() in the past to solve almost this exact problem.

I have written such code block and it works. Although I loop over employers because I have used select_related('jobtitle'), I consider it doesn't hit database and works faster.
employers = random.sample(Employer.objects.select_related('jobtitle').filter(eminence__lt=4,status=EmployerStatus.ACTIVE).annotate(jtt_count=Count('jobtitle')).filter(jtt_count__gt=0),3)
jtList = []
for emp in employers:
jt = random.choice(emp.jobtitle_set.filter(isActive=True).annotate(people_count=Count('people')).filter(people_count__gt=0)[:10])
jtList.append(jt)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Simplifying Django Query Annotation - django

Related

Django filter exact value in list

Modern methods for filtering a Django annotation?

Django ORM - LEFT JOIN with WHERE clause

django annotate question

Django complex query without using loop

Categories

Resources