django annotate question - django

I have the following model:
class Pick(models.Model):
league = models.ForeignKey(League)
user = models.ForeignKey(User)
team = models.ForeignKey(Team)
week = models.IntegerField()
result = models.IntegerField(default=3, help_text='loss=0, win=1, tie=2, not started=3, in progress=4')
I'm trying to get generate a standings table based off of the results, but I'm unsure how to get it done in a single query. I'm interested in getting, for each user in a particular league, a count of the results that = 1 (as win), 0 (as loss) and 2 as tie). The only thing I can think of is to do 3 separate queries where I filter the results and then annotate like so:
Pick.objects.filter(league=2, result=1).annotate(wins=Count('result'))
Pick.objects.filter(league=2, result=0).annotate(losses=Count('result'))
Pick.objects.filter(league=2, result=2).annotate(ties=Count('result'))
Is there a more efficient way to achieve this?
Thanks!

The trick to this is to use the values method to just select the fields you want to aggregate on.
Pick.objects.filter(league=2).values('result').aggregate(wins=Count('result'))

Related

Simplifying Django Query Annotation

I've 3 models and a function is called many times, but it generates 200-300 sql queries, and I'd like to reduce this number.
I've the following layout:
class Info(models.Model):
user = models.ForeignKey(User)
...
class Forum(models.Model):
info = models.ForeignKey(Info)
datum = models.DateTimeField()
...
class InfoViewed(models.Model):
user = models.ForeignKey(User)
infokom = models.ForeignKey(Info)
last_seen = models.DateTimeField()
...
. I need to get all the number of new Forum messages, so only a number. At the moment it works so that I iterate over all the related Infokoms and I summerize all Forums having higher datum than the related InfoViewed's last_seen field.
This works, however results ~200 queries for 100 Infos.
Is there any possibility to fetch the same number within a few queries? I tried to play with annonate and django.db.models.Count, but I failed.
Django: 1.11.16
Currently I'm using this:
infos = Info.objects.filter(user_id=**x**)
return sum(i.number_of_new_forums(self.user) \
for i in infos)
and the number_of_new_forums looks like this:
info_viewed = self.info_viewed_set.filter(user=user)
return len([f.id for f in self.get_forums().\
filter(datum__gt = info_viewed[0].last_seen)])
I managed to come up with some solution from a different perspective:
Forum.objects.filter(info_id__in=info_ids,
info__info_viewed__user=request.user,
datum__gt=F('info__info_viewed__last_seen')).count()
where info_ids are all related Infos' id (list), however i'm unsure if it's a 100% procent solution... .
If someone might have a different approach I'd welcome it.

How to add weeks to a datetime column, depending on a django model/dictionary?

Context
There is a dataframe of customer invoices and their due dates.(Identified by customer code)
Week(s) need to be added depending on customer code
Model is created to persist the list of customers and week(s) to be added
What is done so far:
Models.py
class BpShift(models.Model):
bp_name = models.CharField(max_length=50, default='')
bp_code = models.CharField(max_length=15, primary_key=True, default='')
weeks = models.IntegerField(default=0)
helper.py
from .models import BpShift
# used in views later
def week_shift(self, df):
df['DueDateRange'] = df['DueDate'] + datetime.timedelta(
weeks=BpShift.objects.get(pk=df['BpCode']).weeks)
I realised my understanding of Dataframes is seriously flawed.
df['A'] and df['B'] would return Series. Of course, timedelta wouldn't work like this(weeks=BpShift.objects.get(pk=df['BpCode']).weeks).
Dataframe
d = {'BpCode':['customer1','customer2'],'DueDate':['2020-05-30','2020-04-30']}
df = pd.DataFrame(data=d)
Customer List csv
BP Name,BP Code,Week(s)
Customer1,CA0023MY,1
Customer2,CA0064SG,1
Error
BpShift matching query does not exist.
Commentary
I used these methods in hope that I would be able to change the dataframe at once, instead of
using df.iterrows(). I have recently been avoiding for loops like a plague and wondering if this
is the "correct" mentality. Is there any recommended way of doing this? Thanks in advance for any guidance!
This question Python & Pandas: series to timedelta will help to take you from Series to timedelta. And although
pandas.Series(
BpShift.objects.filter(
pk__in=df['BpCode'].tolist()
).values_list('weeks', flat=True)
)
will give you a Series of integers, I doubt the order is the same as in df['BpCode']. Because it depends on the django Model and database backend.
So you might be better off to explicitly create not a Series, but a DataFrame with pk and weeks columns so you can use df.join. Something like this
pandas.DataFrame(
BpShift.objects.filter(
pk__in=df['BpCode'].tolist()
).values_list('pk', 'weeks'),
columns=['BpCode', 'weeks'],
)
should give you a DataFrame that you can join with.
So combined this should be the gist of your code:
django_response = [('customer1', 1), ('customer2', '2')]
d = {'BpCode':['customer1','customer2'],'DueDate':['2020-05-30','2020-04-30']}
df = pd.DataFrame(data=d).set_index('BpCode').join(
pd.DataFrame(django_response, columns=['BpCode', 'weeks']).set_index('BpCode')
)
df['DueDate'] = pd.to_datetime(df['DueDate'])
df['weeks'] = pd.to_numeric(df['weeks'])
df['new_duedate'] = df['DueDate'] + df['weeks'] * pd.Timedelta('1W')
print(df)
DueDate weeks new_duedate
BpCode
customer1 2020-05-30 1 2020-06-06
customer2 2020-04-30 2 2020-05-14
You were right to want to avoid looping. This approach gets all the data in one SQL query from your Django model, by using filter. Then does a left join with the DataFrame you already have. Casts the dates and weeks to the right types and then computes a new due date using the whole columns instead of loops over them.
NB the left join will give NaN and NaT for customers that don't exist in your Django database. You can either avoid those rows by passing how='inner' to df.join or handle them whatever way you like.

How to get Cartesian product of two tables in Django Queryset?

Is there a way to do the equivalent of a full outer join in Django (I think I've read that full outer joins are not supported).
My scenario is that I have three tables:
Staff / WeekList / WeeksCompleted
The relevant fields I'm trying to work with are:
Staff table - Staff Number.
WeekList table - Week Start date.
WeeksCompleted table - Week Start date and Staff Number.
Basically, everyone should have an entry in the WeeksCompleted table (if they're still active, that is, but that's not pertinent for this question). The queryset I would like to produce is a list of Staff who have missing weeks in the WeeksCompleted table.
I can get the result I want using SQL queries but it involves a full outer join on the Staff and WeekList tables. I was wondering if anyone knows of a way to do this using the queryset functions?
The only other way I can think to do the equivalent of the full join is to create a list using a nested loop of Staff Numbers against each week, which might have a sizeable processing overhead?
EDIT: if it helps, here are the three simplified models.
models.py
class Staff(models.Model):
staff_number = models.CharField(max_length=9, null=True)
class WeekList(models.Model):
week_start = models.DateField(null=True)
class WeeksCompleted(models.Model):
staff = models.ForeignKey(to='weekscompleted.Staff', null=True, on_delete=models.PROTECT)
week_list = models.ForeignKey(to='weekscompleted.WeekList', null=True, on_delete=models.PROTECT)
EDIT 2: The join I think I need is:
SELECT staff_number, week_start
FROM Staff, Contractor
GROUP BY staff_number, week_start
This will give a list of the expected weeks completed for staff:
week_start staff_number
17/10/2020 12345
17/10/2020 54321
I can then compare this to the WeeksCompleted table:
week_start staff_number
17/10/2020 12345
to find which staff are missing for a week using this query (keep in mind that this is a query I produced in a database):
SELECT qryShouldBeCompleted.week_start, qryShouldBeCompleted.staff_number
FROM qryShouldBeCompleted
LEFT JOIN WeeksCompleted ON qryShouldBeCompleted.staff_number =
WeeksCompleted.staff_number
AND qryShouldBeCompleted.week_start = WeeksCompleted.week_start
WHERE WeeksCompleted.staff_number Is Null
This would then produce the result I need:
week_start staff_number
17/10/2020 54321
Edit 3:
I just found an article on FilteredRelation that gets me partway there:
Staff.objects.annotate(missing=FilteredRelation('weekscompleted', condition=Q(weekscompleted__week_start='some date'))).values('staff_number', 'missing__staff__staff_number', 'missing__week_start')
which gets me this:
{'staff_number': '54321', 'missing__staff__staff_number': None, 'missing__week_start': None}
The only thing with this is that it only appears to work for one week at a time - using __lte in the condition doesn't return any 'None' values so I'd have to loop through each week...

Django filter by number of ForeignKey and less than a month in DateField

I have a model like this:
class MovieHistory(models.Model):
watched_by = models.ForeignKey(User)
time = models.DateTimeField(auto_now_add=True)
movie = models.ForeignKey(Movie)
I want to get up to 15 movies that were watched the most in the last 30 days. So far I have this:
Movie.objects.filter(time__gte=datetime.now()-timedelta(days=30))
How do you filter again, and order them by movie count? I know that I can filter the first 15 results like this: [:15], but I don't know how to order by the amount of movies in that model, and only pick one of each (so I don't have repeated MovieHistories with the same movies on each one).
Thanks.
Annotation is likely the best approach:
from django.db.models import Count
most_watched = Movie.objects.all().annotate(num_watched = Count('watched_by')).order_by('-num_watched')[:15]
I haven't tested this, but I believe this is on the way to the answer. Please let me know if it works! You may need to replace count('watched_by') by Count('watched_by_id') or whatever the field name is in your database (check with ./manage.py sql your_appname).
Hope this helps!
For more on using these annotations: https://docs.djangoproject.com/en/dev/topics/db/aggregation/#cheat-sheet

Django complex query without using loop

I have two models such that
class Employer(models.Model):
name = models.CharField(max_length=1000,null=False,blank=False)
eminence = models.IntegerField(null=False,default=4)
class JobTitle(models.Model):
name = models.CharField(max_length=1000,null=False,blank=False)
employer= models.ForeignKey(JobTitle,unique=False,null=False)
class People(models.Model):
name = models.CharField(max_length=1000,null=False,blank=False)
jobtitle = models.ForeignKey(JobTitle,unique=False,null=False)
I would like to list random 5 employers and one job title for each employer. However, job title should be picked up from first 10 jobtitles of the employer whose number of people is maximum.
One approach could be
employers = Employer.objects.filter(isActive=True).filter(eminence__lt=4 ).order_by('?')[:5]
for emp in employers:
jobtitle = JobTitle.objects.filter(employer=emp)... and so on.
However, loop through selected employers may be ineffiecent. Is there any way to do it in one query ?
Thanks
There is! Check out: https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
select_related() tells Django to follow all the foreign key relationships using JOINs. This will result in one large query as opposed to many small queries, which in most cases is what you want. The QuerySet you get will be pre-populated and Django won't have to lazy-load anything from the database.
I've used select_related() in the past to solve almost this exact problem.
I have written such code block and it works. Although I loop over employers because I have used select_related('jobtitle'), I consider it doesn't hit database and works faster.
employers = random.sample(Employer.objects.select_related('jobtitle').filter(eminence__lt=4,status=EmployerStatus.ACTIVE).annotate(jtt_count=Count('jobtitle')).filter(jtt_count__gt=0),3)
jtList = []
for emp in employers:
jt = random.choice(emp.jobtitle_set.filter(isActive=True).annotate(people_count=Count('people')).filter(people_count__gt=0)[:10])
jtList.append(jt)