Django aggregation / annotation - django

I know this is probably a really simple question, but I've been trying to solve it for ages now and keep failing!
I have two models:
class Subject(models.Model):
name = models.CharField(max_length=200)
class Pupil(models.Model):
first_name = models.CharField(max_length=200)
last_name = models.CharField(max_length=200)
active = models.BooleanField(default=0, db_index=True)
takes_subject = models.ManyToManyField(Subject)
Each pupil can take many subjects. Finding out how many pupils take each subject is easy,
but I want to find out how many pupils take multiple subjects. Something like:
Subjects taken | Number of pupils
===============|==================
4 | 20
3 | 15
2 | 7
1 | 38
That way I can know that say 15 pupils are taking 3 subjects while 38 pupils are taking 1 subject.
How do I achieve this?
Thanks in advance,
Alex

from collections import Counter
Counter(Pupil.objects.annotate(
count=Count('takes_subject')).values_list('count', flat=True))
That gets a list from Pupil counting how many subjects each student takes, e.g. [4, 5, 4, 4, 6,...].
Then let Counter() count how often each number occurs, and return a dict(), e.g. {4: 3, 5: 1, 6: 1, ...}.

Related

Weird behavior in Django queryset union of values

I want to join the sum of related values from users with the users that do not have those values.
Here's a simplified version of my model structure:
class Answer(models.Model):
person = models.ForeignKey(Person)
points = models.PositiveIntegerField(default=100)
correct = models.BooleanField(default=False)
class Person(models.Model):
# irrelevant model fields
Sample dataset:
Person | Answer.Points
------ | ------
3 | 50
3 | 100
2 | 100
2 | 90
Person 4 has no answers and therefore, points
With the query below, I can achieve the sum of points for each person:
people_with_points = Person.objects.\
filter(answer__correct=True).\
annotate(points=Sum('answer__points')).\
values('pk', 'points')
<QuerySet [{'pk': 2, 'points': 190}, {'pk': 3, 'points': 150}]>
But, since some people might not have any related Answer entries, they will have 0 points and with the query below I use Coalesce to "fake" their points, like so:
people_without_points = Person.objects.\
exclude(pk__in=people_with_points.values_list('pk')).\
annotate(points=Coalesce(Sum('answer__points'), 0)).\
values('pk', 'points')
<QuerySet [{'pk': 4, 'points': 0}]>
Both of these work as intended but I want to have them in the same queryset so I use the union operator | to join them:
everyone = people_with_points | people_without_points
Now, for the problem:
After this, the people without points have their points value turned into None instead of 0.
<QuerySet [{'pk': 2, 'points': 190}, {'pk': 3, 'points': 150}, {'pk': 4, 'points': None}]>
Anyone has any idea of why this happens?
Thanks!
I should mention that I can fix that by annotating the queryset again and coalescing the null values to 0, like this:
everyone.\
annotate(real_points=Concat(Coalesce(F('points'), 0), Value(''))).\
values('pk', 'real_points')
<QuerySet [{'pk': 2, 'real_points': 190}, {'pk': 3, 'real_points': 150}, {'pk': 4, 'real_points': 0}]>
But I wish to understand why the union does not work as I expected in my original question.
EDIT:
I think I got it. A friend instructed me to use django-debug-toolbar to check my SQL queries to investigate further on this situation and I found out the following:
Since it's a union of two queries, the second query annotation is somehow not considered and the COALESCE to 0 is not used. By moving that to the first query it is propagated to the second query and I could achieve the expected result.
Basically, I changed the following:
# Moved the "Coalesce" to the initial query
people_with_points = Person.objects.\
filter(answer__correct=True).\
annotate(points=Coalesce(Sum('answer__points'), 0)).\
values('pk', 'points')
# Second query does not have it anymore
people_without_points = Person.objects.\
exclude(pk__in=people_with_points.values_list('pk')).\
values('pk', 'points')
# We will have the values with 0 here!
everyone = people_with_points | people_without_points

Filtering on annotations with max date in Django

I have 3 models in Django-project:
class Hardware(models.Model):
inventory_number = models.IntegerField(unique=True,)
class Subdivision(models.Model):
name = models.CharField(max_length=50,)
class Relocation(models.Model):
hardware = models.ForeignKey('Hardware',)
subdivision = models.ForeignKey('Subdivision',)
relocation_date = models.DateField(verbose_name='Relocation Date', default=date.today())
Table 'Hardware_Relocation' with values for example:
id hardware subdivision relocation_date
1 1 1 01.01.2009
2 1 2 01.01.2010
3 1 1 01.01.2011
4 1 3 01.01.2012
5 1 3 01.01.2013
6 1 3 01.01.2014
7 1 3 01.01.2015 # Now hardware 1 located in subdivision 3 because relocation_date is max
I would like to write a filter to find hardwares in subdivisions on today.
I'm trying to write a filter:
subdivision = Subdivision.objects.get(pk=1)
hardware_list = Hardware.objects.annotate(relocation__relocation_date=Max('relocation__relocation_date')).filter(relocation__subdivision = subdivision)
Now hardware_list contains hardware 1, but it is wrong (because now hardware 1 in subdivision 3).
hardware_list must be None in this example.
The following code works wrong (hardware_list contains hardware 1, for subdivision 1).
limit_date = datetime.datetime.now()
q1 = Hardware.objects.filter(relocation__subdivision=subdivision, relocation__relocation_date__lte=limit_date)
q2 = q1.exclude(~Q(relocation__relocation_date__gt=F('relocation__relocation_date')), ~Q(relocation__subdivision=subdivision))
hardware_list = q2.distinct()
Maybe better use SQL?
This might work...
from django.db.models import F, Q
Hardware.objects
.filter(relocation__subdivision=target_subdivision, relocation__relocation_date__lte=limit_date)
.exclude(~Q(relocation__subdivision=target_subdivision), relocation__relocation_date__gt=F('relocation__relocation_date'))
.distinct()
The idea is, give me all hardware that have been relocated to target division before limit date, which DON'T have been relocated to other divisions after that.

Linear Programming: How to implement with multiple constraints?

I’m trying to solve a linear programing model and need some help. I’m not a programming expert, but I conceptually can draw up the problem and am hoping for some help implementing it.
I’m looking into an asset allocation problem for an investment portfolio from a theoretical perspective, but for simplicity of this post I’m going to use generic terms.
I have a list of 500+ choices that all have an assigned cost and value add. My goal is to maximize the sum of the value add, given a constraint on how much I can spend. These 500 choices are divided into 5 categories and there are restrictions on how many choices I can have from each category.
Category 1 = 1
Category 2 = 1
Category 3 = 2 or 3
Category 4 = 1 or 2
Category 5 = 2
Category 3 + Category 4 = 4
I figure I’ll need to use a binary X variable attached to each choice and 1 means I’m picking that choice and 0 doesn’t so in the end there should be 8 variables that have 1 and the rest have a 0 value that leads to the maximum value add given the constraints on cost each choice has.
I ultimately hope to be able to run and say for example “what is the nth highest value” so instead of getting the maximum value add I can get the second highest value add and so on.
Is this possible and what software/language would be best to do it? Thanks for your help!
Just to simplify writing everything down, let's assume you had 15 assets, with value added v_1, v_2, ..., v_15 and costs c_1, c_2, ..., c_15. Let's assume assets 1, 2, and 3 are in category 1, assets 4, 5, and 6 are in category 2, assets 7, 8, and 9 are in category 3, assets 10, 11, and 12 are in category 4, and assets 13, 14, and 15 are in category 5. Finally, let's assume a budget B.
We would create binary variables x_1, x_2, ..., x_15 to indicate whether we bought each asset. Now, the objective function of our integer program is:
max v_1*x_1 + v_2*x_2 + ... + v_15*x_15
Our budget constraint is:
c_1*x_1 + c_2*x_2 + ... + c_15*x_15 <= B
Exactly one choice from category 1:
x_1 + x_2 + x_3 = 1
Exactly one choice from category 2:
x_4 + x_5 + x_6 = 1
Either 2 or 3 choices from category 3:
x_7 + x_8 + x_9 >= 2
x_7 + x_8 + x_9 <= 3
Either 1 or 2 choices from category 4:
x_10 + x_11 + x_12 >= 1
x_10 + x_11 + x_12 <= 2
Exactly 2 choices from category 5:
x_13 + x_14 + x_15 = 2
Exactly 4 choices from categories 3 and 4 combined:
x_7 + x_8 + x_9 + x_10 + x_11 + x_12 = 4
Finally, you would specify all variables to be binary.
Note that the only adjustment you would need to your problem is to change the variables in each of these constraints to be the variables associated with each of your five categories.
All that remains would be to implement the model. There are a myriad of linear programming packages in all major languages; check out this survey for details. Since Stack Overflow is not a software recommendation site and you haven't really given any details about your situation (e.g. free vs. non-free solvers or the programming language you're using), I will refrain from suggesting a particular package.

Does any one know how I can add people into groups in django?

In django-admin I'm given a table with a list of people in it. The fields are for example:
firstname lastname occupation group
the first three columns are filled out already but the fourth (group) has to be done by me.
I would like to write an action that groups people into groups of say (3)
so the result would be
firstname lastname occupation group
mike jones doctor 1
tracy jackson laywer 1
Mack Bean Actor 1
Steward Griffin Baby 2
Candice Green Cashier 2
Anyone know how I can do this? I didnt add code because I dont know where to start
try this...
maxId = People.objects.all().aggregate(MAX('id'))['id__max']
newgroupid = maxId / 3
if maxId % 3 == 0:
newgroupid = newgroupid
else:
newgroupid = newgroupid + 1
now use this newgroupid to insert the record.

Get list of occurrences + count in a model Django?

Imagine I have the following model:
class Person(models.Model):
...other stuff...
optional_first_name= models.CharField(max_length=50, blank=True)
How would I go about writing a request that returns an array of the most popular names, in decreasing order of occurence, with their counts, while ignoring the empty names?
i.e. for a database with 13 Leslies, 8 Andys, 3 Aprils, 1 Ron and 18 people who haven't specified their name, the output would be:
[('leslie', 13), ('andy', 8), ('april', 3), ('ron', 1)]
The closest I can get is by doing the following:
q= Person.objects.all()
q.query.group_by=['optional_first_name']
q.query.add_count_column()
q.values_list('optional_first_name', flat= True)
But it's still not quite what I want.
After some digging, finally found out:
Person.objects.values('optional_first_name').annotate(c=Count('optional_first_name')).order_by('-c')