Django annotate fails when querying unique values

Django annotate fails when querying unique values - django

for a Django project I need to combine two parts lists into one.
models.py:
class UserBuild(models.Model):
project = models.ForeignKey(Project)
created = models.DateTimeField(auto_now_add=True)
updated = models.DateTimeField(auto_now=True)
part = models.ForeignKey(Parts)
part_quantity = models.IntegerField(max_length=5, null=True, blank=True)
suggested_quantity = models.IntegerField(
max_length=5, null=True, blank=True
)
views.py:
def CombineProjects(request, template_name='combined_projects.html'):
...
build_set = UserBuild.objects.values(
#'pk',
#'project__pk',
'part__number',
'part__part_type__name',
'part__price',
'part__description',
'part__category__name'
).filter(
project__in=projects
).order_by('part__category', 'part__part_type').annotate(total=Sum('part_quantity'))
Basically here I want to group all parts which are the same and sum their quantity. As above works but if I uncomment either the pk or project__pk arguments, then the parts are no longer grouped (I assume because they are variable even when the part is the same).
Is there some way that I can keep the grouping but also include the pk and project__pk values?

The problem is that your list of values is what you group and sum by; so you really want to group by project and part number only.
I would try:
build_set = UserBuild.objects.values(
'project__pk',
'part__number',
).filter(
project__in=projects
).order_by('part__category', 'part__part_type').annotate(total=Sum('part_quantity'))
then merge the other attributes into this list later on. I'm not sure this will work however. You might need to do:
build_set = UserBuild.objects.values(
'project_id',
'part_id',
).filter(
project__in=projects
).order_by('part__category', 'part__part_type').annotate(total=Sum('part_quantity'))
And do more work by hand afterwards.

Related

Django Annotation Count with Subquery & OuterRef

I'm trying to create a high score statistic table/list for a quiz, where the table/list is supposed to be showing the percentage of (or total) correct guesses on a person which was to be guessed on. To elaborate further, these are the models which are used.
The Quiz model:
class Quiz(models.Model):
participants = models.ManyToManyField(
User,
through="Participant",
through_fields=("quiz", "correct_user"),
blank=True,
related_name="related_quiz",
)
fake_users = models.ManyToManyField(User, related_name="quiz_fakes")
user_quizzed = models.ForeignKey(
User, related_name="user_taking_quiz", on_delete=models.CASCADE, null=True
)
time_started = models.DateTimeField(default=timezone.now)
time_end = models.DateTimeField(blank=True, null=True)
final_score = models.IntegerField(blank=True, default=0)
This model does also have some properties; I deem them to be unrelated to the problem at hand.
The Participant model:
class Participant(models.Model): # QuizAnswer FK -> QUIZ
guessed_user = models.ForeignKey(
User, on_delete=models.CASCADE, related_name="clicked_in_quiz", null=True
)
correct_user = models.ForeignKey(
User, on_delete=models.CASCADE, related_name="solution_in_quiz", null=True
)
quiz = models.ForeignKey(
Quiz, on_delete=models.CASCADE, related_name="participants_in_quiz"
)
#property
def correct(self):
return self.guessed_user == self.correct_user
To iterate through what I am trying to do, I'll try to explain how I'm thinking this should work:
For a User in User.objects.all(), find all participant objects where the user.id equals correct_user(from participant model)
For each participantobject, evaluate if correct_user==guessed_user
Sum each participant object where the above comparison is True for the User, represented by a field sum_of_correct_guesses
Return a queryset including all users with parameters [User, sum_of_correct_guesses]
^Now ideally this should be percentage_of_correct_guesses, but that is an afterthought which should be easy enough to change by doing sum_of_correct_guesses / sum n times of that person being a guess.
Now I've even made some pseudocode for a single person to illustrate to myself roughly how it should work using python arithmetics
# PYTHON PSEUDO QUERY ---------------------
person = get_object_or_404(User, pk=3) # Example-person
y = Participant.objects.filter(
correct_user=person
) # Find participant-objects where person is used as guess
y_corr = [] # empty list to act as "queryset" in for-loop
for el in y: # for each participant object
if el.correct: # if correct_user == guessed_user
y_corr.append(el) # add to queryset
y_percentage_corr = len(y_corr) / len(y) # do arithmetic division
print("Percentage correct: ", y_percentage_corr) # debug-display
# ---------------------------------------------
What I've tried (with no success so far), is to use an ExtensionWrapper with Count() and Q object:
percentage_correct_guesses = ExpressionWrapper(
Count("pk", filter=Q(clicked_in_quiz=F("id")), distinct=True)
/ Count("solution_in_quiz"),
output_field=fields.DecimalField())
all_users = (
User.objects.all().annotate(score=percentage_correct_guesses).order_by("score"))
Any help or directions to resources on how to do this is greatly appreciated :))

I found an answer while looking around for related problems:
Django 1.11 Annotating a Subquery Aggregate
What I've done is:
Create a filter with an OuterRef() which points to a User and checks if Useris the same as correct_person and also a comparison between guessed_person and correct_person, outputs a value correct_user in a queryset for all elements which the filter accepts.
Do an annotated count for how many occurrences there are of a correct_user in the filtered queryset.
Annotate User based on the annotated-count, this is the annotation that really drives the whole operation. Notice how OuterRef() and Subquery are used to tell the filter which user is supposed to be correct_user.
Below is the code snippet which I made it work with, it looks very similar to the answer-post in the above linked question:
from django.db.models import Count, OuterRef, Subquery, F, Q
crit1 = Q(correct_user=OuterRef('pk'))
crit2 = Q(correct_user=F('guessed_user'))
compare_participants = Participant.objects.filter(crit1 & crit2).order_by().values('correct_user')
count_occurrences = compare_participants.annotate(c=Count('*')).values('c')
most_correctly_guessed_on = (
User.objects.annotate(correct_clicks=Subquery(count_occurrences))
.values('first_name', 'correct_clicks')
.order_by('-correct_clicks')
)
return most_correctly_guessed_on
This works wonderfully, thanks to Oli.

Django annotation on compoundish primary key with filter ignoring primary key resutling in too many annotated items

Please see EDIT1 below, as well.
Using Django 3.0.6 and python3.8, given following models
class Plants(models.Model):
plantid = models.TextField(primary_key=True, unique=True)
class Pollutions(models.Model):
pollutionsid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
pollutant = models.TextField()
releasesto = models.TextField(blank=True, null=True)
amount = models.FloatField(db_column="amount", blank=True, null=True)
class Meta:
managed = False
db_table = 'pollutions'
unique_together = (('plantid', 'releasesto', 'pollutant', 'year'))
class Monthp(models.Model):
monthpid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
month = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
power = models.IntegerField(null=False)
class Meta:
managed = False
db_table = 'monthp'
unique_together = ('plantid', 'year', 'month')
I'd like to annotate - based on a foreign key relationship and a fiter a value, particulary - to each plant the amount of co2 and the Sum of its power for a given year. For sake of debugging having replaced Sum by Count using the following query:
annotated = tmp.all().annotate(
energy=Count('monthp__power', filter=Q(monthp__year=YEAR)),
co2=Count('pollutions__amount', filter=Q(pollutions__year=YEAR, pollutions__pollutant="CO2", pollutions__releasesto="Air")))
However this returns too many items (a wrong number using Sum, respectively)
annotated.first().co2 # 60, but it should be 1
annotated.first().energy # 252, but it should be 1
although my database guarantees - as denoted, that (plantid, year, month) and (plantid, releasesto, pollutant, year) are unique together, which can easily be demonstrated:
pl = annotated.first().plantid
testplant = Plants.objects.get(pk=pl) # plant object
pco2 = Pollutions.objects.filter(plantid=testplant, year=YEAR, pollutant="CO2", releasesto="Air")
len(pco2) # 1, as expected
Why does django return to many results and how can I tell django to limit the elements to annotate to the 'current primary key' in other words to only annotate the elements where the foreign key matches the primary key?
I can achieve what I intend to do by using distinct and Max:
energy=Sum('yearly__power', distinct=True, filter=Q(yearly__year=YEAR)),
co2=Max('pollutions__amount', ...
However the performance is inacceptable.
I have tested to use model_to_dict and appending the wanted values "by hand" to the dict, which works for the values itself, but not for sorting the resulted dict (e.g. by energy) and it is acutally faster than the workaround directly above.
It conceptually strikes to me that the manual approach is faster than letting the database do, what it is intended to do.
Is this a feature limitation of django's orm or am I missing something?
EDIT1:
The behaviour is known as bug since 11 years.
Even others "spent a whole day on this".
I am now trying it with subqueries. However the forein key I am using is not a primary key of its table. So the kind of "usual" approach to use "pk=''" does not work. More clearly, trying:
tmp = Plants.objects.filter(somefilter)
subq1 = Subquery(Yearly.objects.filter(pk=OuterRef('plantid'), year=YEAR)) tmp1 = tmp.all().annotate(
energy=Count(Subquery(subq1))
)
returns
OperationalError at /xyz
no such column: U0.yid
Which definitely makes sense because Plants has no clue what a yid is, it only knows plantids. How do I adjust the subquery to that?

Query intermediate through fields in django

I have a simple Relation model, where a user can follow a tag just like stackoverflow.
class Relation(models.Model):
user = AutoOneToOneField(User)
follows_tag = models.ManyToManyField(Tag, blank=True, null=True, through='TagRelation')
class TagRelation(models.Model):
user = models.ForeignKey(Relation, on_delete=models.CASCADE)
following_tag = models.ForeignKey(Tag, on_delete=models.CASCADE)
pub_date = models.DateTimeField(default=timezone.now)
class Meta:
unique_together = ['user', 'following_tag']
Now, to get the results of all the tags a user is following:
kakar = CustomUser.objects.get(email="kakar#gmail.com")
tags_following = kakar.relation.follows_tag.all()
This is fine.
But, to access intermediate fields I have to go through a big list of other queries. Suppose I want to display when the user started following a tag, I will have to do something like this:
kakar = CustomUser.objects.get(email="kakar#gmail.com")
kakar_relation = Relation.objects.get(user=kakar)
t1 = kakar.relation.follows_tag.all()[0]
kakar_t1_relation = TagRelation.objects.get(user=kakar_relation, following_tag=t1)
kakar_t1_relation.pub_date
As you can see, just to get the date I have to go through so much query. Is this the only way to get intermediate values, or this can be optimized? Also, I am not sure if this model design is the way to go, so if you have any recomendation or advice I would be very grateful. Thank you.

You need to use Double underscore i.e. ( __ ) for ForeignKey lookup,
Like this :
user_tags = TagRelation.objects.filter(user__user__email="kakar#gmail.com").values("following_tag__name", "pub_date")
If you need the name of the tag, you can use following_tag__name in the query and if you need id you can use following_tag__id.
And for that you need to iterate through the result of above query set, like this:
for items in user_tags:
print items['following_tag__name']
print items['pub_date']
One more thing,The key word values will return a list of dictionaries and you can iterate it through above method and if you are using values_list in the place of values, it will return a list of tuples. Read further from here .

How to return all records but exclude the last item

I'm having problem filtering in django-models.
I want to return all records of a particular animal but excluding the last item based on the latest created_at value and sorted in a descending order.
I have this model.
class Heat(models.Model):
# Fields
performer = models.CharField(max_length=25)
is_bred = models.BooleanField(default=False)
note = models.TextField(max_length=250, blank=True, null=True)
result = models.BooleanField(default=False)
# Relationship Fields
animal = models.ForeignKey(Animal, related_name='heats', on_delete=models.CASCADE)
created_at = models.DateTimeField(auto_now_add=True, editable=False)
last_updated = models.DateTimeField(auto_now=True, editable=False)
I was able to achieved the desired result by this raw sql script. But I want a django approach.
SELECT
*
FROM
heat
WHERE
heat.created_at != (SELECT MAX((heat.created_at)) FROM heat)
AND heat.animal_id = '2' ORDER BY heat.created_at DESC;
Please help.

It will be
Heat.objects.order_by("-created_at")[1:]
For a particular animal it will then be:
Heat.objects.filter(animal_id=2).order_by("-created_at")[1:]
where [1:] on a queryset has a regular python slice syntax and generates the correct SQL code. (In this case simply removes the first / most recently created element)
Upd: as #schwobaseggl mentioned, in the comments, slices with negative index don't work on django querysets. Therefore the objects are reverse ordered first.

I just converted your SQL query to Django ORM code.
First, fetch the max created_at value using aggregation and do an exclude.
from django.db.models import Max
heat_objects = Heat.objects.filter(
animal_id=2
).exclude(
created_at=Heat.objects.all().aggregate(Max('created_at'))['created_at__max']
)

Get last record:
obj= Heat.objects.all().order_by('-id')[0]
Make query:
query = Heat.objects.filter(animal_id=2).exclude(id=obj['id']).all()

The query would be :
Heat.objects.all().order_by('id')[1:]
You could also put any filter you require by replacing all()

Django queryset - Adding HAVING constraint

I have been using Django for a couple of years now but I am struggling today with adding a HAVING constraint to a GROUP BY.
My queryset is the following:
crm_models.Contact.objects\
.filter(dealercontact__dealer__pk__in=(265,),
dealercontact__activity='gardening',
date_data_collected__gte=datetime.date(2012,10,1),
date_data_collected__lt=datetime.date(2013,10,1))\
.annotate(nb_rels=Count('dealercontact'))
which gives me the following MySQL query:
SELECT *
FROM `contact`
LEFT OUTER JOIN `dealer_contact` ON (`contact`.`id_contact` = `dealer_contact`.`id_contact`)
WHERE (`dealer_contact`.`active` = True
AND `dealer_contact`.`activity` = 'gardening'
AND `contact`.`date_data_collected` >= '2012-10-01'
AND `contact`.`date_data_collected` < '2013-10-01'
AND `dealer_contact`.`id_dealer` IN (265))
GROUP BY `contact`.`id_contact`
ORDER BY NULL;
I would get exactly what I need with this HAVING constraint:
HAVING SUM(IF(`dealer_contact`.`type`='customer', 1, 0)) = 0
How can I get this fixed with a Django Queryset? I need a queryset in this instance.
Here I am using annotate only in order to get the GROUP BY on contact.id_contact.
Edit: My goal is to get the Contacts who have no "customer" relation in dealercontact but have "ref" relation(s) (according to the WHERE clause of course).
Models
class Contact(models.Model):
id_contact = models.AutoField(primary_key=True)
title = models.CharField(max_length=255L, blank=True, choices=choices_custom_sort(TITLE_CHOICES))
last_name = models.CharField(max_length=255L, blank=True)
first_name = models.CharField(max_length=255L, blank=True)
[...]
date_data_collected = models.DateField(null=True, db_index=True)
class Dealer(models.Model):
id_dealer = models.AutoField(primary_key=True)
address1 = models.CharField(max_length=45L, blank=True)
[...]
class DealerContact(Auditable):
id_dealer_contact = models.AutoField(primary_key=True)
contact = models.ForeignKey(Contact, db_column='id_contact')
dealer = models.ForeignKey(Dealer, db_column='id_dealer')
activity = models.CharField(max_length=32, choices=choices_custom_sort(ACTIVITIES), db_index=True)
type = models.CharField(max_length=32, choices=choices_custom_sort(DEALER_CONTACT_TYPE), db_index=True)

I figured this out by adding two binary fields in DealerContact: is_ref and is_customer.
If type='ref' then is_ref=1 and is_customer=0.
Else if type='customer' then is_ref=0 and is_customer=1.
Thus, I am now able to use annotate(nb_customers=Sum('is_customer')) and then use filter(nb_customers=0).
The final queryset consists in:
Contact.objects.filter(dealercontact__dealer__pk__in=(265,),
dealercontact__activity='gardening',
date_data_collected__gte=datetime.date(2012,10,1),
date_data_collected__lt=datetime.date(2013,10,1))\
.annotate(nb_customers=Sum('dealercontact__is_customer'))\
.filter(nb_customers=0)

Actually there is a way you can add your own custom HAVING and GROUP BY clauses if you need.
Just use my example with caution - if Django ORM code/paths will change in future Django versions, you will have to update your code too.
Image you have Book and Edition models, where for each book there can be multiple editions and you want to select first US edition date within Book queryset.
Adding custom HAVING and GROUP BY clauses in Django 1.5+:
from django.db.models import Min
from django.db.models.sql.where import ExtraWhere, AND
qs = Book.objects.all()
# Standard annotate
qs = qs.annotate(first_edition_date=Min("edition__date"))
# Custom HAVING clause, to limit annotation by US country only
qs.query.having.add(ExtraWhere(['"app_edition"."country"=%s'], ["US"]), AND)
# Custom GROUP BY clause will be needed too
qs.query.group_by.append(("app_edition", "country"))
ExtraWhere can contain not just fields, but any raw sql conditions and functions too.

Are you not using raw query just because you want orm object? Using Contact.objects.raw() generate instances similar filter. Refer to https://docs.djangoproject.com/en/dev/topics/db/sql/ for more help.

My goal is to get the Contacts who have no "customer" relation in
dealercontact but have "ref" relation(s) (according to the WHERE
clause of course).
This simple query fulfills this requirement:
Contact.objects.filter(dealercontact__type="ref").exclude(dealercontact__type="customer")
Is this enough, or do you need it to do something more?
UPDATE: if your requirement is
Contacts that have a "ref" relations, but do not have "customer"
relations with the same dealer
you can do this:
from django.db.models import Q
Contact.objects.filter(Q(dealercontact__type="ref") & ~Q(dealercontact__type="customer"))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django annotate fails when querying unique values - django

Related

Django Annotation Count with Subquery & OuterRef

Django annotation on compoundish primary key with filter ignoring primary key resutling in too many annotated items

Query intermediate through fields in django

How to return all records but exclude the last item

Django queryset - Adding HAVING constraint

Categories

Resources