django: aggregation of aggregation with subquery: cannot compute - Avg is an aggregate

django: aggregation of aggregation with subquery: cannot compute - Avg is an aggregate - django

I've been scouting & testing for quite some time now and I'm unable to get anything working with MariaDB.
My models:
class SequenceCompletion(SequenceMixin, models.Model):
pass
class Play(SequenceMixin, models.Model):
puzzle = models.ForeignKey(
'puzzles.Puzzle', on_delete=models.CASCADE, related_name='plays')
rating = models.IntegerField(default=-1)
completion = models.ForeignKey(SequenceCompletion, blank=True,
null=True, on_delete=models.SET_NULL, related_name='plays')
I have a subquery:
rating_average_by_puzzle_from_completion_plays = Play.objects.filter(completion_id=OuterRef('id')).values('puzzle_id').order_by('puzzle_id').annotate(puzzle_rating_average=Avg('rating')).values('puzzle_rating_average')
where the ratings of all plays corresponding to a given completion are averaged per puzzle_id
Afterwards, I'm trying, for each completion, to average the puzzle_rating_average corresponding to each puzzle_id:
if I do annotate, I end up with :
SequenceCompletion.objects.annotate(rating_avg_from_completion_plays=Subquery(rating_average_by_puzzle_from_completion_plays.annotate(result=Avg('puzzle_rating_average')).order_by().values('result')))
django.core.exceptions.FieldError: Cannot compute Avg('puzzle_rating_average'): 'puzzle_rating_average' is an aggregate
if I do aggregate, I end up with :
SequenceCompletion.objects.annotate(rating_avg_from_completion_plays=Subquery(rating_average_by_puzzle_from_completion_plays.aggregate(result=Avg('puzzle_rating_average')).order_by().values('result')))
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
the same if I add completion_id to the subquery:
rating_average_by_puzzle_from_completion_plays = Play.objects.filter(completion_id=OuterRef('id')).values('puzzle_id').order_by('puzzle_id').annotate(puzzle_rating_average=Avg('rating')).values('puzzle_rating_average','completion_id')
I tried as well:
SequenceCompletion.objects.annotate(rating_avg_from_completion_plays=Avg(Subquery(rating_average_by_puzzle_from_completion_plays)))
and I end up with:
MySQLdb._exceptions.OperationalError: (1242, 'Subquery returns more than 1 row'
Nothing I could find on the documentation or in any post would work, so I'll appreciate any help

Related

Django Alternative To Inner Join When Annotating

I'm having some trouble generating an annotation for the following models:
class ResultCode(GenericSteamDataModel):
id = models.IntegerField(db_column='PID')
result_code = models.IntegerField(db_column='resultcode', primary_key=True)
campaign = models.OneToOneField(SteamCampaign, db_column='campagnePID', on_delete=models.CASCADE)
sale = models.BooleanField(db_column='ishit')
factor = models.DecimalField(db_column='factor', max_digits=5, decimal_places=2)
class Meta:
managed = False
constraints = [
models.UniqueConstraint(fields=['result_code', 'campaign'], name='result_code per campaign unique')
]
class CallStatistics(GenericShardedDataModel, GenericSteamDataModel):
objects = CallStatisticsManager()
project = models.OneToOneField(SteamProject, primary_key=True, db_column='projectpid', on_delete=models.CASCADE)
result_code = models.ForeignKey(ResultCode, db_column='resultcode', on_delete=models.CASCADE)
class Meta:
managed = False
The goal is to find the sum of factors based on the result_code field in the ResultCode and CallStatistics model, when sale=True.
Note that:
Result codes are not unique by themselves (described in model). A Project has a relation to a Campaign
The following annotation generates the result that is desired (possible solution):
result = CallStatistics.objects.all().values('project').annotate(
sales_factored=models.Sum(
models.Case(
models.When(
models.Q(sale=True) & models.Q(project__campaign=models.F('result_code__campaign')),
then=models.F('result_code__factor')
)
)
)
)
The problem is that the generated query performs a Inner Join on result_code between the 2 models.
Trying to add another field in the same annotation (that should not be joined with Resultcode), for example:
sales=models.Sum(Cast('sale', models.IntegerField())),
results in a wrong summation.
The Questions is if there is an alternative to the automatic Inner Join that Django generates. So that it is possible to retrieve the following fields (and others similar) in 1 annotation:
...
sales=models.Sum(Cast('sale', models.IntegerField())),
sales_factored= [sum of factores, without Inner Join]
...
Thanks in advance for taking your time for this.

Django annotation on compoundish primary key with filter ignoring primary key resutling in too many annotated items

Please see EDIT1 below, as well.
Using Django 3.0.6 and python3.8, given following models
class Plants(models.Model):
plantid = models.TextField(primary_key=True, unique=True)
class Pollutions(models.Model):
pollutionsid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
pollutant = models.TextField()
releasesto = models.TextField(blank=True, null=True)
amount = models.FloatField(db_column="amount", blank=True, null=True)
class Meta:
managed = False
db_table = 'pollutions'
unique_together = (('plantid', 'releasesto', 'pollutant', 'year'))
class Monthp(models.Model):
monthpid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
month = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
power = models.IntegerField(null=False)
class Meta:
managed = False
db_table = 'monthp'
unique_together = ('plantid', 'year', 'month')
I'd like to annotate - based on a foreign key relationship and a fiter a value, particulary - to each plant the amount of co2 and the Sum of its power for a given year. For sake of debugging having replaced Sum by Count using the following query:
annotated = tmp.all().annotate(
energy=Count('monthp__power', filter=Q(monthp__year=YEAR)),
co2=Count('pollutions__amount', filter=Q(pollutions__year=YEAR, pollutions__pollutant="CO2", pollutions__releasesto="Air")))
However this returns too many items (a wrong number using Sum, respectively)
annotated.first().co2 # 60, but it should be 1
annotated.first().energy # 252, but it should be 1
although my database guarantees - as denoted, that (plantid, year, month) and (plantid, releasesto, pollutant, year) are unique together, which can easily be demonstrated:
pl = annotated.first().plantid
testplant = Plants.objects.get(pk=pl) # plant object
pco2 = Pollutions.objects.filter(plantid=testplant, year=YEAR, pollutant="CO2", releasesto="Air")
len(pco2) # 1, as expected
Why does django return to many results and how can I tell django to limit the elements to annotate to the 'current primary key' in other words to only annotate the elements where the foreign key matches the primary key?
I can achieve what I intend to do by using distinct and Max:
energy=Sum('yearly__power', distinct=True, filter=Q(yearly__year=YEAR)),
co2=Max('pollutions__amount', ...
However the performance is inacceptable.
I have tested to use model_to_dict and appending the wanted values "by hand" to the dict, which works for the values itself, but not for sorting the resulted dict (e.g. by energy) and it is acutally faster than the workaround directly above.
It conceptually strikes to me that the manual approach is faster than letting the database do, what it is intended to do.
Is this a feature limitation of django's orm or am I missing something?
EDIT1:
The behaviour is known as bug since 11 years.
Even others "spent a whole day on this".
I am now trying it with subqueries. However the forein key I am using is not a primary key of its table. So the kind of "usual" approach to use "pk=''" does not work. More clearly, trying:
tmp = Plants.objects.filter(somefilter)
subq1 = Subquery(Yearly.objects.filter(pk=OuterRef('plantid'), year=YEAR)) tmp1 = tmp.all().annotate(
energy=Count(Subquery(subq1))
)
returns
OperationalError at /xyz
no such column: U0.yid
Which definitely makes sense because Plants has no clue what a yid is, it only knows plantids. How do I adjust the subquery to that?

How to return all records but exclude the last item

I'm having problem filtering in django-models.
I want to return all records of a particular animal but excluding the last item based on the latest created_at value and sorted in a descending order.
I have this model.
class Heat(models.Model):
# Fields
performer = models.CharField(max_length=25)
is_bred = models.BooleanField(default=False)
note = models.TextField(max_length=250, blank=True, null=True)
result = models.BooleanField(default=False)
# Relationship Fields
animal = models.ForeignKey(Animal, related_name='heats', on_delete=models.CASCADE)
created_at = models.DateTimeField(auto_now_add=True, editable=False)
last_updated = models.DateTimeField(auto_now=True, editable=False)
I was able to achieved the desired result by this raw sql script. But I want a django approach.
SELECT
*
FROM
heat
WHERE
heat.created_at != (SELECT MAX((heat.created_at)) FROM heat)
AND heat.animal_id = '2' ORDER BY heat.created_at DESC;
Please help.

It will be
Heat.objects.order_by("-created_at")[1:]
For a particular animal it will then be:
Heat.objects.filter(animal_id=2).order_by("-created_at")[1:]
where [1:] on a queryset has a regular python slice syntax and generates the correct SQL code. (In this case simply removes the first / most recently created element)
Upd: as #schwobaseggl mentioned, in the comments, slices with negative index don't work on django querysets. Therefore the objects are reverse ordered first.

I just converted your SQL query to Django ORM code.
First, fetch the max created_at value using aggregation and do an exclude.
from django.db.models import Max
heat_objects = Heat.objects.filter(
animal_id=2
).exclude(
created_at=Heat.objects.all().aggregate(Max('created_at'))['created_at__max']
)

Get last record:
obj= Heat.objects.all().order_by('-id')[0]
Make query:
query = Heat.objects.filter(animal_id=2).exclude(id=obj['id']).all()

The query would be :
Heat.objects.all().order_by('id')[1:]
You could also put any filter you require by replacing all()

order_by on Many-to-Many field results in duplicate entries in queryset

I am attempting to perform an order_by based a m2m field, but it ends up creating duplicate entries in my queryset. I have been searching through the django documentation and related questions on stack exchange, but I haven't been able to come up with any solutions.
Models:
class WorkOrder(models.Model):
...
appointment = models.ManyToManyField(Appointment, null=True, blank=True, related_name = 'appointment_from_schedule')
...
class Appointment(models.Model):
title = models.CharField(max_length=1000, blank=True)
allDay = models.BooleanField(default=False)
start = models.DateTimeField()
end = models.DateTimeField(null=True, blank=True)
url = models.URLField(blank=True, null=True)
Query:
qs = WorkOrder.objects.filter(work_order_status="complete").order_by("-appointment__start")
Results:
[<WorkOrder: 45: Davis>, <WorkOrder: 45: Davis>]
In interactive mode:
>>>qs[0] == a[1]
True
>>>qs[0].pk
45
>>>qs[1].pk
45
If I remove the order_by then I get only a single result, but adding it later puts the duplicate entry back in.
>>>qs = WorkOrder.objects.filter(work_order_status="complete")
>>>qs
[<WorkOrder: 45: Davis>]
>>>qs.order_by('appointment__start')
[<WorkOrder: 45: Davis>, <WorkOrder: 45: Davis>]
I have tried adding .distinct() and .distinct('pk'), but the former has no effect and the latter results in an error:
ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions

I took suggestions provided by sfletche about using annotate and discussed the problem in freenode.net irc channel #django.
Users FunkyBob and jtiai were able to help me getting it working.
Since there can be many appointments for each work order, when we ask it to order by appointments, it will return a row for every instance of appointment since it doesn't know which appointment I was intending for it to order by.
from django.db.models import Max
WorkOrder.objects.annotate(max_date=Max('appointment__start')).filter(work_order_status="complete").order_by('max_date')
So, we were on the right path it was just about getting the syntax correct.
Thank you for the help sfletche, FunkyBob and jtiai.

You might try using annotate with values:
qs = WorkOrder.objects.filter(work_order_status="complete").values("appointment").annotate(status="work_order_status").order_by("-appointment__start")

Django queryset - Adding HAVING constraint

I have been using Django for a couple of years now but I am struggling today with adding a HAVING constraint to a GROUP BY.
My queryset is the following:
crm_models.Contact.objects\
.filter(dealercontact__dealer__pk__in=(265,),
dealercontact__activity='gardening',
date_data_collected__gte=datetime.date(2012,10,1),
date_data_collected__lt=datetime.date(2013,10,1))\
.annotate(nb_rels=Count('dealercontact'))
which gives me the following MySQL query:
SELECT *
FROM `contact`
LEFT OUTER JOIN `dealer_contact` ON (`contact`.`id_contact` = `dealer_contact`.`id_contact`)
WHERE (`dealer_contact`.`active` = True
AND `dealer_contact`.`activity` = 'gardening'
AND `contact`.`date_data_collected` >= '2012-10-01'
AND `contact`.`date_data_collected` < '2013-10-01'
AND `dealer_contact`.`id_dealer` IN (265))
GROUP BY `contact`.`id_contact`
ORDER BY NULL;
I would get exactly what I need with this HAVING constraint:
HAVING SUM(IF(`dealer_contact`.`type`='customer', 1, 0)) = 0
How can I get this fixed with a Django Queryset? I need a queryset in this instance.
Here I am using annotate only in order to get the GROUP BY on contact.id_contact.
Edit: My goal is to get the Contacts who have no "customer" relation in dealercontact but have "ref" relation(s) (according to the WHERE clause of course).
Models
class Contact(models.Model):
id_contact = models.AutoField(primary_key=True)
title = models.CharField(max_length=255L, blank=True, choices=choices_custom_sort(TITLE_CHOICES))
last_name = models.CharField(max_length=255L, blank=True)
first_name = models.CharField(max_length=255L, blank=True)
[...]
date_data_collected = models.DateField(null=True, db_index=True)
class Dealer(models.Model):
id_dealer = models.AutoField(primary_key=True)
address1 = models.CharField(max_length=45L, blank=True)
[...]
class DealerContact(Auditable):
id_dealer_contact = models.AutoField(primary_key=True)
contact = models.ForeignKey(Contact, db_column='id_contact')
dealer = models.ForeignKey(Dealer, db_column='id_dealer')
activity = models.CharField(max_length=32, choices=choices_custom_sort(ACTIVITIES), db_index=True)
type = models.CharField(max_length=32, choices=choices_custom_sort(DEALER_CONTACT_TYPE), db_index=True)

I figured this out by adding two binary fields in DealerContact: is_ref and is_customer.
If type='ref' then is_ref=1 and is_customer=0.
Else if type='customer' then is_ref=0 and is_customer=1.
Thus, I am now able to use annotate(nb_customers=Sum('is_customer')) and then use filter(nb_customers=0).
The final queryset consists in:
Contact.objects.filter(dealercontact__dealer__pk__in=(265,),
dealercontact__activity='gardening',
date_data_collected__gte=datetime.date(2012,10,1),
date_data_collected__lt=datetime.date(2013,10,1))\
.annotate(nb_customers=Sum('dealercontact__is_customer'))\
.filter(nb_customers=0)

Actually there is a way you can add your own custom HAVING and GROUP BY clauses if you need.
Just use my example with caution - if Django ORM code/paths will change in future Django versions, you will have to update your code too.
Image you have Book and Edition models, where for each book there can be multiple editions and you want to select first US edition date within Book queryset.
Adding custom HAVING and GROUP BY clauses in Django 1.5+:
from django.db.models import Min
from django.db.models.sql.where import ExtraWhere, AND
qs = Book.objects.all()
# Standard annotate
qs = qs.annotate(first_edition_date=Min("edition__date"))
# Custom HAVING clause, to limit annotation by US country only
qs.query.having.add(ExtraWhere(['"app_edition"."country"=%s'], ["US"]), AND)
# Custom GROUP BY clause will be needed too
qs.query.group_by.append(("app_edition", "country"))
ExtraWhere can contain not just fields, but any raw sql conditions and functions too.

Are you not using raw query just because you want orm object? Using Contact.objects.raw() generate instances similar filter. Refer to https://docs.djangoproject.com/en/dev/topics/db/sql/ for more help.

My goal is to get the Contacts who have no "customer" relation in
dealercontact but have "ref" relation(s) (according to the WHERE
clause of course).
This simple query fulfills this requirement:
Contact.objects.filter(dealercontact__type="ref").exclude(dealercontact__type="customer")
Is this enough, or do you need it to do something more?
UPDATE: if your requirement is
Contacts that have a "ref" relations, but do not have "customer"
relations with the same dealer
you can do this:
from django.db.models import Q
Contact.objects.filter(Q(dealercontact__type="ref") & ~Q(dealercontact__type="customer"))

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

django: aggregation of aggregation with subquery: cannot compute - Avg is an aggregate - django

Related

Django Alternative To Inner Join When Annotating

Django annotation on compoundish primary key with filter ignoring primary key resutling in too many annotated items

How to return all records but exclude the last item

order_by on Many-to-Many field results in duplicate entries in queryset

Django queryset - Adding HAVING constraint

Categories

Resources