Django Alternative To Inner Join When Annotating - django

I'm having some trouble generating an annotation for the following models:
class ResultCode(GenericSteamDataModel):
id = models.IntegerField(db_column='PID')
result_code = models.IntegerField(db_column='resultcode', primary_key=True)
campaign = models.OneToOneField(SteamCampaign, db_column='campagnePID', on_delete=models.CASCADE)
sale = models.BooleanField(db_column='ishit')
factor = models.DecimalField(db_column='factor', max_digits=5, decimal_places=2)
class Meta:
managed = False
constraints = [
models.UniqueConstraint(fields=['result_code', 'campaign'], name='result_code per campaign unique')
]
class CallStatistics(GenericShardedDataModel, GenericSteamDataModel):
objects = CallStatisticsManager()
project = models.OneToOneField(SteamProject, primary_key=True, db_column='projectpid', on_delete=models.CASCADE)
result_code = models.ForeignKey(ResultCode, db_column='resultcode', on_delete=models.CASCADE)
class Meta:
managed = False
The goal is to find the sum of factors based on the result_code field in the ResultCode and CallStatistics model, when sale=True.
Note that:
Result codes are not unique by themselves (described in model). A Project has a relation to a Campaign
The following annotation generates the result that is desired (possible solution):
result = CallStatistics.objects.all().values('project').annotate(
sales_factored=models.Sum(
models.Case(
models.When(
models.Q(sale=True) & models.Q(project__campaign=models.F('result_code__campaign')),
then=models.F('result_code__factor')
)
)
)
)
The problem is that the generated query performs a Inner Join on result_code between the 2 models.
Trying to add another field in the same annotation (that should not be joined with Resultcode), for example:
sales=models.Sum(Cast('sale', models.IntegerField())),
results in a wrong summation.
The Questions is if there is an alternative to the automatic Inner Join that Django generates. So that it is possible to retrieve the following fields (and others similar) in 1 annotation:
...
sales=models.Sum(Cast('sale', models.IntegerField())),
sales_factored= [sum of factores, without Inner Join]
...
Thanks in advance for taking your time for this.

Related

django: aggregation of aggregation with subquery: cannot compute - Avg is an aggregate

I've been scouting & testing for quite some time now and I'm unable to get anything working with MariaDB.
My models:
class SequenceCompletion(SequenceMixin, models.Model):
pass
class Play(SequenceMixin, models.Model):
puzzle = models.ForeignKey(
'puzzles.Puzzle', on_delete=models.CASCADE, related_name='plays')
rating = models.IntegerField(default=-1)
completion = models.ForeignKey(SequenceCompletion, blank=True,
null=True, on_delete=models.SET_NULL, related_name='plays')
I have a subquery:
rating_average_by_puzzle_from_completion_plays = Play.objects.filter(completion_id=OuterRef('id')).values('puzzle_id').order_by('puzzle_id').annotate(puzzle_rating_average=Avg('rating')).values('puzzle_rating_average')
where the ratings of all plays corresponding to a given completion are averaged per puzzle_id
Afterwards, I'm trying, for each completion, to average the puzzle_rating_average corresponding to each puzzle_id:
if I do annotate, I end up with :
SequenceCompletion.objects.annotate(rating_avg_from_completion_plays=Subquery(rating_average_by_puzzle_from_completion_plays.annotate(result=Avg('puzzle_rating_average')).order_by().values('result')))
django.core.exceptions.FieldError: Cannot compute Avg('puzzle_rating_average'): 'puzzle_rating_average' is an aggregate
if I do aggregate, I end up with :
SequenceCompletion.objects.annotate(rating_avg_from_completion_plays=Subquery(rating_average_by_puzzle_from_completion_plays.aggregate(result=Avg('puzzle_rating_average')).order_by().values('result')))
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
the same if I add completion_id to the subquery:
rating_average_by_puzzle_from_completion_plays = Play.objects.filter(completion_id=OuterRef('id')).values('puzzle_id').order_by('puzzle_id').annotate(puzzle_rating_average=Avg('rating')).values('puzzle_rating_average','completion_id')
I tried as well:
SequenceCompletion.objects.annotate(rating_avg_from_completion_plays=Avg(Subquery(rating_average_by_puzzle_from_completion_plays)))
and I end up with:
MySQLdb._exceptions.OperationalError: (1242, 'Subquery returns more than 1 row'
Nothing I could find on the documentation or in any post would work, so I'll appreciate any help

Django annotation on compoundish primary key with filter ignoring primary key resutling in too many annotated items

Please see EDIT1 below, as well.
Using Django 3.0.6 and python3.8, given following models
class Plants(models.Model):
plantid = models.TextField(primary_key=True, unique=True)
class Pollutions(models.Model):
pollutionsid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
pollutant = models.TextField()
releasesto = models.TextField(blank=True, null=True)
amount = models.FloatField(db_column="amount", blank=True, null=True)
class Meta:
managed = False
db_table = 'pollutions'
unique_together = (('plantid', 'releasesto', 'pollutant', 'year'))
class Monthp(models.Model):
monthpid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
month = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
power = models.IntegerField(null=False)
class Meta:
managed = False
db_table = 'monthp'
unique_together = ('plantid', 'year', 'month')
I'd like to annotate - based on a foreign key relationship and a fiter a value, particulary - to each plant the amount of co2 and the Sum of its power for a given year. For sake of debugging having replaced Sum by Count using the following query:
annotated = tmp.all().annotate(
energy=Count('monthp__power', filter=Q(monthp__year=YEAR)),
co2=Count('pollutions__amount', filter=Q(pollutions__year=YEAR, pollutions__pollutant="CO2", pollutions__releasesto="Air")))
However this returns too many items (a wrong number using Sum, respectively)
annotated.first().co2 # 60, but it should be 1
annotated.first().energy # 252, but it should be 1
although my database guarantees - as denoted, that (plantid, year, month) and (plantid, releasesto, pollutant, year) are unique together, which can easily be demonstrated:
pl = annotated.first().plantid
testplant = Plants.objects.get(pk=pl) # plant object
pco2 = Pollutions.objects.filter(plantid=testplant, year=YEAR, pollutant="CO2", releasesto="Air")
len(pco2) # 1, as expected
Why does django return to many results and how can I tell django to limit the elements to annotate to the 'current primary key' in other words to only annotate the elements where the foreign key matches the primary key?
I can achieve what I intend to do by using distinct and Max:
energy=Sum('yearly__power', distinct=True, filter=Q(yearly__year=YEAR)),
co2=Max('pollutions__amount', ...
However the performance is inacceptable.
I have tested to use model_to_dict and appending the wanted values "by hand" to the dict, which works for the values itself, but not for sorting the resulted dict (e.g. by energy) and it is acutally faster than the workaround directly above.
It conceptually strikes to me that the manual approach is faster than letting the database do, what it is intended to do.
Is this a feature limitation of django's orm or am I missing something?
EDIT1:
The behaviour is known as bug since 11 years.
Even others "spent a whole day on this".
I am now trying it with subqueries. However the forein key I am using is not a primary key of its table. So the kind of "usual" approach to use "pk=''" does not work. More clearly, trying:
tmp = Plants.objects.filter(somefilter)
subq1 = Subquery(Yearly.objects.filter(pk=OuterRef('plantid'), year=YEAR)) tmp1 = tmp.all().annotate(
energy=Count(Subquery(subq1))
)
returns
OperationalError at /xyz
no such column: U0.yid
Which definitely makes sense because Plants has no clue what a yid is, it only knows plantids. How do I adjust the subquery to that?

Annotate QuerySet using raw SQL

In case I am asking the wrong question, let me first state the end goal: I need to allow users to filter a ListView by a field that's not in the primary model (Salesleadquote), but instead the field comes from a model (Salesleadbusinessgroup) with a FK to a related model (Saleslead).
The way I am trying to approach this is by annotating a field on Salesleadquote.
The models:
class Salesleadquote(models.Model):
salesleadquoteid = models.AutoField(db_column='SalesLeadQuoteId', primary_key=True)
salesleadid = models.ForeignKey(Saleslead, models.DO_NOTHING, db_column='SalesLeadId')
...
class Saleslead(models.Model):
salesleadid = models.AutoField(db_column='SalesLeadId', primary_key=True)
...
class Salesleadbusinessgroup(models.Model):
salesleadbusinessgroupid = models.AutoField(db_column='SalesLeadBusinessGroupId', primary_key=True)
salesleadid = models.ForeignKey(Saleslead, models.DO_NOTHING, db_column='SalesLeadId')
businessgroupid = models.ForeignKey(Businessgroup, models.DO_NOTHING, db_column='BusinessGroupId')
The desired result (queryset), in SQL:
SELECT slq.*, slbg.BusinessGroupId FROM crm.SalesLeadQuote slq
LEFT JOIN
(SELECT SalesLeadId, BusinessGroupId
FROM crm.SalesLeadBusinessGroup ) slbg
ON slbg.SalesLeadId = slq.SalesLeadId
WHERE slbg.BusinessGroupId IN (5,21)
I know I can get a RawQuerySet by doing something like
Salesleadquote.objects.raw("SELECT salesleadquote.*, \
salesleadbusinessgroup.businessgroupid \
FROM salesleadquote \
LEFT JOIN salesleadbusinessgroup \
ON salesleadquote.salesleadid = salesleadbusinessgroup.salesleadid \
WHERE salesleadbusinessgroup.businessgroupid IN (5,21)")
But I need the functionality of a QuerySet, so my idea was to annotate the desired field (businessgroupid) in Salesleadquote, but I've been struggling with how to accomplish this.
I implemented a work-around that doesn't address my original question but works for my use case. I created a view at the database level (called SalesLeadQuoteBG) using the SQL I had posted and then tied that to a model to use with Django's ORM.
class Salesleadquotebg(models.Model):
"""
This model represents a database view that extends the Salesleadquote table with a business group id column.
There can be multiple business groups per quote, resulting in duplicate quotes, but this is handled at the view and template layer
via filtering (users are required to select a business group).
"""
salesleadquoteid = models.IntegerField(db_column='SalesLeadQuoteId', primary_key=True) # Field name made lowercase.
salesleadid = models.ForeignKey(Saleslead, models.DO_NOTHING, db_column='SalesLeadId') # Field name made lowercase.
...
businessgroupid = models.ForeignKey(Businessgroup, models.DO_NOTHING, db_column='BusinessGroupId')
I am using django-filters for the filtering.
filters.py:
BG_CHOICES = (
(5, 'Machine Vision'),
(21, 'Process Systems'),
)
class BGFilter(django_filters.FilterSet):
businessgroupid = django_filters.ChoiceFilter(choices=BG_CHOICES)
class Meta:
model = Salesleadquotebg
fields = ['businessgroupid', ]
This can be done with pure Django ORM. Relationships can be followed backwards (filtering on salesleadbusinessgroup) and the double underscore syntax can be used to query attributes of the related model and also follow more relationships
Salesleadquote.objects.filter(
salesleadbusinessgroup__businessgroupid__in=(5,21)
)

Django FULL OUTER JOIN

I have these three tables
class IdentificationAddress(models.Model):
id_ident_address = models.AutoField(primary_key=True)
ident = models.ForeignKey('Ident', models.DO_NOTHING, db_column='ident')
address = models.TextField()
time = models.DateTimeField()
class Meta:
managed = False
db_table = 'identification_address'
class IdentC(models.Model):
id_ident = models.AutoField(primary_key=True)
ident = models.TextField(unique=True)
name = models.TextField()
class Meta:
managed = False
db_table = 'ident_c'
class location(models.Model):
id_ident_loc = models.AutoField(primary_key=True)
ident = models.ForeignKey('IdentC', models.DO_NOTHING, db_column='ident')
loc_name = models.TextField()
class Meta:
managed = False
db_table = 'location
I want to get the last
address field (It could be zero) from IdentificationAddress model, the last _loc_name_ field (it matches at least one) from location model, name field (Only one) from IdentC model and ident field. The search is base on ident field.
I have been reading about many_to_many relationships and prefetch_related. But, they don't seem to be the best way to get these information.
If a use SQL syntax, this instruction does the job:
SELECT ident_c.name, ident_c.ident, identification_address.address, location.loc_name FROM identn_c FULL OUTER JOIN location ON ident_c.ident=location.ident FULL OUTER JOIN identification_address ON ident_c.ident=identification_address.ident;
or for this case
SELECT ident_c.name, ident_c.ident, identification_address.address, location.loc_name FROM identn_c LEFT JOIN location ON ident_c.ident=location.ident LEFT JOIN identification_address ON ident_c.ident=identification_address.ident;
Based on my little understanding of Django, JOIN instructions cannot be implemented. Hope I am wrong.
Django ORM take care of it if you set relationship between models.
for example,
models.py
class Aexample(models.Model):
name = models.CharField(max_length=20)
class Bexample(models.Model):
name = models.CharField(max_length=20)
fkexample = models.ForeignKey(Aexample)
shell
examplequery = Bexample.objects.filter(fkexample__name="hellothere")
SQL query
SELECT
"yourtable_bexample"."id",
"yourtable_bexample"."name",
"yourtable_bexample"."fkexample_id"
FROM "yourtable_bexample"
INNER JOIN "yourtable_aexample"
ON ("yourtable_bexample"."fkexample_id" = "yourtable_aexample"."id")
WHERE "yourtable_aexample"."name" = hellothere
you want to make query in Django like below
SELECT ident_c.name, ident_c.ident, identification_address.address, location.loc_name
FROM identn_c
LEFT JOIN location ON ident_c.ident=location.ident
LEFT JOIN identification_address ON ident_c.ident=identification_address.ident;
It means you want all rows from identn_c, right?. If you make proper relationship between your tables for your purpose, Django ORM takes care of it.
class IntentC(model.Model):
exampleA = models.ForeignKey(ExampleA)
exampleB = models.ForeignKey(ExampleB)
this command make query with JOIN Clause.
identn_instance = IdentC.objects.get(id=somenumber)
identn_instance.exampleA
identn_instance.exampleB
you can show every IntentC rows and relating rows in different tables.
for in in IntentC.objects.all(): #you can all rows in IntentC
print(in.exampleA.name)
#show name column in exampleA table
#JOIN ... ON intenctctable.example_id = exampleatable.id
print(in.exampleB.name) #show name column in exampleB table / JOIN ... ON

django querset filter foreign key select first record

I have a History model like below
class History(models.Model):
class Meta:
app_label = 'subscription'
ordering = ['-start_datetime']
subscription = models.ForeignKey(Subscription, related_name='history')
FREE = 'free'
Premium = 'premium'
SUBSCRIPTION_TYPE_CHOICES = ((FREE, 'Free'), (Premium, 'Premium'),)
name = models.CharField(max_length=32, choices=SUBSCRIPTION_TYPE_CHOICES, default=FREE)
start_datetime = models.DateTimeField(db_index=True)
end_datetime = models.DateTimeField(db_index=True, blank=True, null=True)
cancelled_datetime = models.DateTimeField(blank=True, null=True)
Now i have a queryset filtering like below
users = get_user_model().objects.all()
queryset = users.exclude(subscription__history__end_datetime__lt=timezone.now())
The issue is that in the exclude above it is checking end_datetime for all the rows for a particular history object. But i only want to compare it with first row of history object.
Below is how a particular history object looks like. So i want to write a queryset filter which can do datetime comparison on first row only.
You could use a Model Manager method for this. The documentation isn't all that descriptive, but you could do something along the lines of:
class SubscriptionManager(models.Manager):
def my_filter(self):
# You'd want to make this a smaller query most likely
subscriptions = Subscription.objects.all()
results = []
for subscription in subscriptions:
sub_history = subscription.history_set.first()
if sub_history.end_datetime > timezone.now:
results.append(subscription)
return results
class History(models.Model):
subscription = models.ForeignKey(Subscription)
end_datetime = models.DateTimeField(db_index=True, blank=True, null=True)
objects = SubscriptionManager()
Then: queryset = Subscription.objects().my_filter()
Not a copy-pastable answer, but shows the use of Managers. Given the specificity of what you're looking for, I don't think there's a way to get it just via the plain filter() and exclude().
Without knowing what your end goal here is, it's hard to say whether this is feasible, but have you considered adding a property to the subscription model that indicates whatever you're looking for? For example, if you're trying to get everyone who has a subscription that's ending:
class Subscription(models.Model):
#property
def ending(self):
if self.end_datetime > timezone.now:
return True
else:
return False
Then in your code: queryset = users.filter(subscription_ending=True)
I have tried django's all king of expressions(aggregate, query, conditional) but was unable to solve the problem so i went with RawSQL and it solved the problem.
I have used the below SQL to select the first row and then compare the end_datetime
SELECT (end_datetime > %s OR end_datetime IS NULL) AS result
FROM subscription_history
ORDER BY start_datetime DESC
LIMIT 1;
I will select my answer as accepted if not found a solution with queryset filter chaining in next 2 days.