adding .annotate foreign key in objects does not give distinct result - django

When i run following query its working fine
query= UserOrder.objects.filter(order_date__range=(
Request['from'], Request['to'])).values('user_id__username','user_id__email').annotate(
total_no_of_order=Count("pk")).annotate(total=Sum('total_cost'))
But i also want to know the number of services related to order
.annotate(total_no_of_services=Count("userorderservice__pk"))
by adding above code in query total_no_of_order and total_cost getting changed too and by putting distinct=True in total_cost its only considering for single unique total_cost value
Modles.py
class UserOrder(models.Model):
order_id = models.AutoField(primary_key=True)
user_id = models.ForeignKey(AppUser, on_delete=models.CASCADE)
total_cost = models.FloatField()
...
class UserOrderService(models.Model):
order_id = models.ForeignKey(UserOrder, on_delete=models.CASCADE)
...

A UserOrder can have many UserOrderService instances.
Therefore, the correct way to COUNT is:
.annotate(total_no_of_services=Count("userorderservice_set"))
Please have a look at reverse relations

Related

Annotate based on related field, with filters

class IncomeStream(models.Model):
product = models.ForeignKey(Product, related_name="income_streams")
from_date = models.DateTimeField(blank=True, null=True)
to_date = models.DateTimeField(blank=True, null=True)
value = MoneyField(max_digits=14, decimal_places=2, default_currency='USD')
class Product(models.Model):
...
class Sale(models.Model):
product = models.ForeignKey(Product, related_name="sales")
created_at = models.DateTimeField(auto_now_add=True)
...
With the above model, suppose I want to add a value to some Sales using .annotate.
This value is called cpa (cost per action): cpa is the value of the IncomeStream whose from_date and to_date include the Sale created_at in their range.
Furthermore, from_date and to_date are both nullable, in which case we assume they mean infinity.
For example:
<IncomeStream: from 2021-10-10 to NULL, value 10$, product TEST>
<IncomeStream: from NULL to 2021-10-09, value 5$, product TEST>
<Sale: product TEST, created_at 2019-01-01, [cpa should be 5$]>
<Sale: product TEST, created_at 2021-11-01, [cpa should be 10$]>
My question is: is it possible to write all these conditions using only the Django ORM and annotate? If yes, how?
I know F objects can traverse relationships like this:
Sale.objects.annotate(cpa=F('product__income_streams__value'))
But then where exactly can I write all the logic to determine which specific income_stream it should pick the value from?
Please suppose no income stream have overlapping dates for the same product, so the above mentioned specs never result in conflicts.
Something like this should get you started
subquery = (
IncomeStream
.objects
.values('product') # group by product primary key i.e. product_id
.filter(product=OuterRef('product'))
.filter(from_date__gte=OuterRef('created_at'))
.filter(to_date__lte=OuterRef('created_at'))
.annotate(total_value=Sum('value'))
)
Then with the subquery
Sale
.objects
.annotate(
cpa=Subquery(
subquery.values('total_value')
) # subquery should return only one row so
# so just need the total_value column
)
Without the opportunity to play around with this in the shell myself I not 100%. It should be close though anyway.

Django annotation on compoundish primary key with filter ignoring primary key resutling in too many annotated items

Please see EDIT1 below, as well.
Using Django 3.0.6 and python3.8, given following models
class Plants(models.Model):
plantid = models.TextField(primary_key=True, unique=True)
class Pollutions(models.Model):
pollutionsid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
pollutant = models.TextField()
releasesto = models.TextField(blank=True, null=True)
amount = models.FloatField(db_column="amount", blank=True, null=True)
class Meta:
managed = False
db_table = 'pollutions'
unique_together = (('plantid', 'releasesto', 'pollutant', 'year'))
class Monthp(models.Model):
monthpid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
month = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
power = models.IntegerField(null=False)
class Meta:
managed = False
db_table = 'monthp'
unique_together = ('plantid', 'year', 'month')
I'd like to annotate - based on a foreign key relationship and a fiter a value, particulary - to each plant the amount of co2 and the Sum of its power for a given year. For sake of debugging having replaced Sum by Count using the following query:
annotated = tmp.all().annotate(
energy=Count('monthp__power', filter=Q(monthp__year=YEAR)),
co2=Count('pollutions__amount', filter=Q(pollutions__year=YEAR, pollutions__pollutant="CO2", pollutions__releasesto="Air")))
However this returns too many items (a wrong number using Sum, respectively)
annotated.first().co2 # 60, but it should be 1
annotated.first().energy # 252, but it should be 1
although my database guarantees - as denoted, that (plantid, year, month) and (plantid, releasesto, pollutant, year) are unique together, which can easily be demonstrated:
pl = annotated.first().plantid
testplant = Plants.objects.get(pk=pl) # plant object
pco2 = Pollutions.objects.filter(plantid=testplant, year=YEAR, pollutant="CO2", releasesto="Air")
len(pco2) # 1, as expected
Why does django return to many results and how can I tell django to limit the elements to annotate to the 'current primary key' in other words to only annotate the elements where the foreign key matches the primary key?
I can achieve what I intend to do by using distinct and Max:
energy=Sum('yearly__power', distinct=True, filter=Q(yearly__year=YEAR)),
co2=Max('pollutions__amount', ...
However the performance is inacceptable.
I have tested to use model_to_dict and appending the wanted values "by hand" to the dict, which works for the values itself, but not for sorting the resulted dict (e.g. by energy) and it is acutally faster than the workaround directly above.
It conceptually strikes to me that the manual approach is faster than letting the database do, what it is intended to do.
Is this a feature limitation of django's orm or am I missing something?
EDIT1:
The behaviour is known as bug since 11 years.
Even others "spent a whole day on this".
I am now trying it with subqueries. However the forein key I am using is not a primary key of its table. So the kind of "usual" approach to use "pk=''" does not work. More clearly, trying:
tmp = Plants.objects.filter(somefilter)
subq1 = Subquery(Yearly.objects.filter(pk=OuterRef('plantid'), year=YEAR)) tmp1 = tmp.all().annotate(
energy=Count(Subquery(subq1))
)
returns
OperationalError at /xyz
no such column: U0.yid
Which definitely makes sense because Plants has no clue what a yid is, it only knows plantids. How do I adjust the subquery to that?

Using Multiple Annotations in Django

Here is my data base:
class User(models.Model):
Name = models.TextField(null=True, blank=True, default=None)
class Salary(models.Model):
value = models.PositiveIntegerField(default=1)
name = models.ForeignKey(User, related_name='salarys', on_delete=models.CASCADE)
class Expense(models.Model):
value = models.PositiveIntegerField(default=1)
name = models.ForeignKey(User, related_name='expenses', on_delete=models.CASCADE)
I want to add all the salary and expenses of a user.
queryset = User.objects.annotate(total_salary=Sum('salarys__value', distinct=True) ,total_expense=Sum('expenses__value', distinct=True))
Here is my data:
User table
id=1; name= ram
Salary table
id =1; value = 12000
id = 2; value = 8000
Expense table
id =1; value=5000
id=2; value = 3000
expected output : total_Salary = 20000; total_expense=8000 output
obtained : total_salary= 40000; total_expense = 16000
Every output is multiplied the number of times the data in another table. Can anyone help me through this
looks like it is not possible if you are trying to combine like aggregates. Only works for Count with distinct. For closure, please refer to docs: https://docs.djangoproject.com/en/4.0/topics/db/aggregation/#combining-multiple-aggregations
Since annotate returns a queryset instead of a dict(in case of aggregate), you can try chaining multiple like values().annotate("field1").annotate("field2"). Worked for me, though please check sql from your queryset to debug in case you cannot run your query as desired.
Can you try this ?
queryset = User.objects.all().annotate(total_salary=Sum('salarys__value') ,total_expense=Sum('expenses__value'))

How to return all records but exclude the last item

I'm having problem filtering in django-models.
I want to return all records of a particular animal but excluding the last item based on the latest created_at value and sorted in a descending order.
I have this model.
class Heat(models.Model):
# Fields
performer = models.CharField(max_length=25)
is_bred = models.BooleanField(default=False)
note = models.TextField(max_length=250, blank=True, null=True)
result = models.BooleanField(default=False)
# Relationship Fields
animal = models.ForeignKey(Animal, related_name='heats', on_delete=models.CASCADE)
created_at = models.DateTimeField(auto_now_add=True, editable=False)
last_updated = models.DateTimeField(auto_now=True, editable=False)
I was able to achieved the desired result by this raw sql script. But I want a django approach.
SELECT
*
FROM
heat
WHERE
heat.created_at != (SELECT MAX((heat.created_at)) FROM heat)
AND heat.animal_id = '2' ORDER BY heat.created_at DESC;
Please help.
It will be
Heat.objects.order_by("-created_at")[1:]
For a particular animal it will then be:
Heat.objects.filter(animal_id=2).order_by("-created_at")[1:]
where [1:] on a queryset has a regular python slice syntax and generates the correct SQL code. (In this case simply removes the first / most recently created element)
Upd: as #schwobaseggl mentioned, in the comments, slices with negative index don't work on django querysets. Therefore the objects are reverse ordered first.
I just converted your SQL query to Django ORM code.
First, fetch the max created_at value using aggregation and do an exclude.
from django.db.models import Max
heat_objects = Heat.objects.filter(
animal_id=2
).exclude(
created_at=Heat.objects.all().aggregate(Max('created_at'))['created_at__max']
)
Get last record:
obj= Heat.objects.all().order_by('-id')[0]
Make query:
query = Heat.objects.filter(animal_id=2).exclude(id=obj['id']).all()
The query would be :
Heat.objects.all().order_by('id')[1:]
You could also put any filter you require by replacing all()

django querset filter foreign key select first record

I have a History model like below
class History(models.Model):
class Meta:
app_label = 'subscription'
ordering = ['-start_datetime']
subscription = models.ForeignKey(Subscription, related_name='history')
FREE = 'free'
Premium = 'premium'
SUBSCRIPTION_TYPE_CHOICES = ((FREE, 'Free'), (Premium, 'Premium'),)
name = models.CharField(max_length=32, choices=SUBSCRIPTION_TYPE_CHOICES, default=FREE)
start_datetime = models.DateTimeField(db_index=True)
end_datetime = models.DateTimeField(db_index=True, blank=True, null=True)
cancelled_datetime = models.DateTimeField(blank=True, null=True)
Now i have a queryset filtering like below
users = get_user_model().objects.all()
queryset = users.exclude(subscription__history__end_datetime__lt=timezone.now())
The issue is that in the exclude above it is checking end_datetime for all the rows for a particular history object. But i only want to compare it with first row of history object.
Below is how a particular history object looks like. So i want to write a queryset filter which can do datetime comparison on first row only.
You could use a Model Manager method for this. The documentation isn't all that descriptive, but you could do something along the lines of:
class SubscriptionManager(models.Manager):
def my_filter(self):
# You'd want to make this a smaller query most likely
subscriptions = Subscription.objects.all()
results = []
for subscription in subscriptions:
sub_history = subscription.history_set.first()
if sub_history.end_datetime > timezone.now:
results.append(subscription)
return results
class History(models.Model):
subscription = models.ForeignKey(Subscription)
end_datetime = models.DateTimeField(db_index=True, blank=True, null=True)
objects = SubscriptionManager()
Then: queryset = Subscription.objects().my_filter()
Not a copy-pastable answer, but shows the use of Managers. Given the specificity of what you're looking for, I don't think there's a way to get it just via the plain filter() and exclude().
Without knowing what your end goal here is, it's hard to say whether this is feasible, but have you considered adding a property to the subscription model that indicates whatever you're looking for? For example, if you're trying to get everyone who has a subscription that's ending:
class Subscription(models.Model):
#property
def ending(self):
if self.end_datetime > timezone.now:
return True
else:
return False
Then in your code: queryset = users.filter(subscription_ending=True)
I have tried django's all king of expressions(aggregate, query, conditional) but was unable to solve the problem so i went with RawSQL and it solved the problem.
I have used the below SQL to select the first row and then compare the end_datetime
SELECT (end_datetime > %s OR end_datetime IS NULL) AS result
FROM subscription_history
ORDER BY start_datetime DESC
LIMIT 1;
I will select my answer as accepted if not found a solution with queryset filter chaining in next 2 days.