Annotate number of identical items in manager - django

Let's say I have following simplified model:
class CurrentInvoices(models.Manager):
def get_queryset(self):
qs = super(CurrentInvoices, self).get_queryset()
current_invoices = qs.order_by('person', '-created_on').distinct('person').values('pk')
return qs.annotate(invoice_count=models.Count('number')).filter(id__in=current_invoices).order_by('person__last_name')
class Invoice(models.Model):
created_on = models.DateField()
person = models.ForeignKey(Person)
total_amount = models.DecimalField()
number = models.PositiveSmallIntegerField()
objects = models.Manager()
current_invoices = CurrentEDCInvoices()
A Person can have an Invoice with the same number if for some reason the previously generated invoice was wrong. The latest one (highest created_on) is the one that counts.
The trick with .filter(id__in) in the manager is needed to get the results listed by persons last name; this cannot be removed.
Now I'd like to annotate the total count of number.
My try annotate(invoice_count=models.Count('number')) always returns 1 even though there are multiple.
What am I doing wrong? Any pointers on how to properly achieve this without hacking around too much and without hitting the DB for every invoice?

Seems your problem in distinct('person'), which removes duplicates by person field.
Update
To complete your task you should
current_invoices = qs.order_by('person', '-created_on').distinct('person').values('number')
return qs.annotate(invoice_count=models.Count('number')).filter(number__in=current_invoices).order_by('person__last_name')

Related

Django annotation on compoundish primary key with filter ignoring primary key resutling in too many annotated items

Please see EDIT1 below, as well.
Using Django 3.0.6 and python3.8, given following models
class Plants(models.Model):
plantid = models.TextField(primary_key=True, unique=True)
class Pollutions(models.Model):
pollutionsid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
pollutant = models.TextField()
releasesto = models.TextField(blank=True, null=True)
amount = models.FloatField(db_column="amount", blank=True, null=True)
class Meta:
managed = False
db_table = 'pollutions'
unique_together = (('plantid', 'releasesto', 'pollutant', 'year'))
class Monthp(models.Model):
monthpid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
month = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
power = models.IntegerField(null=False)
class Meta:
managed = False
db_table = 'monthp'
unique_together = ('plantid', 'year', 'month')
I'd like to annotate - based on a foreign key relationship and a fiter a value, particulary - to each plant the amount of co2 and the Sum of its power for a given year. For sake of debugging having replaced Sum by Count using the following query:
annotated = tmp.all().annotate(
energy=Count('monthp__power', filter=Q(monthp__year=YEAR)),
co2=Count('pollutions__amount', filter=Q(pollutions__year=YEAR, pollutions__pollutant="CO2", pollutions__releasesto="Air")))
However this returns too many items (a wrong number using Sum, respectively)
annotated.first().co2 # 60, but it should be 1
annotated.first().energy # 252, but it should be 1
although my database guarantees - as denoted, that (plantid, year, month) and (plantid, releasesto, pollutant, year) are unique together, which can easily be demonstrated:
pl = annotated.first().plantid
testplant = Plants.objects.get(pk=pl) # plant object
pco2 = Pollutions.objects.filter(plantid=testplant, year=YEAR, pollutant="CO2", releasesto="Air")
len(pco2) # 1, as expected
Why does django return to many results and how can I tell django to limit the elements to annotate to the 'current primary key' in other words to only annotate the elements where the foreign key matches the primary key?
I can achieve what I intend to do by using distinct and Max:
energy=Sum('yearly__power', distinct=True, filter=Q(yearly__year=YEAR)),
co2=Max('pollutions__amount', ...
However the performance is inacceptable.
I have tested to use model_to_dict and appending the wanted values "by hand" to the dict, which works for the values itself, but not for sorting the resulted dict (e.g. by energy) and it is acutally faster than the workaround directly above.
It conceptually strikes to me that the manual approach is faster than letting the database do, what it is intended to do.
Is this a feature limitation of django's orm or am I missing something?
EDIT1:
The behaviour is known as bug since 11 years.
Even others "spent a whole day on this".
I am now trying it with subqueries. However the forein key I am using is not a primary key of its table. So the kind of "usual" approach to use "pk=''" does not work. More clearly, trying:
tmp = Plants.objects.filter(somefilter)
subq1 = Subquery(Yearly.objects.filter(pk=OuterRef('plantid'), year=YEAR)) tmp1 = tmp.all().annotate(
energy=Count(Subquery(subq1))
)
returns
OperationalError at /xyz
no such column: U0.yid
Which definitely makes sense because Plants has no clue what a yid is, it only knows plantids. How do I adjust the subquery to that?

Django-orm Queryset for Find object by count a particular field

Let's say I have two models:
class Testmodel1():
amount = models.IntegerField(null=True)
contact = models.ForeignKey(Testmodel2)
entry_time = models.DateTimeField()
stage = choicesfiled
class Testmodel2():
name = models.CharField()
mobile_no = models.CharField()
I want to find out the object of Testmodel1 for contact > 3 which is created in the last 24 hours last = arrow.utcnow().shift(hours=-24).date().
I am applying a query:
n1=Testmodel1.objects.filter(entry_time__gte=last, stage=1).annotate(t_count=Count('contact')).filter(t_count__gt=3)
But it seems it's not working. Because I am getting an empty queryset.
Any help would be appreciated.
Only a partial answer. Sorry! Your code looks fine to me, so I'm just trying to find a solution by approaching it from a different direction.
Here's how I structure (sort of) similar code on one of my projects.
from datetime import timedelta, date
....
base_date = date.today()
start_date = base_date + timedelta(days=30)
end_date = base_date
possible_holidays = Holiday.objects.filter(
start_date__lte=start_date, end_date__gte=end_date)
From there, could you just do something like:
if possible_holidays.contact_set.count() > 3:
pass
Does that work?
The problem is your Many-to-One relationship is inverted. This relationship is a parent-child relationship, where a parent can have multiple children, but a children can only have one parent. In database this relationship is stored as a child's ForeignKey field that points to the child's parent.
In your case Testmodel1 is a parent and Testmodel2 is a child (Testmodel1 can have multiple contacts represented by Testmodel2) This means that ForeignKey field should belong to Testmodel2, not Testmodel1.
class Testmodel1():
amount = models.IntegerField(null=True)
entry_time = models.DateTimeField()
stage = choicesfiled
class Testmodel2():
name = models.CharField()
mobile_no = models.ForeignKey()
parent = models.ForeignKey(Testmodel1,
related_name='contacts',
)
With this model structure you can reference Testmodel1's contacts as testmodel1.contacts.all(). Your query then should look like this:
n1 = (Testmodel1.objects
.filter(entry_time__gte=last, stage=1)
.annotate(t_count=Count('contacts'))
.filter(t_count__gt=3)
)
docs reference

Django, should I be paranoid to use the ORM's `first()` method for certain scenarios?

I have the following models:
class Merchant(Model):
name = CharField()
class OrderItem(Model):
merchant = ForeignKey(Merchant)
freight_item = ForeignKey(FreightItem)
class FreightItem(Model):
amount = DecimalField()
Let's say it's a business rule that all order items that points to the freight item belong to the same merchant, in other words a merchant will only generate one freight item.
In my existing FreightItem class, to get the merchant, I am using a property to return the first order item to access the merchant, like this:
class FreightItem(Model):
amount = DecimalField()
#property
def merchant(self):
return self.orderitems_set.first().merchant
When I read this code, something in my mind that it's yelling me it's not right. But the other option is to add merchant field into the FreightItem model:
class FreightItem(Model):
merchant = ForeignKey(Merchant)
amount = DecimalField()
But this seems redundant(denormalized) in the table.
Which way would you guys prefer?
The .first() call is going to fail (None will be returned) if you have a FreightItem that has never been part of an OrderItem.
Why not do something like:
class OrderItem(Model):
freight_item = ForeignKey(FreightItem)
#property
def merchant(self):
return self.freight_item.merchant
class FreightItem(Model):
amount = DecimalField()
merchant = ForeignKey(Merchant)

Count number of distinct records by date in Django

following this question:
Count number of records by date in Django
class Review(models.Model):
venue = models.ForeignKey(Venue, db_index=True)
review = models.TextField()
datetime_visited = models.DateTimeField(default=datetime.now)
It is true that the following line solves the problem of count number of records by date:
Review.objects.filter
.extra({'date_visited' : "date(datetime_visisted)"})
.values('date_visited')
.annotate(visited_count=Count('id'))
However, say I would like to have a distinct count, that is, I would like to avoid Review objects from the same id on the same day, what can I do?
I tried:
Review.objects.filter.
.extra({'date_visited': "date(datetime_visited)"})
.values('date_visited', 'id')
.distinct()
.annotate(Count('id'))
but it seems not working
Your problem is that you're including id in your values(), which is making all records unique, defeating distinct(). Try this instead:
Review.objects.filter.
.extra({'date_visited': "date(datetime_visited)"})
.values('date_visited')
.distinct()
.annotate(Count('date_visited'))

Sorting products after dateinterval and weight

What I want is to be able to get this weeks/this months/this years etc. hotest products. So I have a model named ProductStatistics that will log each hit and each purchase on a day-to-day basis. This is the models I have got to work with:
class Product(models.Model):
name = models.CharField(_("Name"), max_length=200)
slug = models.SlugField()
description = models.TextField(_("Description"))
picture = models.ImageField(upload_to=product_upload_path, blank=True)
category = models.ForeignKey(ProductCategory)
prices = models.ManyToManyField(Store, through='Pricing')
objects = ProductManager()
class Meta:
ordering = ('name', )
def __unicode__(self):
return self.name
class ProductStatistic(models.Model):
# There is only 1 `date` each day. `date` is
# set by datetime.today().date()
date = models.DateTimeField(default=datetime.now)
hits = models.PositiveIntegerField(default=0)
purchases = models.PositiveIntegerField(default=0)
product = models.ForeignKey(Product)
class Meta:
ordering = ('product', 'date', 'purchases', 'hits', )
def __unicode__(self):
return u'%s: %s - %s hits, %s purchases' % (self.product.name, str(self.date).split(' ')[0], self.hits, self.purchases)
How would you go about sorting the Products after say (hits+(purchases*2)) the latest week?
This structure isn't set in stone either, so if you would structure the models in any other way, please tell!
first idea:
in the view you could query for today's ProductStatistic, than loop over the the queryset and add a variable ranking to every object and add that object to a list. Then just sort after ranking and pass the list to ur template.
second idea:
create a filed ranking (hidden for admin) and write the solution of ur formula each time the object is saved to the database by using a pre_save-signal. Now you can do ProductStatistic.objects.filter(date=today()).order_by('ranking')
Both ideas have pros&cons, but I like second idea more
edit as response to the comment
Use Idea 2
Write a view, where you filter like this: ProductStatistic.objects.filter(product= aProductObject, date__gte=startdate, date__lte=enddate)
loop over the queryset and do somthing like aProductObject.ranking+= qs_obj.ranking
pass a sorted list of the queryset to the template
Basically a combination of both ideas
edit to your own answer
Your solution isn't far away from what I suggested — but in sql-space.
But another solution:
Make a Hit-Model:
class Hit(models.Model):
date = models.DateTimeFiles(auto_now=True)
product = models.ForeignKey(Product)
purchased= models.BooleanField(default=False)
session = models.CharField(max_length=40)
in your view for displaying a product you check, if there is a Hit-object with the session, and object. if not, you save it
Hit(product=product,
date=datetime.datetime.now(),
session=request.session.session_key).save()
in your purchase view you get the Hit-object and set purchased=True
Now in your templates/DB-Tools you can do real statistics.
Of course it can generate a lot of DB-Objects over the time, so you should think about a good deletion-strategy (like sum the data after 3 month into another model MonthlyHitArchive)
If you think, that displaying this statistics would generate to much DB-Traffic, you should consider using some caching.
I solved this the way I didn't want to solve it. I added week_rank, month_rank and overall_rank to Product and then I just added the following to my ProductStatistic model.
def calculate_rank(self, days_ago=7, overall=False):
if overall:
return self._default_manager.all().extra(
select = {'rank': 'SUM(hits + (clicks * 2))'}
).values()[0]['rank']
else:
return self._default_manager.filter(
date__gte = datetime.today()-timedelta(days_ago),
date__lte = datetime.today()
).extra(
select = {'rank': 'SUM(hits + (clicks * 2))'}
).values()[0]['rank']
def save(self, *args, **kwargs):
super(ProductStatistic, self).save(*args, **kwargs)
t = Product.objects.get(pk=self.product.id)
t.week_rank = self.calculate_rank()
t.month_rank = self.calculate_rank(30)
t.overall_rank = self.calculate_rank(overall=True)
t.save()
I'll leave it unsolved if there is a better solution.