Proper way to annotate a rank field for a queryset - django

Assume models like this:
class Person(models.Model):
name = models.CharField(max_length=20)
class Session(models.Model):
start_time = models.TimeField(auto_now_add=True)
end_time = models.TimeField(blank=True, null=True)
person = models.ForeignKey(Person)
class GameSession(models.Model):
game_type = models.CharField(max_length=2)
score = models.PositiveIntegerField(default=0, blank=True)
session = models.ForeignKey(Session)
I want to have a queryset function to return total score of each person which is addition of all his games score and all times he has spent in all his sessions alongside with a rank that a person has relative to all persons. Something like below:
class DenseRank(Func):
function = 'DENSE_RANK'
template = '%(function)s() Over(Order by %(expressions)s desc)'
class PersonQuerySet(models.query.QuerySet):
def total_scores(self):
return self.annotate(total_score=some_fcn_for_calculate).annotate(rank=DenseRank('total_score'))
I could find a way to calculate total score, but dense rank is not what I want, because it just calculates rank based on persons in current queryset but I want to calculate rank of a person relative to all persons.
I use django 1.11 and postgres 10.5, please suggest me a proper way to find rank of each person in a queryset because I want to able to add another filter before or after calculating total_score and rank.

Sadly, it is not a possible operation since (to me) the postgresql WHERE operation (filter/exclude) narrows the rows before the aggregation functions can work on them.
The only solution I found is to simply compute the ranking for all Person with a separate queryset and then, to annotate your queryset with these results.
This answer (see the improved method) explains how to "annotate a queryset with externally prepared data in a dict".
Here is the implementation I made for your models:
class PersonQuerySet(models.QuerySet):
def total_scores(self):
# compute the global ranking
ranks = (Person.objects
.annotate(total_score=models.Sum('session__gamesession__score'))
.annotate(rank=models.Window(expression=DenseRank(),
order_by=models.F('total_score').decs()))
.values('pk', 'rank'))
# extract and put ranks in a dict
rank_dict = dict((e['pk'], e['rank']) for e in ranks)
# create `WHEN` conditions for mapping filtered Persons to their Rank
whens = [models.When(pk=pk, then=rank) for pk, rank in rank_dict.items()]
# build the query
return (self.annotate(rank=models.Case(*whens, default=0,
output_field=models.IntegerField()))
.annotate(total_score=models.Sum('session__gamesession__score')))
I tested it with Django 2.1.3 and Postgresql 10.5, so the code may lightly change for you.
Feel free to share a version compatible with Django 1.11!

Related

Cyclical operations in Django Rest Framework

In my frontend application I have a panel which shows 6 the most popular products on my site. Searching for the most popular products each request by the number of views can be costly. I think that a good way to improve it will be to create a table in my data base which will store 6 the most popular products and the table will be refreshed for example every one minute. What should I looking for to do so cyclical operation on my django backend?
You can create separate model
models.py
class ProductRecord(models.Model):
product = models.OneToOneField(
'Product'
related_name='stats', on_delete=models.CASCADE)
# Data used for generating a score
num_views = models.PositiveIntegerField(default=0)
num_basket_additions = models.PositiveIntegerField(
default=0)
num_purchases = models.PositiveIntegerField(
default=0)
# Product score - used within search
score = models.FloatField(default=0.00)
def __str__(self):
return _("Record for '%s'") % self.product
rule_engine.py
class Calculator:
weights = {
'num_views': 1,
'num_basket_additions': 3,
'num_purchases': 5
}
def calculate_scores(self):
total_weight = float(sum(self.weights.values()))
weighted_fields = [
self.weights[name] * F(name) for name in self.weights.keys()]
ProductRecord.objects.update(
score=sum(weighted_fields) / total_weight)
receivers.py
Here you'll have multiple signals to receive addition to basket, views, favoriting, and so on..
This is an example
def receive_product_view(sender, product, user, **kwargs):
UserProductView.objects.create(product=product, user=user)
## yyou can do much advanced staff here using signals
Another solution
You can use Redis to save such these data like counts and purchases
Or make aws lambda function for receiving analytical data, all of these solutions depend on your case
This is a pseudo solution. hope it's been beneficial to you

Django annotation on compoundish primary key with filter ignoring primary key resutling in too many annotated items

Please see EDIT1 below, as well.
Using Django 3.0.6 and python3.8, given following models
class Plants(models.Model):
plantid = models.TextField(primary_key=True, unique=True)
class Pollutions(models.Model):
pollutionsid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
pollutant = models.TextField()
releasesto = models.TextField(blank=True, null=True)
amount = models.FloatField(db_column="amount", blank=True, null=True)
class Meta:
managed = False
db_table = 'pollutions'
unique_together = (('plantid', 'releasesto', 'pollutant', 'year'))
class Monthp(models.Model):
monthpid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
month = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
power = models.IntegerField(null=False)
class Meta:
managed = False
db_table = 'monthp'
unique_together = ('plantid', 'year', 'month')
I'd like to annotate - based on a foreign key relationship and a fiter a value, particulary - to each plant the amount of co2 and the Sum of its power for a given year. For sake of debugging having replaced Sum by Count using the following query:
annotated = tmp.all().annotate(
energy=Count('monthp__power', filter=Q(monthp__year=YEAR)),
co2=Count('pollutions__amount', filter=Q(pollutions__year=YEAR, pollutions__pollutant="CO2", pollutions__releasesto="Air")))
However this returns too many items (a wrong number using Sum, respectively)
annotated.first().co2 # 60, but it should be 1
annotated.first().energy # 252, but it should be 1
although my database guarantees - as denoted, that (plantid, year, month) and (plantid, releasesto, pollutant, year) are unique together, which can easily be demonstrated:
pl = annotated.first().plantid
testplant = Plants.objects.get(pk=pl) # plant object
pco2 = Pollutions.objects.filter(plantid=testplant, year=YEAR, pollutant="CO2", releasesto="Air")
len(pco2) # 1, as expected
Why does django return to many results and how can I tell django to limit the elements to annotate to the 'current primary key' in other words to only annotate the elements where the foreign key matches the primary key?
I can achieve what I intend to do by using distinct and Max:
energy=Sum('yearly__power', distinct=True, filter=Q(yearly__year=YEAR)),
co2=Max('pollutions__amount', ...
However the performance is inacceptable.
I have tested to use model_to_dict and appending the wanted values "by hand" to the dict, which works for the values itself, but not for sorting the resulted dict (e.g. by energy) and it is acutally faster than the workaround directly above.
It conceptually strikes to me that the manual approach is faster than letting the database do, what it is intended to do.
Is this a feature limitation of django's orm or am I missing something?
EDIT1:
The behaviour is known as bug since 11 years.
Even others "spent a whole day on this".
I am now trying it with subqueries. However the forein key I am using is not a primary key of its table. So the kind of "usual" approach to use "pk=''" does not work. More clearly, trying:
tmp = Plants.objects.filter(somefilter)
subq1 = Subquery(Yearly.objects.filter(pk=OuterRef('plantid'), year=YEAR)) tmp1 = tmp.all().annotate(
energy=Count(Subquery(subq1))
)
returns
OperationalError at /xyz
no such column: U0.yid
Which definitely makes sense because Plants has no clue what a yid is, it only knows plantids. How do I adjust the subquery to that?

Bin a queryset using Django?

Let's say we have the following simplistic models:
class Category(models.Model):
name = models.CharField(max_length=264)
def __str__(self):
return self.name
class Meta:
verbose_name_plural = "categories"
class Status(models.Model):
name = models.CharField(max_length=264)
def __str__(self):
return self.name
class Meta:
verbose_name_plural = "status"
class Product(models.Model):
title = models.CharField(max_length=264)
description = models.CharField(max_length=264)
category = models.ForeignKey(Category, on_delete=models.CASCADE)
price = models.DecimalField(max_digits=10)
status = models.ForeignKey(Status, on_delete=models.CASCADE)
My aim is to get some statistics, like total products, total sales, average sales etc, based on which price bin each product belongs to.
So, the price bins could be something like 0-100, 100-500, 500-1000, etc.
I know how to use pandas to do something like that:
Binning column with python pandas
I am searching for a way to do this with the Django ORM.
One of my thoughts is to convert the queryset into a list and apply a function to get the apropriate price bin and then do the statistics.
Another thought which I am not sure how to impliment, is the same as the one above but just apply the bin function to the field in the queryset I am interested in.
There are three pathways I can see.
First is composing the SQL you want to use directly and putting it to your database with a modification of your models manager class. .objects.raw("[sql goes here]"). This answer shows how to define group with a simple function on the content - something like that could work?
SELECT FLOOR(grade/5.00)*5 As Grade,
COUNT(*) AS [Grade Count]
FROM TableName
GROUP BY FLOOR(Grade/5.00)*5
ORDER BY 1
Second is that there is no reason you can't move the queryset (with .values() or .values_list()) into a pandas dataframe or similar and then bin it, as you mentioned. There is probably a bit of an efficiency loss in terms of getting the queryset into a dataframe and then processing it, but I am not sure that it would certainly or always be bad. If its easier to compose and maintain, that might be fine.
The third way I would try (which I think is what you really want) is chaining .annotate() to label points with the bin they belong in, and the aggregate count function to count how many are in each bin. This is more advanced ORM work than I've done, but I think you'd start looking at something like the docs section on conditional aggregation. I've adapted this slightly to create the 'price_class' column first, with annotate.
Product.objects.annotate(price_class=floor(F('price')/100).aggregate(
class_zero=Count('pk', filter=Q(price_class=0)),
class_one=Count('pk', filter=Q(price_class=1)),
class_two=Count('pk', filter=Q(price_class=2)), # etc etc
)
I'm not sure if that 'floor' is going to work, and you may need 'expression wrapper' to ensure the push price_class into the write type of output_field. All the best.

Django: Getting last value by date from related model in admin. Best way?

Say I have a model like
class Product(models.Model):
name = models.CharField(max_length=64)
[...]
and another like
class Price(models.Model):
product = models.ForeignKey('Product')
date = models.DateField()
value = models.FloatField()
[...]
and I want to display products in a modelAdmin, with the last price registered in a column.
So far, the only way I have found is to add the following method to the product object:
#property
def last_price(self):
price = Price.objects.filter(product=self.pk).order_by("-date")
return price[0].value if price else None
and then adding last_price to the list_display in Product's ModelAdmin. I don't like this approach because it ends up doing a query for each row displayed.
Is there a better way to do this in code?. Or do I have to create a column in the Product table and cache the last price there?.
Thanks!
to reduce the the queries for each entry use the following:
Price.objects.filter(product=self.pk).order_by("-date").select_related('product')
this will decrease the product query at each object, hope it is helpful, vote up please
A cleaner version of what you have would be:
def last_price(self):
latest_price = self.price_set.latest('date')
return latest_price.value if latest_price else None
But this still involves queries for each item.
You if you want to avoid this I would suggest adding a latest_price column to Product. Then you could set up a post_save signal for Price that then updates the related Product latest_price (This could be a ForiegnKey or the value itself.)
Update
Here is a receiver that would update the products latest price value when you save a Price. Obviously this assumes that you are saving Price models in chronological order so the lastest one saved is the latest_value
#receiver(post_save, sender=Price)
def update_product_latest_value(sender, instance, created, **kwargs):
if created:
instance.product.latest_value = instance.value
instance.product.save()

Sorting products after dateinterval and weight

What I want is to be able to get this weeks/this months/this years etc. hotest products. So I have a model named ProductStatistics that will log each hit and each purchase on a day-to-day basis. This is the models I have got to work with:
class Product(models.Model):
name = models.CharField(_("Name"), max_length=200)
slug = models.SlugField()
description = models.TextField(_("Description"))
picture = models.ImageField(upload_to=product_upload_path, blank=True)
category = models.ForeignKey(ProductCategory)
prices = models.ManyToManyField(Store, through='Pricing')
objects = ProductManager()
class Meta:
ordering = ('name', )
def __unicode__(self):
return self.name
class ProductStatistic(models.Model):
# There is only 1 `date` each day. `date` is
# set by datetime.today().date()
date = models.DateTimeField(default=datetime.now)
hits = models.PositiveIntegerField(default=0)
purchases = models.PositiveIntegerField(default=0)
product = models.ForeignKey(Product)
class Meta:
ordering = ('product', 'date', 'purchases', 'hits', )
def __unicode__(self):
return u'%s: %s - %s hits, %s purchases' % (self.product.name, str(self.date).split(' ')[0], self.hits, self.purchases)
How would you go about sorting the Products after say (hits+(purchases*2)) the latest week?
This structure isn't set in stone either, so if you would structure the models in any other way, please tell!
first idea:
in the view you could query for today's ProductStatistic, than loop over the the queryset and add a variable ranking to every object and add that object to a list. Then just sort after ranking and pass the list to ur template.
second idea:
create a filed ranking (hidden for admin) and write the solution of ur formula each time the object is saved to the database by using a pre_save-signal. Now you can do ProductStatistic.objects.filter(date=today()).order_by('ranking')
Both ideas have pros&cons, but I like second idea more
edit as response to the comment
Use Idea 2
Write a view, where you filter like this: ProductStatistic.objects.filter(product= aProductObject, date__gte=startdate, date__lte=enddate)
loop over the queryset and do somthing like aProductObject.ranking+= qs_obj.ranking
pass a sorted list of the queryset to the template
Basically a combination of both ideas
edit to your own answer
Your solution isn't far away from what I suggested — but in sql-space.
But another solution:
Make a Hit-Model:
class Hit(models.Model):
date = models.DateTimeFiles(auto_now=True)
product = models.ForeignKey(Product)
purchased= models.BooleanField(default=False)
session = models.CharField(max_length=40)
in your view for displaying a product you check, if there is a Hit-object with the session, and object. if not, you save it
Hit(product=product,
date=datetime.datetime.now(),
session=request.session.session_key).save()
in your purchase view you get the Hit-object and set purchased=True
Now in your templates/DB-Tools you can do real statistics.
Of course it can generate a lot of DB-Objects over the time, so you should think about a good deletion-strategy (like sum the data after 3 month into another model MonthlyHitArchive)
If you think, that displaying this statistics would generate to much DB-Traffic, you should consider using some caching.
I solved this the way I didn't want to solve it. I added week_rank, month_rank and overall_rank to Product and then I just added the following to my ProductStatistic model.
def calculate_rank(self, days_ago=7, overall=False):
if overall:
return self._default_manager.all().extra(
select = {'rank': 'SUM(hits + (clicks * 2))'}
).values()[0]['rank']
else:
return self._default_manager.filter(
date__gte = datetime.today()-timedelta(days_ago),
date__lte = datetime.today()
).extra(
select = {'rank': 'SUM(hits + (clicks * 2))'}
).values()[0]['rank']
def save(self, *args, **kwargs):
super(ProductStatistic, self).save(*args, **kwargs)
t = Product.objects.get(pk=self.product.id)
t.week_rank = self.calculate_rank()
t.month_rank = self.calculate_rank(30)
t.overall_rank = self.calculate_rank(overall=True)
t.save()
I'll leave it unsolved if there is a better solution.