Bin a queryset using Django? - django

Let's say we have the following simplistic models:
class Category(models.Model):
name = models.CharField(max_length=264)
def __str__(self):
return self.name
class Meta:
verbose_name_plural = "categories"
class Status(models.Model):
name = models.CharField(max_length=264)
def __str__(self):
return self.name
class Meta:
verbose_name_plural = "status"
class Product(models.Model):
title = models.CharField(max_length=264)
description = models.CharField(max_length=264)
category = models.ForeignKey(Category, on_delete=models.CASCADE)
price = models.DecimalField(max_digits=10)
status = models.ForeignKey(Status, on_delete=models.CASCADE)
My aim is to get some statistics, like total products, total sales, average sales etc, based on which price bin each product belongs to.
So, the price bins could be something like 0-100, 100-500, 500-1000, etc.
I know how to use pandas to do something like that:
Binning column with python pandas
I am searching for a way to do this with the Django ORM.
One of my thoughts is to convert the queryset into a list and apply a function to get the apropriate price bin and then do the statistics.
Another thought which I am not sure how to impliment, is the same as the one above but just apply the bin function to the field in the queryset I am interested in.

There are three pathways I can see.
First is composing the SQL you want to use directly and putting it to your database with a modification of your models manager class. .objects.raw("[sql goes here]"). This answer shows how to define group with a simple function on the content - something like that could work?
SELECT FLOOR(grade/5.00)*5 As Grade,
COUNT(*) AS [Grade Count]
FROM TableName
GROUP BY FLOOR(Grade/5.00)*5
ORDER BY 1
Second is that there is no reason you can't move the queryset (with .values() or .values_list()) into a pandas dataframe or similar and then bin it, as you mentioned. There is probably a bit of an efficiency loss in terms of getting the queryset into a dataframe and then processing it, but I am not sure that it would certainly or always be bad. If its easier to compose and maintain, that might be fine.
The third way I would try (which I think is what you really want) is chaining .annotate() to label points with the bin they belong in, and the aggregate count function to count how many are in each bin. This is more advanced ORM work than I've done, but I think you'd start looking at something like the docs section on conditional aggregation. I've adapted this slightly to create the 'price_class' column first, with annotate.
Product.objects.annotate(price_class=floor(F('price')/100).aggregate(
class_zero=Count('pk', filter=Q(price_class=0)),
class_one=Count('pk', filter=Q(price_class=1)),
class_two=Count('pk', filter=Q(price_class=2)), # etc etc
)
I'm not sure if that 'floor' is going to work, and you may need 'expression wrapper' to ensure the push price_class into the write type of output_field. All the best.

Related

Order Objects using an associated model and timedelta

I have a puzzle on my hands. As an exercise, I am trying to write a queryset that helps me visualize which of my professional contacts I should prioritize corresponding with.
To this end I have a couple of models:
class Person(models.Model):
name = models.CharField(max_length=256)
email = models.EmailField(blank=True, null=True)
target_contact_interval = models.IntegerField(default=45)
class ContactInstance(models.Model):
person = models.ForeignKey(Person, on_delete=models.CASCADE, related_name='contacts')
date = models.DateField()
notes = models.TextField(blank=True, null=True)
The column target_contact_interval on the Person model generally specifies the maximum amount of days that should pass before I reach out to this person again.
A ContactInstance reflects a single point of contact with a Person. A Person can have a reverse relationship with many ContactInstance objects.
So, the first Person in the queryset should own the greatest difference between the date of the most recent ContactInstance related to it and its own target_contact_interval
So my dream function would look something like:
Person.objects.order_by(contact__latest__date__day - timedelta(days=F(target_contact_interval))
but of course that won't work for a variety of reasons.
I'm sure someone could write up some raw PostgreSQL for this, but I am really curious to know if there is a way to accomplish it using only the Django ORM.
Here are the pieces I've found so far, but I'm having trouble putting them together.
I might be able to use a Subquery to annotate the date of the most recent datapoint:
from django.db.models import OuterRef, Subquery
latest = ContactInstance.objects.filter(person=OuterRef('pk')).order_by('-date')
Person.objects.annotate(latest_contact_date=Subquery(latest.values('date')[:1]))
And I like the idea of sorting the null values at the end:
from django.db.models import F
Person.objects.order_by(F('last_contacted').desc(nulls_last=True))
But I don't know where to go from here. I've been trying to put everything into order_by(), but I can't discern if it is possible to use F() with annotated values or with timedelta in my case.
UPDATE:
I have changed the target_contact_interval model to a DurationField as suggested. Here is the query I am attempting to use:
ci = ContactInstance.objects.filter(
person=OuterRef('pk')
).order_by('-date')
Person.objects.annotate(
latest_contact_date=Subquery(ci.values('date'[:1])
).order_by((
(datetime.today().date() - F('latest_contact_date')) -
F('target_contact_interval')
).desc(nulls_last=True))
It seems to me that this should work, however, the queryset is still not ordering correctly.

How to sort by the sum of a related field in Django using class based views?

If I have a model of an Agent that looks like this:
class Agent(models.Model):
name = models.CharField(max_length=100)
and a related model that looks like this:
class Deal(models.Model):
agent = models.ForeignKey(Agent, on_delete=models.CASCADE)
price = models.IntegerField()
and a view that looked like this:
from django.views.generic import ListView
class AgentListView(ListView):
model = Agent
I know that I can adjust the sort order of the agents in the queryset and I even know how to sort the agents by the number of deals they have like so:
queryset = Agent.objects.all().annotate(uc_count=Count('deal')).order_by('-uc_count')
However, I cannot figure out how to sort the deals by the sum of the price of the deals for each agent.
Given you already know how to annotate and sort by those annotations, you're 90% of the way there. You just need to use the Sum aggregate and follow the relationship backwards.
The Django docs give this example:
Author.objects.annotate(total_pages=Sum('book__pages'))
You should be able to do something similar:
queryset = Agent.objects.all().annotate(deal_total=Sum('deal__price')).order_by('-deal_total')
My spidy sense is telling me you may need to add a distinct=True to the Sum aggregation, but I'm not sure without testing.
Building off of the answer that Greg Kaleka and the question you asked under his response, this is likely the solution you are looking for:
from django.db.models import Case, IntegerField, When
queryset = Agent.objects.all().annotate(
deal_total=Sum('deal__price'),
o=Case(
When(deal_total__isnull=True, then=0),
default=1,
output_field=IntegerField()
)
).order_by('-o', '-deal_total')
Explanation:
What's happening is that the deal_total field is adding up the price of the deals object but if the Agent has no deals to begin with, the sum of the prices is None. The When object is able to assign a value of 0 to the deal_totals that would have otherwise been given the value of None

Proper way to annotate a rank field for a queryset

Assume models like this:
class Person(models.Model):
name = models.CharField(max_length=20)
class Session(models.Model):
start_time = models.TimeField(auto_now_add=True)
end_time = models.TimeField(blank=True, null=True)
person = models.ForeignKey(Person)
class GameSession(models.Model):
game_type = models.CharField(max_length=2)
score = models.PositiveIntegerField(default=0, blank=True)
session = models.ForeignKey(Session)
I want to have a queryset function to return total score of each person which is addition of all his games score and all times he has spent in all his sessions alongside with a rank that a person has relative to all persons. Something like below:
class DenseRank(Func):
function = 'DENSE_RANK'
template = '%(function)s() Over(Order by %(expressions)s desc)'
class PersonQuerySet(models.query.QuerySet):
def total_scores(self):
return self.annotate(total_score=some_fcn_for_calculate).annotate(rank=DenseRank('total_score'))
I could find a way to calculate total score, but dense rank is not what I want, because it just calculates rank based on persons in current queryset but I want to calculate rank of a person relative to all persons.
I use django 1.11 and postgres 10.5, please suggest me a proper way to find rank of each person in a queryset because I want to able to add another filter before or after calculating total_score and rank.
Sadly, it is not a possible operation since (to me) the postgresql WHERE operation (filter/exclude) narrows the rows before the aggregation functions can work on them.
The only solution I found is to simply compute the ranking for all Person with a separate queryset and then, to annotate your queryset with these results.
This answer (see the improved method) explains how to "annotate a queryset with externally prepared data in a dict".
Here is the implementation I made for your models:
class PersonQuerySet(models.QuerySet):
def total_scores(self):
# compute the global ranking
ranks = (Person.objects
.annotate(total_score=models.Sum('session__gamesession__score'))
.annotate(rank=models.Window(expression=DenseRank(),
order_by=models.F('total_score').decs()))
.values('pk', 'rank'))
# extract and put ranks in a dict
rank_dict = dict((e['pk'], e['rank']) for e in ranks)
# create `WHEN` conditions for mapping filtered Persons to their Rank
whens = [models.When(pk=pk, then=rank) for pk, rank in rank_dict.items()]
# build the query
return (self.annotate(rank=models.Case(*whens, default=0,
output_field=models.IntegerField()))
.annotate(total_score=models.Sum('session__gamesession__score')))
I tested it with Django 2.1.3 and Postgresql 10.5, so the code may lightly change for you.
Feel free to share a version compatible with Django 1.11!

Django: Getting last value by date from related model in admin. Best way?

Say I have a model like
class Product(models.Model):
name = models.CharField(max_length=64)
[...]
and another like
class Price(models.Model):
product = models.ForeignKey('Product')
date = models.DateField()
value = models.FloatField()
[...]
and I want to display products in a modelAdmin, with the last price registered in a column.
So far, the only way I have found is to add the following method to the product object:
#property
def last_price(self):
price = Price.objects.filter(product=self.pk).order_by("-date")
return price[0].value if price else None
and then adding last_price to the list_display in Product's ModelAdmin. I don't like this approach because it ends up doing a query for each row displayed.
Is there a better way to do this in code?. Or do I have to create a column in the Product table and cache the last price there?.
Thanks!
to reduce the the queries for each entry use the following:
Price.objects.filter(product=self.pk).order_by("-date").select_related('product')
this will decrease the product query at each object, hope it is helpful, vote up please
A cleaner version of what you have would be:
def last_price(self):
latest_price = self.price_set.latest('date')
return latest_price.value if latest_price else None
But this still involves queries for each item.
You if you want to avoid this I would suggest adding a latest_price column to Product. Then you could set up a post_save signal for Price that then updates the related Product latest_price (This could be a ForiegnKey or the value itself.)
Update
Here is a receiver that would update the products latest price value when you save a Price. Obviously this assumes that you are saving Price models in chronological order so the lastest one saved is the latest_value
#receiver(post_save, sender=Price)
def update_product_latest_value(sender, instance, created, **kwargs):
if created:
instance.product.latest_value = instance.value
instance.product.save()

Django model aggregation

I have a simple hierarchic model whit a Person and RunningScore as child.
this model store data about running score of many user, simplified something like:
class Person(models.Model):
firstName = models.CharField(max_length=200)
lastName = models.CharField(max_length=200)
class RunningScore(models.Model):
person = models.ForeignKey('Person', related_name="scores")
time = models.DecimalField(max_digits=6, decimal_places=2)
If I get a single Person it cames with all RunningScores associated to it, and this is standard behavior. My question is really simple: if I'd like to get a Person with only a RunningScore child (suppose the better result, aka min(time) ) how can I do?
I read the official Django documentation but have not found a
solution.
I am not 100% sure if I get what you mean, but maybe this will help:
from django.db.models import Min
Person.objects.annotate(min_running_time=Min('time'))
The queryset will fetch Person objects with min_running_time additional attribute.
You can also add a filter:
Person.objects.annotate(min_running_time=Min('time')).filter(firstName__startswith='foo')
Accessing the first object's min_running_time attribute:
first_person = Person.objects.annotate(min_running_score=Min('time'))[0]
print first_person.min_running_time
EDIT:
You can define a method or a property such as the following one to get the related object:
class Person(models.Model):
...
#property
def best_runner(self):
try:
return self.runningscore_set.order_by('time')[0]
except IndexError:
return None
If you want one RunningScore for only one Person you could use odering and limit your queryset to 1 object.
Something like this:
Person.runningscore_set.order_by('-time')[0]
Here is the doc on limiting querysets:
https://docs.djangoproject.com/en/1.3/topics/db/queries/#limiting-querysets