Django Aggregate with several models

Django Aggregate with several models - django

I have these models :
class Package(models.Model):
title = CharField(...)
class Item(models.Model)
package = ForeignKey(Package)
price = FloatField(...)
class UserItem(models.Model)
user = ForeignKey(User)
item = ForeignKey(Item)
purchased = BooleanField()
I am trying to achieve 2 functionality with the best performance possible :
In my templete I would like to calculate each package price sum of all its items. (Aggregate I assume ?)
More complicated : I wish that for each user I can sum up the price of all item purchased. so the purchased = True.
Assume I have 10 items in one package which each of them cost 10$ the package sum should be 100$. assume the user purchase 5 items the second sum should be 50$.
I can easily do simple queries with templetetags but I believe it can be done better ? (Hopefully)

To total the price for a specific package a_package you can use this code
Item.objects.filter(package=a_package).aggregate(Sum('price'))
There is a a guide on how to do these kind of queries, and the aggregate documentation with all the different functions described.
This kind of query can also solve your second problem.
UserItem.objects.filter(user=a_user).filter(purchased=True).aggregate(sum('price'))
You can also use annotate() to attach the count to each object, see the first link above.

The most elegant way in my opinion would be to define a method total on the Model class and decorate it as a property. This will return the total (using Django ORM's Sum aggregate) for either Package or User.
Example for class Package:
from django.db.models import Sum
...
class Package(models.Model):
...
#property
def total(self):
return self.item_set.aggregate(Sum('price'))
In your template code you would use total as any other model attribute. E.g.:
{{ package_instance.total }}

#Vic Smith got the solution.
But I would add a price attribute on the package model if you wish
the best performance possible
You would add a on_save signal to Item, and if created, you update the related package object.
This way you can get the package price very quickly, and even make quick sorting, comparing, etc.
Plus, I don't really get the purpose of the purchased attribute. But you probably want to make a ManyToMany relationship between Item and User, and define UserItem as the connection with the trhough parameter.
Anyway, my experience is that you usually want to make a relationship between Item and a Purchasse objet, which is linked to User, and not a direct link (unless you start to get performances issues...). Having Purchasse as a record of the event "the user bough this and that" make things easier to handle.

Related

Django: Joining on fields other than IDs (Using a date field in one model to pull data from a second model)

I'm attempting to use Django to build a simple website. I have a set of blog posts that have a date field attached to indicate the day they were published. I have a table that contains a list of dates and temperatures. On each post, I would like to display the temperature on the day it was published.
The two models are as follows:
class Post(models.Model):
title = models.CharField(max_length=200)
text = models.TextField()
date = models.DateField()
class Temperature(models.Model):
date = models.DateField()
temperature = models.IntegerField()
I would like to be able to reference the temperature field from the second table using the date field from the first. Is this possible?
In SQL, this is a simple query. I would do the following:
Select temperature from Temperature t join Post p on t.date = p.date
I think I really have two questions:
Is it possible to brute force this, even if it's not best practice? I've googled a lot and tried using raw sql and objects.extra, but can't get them to do what I want. I'm also wary of relying on them for the long haul.
Since this seems to be a simple task, it seems likely that I'm overcomplicating it by having my models set up sub-optimally. Is there something I'm missing about how I should design my models? That is, what's the best practice for doing something like this? (I've successfully pulled the temperature into my blog post by using a foreign key in the Temperature model. But if I go that route, I don't see how I could easily make sure that my temperature dates get the correct foreign key assigned to them so that the temperature date maps to the correct post date.)

There will likely be better answers than this one, but I'll throw in my 2¢ anyway.
You could try a property inside the Post model that returns the temperature:
#property
def temperature(self):
try:
return Temperature.objects.values_list('temperature',flat=True).get(date=self.date)
except:
return None
(code not tested)

About your Models:
If you will be displaying the temperature in a Post list (a list of Posts with their temperatures), then maybe it will be simpler to code and a faster query to just add a temperature field to your Post model.
You can keep the Temperature model. Then:
Assuming you have the temperature data already present in you Temperature model at the time of Post instance creation, you can fill that new field in a custom save method.
If you get temperature data after Post creation, you cann fill in that new temperature field through a background job (maybe triggered by crontab or similar).
Sometimes database orthogonality (not repeating info in many tables) is not the best strategy. Just something to think about, depending on how often you will be querying the Post models and how simple you want to keep that query code.

I think this might be a basic approach to solve the problem
post_dates = Post.objects.all().values('date')
result_temprature = Temperature.objects.filter(date__in = post_dates).values('temperature')

Subqueries could be your friend here. Something like the following should work:
from django.db.models import OuterRef, Subquery
temps = Temperature.objects.filter(date=OuterRef('date'))
posts = Post.objects.annotate(temperature=Subquery(temps.values('temperature')[:1]))
for post in posts:
temperature = post.temperature
Then you can just iterate through posts and access the temperature off each post instance

How to apply windowing function before filter in Django

I have these models:
class Customer(models.Model):
....
class Job(models.Model):
customer = models.ForeignKey('Customer')
payment_status = models.ForeignKey('PaymentStatus')
cleaner = models.ForeignKey(settings.AUTH_USER_MODEL,...)
class PaymentStatus(models.Model):
is_owing = models.NullBooleanField()
I need to find out, for each job, how many total owed jobs the parent customer has, but only display those jobs belonging to the current user. The queryset should be something like this:
user = self.request.user
queryset = Job.objects.select_related('customer'
).filter(payment_status__is_owing=True).annotate(
num_owings=RawSQL('count(jobs_job.id) over (partition by customer_id)', ())
).filter(cleaner=user)
I am using 'select_related' to display fields from the customer related to the job.
Firstly I haven't found a way to do this without the windowing function/raw SQL.
Secondly, regardless of where I place the .filter(window_cleaner=user) (before or afer the annotate()), the final result is always to exclude the jobs that do not belong to the current user in the total count. I need to exclude the jobs from displaying, but not from the count in the windowing function.
I could do the whole thing as raw SQL, but I was hoping there was a nicer way of doing it in Django.
Thanks!

I don't know if this helps and it really depends on how you are wanting to display the results to your user. However if it were me with a free hand to the design aspect I would probably split my window. Perhaps having the total of owed jobs for the parent customer at the top and a separate list for the jobs that belong to the current user below. Then I would split the construction of the data doing a normal query, as you have, for the jobs relating to the current user but then use a custom template tag to calculate the total number of jobs for the parent customer.
I use custom template tags quite a bit. I find they are very cool for those quick snapshot totals that we all want to display to our users. For example....the total number of points accumulated, the number of outstanding tasks, etc etc.
If you've not looked at them previously check out the docs at https://docs.djangoproject.com/en/1.11/howto/custom-template-tags/
They are really easy to use.

Find the most recent rating for a user in a django queryset

I'm looking for a method to get the most recent rating for a specific Person for all Resources.
Currently I'm using a query like Rating.objects.filter(Person = person).order_by('-timestamp')
then passing that through a unique_everseen with a key on the resource and user attributes and then re-looking up with a Rating.objects.filter(id__in = uniquelist). Is there a more elegant way to do this with the django queryset functions?
These are the relevant models.
class Person(models.Model):
pass
class Resource(models.Model):
pass
class Rating(models.Model):
rating = models.IntegerField()
timestamp = models.DateField()
resource = models.ForeignKey('Resource')
user = models.ForeignKey('Person')
I need to keep all of the old Ratings around since other functions need to be able to keep a history of how things are 'changing'.

I am not 100% clear on what you are looking for here, do you want to find the most recent rating by a user for all the resources they have rated? If you can provide detail on what unique_everseen actually does it would help to clarify what you are looking for.
You could rather look from a resource perspective:
resources = Resource.objects.filter(rating__user=person).order_by('-rating__timestamp')
resource_rating = [(resource, resource.rating_set.filter(person=person).get_latest('timestamp')) for resource in resources]
You might be able to use Aggregate functions to get to the most recent record per resource, or some clever use of the Q object to limit the SQL requests (my example may save you some requests, and be more elegant but it is not as simple as what you could produce with a raw SQL request). In raw SQL you would be using an inner SELECT or a well executed GROUP BY to get the most recent rating, so mimicking that would be ideal.
You could also create a post_save signal hook and an 'active' or 'current' boolean field on your Rating model, which would iterate other ratings matching user/resource and set their 'active' field to False. i.e. the post_save hook would mark all other ratings as inactive for a user/resource using something like:
if instance.active:
for rating in Rating.objects.filter(user=instance.user,resource=instance.resource).exclude(id=instance.id):
rating.active=False
rating.save()
You could then do a simple query for:
Rating.objects.filter(user=person,active=True).order_by('-timestamp')
This would be the most economical of queries (even if you make the complicated group by/inner select in raw SQL you are doing a more complicated query than necessary). Using the boolean field also means you can provide 'step forward/step backwards'/'undo/redo' behavior for a user's ratings if that is relevant.

How to sort by annotated Count() in a related model in Django

I'm building a food logging database in Django and I've got a query related problem.
I've set up my models to include (among other things) a Food model connected to the User model through an M2M-field "consumer" via the Consumption model. The Food model describes food dishes and the Consumption model describes a user's consumption of Food (date, amount, etc).
class Food(models.Model):
food_name = models.CharField(max_length=30)
consumer = models.ManyToManyField("User", through=Consumption)
class Consumption(models.Model):
food = models.ForeignKey("Food")
user = models.ForeignKey("User")
I want to create a query that returns all Food objects ordered by the number of times that Food object appears in the Consumption table for that user (the number of times the user has consumed the food).
I'm trying something in the line of:
Food.objects.all().annotate(consumption_times = Count(consumer)).order_by('consumption_times')`
But this will of course count all Consumption objects related to the Food object, not just the ones associated with the user. Do I need to change my models or am I just missing something obvious in the queries?
This is a pretty time-critical operation (among other things, it's used to fill an Autocomplete field in the Frontend) and the Food table has a couple of thousand entries, so I'd rather do the sorting in the database end, rather than doing the brute force method and iterate over the results doing:
Consumption.objects.filter(food=food, user=user).count()
and then using python sort to sort them. I don't think that method would scale very well as the user base increases and I want to design the database as future proof as I can from the start.
Any ideas?

Perhaps something like this?
Food.objects.filter(consumer__user=user)\
.annotate(consumption_times=Count('consumer'))\
.order_by('consumption_times')

I am having a very similar issue. Basically, I know that the SQL query you want is:
SELECT food.*, COUNT(IF(consumption.user_id=123,TRUE,NULL)) AS consumption_times
FROM food LEFT JOIN consumption ON (food.id=consumption.food_id)
ORDER BY consumption_times;
What I wish is that you could mix aggregate functions and F expression, annotate F expressions without an aggregate function, have a richer set of operations/functions for F expressions, and have virtual fields that are basically an automatic F expression annotation. So that you could do:
Food.objects.annotate(consumption_times=Count(If(F('consumer')==user,True,None)))\
.order_by('consumtion_times')
Also, just being able more easily able to add your own complex aggregate functions would be nice, but in the meantime, here's a hack that adds an aggregate function to do this.
from django.db.models import aggregates,sql
class CountIf(sql.aggregates.Count):
sql_template = '%(function)s(IF(%(field)s=%(equals)s,TRUE,NULL))'
sql.aggregates.CountIf = CountIf
consumption_times = aggregates.Count('consumer',equals=user.id)
consumption_times.name = 'CountIf'
rows = Food.objects.annotate(consumption_times=consumption_times)\
.order_by('consumption_times')

django aggregate aggregated fields?

I have a model called Item, with m2m relation to User ("owner").
For each item, I need to count users who own it. That's easy enough with annotate()
But then I need to calculate ratio between owners of specific gender and total owner count for each item. For example if, 2 males own the item out of 5 users, the ratio is 0.4.
What's the best way to do that?

To do this with the ORM, you need conditional aggregates, which aren't supported in Django. http://www.voteruniverse.com/Members/jlantz/blog/conditional-aggregates-in-django proposes a hacky solution that might work.
If you don't need to sort by the ratio, then you can make two calls to annotate, and then compute the ratio in Python. Something like:
items = Item.objects.annotate(ucount=Count('users')).annotate(ccount=CountIf(<condition>))
for item in items:
item.ratio = item.ucount / item.ccount
If you don't want to do that, I'd recommend using the extra() method and some custom sql to get the extra info you want. Documentation for that method is on the Django Queryset API documentation page.

Just on top of my head, something like the following could work. Iterate on it to get your perfect solution if you wish:
items = Item.objects.annotate(Count('users'))
for item in items:
total = item.users__count
num_males = item.users.filter(gender='M').count()
num_females = item.users.filter(gender='F').count()

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django Aggregate with several models - django

Related

Django: Joining on fields other than IDs (Using a date field in one model to pull data from a second model)

How to apply windowing function before filter in Django

Find the most recent rating for a user in a django queryset

How to sort by annotated Count() in a related model in Django

django aggregate aggregated fields?

Categories

Resources