django: time range based aggregate query - django

I have the following models, Art and ArtScore:
class Art(models.Model):
title = models.CharField()
class ArtScore(models.Model):
art = models.ForeignKey(Art)
date = models.DateField(auto_now_add = True)
amount = models.IntegerField()
Certain user actions results in an ArtScore entry, for instance whenever you click 'I like this art', I save a certain amount of ArtScore for that Art.
Now I'm trying to show a page for 'most popular this week', so I need a query aggregating only ArtScore amounts for that time range.
I built the below query but it's flawed...
popular = Art.objects.filter(
artscore__date__range=(weekago, today)
).annotate(
score=Sum('artscore__amount')
).order_by('-score')
... because it only excludes Art that doesn't have an ArtScore record in the date range, but does not exclude the ArtScore records outside the date range.
Any pointers how to accomplish this would be appreciated!
Thanks,
Martin

it looks like according to this:
http://docs.djangoproject.com/en/dev/topics/db/aggregation/#order-of-annotate-and-filter-clauses what you have should do what you want. is the documentation wrong? bug maybe?
"the second query will only include
good books in the annotated count."
in regards to:
>>> Publisher.objects.filter(book__rating__gt=3.0).annotate(num_books=Count('book'))
change this to your query (forget about the order for now):
Art.objects.filter(
artscore__date__range=(weekago, today)
).annotate(
score=Sum('artscore__amount')
)
now we can say
"the query will only include artscores
in the date range in the annotated
sum."

Related

Django: Joining on fields other than IDs (Using a date field in one model to pull data from a second model)

I'm attempting to use Django to build a simple website. I have a set of blog posts that have a date field attached to indicate the day they were published. I have a table that contains a list of dates and temperatures. On each post, I would like to display the temperature on the day it was published.
The two models are as follows:
class Post(models.Model):
title = models.CharField(max_length=200)
text = models.TextField()
date = models.DateField()
class Temperature(models.Model):
date = models.DateField()
temperature = models.IntegerField()
I would like to be able to reference the temperature field from the second table using the date field from the first. Is this possible?
In SQL, this is a simple query. I would do the following:
Select temperature from Temperature t join Post p on t.date = p.date
I think I really have two questions:
Is it possible to brute force this, even if it's not best practice? I've googled a lot and tried using raw sql and objects.extra, but can't get them to do what I want. I'm also wary of relying on them for the long haul.
Since this seems to be a simple task, it seems likely that I'm overcomplicating it by having my models set up sub-optimally. Is there something I'm missing about how I should design my models? That is, what's the best practice for doing something like this? (I've successfully pulled the temperature into my blog post by using a foreign key in the Temperature model. But if I go that route, I don't see how I could easily make sure that my temperature dates get the correct foreign key assigned to them so that the temperature date maps to the correct post date.)
There will likely be better answers than this one, but I'll throw in my 2ยข anyway.
You could try a property inside the Post model that returns the temperature:
#property
def temperature(self):
try:
return Temperature.objects.values_list('temperature',flat=True).get(date=self.date)
except:
return None
(code not tested)
About your Models:
If you will be displaying the temperature in a Post list (a list of Posts with their temperatures), then maybe it will be simpler to code and a faster query to just add a temperature field to your Post model.
You can keep the Temperature model. Then:
Assuming you have the temperature data already present in you Temperature model at the time of Post instance creation, you can fill that new field in a custom save method.
If you get temperature data after Post creation, you cann fill in that new temperature field through a background job (maybe triggered by crontab or similar).
Sometimes database orthogonality (not repeating info in many tables) is not the best strategy. Just something to think about, depending on how often you will be querying the Post models and how simple you want to keep that query code.
I think this might be a basic approach to solve the problem
post_dates = Post.objects.all().values('date')
result_temprature = Temperature.objects.filter(date__in = post_dates).values('temperature')
Subqueries could be your friend here. Something like the following should work:
from django.db.models import OuterRef, Subquery
temps = Temperature.objects.filter(date=OuterRef('date'))
posts = Post.objects.annotate(temperature=Subquery(temps.values('temperature')[:1]))
for post in posts:
temperature = post.temperature
Then you can just iterate through posts and access the temperature off each post instance

How to generate Sum (and other aggregates) in Django where aggregate depends on values from related tables

My model consists of a Portfolio, a Holding, and a Company. Each Portfolio has many Holdings, and each Holding is of a single Company (a Company may be connected to many Holdings).
Portfolio -< Holding >- Company
I'd like the Portfolio query to return the sum of the product of the number of Holdings in the Portfolio, and the value of the Company.
Simplified model:
class Portfolio(model):
some fields
class Company(model):
closing = models.DecimalField(max_digits=10, decimal_places=2)
class Holding(model):
portfolio = models.ForeignKey(Portfolio)
company = models.ForeignKey(Company)
num_shares = models.IntegerField(default=0)
I'd like to be able to query:
Portfolio.objects.some_function()
and have each row annotated with the value of the Portfolio, where the value is equal to the sum of the product of the related Company.closing, and Holding.num_shares. ie something like:
annotate(value=Sum('holding__num_shares * company__closing'))
I'd also like to obtain a summary row, which contains the sum of the values of all of a user's Portfolios, and a count of the number of holdings. ie something like:
aggregate(Sum('holding__num_shares * company__closing'), Count('holding__num_shares'))
I would like to do have a similar summary row for a single Portfolio, which would be the sum of the values of each holding, and a count of the total number of holdings in the portfolio.
I managed to get part of the way there using extra:
return self.extra(
select={
'value': 'select sum(h.num_shares * c.closing) from portfolio_holding h '
'inner join portfolio_company as c on h.company_id = c.id '
'where h.portfolio_id = portfolio_portfolio.id'
}).annotate(Count('holding'))
but this is pretty ugly, and extra seems to be frowned upon, for obvious reasons.
My question is: is there a more Djangoistic way to summarise and annotate queries based on multiple fields, and across related tables?
These two options seem to move in the right direction:
Portfolio.objects.annotate(Sum('holding__company__closing'))
(ie this demonstrates annotation/aggregation over a field in a related table)
Holding.objects.annotate(Sum('id', field='num_shares * id'))
(this demonstrates annotation/aggregation over the product of two fields)
but if I attempt to combine them: eg
Portfolio.objects.annotate(Sum('id', field='holding__company__closing * holding__num_shares'))
I get an error: "No such column 'holding__company__closing'.
So far I've looked at the following related questions, but none of them seem to capture this precise problem:
Annotating django QuerySet with values from related table
Product of two fields annotation
Do I just need to bite the bullet and use raw / extra? I'm hoping that Django ORM will prove the exception to the rule that ORMs really only work as designed for simple queries / models, and anything beyond the most basic ones require either seriously gnarly tap-dancing, or stepping out of the abstraction, which somewhat defeats the purpose...
Thanks in advance!

Django query aggregate upvotes in backward relation

I have two models:
Base_Activity:
some fields
User_Activity:
user = models.ForeignKey(settings.AUTH_USER_MODEL)
activity = models.ForeignKey(Base_Activity)
rating = models.IntegerField(default=0) #Will be -1, 0, or 1
Now I want to query Base_Activity, and sort the items that have the most corresponding user activities with rating=1 on top. I want to do something like the query below, but the =1 part is obviously not working.
activities = Base_Activity.objects.all().annotate(
up_votes = Count('user_activity__rating'=1),
).order_by(
'up_votes'
)
How can I solve this?
You cannot use Count like that, as the error message says:
SyntaxError: keyword can't be an expression
The argument of Count must be a simple string, like user_activity__rating.
I think a good alternative can be to use Avg and Count together:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).order_by(
'-a', '-c'
)
The items with the most rating=1 activities should have the highest average, and among the users with the same average the ones with the most activities will be listed higher.
If you want to exclude items that have downvotes, make sure to add the appropriate filter or exclude operations after annotate, for example:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).filter(user_activity__rating__gt=0).order_by(
'-a', '-c'
)
UPDATE
To get all the items, ordered by their upvotes, disregarding downvotes, I think the only way is to use raw queries, like this:
from django.db import connection
sql = '''
SELECT o.id, SUM(v.rating > 0) s
FROM user_activity o
JOIN rating v ON o.id = v.user_activity_id
GROUP BY o.id ORDER BY s DESC
'''
cursor = connection.cursor()
result = cursor.execute(sql_select)
rows = result.fetchall()
Note: instead of hard-coding the table names of your models, get the table names from the models, for example if your model is called Rating, then you can get its table name with Rating._meta.db_table.
I tested this query on an sqlite3 database, I'm not sure the SUM expression there works in all DBMS. Btw I had a perfect Django site to test, where I also use upvotes and downvotes. I use a very similar model for counting upvotes and downvotes, but I order them by the sum value, stackoverflow style. The site is open-source, if you're interested.

Grouping Django model entries by day using its datetime field

I'm working with an Article like model that has a DateTimeField(auto_now_add=True) to capture the publication date (pub_date). This looks something like the following:
class Article(models.Model):
text = models.TextField()
pub_date = models.DateTimeField(auto_now_add=True)
I want to do a query that counts how many article posts or entries have been added per day. In other words, I want to query the entries and group them by day (and eventually month, hour, second, etc.). This would look something like the following in the SQLite shell:
select pub_date, count(id) from "myapp_article"
where id = 1
group by strftime("%d", pub_date)
;
Which returns something like:
2012-03-07 18:08:57.456761|5
2012-03-08 18:08:57.456761|9
2012-03-09 18:08:57.456761|1
I can't seem to figure out how to get that result from a Django QuerySet. I am aware of how to get a similar result using itertools.groupby, but that isn't possible in this situation (explanation to follow).
The end result of this query will be used in a graph showing the number of posts per day. I'm attempting to use the Django Chartit package to achieve this goal. Chartit puts a constraint on the data source (DataPool). The source must be a Model, Manager, or QuerySet, so using itertools.groupby is not an option as far as I can tell.
So the question is... How do I group or aggregate the entries by day and end up with a QuerySet object?
Create an extra field that only store date data(not time) and annotate with Count:
Article.objects.extra({'published':"date(pub_date)"}).values('published').annotate(count=Count('id'))
Result will be:
published,count
2012-03-07,5
2012-03-08,9
2012-03-09,1

Query for a ManytoMany Field with Through in Django

I have a models in Django that are something like this:
class Classification(models.Model):
name = models.CharField(choices=class_choices)
...
class Activity(models.Model):
name = models.CharField(max_length=300)
fee = models.ManyToManyField(Classification, through='Fee')
...
class Fee(models.Model):
activity = models.ForeignKey(Activity)
class = models.ForeignKey(Classification)
early_fee = models.IntegerField(decimal_places=2, max_digits=10)
regular_fee = models.IntegerField(decimal_places=2, max_digits=10)
The idea being that there will be a set of fees associated with each Activity and Classification pair. Classification is like Student, Staff, etc.
I know that part works right.
Then in my application, I query for a set of Activities with:
activities = Activity.objects.filter(...)
Which returns a list of activities. I need to display in my template that list of Activities with their Fees. Something like this:
Activity Name
Student Early Price - $4
Student Regular Price - $5
Staff Early Price - $6
Staff Regular Price - $8
But I don't know of an easy way to get this info without a specific get query of the Fees object for each activity/class pair.
I hoped this would work:
activity.fee.all()
But that just returns the Classification Object. Is there a way to get the Fee Object Data for the Pair via the Activities I already queried?
Or am I doing this completely wrong?
Considering michuk's tip to rename "fee" to "classification":
Default name for Fee objects on Activity model will be fee_set. So in order to get your prices, do this:
for a in Activity.objects.all():
a.fee_set.all() #gets you all fees for activity
There's one thing though, as you can see you'll end up doing 1 SELECT on each activity object for fees, there are some apps that can help with that, for example, django-batch-select does only 2 queries in this case.
First of all I think you named your field wrong. This:
fee = models.ManyToManyField(Classification, through='Fee')
should be rather that:
classifications = models.ManyToManyField(Classification, through='Fee')
as ManyToManyField refers to a list of related objects.
In general ManyToManyField, AFAIK, is only a django shortcut to enable easy fetching of all related objects (Classification in your case), with the association table being transparent to the model. What you want is the association table (Fee in your case) not being transparent.
So what I would do is to remove the ManyToManyField field from Activity and simply get all the fees related with the activity. And thenm if you need a Classification for each fee, get the Classification separately.