Count only published videos - django

I have a Category model and Video model
Category:
name=Charfield()
Video:
name=CharField()
category=ManyToManyField()
is_live=BooleanField()
And I want to have the get all categories with a video count but I want to exclude videos who are not live.
This my start state:
Category.objects.annotate(video_count=Count('video'))
# I tried this but I'm not sure if this the right way
Category.objects.exclude(video__is_liive=False)
Any Ideas?

If you want to filter the field you are annotating, you need to use raw SQL as you can't do it through the ORM yet. I wrote a blog post about this:
http://timmyomahony.com/blog/filtering-annotations-django/
Your situation is a little more complicated as you have a M2M relationship which uses an intermediate table. You need something like the following which joins all 3 tables and counts only those that are marked is_live=True (this is totally untested so you will need to play around with it)
categories = Category.objects.all().extra(select = {
"video_count" : """
SELECT COUNT(*)
FROM myapp_videocategory
JOIN myapp_videocategory on myapp_videocategory.category_id = myapp_category.id
JOIN myapp_video on myapp_videocategory.video_id = myapp_video.id
WHERE myapp_video.is_live = True
"""
}).order_by("-live_video_count",)

Related

Django: Joining on fields other than IDs (Using a date field in one model to pull data from a second model)

I'm attempting to use Django to build a simple website. I have a set of blog posts that have a date field attached to indicate the day they were published. I have a table that contains a list of dates and temperatures. On each post, I would like to display the temperature on the day it was published.
The two models are as follows:
class Post(models.Model):
title = models.CharField(max_length=200)
text = models.TextField()
date = models.DateField()
class Temperature(models.Model):
date = models.DateField()
temperature = models.IntegerField()
I would like to be able to reference the temperature field from the second table using the date field from the first. Is this possible?
In SQL, this is a simple query. I would do the following:
Select temperature from Temperature t join Post p on t.date = p.date
I think I really have two questions:
Is it possible to brute force this, even if it's not best practice? I've googled a lot and tried using raw sql and objects.extra, but can't get them to do what I want. I'm also wary of relying on them for the long haul.
Since this seems to be a simple task, it seems likely that I'm overcomplicating it by having my models set up sub-optimally. Is there something I'm missing about how I should design my models? That is, what's the best practice for doing something like this? (I've successfully pulled the temperature into my blog post by using a foreign key in the Temperature model. But if I go that route, I don't see how I could easily make sure that my temperature dates get the correct foreign key assigned to them so that the temperature date maps to the correct post date.)
There will likely be better answers than this one, but I'll throw in my 2¢ anyway.
You could try a property inside the Post model that returns the temperature:
#property
def temperature(self):
try:
return Temperature.objects.values_list('temperature',flat=True).get(date=self.date)
except:
return None
(code not tested)
About your Models:
If you will be displaying the temperature in a Post list (a list of Posts with their temperatures), then maybe it will be simpler to code and a faster query to just add a temperature field to your Post model.
You can keep the Temperature model. Then:
Assuming you have the temperature data already present in you Temperature model at the time of Post instance creation, you can fill that new field in a custom save method.
If you get temperature data after Post creation, you cann fill in that new temperature field through a background job (maybe triggered by crontab or similar).
Sometimes database orthogonality (not repeating info in many tables) is not the best strategy. Just something to think about, depending on how often you will be querying the Post models and how simple you want to keep that query code.
I think this might be a basic approach to solve the problem
post_dates = Post.objects.all().values('date')
result_temprature = Temperature.objects.filter(date__in = post_dates).values('temperature')
Subqueries could be your friend here. Something like the following should work:
from django.db.models import OuterRef, Subquery
temps = Temperature.objects.filter(date=OuterRef('date'))
posts = Post.objects.annotate(temperature=Subquery(temps.values('temperature')[:1]))
for post in posts:
temperature = post.temperature
Then you can just iterate through posts and access the temperature off each post instance

Django - joining multiple tables (models) and filtering out based on their attribute

I'm new to django and ORM in general, and so have trouble coming up with query which would join multiple tables.
I have 4 Models that need joining - Category, SubCategory, Product and Packaging, example values would be:
Category: 'male'
SubCategory: 'shoes'
Product: 'nikeXYZ'
Packaging: 'size_36: 1'
Each of the Model have FK to the model above (ie. SubCategory has field category etc).
My question is - how can I filter Product given a Category (e.g. male) and only show products which have Packaging attribute available set to True? Obviously I want to minimise the hits on my database (ideally do it with 1 SQL query).
I could do something along these lines:
available = Product.objects.filter(packaging__available=True)
subcategories = SubCategory.objects.filter(category_id=<id_of_male>)
products = available.filter(subcategory_id__in=subcategories)
but then that requires 2 hits on database at least (available, subcategories) I think. Is there a way to do it in one go?
try this:
lookup = {'packaging_available': True, 'subcategory__category_id__in': ['ids of males']}
product_objs = Product.objects.filter(**lookup)
Try to read:
this
You can query with _set, multi __ (to link models by FK) or create list ids
I think this should work but it's not tested:
Product.objects.filter(packaging__available=True,subcategori‌​es__category_id__in=‌​[id_of_male])
it isn't tested but I think that subcategories should be plural (related_name), if you didn't set related_name, then subcategory__set instead od subcategories should work.
Probably subcategori‌​es__category_id__in=‌​[id_of_male] can be switched to .._id=id_of_male.

Grouping Django model entries by day using its datetime field

I'm working with an Article like model that has a DateTimeField(auto_now_add=True) to capture the publication date (pub_date). This looks something like the following:
class Article(models.Model):
text = models.TextField()
pub_date = models.DateTimeField(auto_now_add=True)
I want to do a query that counts how many article posts or entries have been added per day. In other words, I want to query the entries and group them by day (and eventually month, hour, second, etc.). This would look something like the following in the SQLite shell:
select pub_date, count(id) from "myapp_article"
where id = 1
group by strftime("%d", pub_date)
;
Which returns something like:
2012-03-07 18:08:57.456761|5
2012-03-08 18:08:57.456761|9
2012-03-09 18:08:57.456761|1
I can't seem to figure out how to get that result from a Django QuerySet. I am aware of how to get a similar result using itertools.groupby, but that isn't possible in this situation (explanation to follow).
The end result of this query will be used in a graph showing the number of posts per day. I'm attempting to use the Django Chartit package to achieve this goal. Chartit puts a constraint on the data source (DataPool). The source must be a Model, Manager, or QuerySet, so using itertools.groupby is not an option as far as I can tell.
So the question is... How do I group or aggregate the entries by day and end up with a QuerySet object?
Create an extra field that only store date data(not time) and annotate with Count:
Article.objects.extra({'published':"date(pub_date)"}).values('published').annotate(count=Count('id'))
Result will be:
published,count
2012-03-07,5
2012-03-08,9
2012-03-09,1

Django models: retrieving unique foreign key instances

I have two tables like so:
class Collection(models.Model):
name = models.CharField()
class Image(models.Model):
name = models.CharField()
image = models.ImageField()
collection = models.ForeignKey(Collection)
I'd like to retrieve the first image out of every collection. I have attempted:
image_list = Image.objects.order_by('collection.id').distinct('collection.id')
but it didn't work out the way I expected :(
Any ideas?
Thanks.
Don't use dots to separate fields that span relations in Django; the double-underscore convention is used instead -- it means "follow this relation to get to this field"
this is more correct:
image_list = Image.objects.order_by('collection__id').distinct('collection__id')
However, it probably doesn't do what you want.
The concept of "first" doesn't always apply in relational databases the way you seem to be using it. For all of the records in the image table with the same collection id, there is no record which is 'first' or 'last' -- they're all just records. You could put another field on that table to define a specific order, or you could order by id, or alphabetically by name, but none of those will happen by default.
What will probably work best for you is to get the list of collections with one query, and then get a single item per collection, in separate queries:
collection_ids = Image.objects.values_list('collection', flat=True).distinct()
image_list = [
Image.objects.filter(collection__id=c)[0] for c in collection_ids
]
If you want to apply an order to the Images, to define which is 'first', then modify it like this:
collection_ids = Image.objects.values_list('collection', flat=True).distinct()
image_list = [
Image.objects.filter(collection__id=c).order_by('-id')[0] for c in collection_ids
]
You could also write raw SQL -- MySQL aggregation has the interesting property that fields which are not aggregated over can still appear in the final output, and essentially take a random value from the set of matching records. Something like this might work:
Image.objects.raw("SELECT image.* FROM app_image GROUP BY collection_id")
This query should get you one image from each collection, but you will have no control over which one is returned.
As written in my comment, you cannot use specific fields with distinct under MySQL. However, you can achieve the same result with the following:
from itertools import groupby
all_images = Image.objects.order_by('collection__id')
images_by_collection = groupby(all_images, lambda image: image.collection_id)
image_list = sum([group for key, group in images_by_collection], [])
Unfortunately, this results in a "bigger" query to the DB (all images are retrieved).
dict([(c.id, c.image_set.all()[0]) for c in Collection.objects.all()])
That will create a dictionary of the first image (by default ordering) in each collection, keyed by the collection's id. Be aware, though, that this will generate 1+N queries, where N is the total number of collection objects.
To get around that, you'll either need to wait for Django 1.4 and prefetch_related or use something like django-batch-select.
First get the distinct result, then do your filters.
I think you should try this one.
image_list = Image.objects.distinct()
image_list = image_list.order_by('collection__id')

Annotating a Django queryset with a left outer join?

Say I have a model:
class Foo(models.Model):
...
and another model that basically gives per-user information about Foo:
class UserFoo(models.Model):
user = models.ForeignKey(User)
foo = models.ForeignKey(Foo)
...
class Meta:
unique_together = ("user", "foo")
I'd like to generate a queryset of Foos but annotated with the (optional) related UserFoo based on user=request.user.
So it's effectively a LEFT OUTER JOIN on (foo.id = userfoo.foo_id AND userfoo.user_id = ...)
A solution with raw might look like
foos = Foo.objects.raw("SELECT foo.* FROM foo LEFT OUTER JOIN userfoo ON (foo.id = userfoo.foo_id AND foo.user_id = %s)", [request.user.id])
You'll need to modify the SELECT to include extra fields from userfoo which will be annotated to the resulting Foo instances in the queryset.
This answer might not be exactly what you are looking for but since its the first result in google when searching for "django annotate outer join" so I will post it here.
Note: tested on Djang 1.7
Suppose you have the following models
class User(models.Model):
name = models.CharField()
class EarnedPoints(models.Model):
points = models.PositiveIntegerField()
user = models.ForeignKey(User)
To get total user points you might do something like that
User.objects.annotate(points=Sum("earned_points__points"))
this will work but it will not return users who have no points, here we need outer join without any direct hacks or raw sql
You can achieve that by doing this
users_with_points = User.objects.annotate(points=Sum("earned_points__points"))
result = users_with_points | User.objects.exclude(pk__in=users_with_points)
This will be translated into OUTER LEFT JOIN and all users will be returned. users who has no points will have None value in their points attribute.
Hope that helps
Notice: This method does not work in Django 1.6+. As explained in tcarobruce's comment below, the promote argument was removed as part of ticket #19849: ORM Cleanup.
Django doesn't provide an entirely built-in way to do this, but it's not neccessary to construct an entirely raw query. (This method doesn't work for selecting * from UserFoo, so I'm using .comment as an example field to include from UserFoo.)
The QuerySet.extra() method allows us to add terms to the SELECT and WHERE clauses of our query. We use this to include the fields from UserFoo table in our results, and limit our UserFoo matches to the current user.
results = Foo.objects.extra(
select={"user_comment": "UserFoo.comment"},
where=["(UserFoo.user_id IS NULL OR UserFoo.user_id = %s)"],
params=[request.user.id]
)
This query still needs the UserFoo table. It would be possible to use .extras(tables=...) to get an implicit INNER JOIN, but for an OUTER JOIN we need to modify the internal query object ourself.
connection = (
UserFoo._meta.db_table, User._meta.db_table, # JOIN these tables
"user_id", "id", # on these fields
)
results.query.join( # modify the query
connection, # with this table connection
promote=True, # as LEFT OUTER JOIN
)
We can now evaluate the results. Each instance will have a .user_comment property containing the value from UserFoo, or None if it doesn't exist.
print results[0].user_comment
(Credit to this blog post by Colin Copeland for showing me how to do OUTER JOINs.)
I stumbled upon this problem I was unable to solve without resorting to raw SQL, but I did not want to rewrite the entire query.
Following is a description on how you can augment a queryset with an external raw sql, without having to care about the actual query that generates the queryset.
Here's a typical scenario: You have a reddit like site with a LinkPost model and a UserPostVote mode, like this:
class LinkPost(models.Model):
some fields....
class UserPostVote(models.Model):
user = models.ForeignKey(User,related_name="post_votes")
post = models.ForeignKey(LinkPost,related_name="user_votes")
value = models.IntegerField(null=False, default=0)
where the userpostvote table collect's the votes of users on posts.
Now you're trying to display the front page for a user with a pagination app, but you want the arrows to be red for posts the user has voted on.
First you get the posts for the page:
post_list = LinkPost.objects.all()
paginator = Paginator(post_list,25)
posts_page = paginator.page(request.GET.get('page'))
so now you have a QuerySet posts_page generated by the django paginator that selects the posts to display. How do we now add the annotation of the user's vote on each post before rendering it in a template?
Here's where it get's tricky and I was unable to find a clean ORM solution. select_related won't allow you to only get votes corresponding to the logged in user and looping over the posts would do bunch queries instead of one and doing it all raw mean's we can't use the queryset from the pagination app.
So here's how I do it:
q1 = posts_page.object_list.query # The query object of the queryset
q1_alias = q1.get_initial_alias() # This forces the query object to generate it's sql
(q1str, q1param) = q1.sql_with_params() #This gets the sql for the query along with
#parameters, which are none in this example
we now have the query for the queryset, and just wrap it, alias and left outer join to it:
q2_augment = "SELECT B.value as uservote, A.*
from ("+q1str+") A LEFT OUTER JOIN reddit_userpostvote B
ON A.id = B.post_id AND B.user_id = %s"
q2param = (request.user.id,)
posts_augmented = LinkPost.objects.raw(q2_augment,q1param+q2param)
voila! Now we can access post.uservote for a post in the augmented queryset.
And we just hit the database with a single query.
The two queries you suggest are as good as you're going to get (without using raw()), this type of query isn't representable in the ORM at present time.
You could do this using simonw's django-queryset-transform to avoid hard-coding a raw SQL query - the code would look something like this:
def userfoo_retriever(qs):
userfoos = dict((i.pk, i) for i in UserFoo.objects.filter(foo__in=qs))
for i in qs:
i.userfoo = userfoos.get(i.pk, None)
for foo in Foo.objects.filter(…).tranform(userfoo_retriever):
print foo.userfoo
This approach has been quite successful for this need and to efficiently retrieve M2M values; your query count won't be quite as low but on certain databases (cough MySQL cough) doing two simpler queries can often be faster than one with complex JOINs and many of the cases where I've most needed it had additional complexity which would have been even harder to hack into an ORM expression.
As for outerjoins:
Once you have a queryset qs from foo that includes a reference to columns from userfoo, you can promote the inner join to an outer join with
qs.query.promote_joins(["userfoo"])
You shouldn't have to resort to extra or raw for this.
The following should work.
Foo.objects.filter(
Q(userfoo_set__user=request.user) |
Q(userfoo_set=None) # This forces the use of LOUTER JOIN.
).annotate(
comment=F('userfoo_set__comment'),
# ... annotate all the fields you'd like to see added here.
)
The only way I see to do this without using raw etc. is something like this:
Foo.objects.filter(
Q(userfoo_set__isnull=True)|Q(userfoo_set__isnull=False)
).annotate(bar=Case(
When(userfoo_set__user_id=request.user, then='userfoo_set__bar')
))
The double Q trick ensures that you get your left outer join.
Unfortunately you can't set your request.user condition in the filter() since it may filter out successful joins on UserFoo instances with the wrong user, hence filtering out rows of Foo that you wanted to keep (which is why you ideally want the condition in the ON join clause instead of in the WHERE clause).
Because you can't filter out the rows that have an unwanted user value, you have to select rows from UserFoo with a CASE.
Note also that one Foo may join to many UserFoo records, so you may want to consider some way to retrieve distinct Foos from the output.
maparent's comment put me on the right way:
from django.db.models.sql.datastructures import Join
for alias in qs.query.alias_map.values():
if isinstance(alias, Join):
alias.nullable = True
qs.query.promote_joins(qs.query.tables)