Django queryset - Group on basis of foreign key and get count - django

I have 3 tables as follows:
class Bike:
name = CharField(...)
cc_range = IntField(...)
class Item:
bike_number = CharField(...)
bike = ForeignKey(Bike)
class Booking:
start_time = DateTimeField(...)
end_time = DateTimeField(...)
item = ForeignKey(Item, related_name='bookings')
I want to get a list of all the bikes which are not booked during a period of time (say, ["2016-01-09", "2016-01-11"]) with an item count with them.
For example, say there are two bikes b1, b2 with items i11, i12 and i21, i22. If i21 is involved in a booking (say ["2016-01-10", "2016-01-12"]) then I want something like
{"b1": 2, "b2": 1}
I have got the relevant items by
Item.objects
.exclude(bookings__booking_time__range=booking_period)
.exclude(bookings__completion_time__range=booking_period)
but am not able to group them.
I also tried:
Bike.objects
.exclude(item__bookings__booking_time__range=booking_period)
.exclude(item__bookings__completion_time__range=booking_period)
.annotate(items_count=Count('item')
But it removes the whole bike if any of it's item is booked.
I seem to be totally stuck. I would prefer doing this without using a for loop. The django documentation also don't seem to help me out (which is something rare). Is there a problem with my model architecture for the type of problem I want to solve. Or am I missing something out. Any help would be appreciated.
Thanks in advance !!

from django.db.models import Q, Count, Case, When, Value, BooleanField
bikes = models.Bike.objects.annotate(
booked=Case(
When(Q(item__bookings__start_time__lte=booking_period[1]) & Q(item__bookings__end_time__gte=booking_period[0]),
then=Value(True)),
default=Value(False),
output_field=BooleanField(),
)).filter(booked=False).annotate(item_count=Count('item'))
Please read the documentation about conditional expressions.

Related

Django ORM subquery with window function

I'm trying to do this query with Django's ORM:
SELECT
id,
pn,
revision,
description
FROM (SELECT
id,
pn,
revision,
MAX(revision)
OVER (
PARTITION BY pn ) max_rev,
description
FROM table) maxarts
WHERE revision = max_rev
The result needs to be a queryset, i have tried every combination of Window/OuterRef/Subquery i know with no success.
Do i have to use a raw query?
Thanks in advance
Marco
EDIT #1:
I'll try to explain better, i have a model that looks like this:
class Article(models.Model):
pn = models.CharField()
revision = models.CharField()
description = models.CharField()
class Meta:
unique_together = [("pn", "revision"), ]
The data is something like:
pn1 rev1 description
pn1 rev2 description
pn2 rev1 anotherdescription
pn1 rev3 description
pn2 rev2 anotherdescription
I need to have a queryset containing only the Max("revision") value, which increments every time a user make a modfication to the object.
I hope that is more clear now. Thanks!
EDIT #2
As suggested i'm writing what i've already tried:
Raw SQL using the query written in the first message, selecting only the id field and passing it to the ORM as id__in=ids. Slow as hell, unusable.
Declared a WIndow function to use as filter:
Article.objects.annotate(
max_rev=Window(expression=Max("revision"), partition_by=F("pn"))
).filter(revision=F("max_rev"))
But Django complained that i cannot use a window function in a where clause (that's correct).
Then i've tried to use the window as subquery:
window_query = Article.objects.annotate(
max_rev=Window(expression=Max("revision"), partition_by=F("pn"))
)
result = Article.objects.filter(revision=Subquery(window_query))
I've tried also with OuterRef, to use the max_rev annotation as a join, no luck.
I'm out of ideas!
I think you can get what you are after, not much different to what you had, by using FirstValue rather than Max:
>>> window_query = Article.objects.annotate(max_id=Window(
expression=FirstValue("id"),
partition_by=F("pn"),
order_by=F("revision").desc()
)).values("max_id")
>>> list(Article.objects.filter(id__in=Subquery(window_query)))
[<Article: Article object (4)>, <Article: Article object (5)>]
This produces SQL like: SELECT * FROM articles_article WHERE id IN (SELECT FIRST_VALUE(id) OVER (PARTITION BY pn ORDER BY revision DESC) AS max_id FROM articles_article).
The subquery says order your window by revision descending, partitioned by pn, and take the first ID from each partition; then we use that in the parent query to fetch the relevant Articles for those IDs.
On PostgreSQL, you could also do:
>>> Article.objects.order_by('pn', '-revision').distinct('pn')
<QuerySet [<Article: Article object (4)>, <Article: Article object (5)>]>
This produces SQL like SELECT DISTINCT ON (pn) * FROM articles_article ORDER BY pn ASC, revision DESC.
So everytime a revision is made against an article, a row within the table is created?
If so, all you would need to do is perform a count query that counts all rows and groups them according to the 'pn' field. If you want to use the Max function, then I would suggest replacing the 'pn' field with an IntegerField or DecimalField rather than using a CharField. Although depending on where your application is at, that might be pretty difficult.
from django.db.models import Count
Article.objects.values('pn').annotate(maxvalues=Count('pn'))

multiple Django annotate Count over reverse relation of a foreign key with an exclude returns a strange result (18)

The strangest thing, either I'm missing something basic, or maybe a django bug
for example:
class Author(Model):
name = CharField()
class Parent(Model):
name = CharField(
class Subscription(Model):
parent = ForeignKey(Parent, related_name='subscriptions')
class Book(Model):
name = CharField()
good_book = BooleanField()
author = ForeignKey(Author, related_name='books')
class AggregatePerson(Model):
author = OneToOneField(Author, related_name='+')
parent = OneToOneField(Parent, related_name='+')
when I try:
AggregatePerson.objects.annotate(counter=Count('author__books')).order_by('counter')
everything work correctly. both ordering and fields counter and existing_subs show the correct number BUT if I add the following:
AggregatePerson.objects.annotate(existing_subs=Count('parent__subscriptions')).exclude(existing_subs=0).annotate(counter=Count('author__books')).order_by('counter')
Then counter and existing_subs fields become 18
Why 18? and what am I doing wrong?
Thanks for the help!
EDIT clarification after further research:
is the number of parent__subscriptions, the code breaks even without the exclude, **for some reason counter also gets the value of existing_subs
I found the answer to this issue.
Tl;dr:
You need to add distinct=True inside the Count like this:
AggregatePerson.objects.annotate(counter=Count('author__books', distinct=True))
Longer version:
Adding a Count annotation is adding a LEFT OUTER JOIN behind the scene. Since we add two annotations, both referring to the same table, the number of selected and grouped_by rows is increased since some rows may appear twice (once for the first annotation and another for the second annotation) because LEFT OUTER JOIN allows empty cells (rows) on select from the right table.
(repeating essentials of my reply in another forum)
This looks like a Django bug. Possible workarounds:
1) Add the two annotations in one annotate() call:
...annotate(existing_subs=Count('parent__subscriptions'),counter=Count('author__books'))...
2) Replace the annotation for existing_subs and exclude(existing_subs=0) with an exclude (parent__subscriptions=None).

Django filter by number of ForeignKey and less than a month in DateField

I have a model like this:
class MovieHistory(models.Model):
watched_by = models.ForeignKey(User)
time = models.DateTimeField(auto_now_add=True)
movie = models.ForeignKey(Movie)
I want to get up to 15 movies that were watched the most in the last 30 days. So far I have this:
Movie.objects.filter(time__gte=datetime.now()-timedelta(days=30))
How do you filter again, and order them by movie count? I know that I can filter the first 15 results like this: [:15], but I don't know how to order by the amount of movies in that model, and only pick one of each (so I don't have repeated MovieHistories with the same movies on each one).
Thanks.
Annotation is likely the best approach:
from django.db.models import Count
most_watched = Movie.objects.all().annotate(num_watched = Count('watched_by')).order_by('-num_watched')[:15]
I haven't tested this, but I believe this is on the way to the answer. Please let me know if it works! You may need to replace count('watched_by') by Count('watched_by_id') or whatever the field name is in your database (check with ./manage.py sql your_appname).
Hope this helps!
For more on using these annotations: https://docs.djangoproject.com/en/dev/topics/db/aggregation/#cheat-sheet

Django query aggregate upvotes in backward relation

I have two models:
Base_Activity:
some fields
User_Activity:
user = models.ForeignKey(settings.AUTH_USER_MODEL)
activity = models.ForeignKey(Base_Activity)
rating = models.IntegerField(default=0) #Will be -1, 0, or 1
Now I want to query Base_Activity, and sort the items that have the most corresponding user activities with rating=1 on top. I want to do something like the query below, but the =1 part is obviously not working.
activities = Base_Activity.objects.all().annotate(
up_votes = Count('user_activity__rating'=1),
).order_by(
'up_votes'
)
How can I solve this?
You cannot use Count like that, as the error message says:
SyntaxError: keyword can't be an expression
The argument of Count must be a simple string, like user_activity__rating.
I think a good alternative can be to use Avg and Count together:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).order_by(
'-a', '-c'
)
The items with the most rating=1 activities should have the highest average, and among the users with the same average the ones with the most activities will be listed higher.
If you want to exclude items that have downvotes, make sure to add the appropriate filter or exclude operations after annotate, for example:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).filter(user_activity__rating__gt=0).order_by(
'-a', '-c'
)
UPDATE
To get all the items, ordered by their upvotes, disregarding downvotes, I think the only way is to use raw queries, like this:
from django.db import connection
sql = '''
SELECT o.id, SUM(v.rating > 0) s
FROM user_activity o
JOIN rating v ON o.id = v.user_activity_id
GROUP BY o.id ORDER BY s DESC
'''
cursor = connection.cursor()
result = cursor.execute(sql_select)
rows = result.fetchall()
Note: instead of hard-coding the table names of your models, get the table names from the models, for example if your model is called Rating, then you can get its table name with Rating._meta.db_table.
I tested this query on an sqlite3 database, I'm not sure the SUM expression there works in all DBMS. Btw I had a perfect Django site to test, where I also use upvotes and downvotes. I use a very similar model for counting upvotes and downvotes, but I order them by the sum value, stackoverflow style. The site is open-source, if you're interested.

Get distinct values of Queryset by field

I've got this model:
class Visit(models.Model):
timestamp = models.DateTimeField(editable=False)
ip_address = models.IPAddressField(editable=False)
If a user visits multiple times in one day, how can I filter for unique rows based on the ip field? (I want the unique visits for today)
today = datetime.datetime.today()
yesterday = datetime.datetime.today() - datetime.timedelta(days=1)
visits = Visit.objects.filter(timestamp__range=(yesterday, today)) #.something?
EDIT:
I see that I can use:
Visit.objects.filter(timestamp__range=(yesterday, today)).values('ip_address')
to get a ValuesQuerySet of just the ip fields. Now my QuerySet looks like this:
[{'ip_address': u'127.0.0.1'}, {'ip_address': u'127.0.0.1'}, {'ip_address':
u'127.0.0.1'}, {'ip_address': u'127.0.0.1'}, {'ip_address': u'127.0.0.1'}]
How do I filter this for uniqueness without evaluating the QuerySet and taking the db hit?
# Hope it's something like this...
values.distinct().count()
What you want is:
Visit.objects.filter(stuff).values("ip_address").annotate(n=models.Count("pk"))
What this does is get all ip_addresses and then it gets the count of primary keys (aka number of rows) for each ip address.
With Alex Answer I also have the n:1 for each item. Even with a distinct() clause.
It's weird because this is returning the good numbers of items :
Visit.objects.filter(stuff).values("ip_address").distinct().count()
But when I iterate over "Visit.objects.filter(stuff).values("ip_address").distinct()" I got much more items and some duplicates...
EDIT :
The filter clause was causing me troubles. I was filtering with another table field and a SQL JOIN was made that was breaking the distinct stuff.
I used this hint to see the query that was really used :
q=Visit.objects.filter(myothertable__field=x).values("ip_address").distinct().count()
print q.query
I then reverted the class on witch I was making the query and the filter to have a join that doesn't rely on any "Visit" id.
hope this helps
The question is different from what the title suggests. If you want set-like behavior from the database, you need something like this.
x = Visit.objects.all().values_list('ip_address', flat=True).distinct()
It should give you something like this for x.
[1.2.3.4, 2.3.4.5, ...]
Where
len(x) == len(set(x))
Returns True