I have a usecase where I have to count occurences of a ManyToManyField but its getting more complex than I'd think.
models.py:
class Tag(models.Model):
name = models.CharField(max_length=100, unique=True)
class People(models.Model):
tag = models.ManyToManyField(Tag, blank=True)
Here I have to come up with a list of Tags and the number of times they appear overall but only for those People who have >0 and <6 tags. Something like:
tag1 - 265338
tag2 - 4649303
tag3 - 36636
...
This is how I came up with the count initially:
q = People.objects.annotate(tag_count=Count('tag')).filter(tag_count__lte=6, tag_count__gt=0)
for tag in Tag.objects.all():
cnt = q.filter(tag__name=tag.name).count()
# doing something with the cnt
But I later realised that this may be inefficient since I am probably iterating through the People model many times (Records in People are way larger than those in Tag).
Intuitively I think I should be able to do one iteration of the Tag model without any iteration of the People model. So then I came up with this:
for tag in Tag.objects.all():
cnt = tag.people_set.annotate(tag_count=Count('tag')).filter(tag_count__lte=6).count()
# doing something with the cnt
But, first, this is not producing the expected results. Second, I am thinking this has become more complex that it seemed to be, so perhaps I am complicating a simple thing. All ears to any advice.
Update: I got queryset.query and ran the query on the db to debug it. For some reason, the tag_count column in the resulting join shows all 1's. Can't seem to understand why.
Can be done using reverse ManyToMany field query.
Would also reduce the overhead, and shift most of overhead from python to database server.
from some_app.models import Tag, People
from django.db.models import F, Value, Count, CharField
from django.db.models.functions import Concat
# queryset: people with tags >0 and <6, i.e. 1 to 5 tags
people_qualified = People.objects.annotate(tag_count=Count('tag'))\
.filter(tag_count__range=(1, 5))
# query tags used with above category of people, with count
tag_usage = Tag.objects.filter(people__in=people_qualified)\
.annotate(tag=F('name'), count=Count('people'))\
.values('tag', 'count')
# Result: <QuerySet [{'count': 3, 'tag': u'hello'}, {'count': 2, 'tag': u'world'}]>
# similarily, if needed the string output
tag_usage_list = Tag.objects.filter(people__in=people_qualified)\
.annotate(tags=Concat(F('name'), Value(' - '), Count('people'),
output_field=CharField()))\
.values_list('tags', flat=True)
# Result: <QuerySet [u'hello - 3', u'world - 2']>
Related
There is a model of Article and it has many-to-many categories field. I want to filter it by that field but it doesn't work as i expected. For example:
MyModel.objects.filter(categories__id__in = [0, 1, 2])
it gets model with categories 0 and 1 even if it hasn't category with id 2.
i tried something like this:
MyModel.objects.filter(Q(categories__id = 0) & Q(categories__id = 1) & Q(categories__id = 2))
but it even doesn't work. it doesn't even gets model if it has all of this categories.
By the way, i don't want to use more than 1 filter method
So, is there any solution for me?
Thanks.
P.S: django AND on Q not working for many to many - the same question but author still doesn't get an answer.
You can count if it matches three Categorys and thus only retrieve items where the three match:
from django.db.models import Count
MyModel.objects.filter(
categories__id__in=[0, 1, 2]
).annotate(
category_count=Count('categories')
).filter(category_count=3)
What's an elegant way for fetching multiple objects in some custom order from a DB in django?
For example, suppose you have a few products, each with its name, and you want to fetch three of them to display in a row on your website page, in some fixed custom order. Suppose the names of the products which you want to display are, in order: ["Milk", "Chocolate", "Juice"]
One could do
unordered_products = Product.objects.filter(name__in=["Milk", "Chocolate", "Juice"])
products = [
unordered_products.filter(name="Milk")[0],
unordered_products.filter(name="Chocolate")[0],
unordered_products.filter(name="Juice")[0],
]
And the post-fetch ordering part could be improved to use a name-indexed dictionary instead:
ordered_product_names = ["Milk", "Chocolate", "Juice"]
products_by_name = dict((x.name, x) for x in unordered_products)
products = [products_by_name[name] for name in ordered_product_names]
But is there a more elegant way? e.g., convey the desired order to the DB layer somehow, or return the products grouped by their name (aggregation seems to be similar to what I want, but I want the actual objects, not statistics about them).
You can order your product by a custom order with only one query of your ORM (executing one SQL query only):
ordered_products = Product.objects.filter(
name__in=['Milk', 'Chocolate', 'Juice']
).annotate(
order=Case(
When(name='Milk', then=Value(0)),
When(name='Chocolate', then=Value(1)),
When(name='Juice', then=Value(2)),
output_field=IntegerField(),
)
).order_by('order')
Update
Note
Speaking about "elegant way" (and best practice) I think extra method (proposed by #Satendra) is absolutely to avoid.
Official Django documentation report this about extra :
Warning
You should be very careful whenever you use extra(). Every time you
use it, you should escape any parameters that the user can control by
using params in order to protect against SQL injection attacks .
Please read more about SQL injection protection.
Optimized version
If you want to handle more items whit only one query you can change my first query and use the Django ORM flexibility as suggested by #Shubhanshu in his answer:
products = ['Milk', 'Chocolate', 'Juice']
ordered_products = Product.objects.filter(
name__in=products
).order_by(Case(
*[When(name=n, then=i) for i, n in enumerate(products)],
output_field=IntegerField(),
))
The output of this command will be similar to this:
<QuerySet [<Product: Milk >, <Product: Chocolate>, <Product: Juice>]>
And the SQL generated by the ORM will be like this:
SELECT "id", "name"
FROM "products"
WHERE "name" IN ('Milk', 'Chocolate', 'Juice')
ORDER BY CASE
WHEN "name" = 'Milk' THEN 0
WHEN "name" = 'Chocolate' THEN 1
WHEN "name" = 'Juice' THEN 2
ELSE NULL
END ASC
When there is no relation between the objects that you are fetching and you still wish to fetch (or arrange) them in certain (custom) order, you may try doing this:
unordered_products = Product.objects.filter(name__in=["Milk", "Chocolate", "Juice"])
product_order = ["Milk", "Chocolate", "Juice"]
preserved = Case(*[When(name=name, then=pos) for pos, name in enumerate(product_order)])
ordered_products = unordered_products.order_by(preserved)
Hope it helps!
Try this into meta class from model:
class Meta:
ordering = ('name', 'related__name', )
this get your records ordered by your specified field's
then: chocolate, chocolate blue, chocolate white, juice green, juice XXX, milk, milky, milk YYYY should keep that order when you fetch
Creating a QuerySet from a list while preserving order
This means the order of output QuerySet will be same as the order of list used to filter it.
The solution is more or less same as #PaoloMelchiorre answer
But if there are more items lets say 1000 products in
product_names then you don't have to worry about adding more conditions in Case, you can use extra method of QuerySet
product_names = ["Milk", "Chocolate", "Juice", ...]
clauses = ' '.join(['WHEN name=%s THEN %s' % (name, i) for i, name in enumerate(product_names)])
ordering = 'CASE %s END' % clauses
queryset = Product.objects.filter(name__in=product_names).extra(
select={'ordering': ordering}, order_by=('ordering',))
# Output: <QuerySet [<Product: Milk >, <Product: Chocolate>, <Product: Juice>,...]>
I have 3 tables as follows:
class Bike:
name = CharField(...)
cc_range = IntField(...)
class Item:
bike_number = CharField(...)
bike = ForeignKey(Bike)
class Booking:
start_time = DateTimeField(...)
end_time = DateTimeField(...)
item = ForeignKey(Item, related_name='bookings')
I want to get a list of all the bikes which are not booked during a period of time (say, ["2016-01-09", "2016-01-11"]) with an item count with them.
For example, say there are two bikes b1, b2 with items i11, i12 and i21, i22. If i21 is involved in a booking (say ["2016-01-10", "2016-01-12"]) then I want something like
{"b1": 2, "b2": 1}
I have got the relevant items by
Item.objects
.exclude(bookings__booking_time__range=booking_period)
.exclude(bookings__completion_time__range=booking_period)
but am not able to group them.
I also tried:
Bike.objects
.exclude(item__bookings__booking_time__range=booking_period)
.exclude(item__bookings__completion_time__range=booking_period)
.annotate(items_count=Count('item')
But it removes the whole bike if any of it's item is booked.
I seem to be totally stuck. I would prefer doing this without using a for loop. The django documentation also don't seem to help me out (which is something rare). Is there a problem with my model architecture for the type of problem I want to solve. Or am I missing something out. Any help would be appreciated.
Thanks in advance !!
from django.db.models import Q, Count, Case, When, Value, BooleanField
bikes = models.Bike.objects.annotate(
booked=Case(
When(Q(item__bookings__start_time__lte=booking_period[1]) & Q(item__bookings__end_time__gte=booking_period[0]),
then=Value(True)),
default=Value(False),
output_field=BooleanField(),
)).filter(booked=False).annotate(item_count=Count('item'))
Please read the documentation about conditional expressions.
I have two models:
Base_Activity:
some fields
User_Activity:
user = models.ForeignKey(settings.AUTH_USER_MODEL)
activity = models.ForeignKey(Base_Activity)
rating = models.IntegerField(default=0) #Will be -1, 0, or 1
Now I want to query Base_Activity, and sort the items that have the most corresponding user activities with rating=1 on top. I want to do something like the query below, but the =1 part is obviously not working.
activities = Base_Activity.objects.all().annotate(
up_votes = Count('user_activity__rating'=1),
).order_by(
'up_votes'
)
How can I solve this?
You cannot use Count like that, as the error message says:
SyntaxError: keyword can't be an expression
The argument of Count must be a simple string, like user_activity__rating.
I think a good alternative can be to use Avg and Count together:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).order_by(
'-a', '-c'
)
The items with the most rating=1 activities should have the highest average, and among the users with the same average the ones with the most activities will be listed higher.
If you want to exclude items that have downvotes, make sure to add the appropriate filter or exclude operations after annotate, for example:
activities = Base_Activity.objects.all().annotate(
a=Avg('user_activity__rating'), c=Count('user_activity__rating')
).filter(user_activity__rating__gt=0).order_by(
'-a', '-c'
)
UPDATE
To get all the items, ordered by their upvotes, disregarding downvotes, I think the only way is to use raw queries, like this:
from django.db import connection
sql = '''
SELECT o.id, SUM(v.rating > 0) s
FROM user_activity o
JOIN rating v ON o.id = v.user_activity_id
GROUP BY o.id ORDER BY s DESC
'''
cursor = connection.cursor()
result = cursor.execute(sql_select)
rows = result.fetchall()
Note: instead of hard-coding the table names of your models, get the table names from the models, for example if your model is called Rating, then you can get its table name with Rating._meta.db_table.
I tested this query on an sqlite3 database, I'm not sure the SUM expression there works in all DBMS. Btw I had a perfect Django site to test, where I also use upvotes and downvotes. I use a very similar model for counting upvotes and downvotes, but I order them by the sum value, stackoverflow style. The site is open-source, if you're interested.
Lets say if I have a model that has lots of fields, but I only care about a charfield. Lets say that charfield can be anything so I don't know the possible values, but I know that the values frequently overlap. So I could have 20 objects with "abc" and 10 objects with "xyz" or I could have 50 objects with "def" and 80 with "stu" and i have 40000 with no overlap which I really don't care about.
How do I count the objects efficiently? What I would like returned is something like:
{'abc': 20, 'xyz':10, 'other': 10,000}
or something like that, w/o making a ton of SQL calls.
EDIT:
I dont know if anyone will see this since I am editing it kind of late, but...
I have this model:
class Action(models.Model):
author = models.CharField(max_length=255)
purl = models.CharField(max_length=255, null=True)
and from the answers, I have done this:
groups = Action.objects.filter(author='James').values('purl').annotate(count=Count('purl'))
but...
this is what groups is:
{"purl": "waka"},{"purl": "waka"},{"purl": "waka"},{"purl": "waka"},{"purl": "mora"},{"purl": "mora"},{"purl": "mora"},{"purl": "mora"},{"purl": "mora"},{"purl": "lora"}
(I just filled purl with dummy values)
what I want is
{'waka': 4, 'mora': 5, 'lora': 1}
Hopefully someone will see this edit...
EDIT 2:
Apparently my database (BigTable) does not support the aggregate functions of Django and this is why I have been having all the problems.
You want something similar to "count ... group by". You can do this with the aggregation features of django's ORM:
from django.db.models import Count
fieldname = 'myCharField'
MyModel.objects.values(fieldname)
.order_by(fieldname)
.annotate(the_count=Count(fieldname))
Previous questions on this subject:
How to query as GROUP BY in django?
Django equivalent of COUNT with GROUP BY
This is called aggregation, and Django supports it directly.
You can get your exact output by filtering the values you want to count, getting the list of values, and counting them, all in one set of database calls:
from django.db.models import Count
MyModel.objects.filter(myfield__in=('abc', 'xyz')).\
values('myfield').annotate(Count('myfield'))
You can use Django's Count aggregation on a queryset to accomplish this. Something like this:
from django.db.models import Count
queryset = MyModel.objects.all().annotate(count = Count('my_charfield'))
for each in queryset:
print "%s: %s" % (each.my_charfield, each.count)
Unless your field value is always guaranteed to be in a specific case, it may be useful to transform it prior to performing a count, i.e. so 'apple' and 'Apple' would be treated as the same.
from django.db.models import Count
from django.db.models.functions import Lower
MyModel.objects.annotate(lower_title=Lower('title')).values('lower_title').annotate(num=Count('lower_title')).order_by('num')