Django ORM query instead of using a dictionary - django

I have the following models:
class ProjectUser(models.Model):
categories = models.ManyToManyField('UserCategory', blank=True, null=True)
user_id = models.PositiveIntegerField(db_index=True)
name = models.CharField(max_length=80)
actual_rank = models.FloatField(default=0)
class UserCategory(models.Model):
name = models.CharField(max_length=100)
What I'd like to do is get the categories name and the number of times they appear in the users divided by the length of another object.
I'm doing it creating a categories dictionary {'category': ocurrences } and then iterating it and dividing the ocurrences by a number to get the category rank.
Maybe this is a good way to do it, but I'd like to know if it could be done directly using some queryset methods. I'm doing similar things in a lot of places and finding a better and more succint way to solve it would be great.

This should get all users with the category name in their categories, count that number of users, then divide that number by the length you're getting somewhere else.
ProjectUser.objects.filter(categories__name__contains=category.name).count() / objectLength
category being the current one you're trying to analyse.

Related

How to Get a Distinct Filtered QuerySet In Django Without Using the distinct Method?

Below is my post model.
class Post(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
title = models.CharField(max_length=200)
content = models.TextField()
datetime = models.DateTimeField(auto_now_add=True)
votes = models.ManyToManyField(settings.AUTH_USER_MODEL,
related_name="post_votes", default=None, blank=True)
tags = models.ManyToManyField(Tag, default=None, blank=True)
I want to filter posts which contain a certain query in their title, content or as the name of one of their tags. To do this I've tried:
query_set = Post.objects.filter(Q(content__icontains=query)|
Q(tags__name__icontains=query)|
Q(title__icontains=query))
But this often returns QuerySets with duplicate results. I have tried using the distinct method to solve this, but that results in incorrect ordering when I sort the posts later on by the number of votes they have:
query_set.annotate(vote_count=Count('votes')).order_by('-vote_count', '-datetime')
If anybody could help me I would be very grateful.
Jack
The duplicates originate from the fact that you filter on related objects. This means that Django will perform a query with a JOIN in it. You can of course perform a uniqness filter at the Django/Python level, but those are inefficient (well the ineffeciency is two-fold: first it will result in more data being transmitted from the database to the Django server, and furthermore Python does not handle large collections very well).
Furthermore the line:
query_set.annotate(vote_count=Count('votes')).order_by('-vote_count', '-datetime')
is basically a no-op, since QuerySets are immutable, here you did not sort the QuerySet on votes, you constructed a new one that will do that, but you immediately throw it away, since you do nothing with the result.
You can add the annotation and ordering and thus obtain distinct results later on:
query_set = Post.objects.filter(
Q(content__icontains=query)|
Q(tags__name__icontains=query)|
Q(title__icontains=query)
).annotate(
vote_count=Count('votes', distinct=True)
).order_by('-vote_count', '-date_time').distinct()
The distinct=True on the Count is necessary, since, as said before, the query acts like a JOIN, and JOINs can act like "multipliers" when counting things, since a row can occur multiple times.

Django average scores with through

Maybe this is a very newbie question, but I am stuck in this point. I do not know if the problem is the model or I do not understand very well aggregations and annotations.
I have a model like this:
class User(models.Model):
collection = models.ManyToManyField(Book, through='BookCollection')
class Book(models.Model):
name = models.CharField(max_length=200)
class BookCollection(models.Model):
user = models.ForeignKey(User)
book = models.ForeignKey(Book)
score = models.IntegerField(default=0)
I want to get the score average for all the books and all users, excluding that ones that has a default score equals to 0 (this value represents that the user has the book in the collection, but it has not been rated). I am trying to use an annotation like this:
Book.objects.exclude(collection__score=0).annotate(avg=Avg('collection__score'))
but if there is a book rated with 0 and 3, for example, both entries are excluded.
Is there any way to tell Avg() that it should take into account only values greater than 0?
Thanks in advance.
There is no way to do that in the Django ORM without raw SQL.
A better model would be to allow null values in your score field. Null values are ignored in Avg():
class BookCollection(models.Model):
...
score = models.IntegerField(null=True, blank=True, default=None)
None is generally the best way to describe a lack of an entry in a field. This avoids confusion, e.g. in calculations such as calculating the average.

Django : Count only non-empty CharField with annotate() & values()

Django recommends not using null on CharField, however annotate includes empty strings in the count. Is there a way to avoid that without excluding rows with empty string from the query?
My question isn't simly how to achieve my query, but fundamentally, should Annotate/Aggregate count include empty fields or not. Django consider empty as a replacement for NULL for string based fields.
My model :
class Book(models.Model):
name = models.CharField(...)
class Review(models.Model):
book = models.ForeignKey()
category = models.ForeignKey()
review = models.CharField(max_length=200, default='', blank=True)
To count non-empty reviews & group by category, I use
Review.objects.values('category').annotate(count=Count('review'))
This doesn't work because annotate counts empty values also (if the entry was NULL, it wouldn't have done so). I could filter out empty strings before the annotate call but my Query is more complex and I need all empty & non-empty objects.
Is there a smarter way to use annotate and skip empty values from count or should I change the model from
review = models.CharField(max_length=200, default='', blank=True)
to
review = models.CharField(max_length=200, default=None, blank=True, null=True)
I faced a very similar situation. I solved it using Conditional Expressions:
review_count = Case(
When(review='', then=0),
default=1,
output_field=IntegerField(),
)
Review.objects.values('category').annotate(count=review_count)
...and I need all empty & non-empty objects.
This doesn't make any sense when using values. Instead of actual objects, you'll get a list of dictionaries containing just the category and count keys. Apart from a different number in count, you'll see no difference between filtering out empty review values or not. On top of that, you filter for a single book (id=2) and somehow expect that there can be more than one review.
You need to seriously rethink what you are exactly trying to do, and how your model definition fits into that.

Django ORM most occuring value

I have a model where basically i am Tracking users' activities. I want to know what is the page the user have accessed MOST.
Here are my modals.
class Visitor(models.Model):
session_key = models.CharField(max_length=40, primary_key=True)
user = models.ForeignKey(User, related_name='visit_history', null=True, editable=False)
....
class Pageview(models.Model):
visitor = models.ForeignKey(Visitor, related_name='pageviews')
url = models.CharField(max_length=500)
method = models.CharField(max_length=20, null=True)
view_time = models.DateTimeField()
Here is my query.
Pageview.objects.values('visitor__user__first_name', 'visitor__user__last_name', 'visitor__user').annotate(url_count=Count('url')).annotate(url_count_unique=Count('url', distinct=True))
Here i am getting users number of urls visited, and number of unique urls visited.
Here i also want to know which is the url user have visited the most?
EDIT
Translation of my query.
Goto PageViews and count the occurring of unique URLS.(how many times a url have occurred.) and give me the one that have most visited count against each user.
I hope the question is clear, if not let me know.
IMHO you're better off with a many-to-many relationship. You would have something like:
class VisitedURLs(models.Model):
page = models.ForeignKey(Visitor, ....)
user = models.ForeignKey(User, ....)
timestamp = models.DateTimeField(auto_now_add=True)
and the original models become something like:
class Visitor(models.Model):
members = models.ManyToManyField(PageView, through='VisitedURLs')
class PageView(models.Model):
url = models.CharField(max_length=500)
method = models.CharField(max_length=20, null=True)
In this case, you can use the count/distinct on the visitedURLs model and when you get an object of that type you'll have a FK to a Visitor object (which would give you the user...) and a FK to the URL.
Another way is to explicitly count each unique visitor/url combination and store it somewhere. Depending on usage (e.g. if you want to compute/display this often) you may be better off with the dedicated storage.
Here is the solution that i have come up with.
Pageview.objects.filter(visitor__user_id=user['visitor__user_id']).values(
'url').annotate(page_count=Count('id')).order_by('-page_count')
if max_visited_node:
user['max_visited_node'] = max_visited_node[0]
by this way i can get the count of the all the pages the user have visited. then i order them by that count and then i get the top first element which contains the URL and page_count.
This is what i was looking for. the suggestion of Laur lvan is worth considerable.

How QuerySets are evaluated in Django?

I have following models:
class Product(models.Model):
"""
Basic product
"""
name = models.CharField(max_length=100, db_column='name', unique=True)
url = models.SlugField(max_length=100, db_column="url", unique=True, db_index=True)
description = HTMLField(db_column='description')
category = models.ForeignKey(Category, db_column='category', related_name='products')
class FirstObject(Product):
pass
class FirstProduct(models.Model):
product = models.ForeignKey(FirstObject, db_column='product')
color = models.ForeignKey(Color, db_index=True, db_column='color')
class SecondObject(Product):
pass
class SecondProduct(models.Model):
product = models.ForeignKey(SecondObject, db_column='product')
diameter = models.PositiveSmallIntegerField(db_column='diameter')
In other words I have two different types of products (with different parameters).
I want for particular category (in category can be only one type of product and I know what) select all products with appropriate parameters.
How can this be accomplished efficiently?
If I write Category.objects.get(id=id).products.all() and then use related manager to fetch parameters of particular product, does it mean that database is hitted for every product?
Second approach is to fetch all products in one query and then fetch all parameters.
Then group them in list/dictionary.
What approach is the best? Or maybe there is another approach?
Thank you.
Your schema really does not lend itself well to querying. You will very quickly hit the worst case query behaviour (2 queries for every type of modification to a product. I suggest you have a look at the schema for django-shop-simplevariations and see how they are able to achieve fast lookups (Hint, the schema is structured for prefetch_related to be effective).