Annotate in SQLite - django

I have the following model:
class Model(...):
date = DateField()
user = ForeignKey()
data = ForeignKey()
time = IntegerField()
I'd like to make sum of time field for every user for a single data, so I do:
Model.objects.filter(date=..., data=...).values('user_id').annotate(time=Sum('time'))
but I receive result which looks like:
[{'user_id': 1, 'time': 20},{'user_id': 1, 'time': 10}, {'user_id': 2, 'time': 20}]
So the grouping does not work. I checked query generated by django and I don't know why django uses date and data for grouping as well, not only user. Am I doing something wrong or this is only SQLite issue?

You should append .order_by() to your query set to clear default model ordering.
For your code:
(Model.objects.filter(date=…, data=…)
.values('user_id')
.annotate(time=Sum('time'))
.order_by()) # <---- Here
This is full explained in default ordering doc warning:
"Except that it won't quite work. The default ordering by name will
also play a part in the grouping ... you should ... clearing any ordering in the query."

Related

Django ORM: Get all records and the respective last log for each record

I have two models, the simple version would be this:
class Users:
name = models.CharField()
birthdate = models.CharField()
# other fields that play no role in calculations or filters, but I simply need to display
class UserLogs:
user_id = models.ForeignKey(to='Users', related_name='user_daily_logs', on_delete=models.CASCADE)
reference_date = models.DateField()
hours_spent_in_chats = models.DecimalField()
hours_spent_in_p_channels = models.DecimalField()
hours_spent_in_challenges = models.DecimalField()
# other fields that play no role in calculations or filters, but I simply need to display
What I need to write is a query that will return all the fields of all users, with the latest log (reference_date) for each user. So for n users and m logs, the query should return n records. It is guaranteed that each user has at least one log record.
Restrictions:
the query needs to be written in django orm
the query needs to start from the user model. So Anything that goes like Users.objects... is ok. Anything that goes like UserLogs.objects... is not. That's because of filters and logic in the viewset, which is beyond my control
It has to be a single query, and no iterations in python, pandas or itertools are allowed. The Queryset will be directly processed by a serializer.
I shouldn't have to specify the names of the columns that need to be returned, one by one. The query must return all the columns from both models
Attempt no. 1 returns only user id and the log date (for obvious reasons). However, it is the right date, but I just need to get the other columns:
test = User.objects.select_related("user_daily_logs").values("user_daily_logs__user_id").annotate(
max_date=Max("user_daily_logs__reference_date"))
Attempt no. 2 generates as error (Cannot resolve expression type, unknown output_field):
logs = UserLogs.objects.filter(user_id=OuterRef('pk')).order_by('-reference_date')[:1]
users = Users.objects.annotate(latest_log = Subquery(logs))
This seems impossible taking into account all the restrictions.
One approach would be to use prefetch_related
users = User.objects.all().prefetch_related(
models.Prefetch(
'user_daily_logs',
queryset=UserLogs.objects.filter().order_by('-reference_date'),
to_attr="latest_log"
)
)
This will do two db queries and return all logs for every user which may or not be a problem depending on the number of records. If you need only logs for the current day as the name suggest, you can add that to filter and reduce the number of UserLogs records. Of course you need to get the first element from the list.
users.daily_logs[0]
For that you can create a #property on the User model which could look roughly like this
#property
def latest_log(self):
if not hasattr('daily_logs'):
return None
return self.daily_logs[0]
user.latest_log
You can also go a step further and try the following SubQuery inside Prefetch to limit the queryset to one element but I am not sure on the performance with this one (credits Django prefetch_related with limit).
users = User.objects.all().prefetch_related(
models.Prefetch(
'user_daily_logs',
queryset=UserLogs.objects.filter(id__in=Subquery(UserLogs.objects.filter(user_id=OuterRef('user_id')).order_by('-reference_date').values_list('id', flat=True)[:1] ) ),
to_attr="latest_log"
)
)

Applying union() on same model is not recognising ordering using GenericRelation

I have an Article model like this
from django.contrib.contenttypes.fields import GenericRelation
from django.db import models
from hitcount.models import HitCountMixin, HitCount
class Article(models.Model):
title = models.CharField(max_length=250)
hit_count_generic = GenericRelation(
HitCount, object_id_field='object_pk',
related_query_name='hit_count_generic_relation')
when I do Article.objects.order_by('hit_count_generic__hits'), I am getting results.but when I do
articles_by_id = Article.objects.filter(id__in=ids).annotate(qs_order=models.Value(0, models.IntegerField()))
articles_by_name = Article.objects.filter(title__icontains='sports').annotate(qs_order=models.Value(1, models.IntegerField()))
articles = articles_by_id.union(articles_by_name).order_by('qs_order', 'hit_count_generic__hits')
getting error
ORDER BY term does not match any column in the result set
How can i achieve union like this? I had to use union instead of AND and OR because i need to preserve order. ie; articles_by_id should come first and articles_by_name should come second.
using Django hitcount for hitcount https://github.com/thornomad/django-hitcount. Hitcount model is given below.
class HitCount(models.Model):
"""
Model that stores the hit totals for any content object.
"""
hits = models.PositiveIntegerField(default=0)
modified = models.DateTimeField(auto_now=True)
content_type = models.ForeignKey(
ContentType, related_name="content_type_set_for_%(class)s", on_delete=models.CASCADE)
object_pk = models.TextField('object ID')
content_object = GenericForeignKey('content_type', 'object_pk')
objects = HitCountManager()
As suggested by #Angela tried prefetch related.
articles_by_id = Article.objects.prefetch_related('hit_count_generic').filter(id__in=[1, 2, 3]).annotate(qs_order=models.Value(0, models.IntegerField()))
articles_by_name = Article.objects.prefetch_related('hit_count_generic').filter(title__icontains='date').annotate(qs_order=models.Value(1, models.IntegerField()))
the query of the prefetch_related when checked is not selecting the hitcount at all see.
SELECT "articles_article"."id", "articles_article"."created", "articles_article"."last_changed_date", "articles_article"."title", "articles_article"."title_en", "articles_article"."slug", "articles_article"."status", "articles_article"."number_of_comments", "articles_article"."number_of_likes", "articles_article"."publish_date", "articles_article"."short_description", "articles_article"."description", "articles_article"."cover_image", "articles_article"."page_title", "articles_article"."category_id", "articles_article"."author_id", "articles_article"."creator_id", "articles_article"."article_type", 0 AS "qs_order" FROM "articles_article" WHERE "articles_article"."id" IN (1, 2, 3)
From Django's official documentation:
Further, databases place restrictions on what operations are allowed in the combined queries. For example, most databases don’t allow LIMIT or OFFSET in the combined queries.
So, make sure that your database allows combining queries like this.
ORDER BY term does not match any column in the result set
You are getting this error, because that's exactly what's happening. Your final result-set for articles does not contain the hits column from the hitcount table , due to which the result-set cannot order using this column.
Before delving into the answer, let's look at what's happening with your django querysets under the hood.
Retrieve a particular set of articles and include an extra ordering field qs_order set to 0.
articles_by_id = Article.objects.filter(id__in=ids).annotate(qs_order=models.Value(0, models.IntegerField()))
SQL Query for the above
Select id, title,....., 0 as qs_order from article where article.id in (Select ....) # whatever you did to get your ids or just a flat list
Retrieve another set of articles and include an extra ordering field qs_order set to 1
articles_by_name = Article.objects.filter(title__icontains='sports').annotate(qs_order=models.Value(1, models.IntegerField()))
SQL Query for the above
Select id, title, ...1 as qs_order from article where title ilike '%sports%'
Original queryset and order_by hit_count_generic__hits
Article.objects.order_by('hit_count_generic__hits')
This will actually perform an inner join and fetch the hitcount table to order by the hits column.
Query
Select id, title,... from article inner join hitcount on ... order by hits ASC
Union
So when you do your union, the result-set of the above 2 queries is combined and then ordered using your qs_order and then hits ...where it fails.
Solution
Use prefetch_related to get your hitcount table in the initial queryset filtering, so you can then use the hits column in the union to order.
articles_by_id = Article.objects.prefetch_related('hit_count_generic').filter(id__in=ids).annotate(qs_order=models.Value(0, models.IntegerField()))
articles_by_name = Article.objects.prefetch_related('hit_count_generic').filter(title__icontains='sports').annotate(qs_order=models.Value(1, models.IntegerField()))
Now as you have the desired table and its columns in both your SELECT queries, your union should work the way you have defined.
articles = articles_by_id.union(articles_by_name).order_by('qs_order', 'hit_count_generic__hits')
Just replacing prefetch_related with select_related works for me.
https://docs.djangoproject.com/en/3.2/ref/models/querysets/#select-related

Filtering django DatetimeField__date not working

According to this document that was added on v1.9 we can able to query a DateTimeField by date without time.
Examples are:
Entry.objects.filter(pub_date__date=datetime.date(2005, 1, 1))
Entry.objects.filter(pub_date__date__gt=datetime.date(2005, 1, 1))
But it is not working for me:
class MilkStorage(models.Model):
....
created_at = models.DateTimeField(null=False)
Usage
from datetime import date
MilkStorage.objects.filter(created_at__date=date.today())
It returns an empty queryset <QuerySet []>.
Does this query only works on PostgreSQL? im using MySQL.
Depending on your specific requirements, this may or may not be an ideal solution. I found that __date works if you set USE_TZ = False in settings.py
I had this same problem (I couldn't even filter by __month or __day) until I disabled USE_TZ. Again, may not be ideal for your case, but it should get __date working again.
I have a visitor model. In this model I filtered data through date not time and it's working.
models.py
Visitor(models.Model):
timestamp = models.DateTimeField(_('Login Date Time'), auto_now=True)
os_info = models.CharField(_('OS Information'), max_length=30, null=True)
views.py
import datetime
visitor = Visitor.objects.filter( timestamp__lte=datetime.datetime.now().date())
print visitor
Output:
<Visitor: Windows 10>, <Visitor: Windows 10>]
Use above way to filter data from date.
If you are using a MYSQL database, you may need to do the following:
Load the time zone tables with mysql_tzinfo_to_sql https://dev.mysql.com/doc/refman/8.0/en/mysql-tzinfo-to-sql.html

Django: How to filter on inner join based on properties of second table?

My question is simple: in a Django app, I have a table Users and a table StatusUpdates. In the StatusUpdates table, I have a column user which is a foreign key pointing back to Users. How can I do a search expressing something like:
users.filter(latest_status_update.text__contains='Hello')
Edit:
Please excuse my lack of clarity. The query that I would like to make is something like "Give me all the users whose latest status update contains the text 'hello'". In Django code, I would do the following (which is really inefficient and ugly):
hello_users = []
for user in User.objects.all():
latest_status_update = StatusUpdate.objects.filter(user=user).order_by('-creation_date')[0]
if latest_status_update.text.contains('Hello'):
hello_users.append(user)
return hello_users
Edit 2:
I've already found the solution but since I was asked, here are the important parts of my models:
class User(models.Model):
...
class StatusUpdate(models.Model):
user = models.ForeignKey(User)
text = models.CharField(max_length=140)
creation_date = models.DateTimeField(auto_now_add=True, editable=False)
....
Okay, I think I got it:
from django.db.models import Max, F
User.objects\
.annotate(latest_status_update_id=Max('statusupdate__id'))\
.filter(
statusupdate__id=F('latest_status_update_id'),
statusupdate__text__icontains='hello'
)
For more info, see this section of the Django documentation.
Please note: I ended up changing my strategy a bit and settling for the strategy where the highest ID means the latest update. This is the case because I realized that a User could post two updates the same time and that would break my query.
latest_status_updates = filter(lambda x: x.text.contains('hello'),
[
user.statusupdates_set.order_by('-creation_date').first()
for user in User.objects.all()
]
)
users = list(set([status_update.user for status_update in latest_status_updates]))
EDIT:
Now I first get all LATEST status updates of each user into a list which is then filtered by the text field found in StatusUpdate class. In the second line, I extract users out of the filtered status updates and then produce a unique list of users.
I hope this helps!
Not sure I understand, are you trying to do something like
(StatusUpdates
.objects
.select_related("user")
.filter(text__contains = "hello")
.order_by("-updated")
.first())
This will return the StatusUpdate that was modified last (if you have a field called updated that stores the time of the last modification) which contains "Hello" in the text field. If none of the StatusUpdates contains that string, it will return None.
Then you can do:
latest = (StatusUpdates
.objects
.select_related("user")
.filter(text__contains = "hello")
.order_by("-updated")
.first())
#then if you needed the user too
if latest is not None:
user = latest.user #which does not call the DB again since you selected related`
If this isn't what you needed, please provide more details (models) and clarify your need

Django Count() in multiple annotations

Say I have a simple forum model:
class User(models.Model):
username = models.CharField(max_length=25)
...
class Topic(models.Model):
user = models.ForeignKey(User)
...
class Post(models.Model):
user = models.ForeignKey(User)
...
Now say I want to see how many topics and posts each users of subset of users has (e.g. their username starts with "ab").
So if I do one query for each post and topic:
User.objects.filter(username_startswith="ab")
.annotate(posts=Count('post'))
.values_list("username","posts")
Yeilds:
[('abe', 5),('abby', 12),...]
and
User.objects.filter(username_startswith="ab")
.annotate(topics=Count('topic'))
.values_list("username","topics")
Yields:
[('abe', 2),('abby', 6),...]
HOWEVER, when I try annotating both to get one list, I get something strange:
User.objects.filter(username_startswith="ab")
.annotate(posts=Count('post'))
.annotate(topics=Count('topic'))
.values_list("username","posts", "topics")
Yields:
[('abe', 10, 10),('abby', 72, 72),...]
Why are the topics and posts multiplied together? I expected this:
[('abe', 5, 2),('abby', 12, 6),...]
What would be the best way of getting the correct list?
I think Count('topics', distinct=True) should do the right thing. That will use COUNT(DISTINCT topic.id) instead of COUNT(topic.id) to avoid duplicates.
User.objects.filter(
username_startswith="ab").annotate(
posts=Count('post', distinct=True)).annotate(
topics=Count('topic', distinct=True)).values_list(
"username","posts", "topics")
Try adding distinct to your last queryset:
User.objects.filter(
username_startswith="ab").annotate(
posts=Count('post')).annotate(
topics=Count('topic')).values_list(
"username","posts", "topics").distinct()
See https://docs.djangoproject.com/en/1.3/ref/models/querysets/#distinct for more details, but basically you're getting duplicate rows because the annotations span multiple tables.