Django count number of records per day - django

I'm using Django 2.0
I am preparing data to show on a graph in template. I want to fetch number of records per day.
This is what I'm doing
qs = self.get_queryset().\
extra({'date_created': "date(created)"}).\
values('date_created').\
annotate(item_count=Count('id'))
but, the output given is
[
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1}
]
Here data is not grouped and same date is returning repeatedly with count as 1

Try using TruncDate function.
See that answer

Related

How can I annotate django-polymorphic models that have GenericRelations to other models with GenericForeignKeys?

I have a parent model named Content that inherits from Django polymorphic. This is a simplified example, but I have a Post model that inherits from Content.
On the Content model, notice that I have a GenericRelation(Note) named notes.
What I'm trying to do is annotate all Content objects with a count of the number of notes. It's the exact same result you would get in the below for loop.
for content in Content.objects.all():
print(content.notes.count())
Below is a fully reproducible and simplified example.
To recreate the problem
Setup new Django project, create superuser, add django-polymorphic to the project, and copy/paste the models. Make migrations and migrate. My app was called myapp.
Open manage.py shell, import Post model, and run Post.make_entries(n=30)
Run Post.notes_count_answer() and it will return a list of numbers. These numbers are what the annotated Content PolymorphicQuerySet should show. Example:
Post.notes_count_answer()
[3, 2, 3, 1, 3, 1, 3, 1, 2, 1, 2, 2, 3, 3, 3, 1, 3, 3, 2, 3, 2, 3, 2, 1, 2, 1, 1, 1, 1, 2]
The first number 3 in the list means the first Post has 3 notes.
What have I tried (simplest to complex)
basic
>>> Content.objects.all().annotate(notes_count=Count('notes')).values('notes_count')
<PolymorphicQuerySet [{'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0},
{'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0},
{'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0},
{'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0},
{'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0},
{'notes_count': 0}, '...(remaining elements truncated)...']>
hail-mary / weak attempt
Content.objects.all().prefetch_related('notes').annotate(notes_count=Count('notes')).values('notes_count')
<PolymorphicQuerySet [{'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0},
{'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0},
{'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0},
{'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0},
{'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0}, {'notes_count': 0},
{'notes_count': 0}, '...(remaining elements truncated)...']>
subquery?
>>> Content.objects.all().annotate(notes_count=Subquery(
Note.objects.filter(object_id=OuterRef('pk'), content_type_id=OuterRef('polymorphic_ctype_id')).order_by(
'object_id').annotate(c=Count('object_id')).values('c'))).values('notes_count')
<PolymorphicQuerySet [{'notes_count': 1}, {'notes_count': 1}, {'notes_count': 1},
{'notes_count': 1}, {'notes_count': 1}, {'notes_count': 1}, {'notes_count': 1},
{'notes_count': 1}, {'notes_count': 1}, {'notes_count': 1}, {'notes_count': 1},
{'notes_count': 1}, {'notes_count': 1}, {'notes_count': 1}, {'notes_count': 1},
{'notes_count': 1}, {'notes_count': 1}, {'notes_count': 1}, {'notes_count': 1},
{'notes_count': 1}, '...(remaining elements truncated)...']>
close ?
Content.objects.all().annotate(
notes_count=Count(Subquery(
Note.objects.filter(
object_id=OuterRef('pk'), content_type_id=OuterRef('polymorphic_ctype_id')
).order_by('object_id')))).values('notes_count')
# error message
line 357, in execute
return Database.Cursor.execute(self, query, params)
django.db.utils.OperationalError: sub-select returns 4 columns - expected 1
I've really been trying many different variations of Subquery, but haven't been able to get the correct notes count in the annotation.
Expected result:
Yours wouldn't be exact but the data is generated but this is the idea.
<PolymorphicQuerySet [{'notes_count': 3}, {'notes_count': 2}, {'notes_count': 3},
{'notes_count': 1}, {'notes_count': 3}, {'notes_count': 1}, {'notes_count': 3},
{'notes_count': 1}, {'notes_count': 2}, {'notes_count': 1}, {'notes_count': 2},
{'notes_count': 2}, {'notes_count': 3}, {'notes_count': 3}, {'notes_count': 3},
{'notes_count': 1}, {'notes_count': 3}, {'notes_count': 3}, {'notes_count': 2},
{'notes_count': 3}, {'notes_count': 2}, {'notes_count': 3}, {'notes_count': 2},
{'notes_count': 1}, '...(remaining elements truncated)...']>
requirements.txt
Django==4.1.5
django-polymorphic==3.1.0
settings.py
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'polymorphic',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'myapp.apps.MyappConfig',
]
models.py
from django.contrib.contenttypes.fields import GenericRelation, GenericForeignKey
from django.contrib.contenttypes.models import ContentType
from django.core.validators import MaxValueValidator, MinValueValidator
from django.db import models
from django.conf import settings
from polymorphic.models import PolymorphicModel
from django.contrib.auth import get_user_model
class Vote(models.Model):
value = models.IntegerField(default=0, validators=[MinValueValidator(-1), MaxValueValidator(1)])
content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
object_id = models.PositiveIntegerField()
content_object = GenericForeignKey('content_type', 'object_id')
def __str__(self):
return str(self.value)
class Note(models.Model):
body = models.TextField()
content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
object_id = models.PositiveIntegerField()
content_object = GenericForeignKey('content_type', 'object_id')
def __str__(self):
return str(self.id)
class Content(PolymorphicModel):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
title = models.CharField(max_length=100)
votes = GenericRelation(Vote) # reverse generic relation
notes = GenericRelation(Note) # reverse generic relation
def __str__(self):
return str(self.pk)
class Post(Content):
content = models.TextField(blank=True)
def __str__(self):
return str(self.pk)
#staticmethod
def make_entries(n=5):
import random
user = get_user_model().objects.first()
for i in range(1, n+1, 1):
vote_count = random.randrange(0, 5)
note_count = random.randrange(0,3)
p = Post.objects.create(
user=user,
title=f'Post #{i}',
content=f'Content for post {i}',
)
content_type = ContentType.objects.get_for_model(p)
Vote.objects.create(
value=vote_count,
content_type=content_type,
object_id=p.id
)
for j in range(note_count + 1):
Note.objects.create(
body=f'Note {j}',
object_id=p.id,
content_type=content_type
)
#staticmethod
def notes_count_answer():
return [content.notes.count() for content in Content.objects.all()]
Did it. I guess the key is knowing that Subquery needs to return one single value (a count) and to perform the count inside the Subquery. I was messing around with Count() function a lot and was banging my head against the wall.
from django.db.models import Count, OuterRef, Subquery
from django.db.models.functions import Coalesce
Content.objects.annotate(notes_count=Coalesce(
Subquery(
Note.objects.filter(
object_id=OuterRef('pk'),
).order_by('object_id').values('object_id').annotate(
count=Count('object_id')
).values('count')
), 0
))

django annotate count is giving wrong output

Suppose
class Comment(models.Model):
...
likes = models.ManyToManyField(User,...)
class Post
...
content = models.CharField(...)
likes = models.ManyToManyFiled(User,...)
comment = models.ManyToManyField(Comment,...)
Now if I run
Statement1
Post.objects.annotate(likecount=Count('likes')).values('content','likecount')
Output:
<QuerySet [{'content': 'delta', 'likecount': 3}, {'content': 'gamma', 'likecount': 6}, {'content': 'beta', 'likecount': 7}, {'content': 'alpha', 'likecount': 3}]>
Statement2
Post.objects.annotate(commentlikecount=Count('comment__likes')).values('content','commentlikecount')
Output:
<QuerySet [{'content': 'delta', 'commentlikecount': 6}, {'content': 'gamma', 'commentlikecount': 0}, {'content': 'beta', 'commentlikecount': 3}, {'content': 'alpha', 'commentlikecount': 0}]>
Statement3
Post.objects.annotate(likecount=Count('likes'),commentlikecount=Count('comment__likes')).values('content','likecount','commentlikecount')
Output:
<QuerySet [{'content': 'delta', 'likecount': 18, 'commentlikecount': 18}, {'content': 'gamma', 'likecount': 6, 'commentlikecount': 0}, {'content': 'beta', 'likecount': 21, 'commentlikecount': 21}, {'content': 'alpha', 'likecount': 3, 'commentlikecount': 0}]>
Why the output of third statement is this instead of
<QuerySet [{'content': 'delta', 'likecount': 3, 'commentlikecount': 6}, {'content': 'gamma', 'likecount': 6, 'commentlikecount': 0}, {'content': 'beta', 'likecount': 7, 'commentlikecount': 3}, {'content': 'alpha', 'likecount': 3, 'commentlikecount': 0}]>
How can i have this as output?

Django ORM queryset equivalent to group by year-month?

I have an Django app and need some datavisualization and I am blocked with ORM.
I have a models Orders with a field created_at and I want to present data with a diagram bar (number / year-month) in a dashboard template.
So I need to aggregate/annotate data from my model but did find a complete solution.
I find partial answer with TruncMonth and read about serializers but wonder if there is a simpliest solution with Django ORM possibilities...
In Postgresql it would be:
SELECT date_trunc('month',created_at), count(order_id) FROM "Orders" GROUP BY date_trunc('month',created_at) ORDER BY date_trunc('month',created_at);
"2021-01-01 00:00:00+01" "2"
"2021-02-01 00:00:00+01" "3"
"2021-03-01 00:00:00+01" "3"
...
example
1 "2021-01-04 07:42:03+01"
2 "2021-01-24 13:59:44+01"
3 "2021-02-06 03:29:11+01"
4 "2021-02-06 08:21:15+01"
5 "2021-02-13 10:38:36+01"
6 "2021-03-01 12:52:22+01"
7 "2021-03-06 08:04:28+01"
8 "2021-03-11 16:58:56+01"
9 "2022-03-25 21:40:10+01"
10 "2022-04-04 02:12:29+02"
11 "2022-04-13 08:24:23+02"
12 "2022-05-08 06:48:25+02"
13 "2022-05-19 15:40:12+02"
14 "2022-06-01 11:29:36+02"
15 "2022-06-05 02:15:05+02"
16 "2022-06-05 03:08:22+02"
expected result
[
{
"year-month": "2021-01",
"number" : 2
},
{
"year-month": "2021-03",
"number" : 3
},
{
"year-month": "2021-03",
"number" : 3
},
{
"year-month": "2021-03",
"number" : 1
},
{
"year-month": "2021-04",
"number" : 2
},
{
"year-month": "2021-05",
"number" : 3
},
{
"year-month": "2021-06",
"number" : 3
},
]
I have done this but I am not able to order by date:
Orders.objects.annotate(month=TruncMonth('created_at')).values('month').annotate(number=Count('order_id')).values('month', 'number').order_by()
<SafeDeleteQueryset [
{'month': datetime.datetime(2022, 3, 1, 0, 0, tzinfo=<UTC>), 'number': 4},
{'month': datetime.datetime(2022, 6, 1, 0, 0, tzinfo=<UTC>), 'number': 2},
{'month': datetime.datetime(2022, 5, 1, 0, 0, tzinfo=<UTC>), 'number': 1},
{'month': datetime.datetime(2022, 1, 1, 0, 0, tzinfo=<UTC>), 'number': 5},
{'month': datetime.datetime(2021, 12, 1, 0, 0, tzinfo=<UTC>), 'number': 1},
{'month': datetime.datetime(2022, 7, 1, 0, 0, tzinfo=<UTC>), 'number': 1},
{'month': datetime.datetime(2021, 9, 1, 0, 0, tzinfo=<UTC>), 'number': 2},
'...(remaining elements truncated)...'
]>
Try adding the order_by on the original field if you have multi-year data.
from django.db.models import Sum
from django.db.models.functions import TruncMonth
Orders.objects.values(month=TruncMonth('created_at')).
order_by("created_at").annotate(Sum('number')

How to apply sum on a list of columns

I have a list var aggList : List[String]= List() the list contains the column names on which aggregation has to be applied.
I generate the dataframe as below:
var df = sc.parallelize(Seq[(Int, Int, String, Int, Int, Int)](
(1234, 1234, "PRM", 2, 1, 1),
(1235, 1234, "PRM", 1239, 2, 10),
(1246, 1234, "PRM", 1234, 5, 15),
(1247, 1234, "PRM", 1254, 20, 12),
(1246, 1234, "PRM", 1234, 5, 13),
(1246, 1234, "SEC", 1234, 7, 15),
(1249, 1234, "SEC", 1234, 20, 1),
(1248, 1234, "SEC", 1234, 2, 2))
).toDF("col1", "col2", "col3", "col4", "col5", "col6")
I need to do df.groupby(col1).agg(sum(aggList))
How do I achieve this?

How to exclude items with identical field if the datefield is bigger than in others duplicates?

So I have a Comments model and by querying
comments = Comments.objects.values('students_id', 'created_at')
I get this output
<QuerySet [
{'students_id': 4, 'created_at': datetime.date(2019, 6, 19)}, {'students_id': 2, 'created_at': datetime.date(2019, 6, 3)}, {'students_id': 1, 'created_at': datetime.date(2019, 6, 24)}, {'students_id': 6, 'created_at': datetime.date(2019, 6, 4)}, {'students_id': 6, 'created_at': datetime.date(2019, 6, 19)}, {'students_id': 5, 'created_at': datetime.date(2019, 6, 5)}, {'students_id': 4, 'created_at': datetime.date(2019, 7, 28)}, {'students_id': 6, 'created_at': datetime.date(2019, 6, 11)}]>
It's three comments by student with id=6 and two comments by student with id=4.
What I need to get is only one latest comment from every student. In this example it'll look like this:
<QuerySet [
{'students_id': 2, 'created_at': datetime.date(2019, 6, 3)}, {'students_id': 1, 'created_at': datetime.date(2019, 6, 24)}, {'students_id': 6, 'created_at': datetime.date(2019, 6, 19)}, {'students_id': 5, 'created_at': datetime.date(2019, 6, 5)}, {'students_id': 4, 'created_at': datetime.date(2019, 7, 28)},]>
Thanks in advance for the answer!
You can use annotate and max to get desired result like this Comments.objects.values('students_id').annotate(Max('created_at'))
and the output will be like this <QuerySet [
{'students_id': 2, 'created_at__max': datetime.date(2019, 6, 3)}, {'students_id': 1, 'created_at__max': datetime.date(2019, 6, 24)},]> which will have students_id and latest created_at. To use this you have to import Max from django.db.models like this from django.db.models import Max
use this code :
queryset=Comments.objects.values('students_id', 'created_at').group_by('students_id').annotate(Latest_created_at=Max('created_at'))
queryset.delete()
In raw SQL it would be ... WHERE NOT EXISTS(SELECT * FROM Comments cc WHERE cc.student_id = c.student_id AND cc.created_at > c.created_at)
later_comments = Comments.objects.filter(student_id=OuterRef('student_id'),
created_at__gt=OuterRef('created_at'), ).values('created_at', )
latest_comments = Comments.objects.\
annotate(has_later_comments=Exists(later_comments), ).\
filter(has_later_comments=False, )
If your created_at is a Date column (no time), then you need to use => instead of > because perhaps more than one comment can be created during a day. So the query would contain additional predicate with extra column for ordering comments (like id): WHERE cc.created_at > c.created_at OR cc.created_at = c.created_at AND cc.id > c.id
https://docs.djangoproject.com/en/2.2/ref/models/expressions/#exists-subqueries