Django ORM queryset equivalent to group by year-month? - django

I have an Django app and need some datavisualization and I am blocked with ORM.
I have a models Orders with a field created_at and I want to present data with a diagram bar (number / year-month) in a dashboard template.
So I need to aggregate/annotate data from my model but did find a complete solution.
I find partial answer with TruncMonth and read about serializers but wonder if there is a simpliest solution with Django ORM possibilities...
In Postgresql it would be:
SELECT date_trunc('month',created_at), count(order_id) FROM "Orders" GROUP BY date_trunc('month',created_at) ORDER BY date_trunc('month',created_at);
"2021-01-01 00:00:00+01" "2"
"2021-02-01 00:00:00+01" "3"
"2021-03-01 00:00:00+01" "3"
...
example
1 "2021-01-04 07:42:03+01"
2 "2021-01-24 13:59:44+01"
3 "2021-02-06 03:29:11+01"
4 "2021-02-06 08:21:15+01"
5 "2021-02-13 10:38:36+01"
6 "2021-03-01 12:52:22+01"
7 "2021-03-06 08:04:28+01"
8 "2021-03-11 16:58:56+01"
9 "2022-03-25 21:40:10+01"
10 "2022-04-04 02:12:29+02"
11 "2022-04-13 08:24:23+02"
12 "2022-05-08 06:48:25+02"
13 "2022-05-19 15:40:12+02"
14 "2022-06-01 11:29:36+02"
15 "2022-06-05 02:15:05+02"
16 "2022-06-05 03:08:22+02"
expected result
[
{
"year-month": "2021-01",
"number" : 2
},
{
"year-month": "2021-03",
"number" : 3
},
{
"year-month": "2021-03",
"number" : 3
},
{
"year-month": "2021-03",
"number" : 1
},
{
"year-month": "2021-04",
"number" : 2
},
{
"year-month": "2021-05",
"number" : 3
},
{
"year-month": "2021-06",
"number" : 3
},
]
I have done this but I am not able to order by date:
Orders.objects.annotate(month=TruncMonth('created_at')).values('month').annotate(number=Count('order_id')).values('month', 'number').order_by()
<SafeDeleteQueryset [
{'month': datetime.datetime(2022, 3, 1, 0, 0, tzinfo=<UTC>), 'number': 4},
{'month': datetime.datetime(2022, 6, 1, 0, 0, tzinfo=<UTC>), 'number': 2},
{'month': datetime.datetime(2022, 5, 1, 0, 0, tzinfo=<UTC>), 'number': 1},
{'month': datetime.datetime(2022, 1, 1, 0, 0, tzinfo=<UTC>), 'number': 5},
{'month': datetime.datetime(2021, 12, 1, 0, 0, tzinfo=<UTC>), 'number': 1},
{'month': datetime.datetime(2022, 7, 1, 0, 0, tzinfo=<UTC>), 'number': 1},
{'month': datetime.datetime(2021, 9, 1, 0, 0, tzinfo=<UTC>), 'number': 2},
'...(remaining elements truncated)...'
]>

Try adding the order_by on the original field if you have multi-year data.
from django.db.models import Sum
from django.db.models.functions import TruncMonth
Orders.objects.values(month=TruncMonth('created_at')).
order_by("created_at").annotate(Sum('number')

Related

How add 0 when TruncWeek's week no result in Django Query?

I want query the issue's count of group by weekly.
query1 = MyModel.object.filter(issue_creator__in=group.user_set.all()).\
annotate(week=TruncWeek('issue_creat_date')).values('week').annotate(count=Count('id')).order_by('week'))
the query result is OK. the queryset result:
[
{'week': datetime.datetime(2022, 1, 3, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 9},
{'week': datetime.datetime(2022, 1, 10, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 12},
{'week': datetime.datetime(2022, 1, 17, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 10},
{'week': datetime.datetime(2022, 2, 7, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 1},
{'week': datetime.datetime(2022, 2, 14, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 6},
{'week': datetime.datetime(2022, 2, 21, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 11},
{'week': datetime.datetime(2022, 2, 28, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 1}
]
but 20220101-20220301 has 9 weeks:
[
datetime.datetime(2022, 1, 3, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>),
datetime.datetime(2022, 1, 10, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>),
datetime.datetime(2022, 1, 17, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>),
datetime.datetime(2022, 1, 24, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>),
datetime.datetime(2022, 1, 31, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>),
datetime.datetime(2022, 2, 7, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>),
datetime.datetime(2022, 2, 14, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>),
datetime.datetime(2022, 2, 21, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>),
datetime.datetime(2022, 2, 28, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>)
]
I want add zero when that week no result as this result:
[
{'week': datetime.datetime(2022, 1, 3, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 9},
{'week': datetime.datetime(2022, 1, 10, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 12},
{'week': datetime.datetime(2022, 1, 17, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 10},
{'week': datetime.datetime(2022, 1, 24, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 0},
{'week': datetime.datetime(2022, 1, 31, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 0},
{'week': datetime.datetime(2022, 2, 7, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 1},
{'week': datetime.datetime(2022, 2, 14, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 6},
{'week': datetime.datetime(2022, 2, 21, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 11},
{'week': datetime.datetime(2022, 2, 28, 0, 0, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>), 'count': 1}
]
how to write the right queryset?
Thanks.
Django method for None value Coalesce.
from django.db.models.functions import Coalesce
query1 = MyModel.object.filter(issue_creator__in=group.user_set.all()).\
annotate(week=TruncWeek('issue_creat_date')).values('week').annotate(count=Count('id')).order_by('week'))

Merging a list of dictionaries

I have a list of dicts as below.
list = [ {id: 1, s_id:2, class: 'a', teacher: 'b'} ]
list1 = [ {id: 1, c_id:1, rank:2, area: 34}, {id:1, c_id:2, rank:1, area: 21} ]
I want to merge the two lists on the common key-value pairs (in this case 'id:1')
Merged_list = [ {id:1, s_id:2, class: 'a', teacher: 'b', list1: {c_id:1, rank: 2, area: 34}, {c_id:2, rank: 1, area: 21} ]
How do I go about this?
Thanks
You can use
merged_list = [{**d1, **d2} for d1, d2 in zip(list1, list2)]
>>> merged_list
[{'id': 1, 's_id': 2, 'class': 'a', 'teacher': 'b', 'rank': 2, 'area': 34},
{'id': 2, 's_id': 3, 'class': 'c', 'teacher': 'd', 'rank': 1, 'area': 21}]
where {**d1, **d2} is just a neat way to combine 2 dictionaries. Keep in mind this will replace the duplicate keys of the first dictionary. If you're on Python 3.9, you could use d1 | d2.
EDIT: For the edit in your question, you can try this horrible one liner (keep in mind this will create the pair list1: [] if no matching indeces were found on list1):
list_ = [{"id": 1, "s_id": 2, "class": 'a', "teacher": 'b'}]
list1 = [{"id": 1, "c_id": 1, "rank": 2, "area": 34}, {"id": 1, "c_id": 2, "rank": 1, "area": 21}]
merged_list = [{**d, **{"list1": [{k: v for k, v in d1.items() if k != "id"} for d1 in list1 if d1["id"] == d["id"]]}} for d in list_]
>>> merged_list
[{'id': 1,
's_id': 2,
'class': 'a',
'teacher': 'b',
'list1': [{'c_id': 1, 'rank': 2, 'area': 34},
{'c_id': 2, 'rank': 1, 'area': 21}]}]
This is equivalent to (with some added benefits):
merged_list = []
for d in list_:
matched_ids = []
for d1 in list1:
if d["id"] == d1["id"]:
d1.pop("id") # remove id from dictionary before appending
matched_ids.append(d1)
if matched_ids: # added benefit of not showing the 'list1' key if matched_ids is empty
found = {"list1": matched_ids}
else:
found = {}
merged_list.append({**d, **found})
Try this
And don't forget to put " " when declare string
list = [ {"id": 1, "s_id": 2 ," class": 'a', "teacher": 'b'}, {"id": 2, "s_id" : 3, "class" : 'c', "teacher": 'd'} ]
list1 = [ {"id": 1, "rank" :2, "area" : 34}, {"id" :2, "rank" :1, "area": 21} ]
list2 = list1 + list
print(list2)

How to exclude items with identical field if the datefield is bigger than in others duplicates?

So I have a Comments model and by querying
comments = Comments.objects.values('students_id', 'created_at')
I get this output
<QuerySet [
{'students_id': 4, 'created_at': datetime.date(2019, 6, 19)}, {'students_id': 2, 'created_at': datetime.date(2019, 6, 3)}, {'students_id': 1, 'created_at': datetime.date(2019, 6, 24)}, {'students_id': 6, 'created_at': datetime.date(2019, 6, 4)}, {'students_id': 6, 'created_at': datetime.date(2019, 6, 19)}, {'students_id': 5, 'created_at': datetime.date(2019, 6, 5)}, {'students_id': 4, 'created_at': datetime.date(2019, 7, 28)}, {'students_id': 6, 'created_at': datetime.date(2019, 6, 11)}]>
It's three comments by student with id=6 and two comments by student with id=4.
What I need to get is only one latest comment from every student. In this example it'll look like this:
<QuerySet [
{'students_id': 2, 'created_at': datetime.date(2019, 6, 3)}, {'students_id': 1, 'created_at': datetime.date(2019, 6, 24)}, {'students_id': 6, 'created_at': datetime.date(2019, 6, 19)}, {'students_id': 5, 'created_at': datetime.date(2019, 6, 5)}, {'students_id': 4, 'created_at': datetime.date(2019, 7, 28)},]>
Thanks in advance for the answer!
You can use annotate and max to get desired result like this Comments.objects.values('students_id').annotate(Max('created_at'))
and the output will be like this <QuerySet [
{'students_id': 2, 'created_at__max': datetime.date(2019, 6, 3)}, {'students_id': 1, 'created_at__max': datetime.date(2019, 6, 24)},]> which will have students_id and latest created_at. To use this you have to import Max from django.db.models like this from django.db.models import Max
use this code :
queryset=Comments.objects.values('students_id', 'created_at').group_by('students_id').annotate(Latest_created_at=Max('created_at'))
queryset.delete()
In raw SQL it would be ... WHERE NOT EXISTS(SELECT * FROM Comments cc WHERE cc.student_id = c.student_id AND cc.created_at > c.created_at)
later_comments = Comments.objects.filter(student_id=OuterRef('student_id'),
created_at__gt=OuterRef('created_at'), ).values('created_at', )
latest_comments = Comments.objects.\
annotate(has_later_comments=Exists(later_comments), ).\
filter(has_later_comments=False, )
If your created_at is a Date column (no time), then you need to use => instead of > because perhaps more than one comment can be created during a day. So the query would contain additional predicate with extra column for ordering comments (like id): WHERE cc.created_at > c.created_at OR cc.created_at = c.created_at AND cc.id > c.id
https://docs.djangoproject.com/en/2.2/ref/models/expressions/#exists-subqueries

Access Pandas MultiIndex column by name

I have a spreadsheet imported with pandas like this:
df = pd.read_excel('my_spreadsheet.xlsx',header = [0,1],index_col=0,sheetname='Sheet1')
The output of df.columns is:
MultiIndex(levels=[[u'MR 1', u'MR 10', u'MR 11', u'MR 12', u'MR 13', u'MR 14', u'MR 15', u'MR 16', u'MR 17', u'MR 18', u'MR 19', u'MR 2', u'MR 20', u'MR 21', u'MR 22', u'MR 3', u'MR 4', u'MR 5', u'MR 6', u'MR 7', u'MR 8', u'MR 9'], [u'BIRADS', u'ExamDesc', u'completedDTTM']],
labels=[[0, 0, 0, 11, 11, 11, 15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 12, 12, 12, 13, 13, 13, 14, 14, 14], [1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0]],
names=[None, u'De-Identified MRN'])
I have been trying to access the values of column named 'De-Identified MRN', but can't seem to find the way to do this.
What I have tried (based on similar posts):
[in] df.index.get_level_values('De-Identified MRN')
[out] KeyError: 'Level De-Identified MRN must be same as name (None)'
and
[in] df.index.unique(level='De-Identified MRN')
[out] KeyError: 'Level De-Identified MRN must be same as name (None)'
UPDATE:
The following did the trick for some reason. I really do not understand the format of the MultiIndex Pandas Dataframe:
pd.Series(df.index)
By using your data
s="MultiIndex(levels=[[u'MR 1', u'MR 10', u'MR 11', u'MR 12', u'MR 13', u'MR 14', u'MR 15', u'MR 16', u'MR 17', u'MR 18', u'MR 19', u'MR 2', u'MR 20', u'MR 21', u'MR 22', u'MR 3', u'MR 4', u'MR 5', u'MR 6', u'MR 7', u'MR 8', u'MR 9'], [u'BIRADS', u'ExamDesc', u'completedDTTM']],labels=[[0, 0, 0, 11, 11, 11, 15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 12, 12, 12, 13, 13, 13, 14, 14, 14], [1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0]],names=[None, u'De-Identified MRN'])"
idx=eval(s, {}, {'MultiIndex': pd.MultiIndex})
df=pd.DataFrame(index=idx)
df.index.get_level_values(level=1) # df.index.get_level_values('De-Identified MRN')
Out[336]:
Index(['ExamDesc', 'completedDTTM', 'BIRADS', 'ExamDesc', 'completedDTTM',
'BIRADS', 'ExamDesc', 'completedDTTM', 'BIRADS', 'ExamDesc',...
Also if all above still does not work , try
df.reset_index()['De-Identified MRN']
Try the following:
midx = pd.MultiIndex(
levels=[[u'MR 1', u'MR 10', u'MR 11', u'MR 12', u'MR 13', u'MR 14', u'MR 15', u'MR 16', u'MR 17', u'MR 18', u'MR 19', u'MR 2', u'MR 20', u'MR 21', u'MR 22', u'MR 3', u'MR 4', u'MR 5', u'MR 6', u'MR 7', u'MR 8', u'MR 9'], [u'BIRADS', u'ExamDesc', u'completedDTTM']],
labels=[[0, 0, 0, 11, 11, 11, 15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 12, 12, 12, 13, 13, 13, 14, 14, 14], [1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0]],
names=[None, u'De-Identified MRN']
)
midx.levels[1] # returns the following
Index(['BIRADS', 'ExamDesc', 'completedDTTM'], dtype='object', name='De-Identified MRN')
midx.levels[1].values # returns the following
array(['BIRADS', 'ExamDesc', 'completedDTTM'], dtype=object)

Django count number of records per day

I'm using Django 2.0
I am preparing data to show on a graph in template. I want to fetch number of records per day.
This is what I'm doing
qs = self.get_queryset().\
extra({'date_created': "date(created)"}).\
values('date_created').\
annotate(item_count=Count('id'))
but, the output given is
[
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1},
{'date_created': datetime.date(2018, 5, 24), 'item_count': 1}
]
Here data is not grouped and same date is returning repeatedly with count as 1
Try using TruncDate function.
See that answer