annotate then filter then annotate - django

The code:
now = datetime.now()
year_ago = now - timedelta(days=365)
category_list = Category.objects.annotate(suma = Sum('operation__value')) \
.filter(operation__date__gte = year_ago) \
.annotate(podsuma = Sum('operation__value'))
The idea: get sum of each category and sum of one year back.
But this code result only filtered objects; suma is equal to podsuma.

A queryset only produces one query, so all annotations are calculated over the same filtered data set. You'll need to do two queries.
Update:
Do something like this:
In models.py
class Category(models.Model):
...
def suma(self):
return ...
def podsuma(self):
return ...
Then remove the annotations and your for loop should work as is. It'll mean a lot more queries, but they'll be simpler, and you can always cache them.

Related

query to loop over records by date in django

I am trying to find a better way to loop over orders for the next seven days including today, what I have already:
unfilled_orders_0 = Model.objects.filter(delivery_on__date=timezone.now() + timezone.timedelta(0))
context['todays_orders'] = unfilld_orders_0.aggregate(field_1_sum=Sum('field_1'), field_2_sum=Sum('field_2'),field_3_sum=Sum('field_3'), field_4_sum=Sum('field_4'),field_5_sum=Sum('field_5'))
I'm wondering if I can somehow avoid having to do this seven times--one for each day. I assume there is a more efficient way to do this.
You can do this with a single ORM / db query, by providing Sum with an extra filter argument:
days_ahead = 7
fields = ["field_1", "field_2", ...]
aggregate_kwargs = {
f"s_{field}_{day}": Sum(field, filter=Q(delivery_on__date=now+timedelta(days=day)))
for field in fields
for day in range(days_ahead)
}
unfilled_orders = Model.objects.filter(delivery_on__date__lt=now+timedelta(days=days_ahead)
context.update(unfilled_orders.aggregate(**aggregate_kwargs))
You can approach it with a for loop and store the data in the context like so
from django.utils import timezone
from django.db.models import Sum
context = {}
for i in range(7):
qs = Model.objects.filter(delivery_on__date=(timezone.now() + timezone.timedelta(i)).date())
context = {}
context[f'orders_{i}'] = qs.aggregate(
field_1_sum=Sum('field_1'),
field_2_sum=Sum('field_2'),
field_3_sum=Sum('field_3'),
field_4_sum=Sum('field_4'),
field_5_sum=Sum('field_5'))
This query will hit 7 times the database, otherwise you can use another approach which will hit the db only once
context = {}
qs = Model.objects.filter(delivery_on__date__lte=timezone.now()-timezone.timedelta(days=7)).order_by('delivery_on')
dates = qs.values('delivery_on__date', flat=True).distinct()
for i in dates:
_qs = qs.filter(create_ts__date=i)
context[f'orders_{i}'] = _qs.aggregate(
field_1_sum=Sum('field_1'),
field_2_sum=Sum('field_2'),
field_3_sum=Sum('field_3'),
field_4_sum=Sum('field_4'),
field_5_sum=Sum('field_5'))
You define how many days backwards the qs will be including all orders then distinct the dates and filter the already filtered qs for the dates.

Django - annotate against a prefetch QuerySet?

Is it possible to annotate/count against a prefetched query?
My initial query below, is based on circuits, then I realised that if a site does not have any circuits I won't have a 'None' Category which would show a site as Down.
conn_data = Circuits.objects.all() \
.values('circuit_type__circuit_type') \
.exclude(active_link=False) \
.annotate(total=Count('circuit_type__circuit_type')) \
.order_by('circuit_type__monitor_priority')
So I changed to querying sites and using prefetch, which now has an empty circuits_set for any site that does not have an active link. Is there a Django way of creating the new totals against that circuits_set within conn_data? I was going to loop through all the sites manually and add the totals that way but wanted to know if there was a way to do this within the QuerySet instead?
my end result should have a something like:
[
{'circuit_type__circuit_type': 'Fibre', 'total': 63},
{'circuit_type__circuit_type': 'DSL', 'total': 29},
{'circuit_type__circuit_type': 'None', 'total': 2}
]
prefetch query:
conn_data = SiteData.objects.prefetch_related(
Prefetch(
'circuits_set',
queryset=Circuits.objects.exclude(
active_link=False).select_related('circuit_type'),
)
)
I don't think this will work. Its debatable whether it should work. Let's refer to what prefetch_related does.
Returns a QuerySet that will automatically retrieve, in a single batch, related objects for each of the specified lookups.
So what happens here is that two queries are dispatched and two lists are realized. These lists are then partitioned in memory and grouped to the correct parent records.
Count() and annotate() are directives to the DBMS that resolve to SQL
Select Count(id) from conn_data
Because of the way annotate and prefetch_related work I think its unlikely they will play nice together. prefetch_related is just a convenience though. From a practical perspective running two separate ORM queries and assigning them to SiteData records yourself is effectively the same thing. So something like ...
#Gets all Circuits counted and grouped by SiteData
Circuits.objects.values('sitedata_id)'.exclude(active_link=False).select_related('circuit_type').annotate(Count('site_data_id'));
Then you just loop over your SiteData records and assign the counts.
Ok I got what I wanted with this, probably a better way of doing it but it works never the less:
from collections import Counter
import operator
class ConnData(object):
def __init__(self, priority='', c_type='', count=0 ):
self.priority = priority
self.c_type = c_type
self.count = count
def __repr__(self):
return '{} {}'.format(self.__class__.__name__, self.c_type)
# get all the site data
conn_data = SiteData.objects.exclude(Q(site_type__site_type='Data Centre') | Q(site_type__site_type='Factory')) \
.prefetch_related(
Prefetch(
'circuits_set',
queryset=Circuits.objects.exclude(active_link=False).select_related('circuit_type'),
)
)
# create a list for the conns
conns = []
# add items to list of dictionaries with all required fields
for conn in conn_data:
try:
conn_type = conn.circuits_set.all()[0].circuit_type.circuit_type
prioritiy = conn.circuits_set.all()[0].circuit_type.monitor_priority
conns.append({'circuit_type' : conn_type, 'priority' : prioritiy})
except:
# create category for down sites
conns.append({'circuit_type' : 'Down', 'priority' : 10})
# crate new list for class data
conn_counts = []
# create counter data
conn_count_data = Counter(((d['circuit_type'], d['priority']) for d in conns))
# loop through counter data and add classes to list
for val, count in conn_count_data.items():
cc = ConnData()
cc.priority = val[1]
cc.c_type = val[0]
cc.count = count
conn_counts.append(cc)
# sort the classes by priority
conn_counts = sorted(conn_counts, key=operator.attrgetter('priority'))

Django, general version of prefetch_related()?

Of course, I don't mean to do what prefetch_related does already.
I'd like to mimic what it does.
What I'd like to do is the following.
I have a list of MyModel instances.
A user can either follows or doesn't follow each instance.
my_models = MyModel.objects.filter(**kwargs)
for my_model in my_models:
my_model.is_following = Follow.objects.filter(user=user, target_id=my_model.id, target_content_type=MY_MODEL_CTYPE)
Here I have n+1 query problem, and I think I can borrow what prefetch_related does here. Description of prefetch_related says, it performs the query for all objects and when the related attribute is required, it gets from the pre-performed queryset.
That's exactly what I'm after, perform query for is_following for all objects that I'm interested in. and use the query instead of N individual query.
One additional aspect is that, I'd like to attach queryset rather than attach the actual value, so that I can defer evaluation until pagination.
If that's too ambiguous statement, I'd like to give the my_models queryset that has is_following information attached, to another function (DRF serializer for instance).
How does prefetch_related accomplish something like above?
A solution where you can get only the is_following bit is possible with a subquery via .extra.
class MyModelQuerySet(models.QuerySet):
def annotate_is_follwing(self, user):
return self.extra(
select = {'is_following': 'EXISTS( \
SELECT `id` FROM `follow` \
WHERE `follow`.`target_id` = `mymodel`.id \
AND `follow`.`user_id` = %s)' % user.id
}
)
class MyModel(models.Model):
objects = MyModelQuerySet.as_manager()
usage:
my_models = MyModel.objects.filter(**kwargs).annotate_is_follwing(request.user)
Now another solution where you can get a whole list of following objects.
Because you have a GFK in the Follow class you need to manually create a reverse relation via GenericRelation. Something like:
class MyModelQuerySet(models.QuerySet):
def with_user_following(self, user):
return self.prefetch_related(
Prefetch(
'following',
queryset=Follow.objects.filter(user=user) \
.select_related('user'),
to_attr='following_user'
)
)
class MyModel(models.Model):
following = GenericRelation(Follow,
content_type_field='target_content_type',
object_id_field='target_id'
related_query_name='mymodels'
)
objects = MyModelQuerySet.as_manager()
def get_first_following_object(self):
if hasattr(self, 'following_user') and len(self.following_user) > 0:
return self.following_user[0]
return None
usage:
my_models = MyModel.objects.filter(**kwargs).with_user_following(request.user)
Now you have access to following_user attribute - a list with all follow objects per mymodel, or you can use a method like get_first_following_object.
Not sure if this is the best approach, and I doubt this is what prefetch_related does because I'm joining here.
I found there's way to select extra columns in your query.
extra_select = """
EXISTS(SELECT * FROM follow_follow
WHERE follow_follow.target_object_id = myapp_mymodel.id AND
follow_follow.target_content_type_id = %s AND
follow_follow.user_id = %s)
"""
qs = self.extra(
select={'is_following': extra_select},
select_params=[CONTENT_TYPE_ID, user.id]
)
So you can do this with join.
prefetch_related way of doing it would be separate queryset and look it up in queryset for the attribute.

Filter models by checking one datetime's month against another

Using Django's ORM, I am trying to find instances of myModel based on two of its datetime variables; specifically, where the months of these two datetimes are not equal. I understand to filter by the value of a modelfield, you can use Django's F( ) expressions, so I thought I'd try something like this:
myModel.objects.filter(fixed_date__month=F('closed_date__month'))
I know this wouldn't find instances where they aren't equal, but I thought it'd be a good first step since I've never used the F expressions before. However, it doesn't work as I thought it should. I expected it to give me a queryset of objects where the value of the fixed_date month was equal to the value of the closed_date month, but instead I get an error:
FieldError: Join on field 'closed_date' not permitted. Did you misspell 'month' for the lookup type?
I'm not sure if what I'm trying to do isn't possible or straightforward with the ORM, or if I'm just making a simple mistake.
It doesn't look like django F objects currently support extracting the month inside a DateTimeField, the error message seems to be stating that the F object is trying to convert the '__' inside the string 'closed_date__month' as a Foreignkey between different objects, which are usually stored as joins inside an sql database.
You could carry out the same query by iterating across the objects:
result = []
for obj in myModel.objects.all():
if obj.fixed_date.month != obj.closed_date.month:
result.append(obj)
or as a list comprehension:
result = [obj for obj in myModel.objects.all() if obj.fixed_date.month != obj.closed_date.month]
Alternatively, if this is not efficient enough, the months for the two dates could be cached as IntegerFields within the model, something like:
class myModel(models.Model):
....other fields....
fixed_date = models.DateTimeField()
closed_date = models.DateTimeField()
fixed_month = models.IntegerField()
closed_month = models.IntegerField()
store the two integers when the relevant dates are updated:
myModel.fixed_month = myModel.fixed_date.month
myModel.save()
Then use an F object to compare the two integer fields:
myModel.objects.filter(fixed_month__ne=F('closed_month'))
The ne modifier will do the not equal test.
Edit - using raw sql
If you are using an sql based database, then most efficient method is to use the .raw() method to manually specify the sql:
myModel.objects.raw('SELECT * FROM stuff_mymodel WHERE MONTH(fixed_date) != MONTH(close_date)')
Where 'stuff_mymodel' is the correct name of the table in the database. This uses the SQL MONTH() function to extract the values from the month fields, and compare their values. It will return a collection of objects.
There is some nay-saying about the django query system, for example: http://charlesleifer.com/blog/shortcomings-in-the-django-orm-and-a-look-at-peewee-a-lightweight-alternative/. This example could be taken as demonstrating another inconsistency in it's query api.
My thinking is this:
class myModel(models.Model):
fixed_date = models.DateTimeField()
closed_date = models.DateTimeField()
def has_diff_months(self):
if self.fixed_date.month != self.closed_date.month:
return True
return False
Then:
[x for x in myModel.objects.all() if x.has_diff_months()]
However, for a truly efficient solution you'd have to use another column. It makes sense to me that it'd be a computed boolean field that is created when you save, like so:
class myModel(models.Model):
fixed_date = models.DateTimeField()
closed_date = models.DateTimeField()
diff_months = models.BooleanField()
#overriding save method
def save(self, *args, **kwargs):
#calculating the value for diff_months
self.diff_months = (self.fixed_date.month != self.closed_date.month)
#aaand... saving:
super(Blog, self).save(*args, **kwargs)
Then filtering would simply be:
myModel.objects.filter(diff_months=True)

Query all rows and return most recent of each duplicate

I have a model that has an id that isn't unique. Each model also has a date. I would like to return all results but only the most recent of each row that shares ids. The model looks something like this:
class MyModel(models.Model):
my_id = models.PositiveIntegerField()
date = models.DateTimeField()
title = models.CharField(max_length=36)
## Add some entries
m1 = MyModel(my_id=1, date=yesterday, title='stop')
m1.save()
m2 = MyModel(my_id=1, date=today, title='go')
m2.save()
m3 = MyModel(my_id=2, date=today, title='hello')
m3.save()
Now try to retrieve these results:
MyModel.objects.all()... # then limit duplicate my_id's by most recent
Results should be only m2 and m3
You won't be able to do this with just the ORM, you'll need to get all the records, and then discard the duplicates in Python.
For example:
objs = MyModel.objects.all().order_by("-date")
seen = set()
keep = []
for o in objs:
if o.id not in seen:
keep.append(o)
seen.add(o.id)
Here's some custom SQL that can get what you want from the database:
select * from mymodel where (id, date) in (select id, max(date) from mymodel group by id)
You should be able to adapt this to use in the ORM.
You should also look into abstracting the logic above into a manager:
http://docs.djangoproject.com/en/dev/topics/db/managers/
That way you can call something like MyModel.objects.no_dupes() where you would define no_dupes() in a manager and do the logic Ned laid out in there.
Your models.py would now look like this:
class MyModelManager(models.Manager):
def no_dupes:
objs = MyModel.objects.all().order_by("-date")
seen = set()
keep = []
for o in objs:
if o.id not in seen:
keep.append(o)
seen.add(o.id)
return keep
class MyModel(models.Model):
my_id = models.PositiveIntegerField()
date = models.DateTimeField()
title = models.CharField(max_length=36)
objects = MyModelManager()
With the above code in place, you can call: MyModel.objects.no_dupes(), this should give your desired result. Looks like you can even override the all() function as well if you would want that instead:
http://docs.djangoproject.com/en/1.2/topics/db/managers/#modifying-initial-manager-querysets
I find the manager to be a better solution in case you will need to use this in more than one view across the project, this way you don't have to rewrite the code X number of times.
As Ned says, I don't know of a way to do this with the ORM. But you might be able to use the db to restrict the amount of work you have to do in the for loop in python.
The idea is to use Django's annotate (which is basically running group_by) to find all the instances that have more than one row with the same my_id and process them as Ned suggests. Then for the remainder (which have no duplicates), you can just grab the individual rows.
from django.db.models import Count, Q
annotated_qs = MyModel.objects.annotate(num_my_ids=Count('my_id')).order_by('-date')
dupes = annotated_qs.filter(num_my_ids__gt=1)
uniques = annotated_qs.filter(num_my_ids__lte=1)
for dupe in dupes:
... # just keep the most recent, as Ned describes
keep_ids = [keep.id for keep in keeps]
latests = MyModel.objects.filter(Q(id__in=keep_ids) | Q(id__in=uniques))
If you only have a small number of dupes, this will mean that your for loop is much shorter, at the expense of an extra query (to get the dupes).