Model:
class Foo(models.model):
name = models.CharField(max_length = 50, blank = True, unique = True)
class Bar1(models.Model):
foo = models.ForeignKey('Foo')
value = models.DecimalField(max_digits=10,decimal_places=2)
class Bar2(models.Model):
foo = models.ForeignKey('Foo')
value = models.DecimalField(max_digits=10,decimal_places=2)
Clasess Bar1 and Bar2 are unrelated, so I can't do it as one class what would solve the problem. But this is only example to show the problem as pure as possible.
first = Foo.objects.all().annotate(Sum("bar1__value"))
second = Foo.objects.all().annotate(Sum("bar2__value"))
each of this querysets contains correct values.
I can't merge it into:
both = Foo.objects.all().annotate(Sum("bar1__value")).annotate(Sum("bar2__value"))
Because the sum value multiplicates - this is unfortunately expected behaviour - because of JOINS
And now the problem - how to merge/join first and second to get the both?
Example:
Bar 1:
foo | value
--------------
A | 10
B | 20
B | 20
Bar 2:
foo | value
--------------
A | -0.10
A | -0.10
B | -0.25
both (value differs depends on order of entering bar1 and bar2)
foo | bar1__value__sum | bar2__value__sum
---------------------------------
A | 20 | -0.20
B | 40 | -0.50
expected result:
foo | bar1__value__sum | bar2__value__sum
---------------------------------
A | 10 | -0.20
B | 40 | -0.25
I couldn't use itertools.chains because the result is:
foo | bar1__value__sum | bar2__value__sum
---------------------------------
A | null | -0.20
B | null | -0.25
A | 10 | null
B | 40 | null
Your problem is a known limitation of Django's ORM: https://code.djangoproject.com/ticket/10060.
If you're ok with doing two queries, here's one option:
result = Foo.objects.annotate(b1_sum=Sum("bar1__value"))
bar2_sums = Foo.objects.annotate(b2_sum=Sum("bar2__value")).in_bulk()
for foo in result:
foo.b2_sum = bar2_sums.get(foo.pk).b2_sum
According to answer of #emulbreh i read the ticket and found some solution. I go this way and made this:
models.py:
from django.db.models.expressions import RawSQL
from django.db.models.query import QuerySet
(...)
class NewManager(models.Manager):
"""A re-usable Manager to access a custom QuerySet"""
def __getattr__(self, attr, *args):
try:
return getattr(self.__class__, attr, *args)
except AttributeError:
# don't delegate internal methods to the queryset
if attr.startswith('__') and attr.endswith('__'):
raise
return getattr(self.get_query_set(), attr, *args)
def get_query_set(self):
return self.model.QuerySet(self.model, using=self._db)
class Foo(models.Model):
name = models.CharField(max_length = 50, blank = True, unique = True)
objects =NewManager()
def __str__(self):
return self.name
class QuerySet(QuerySet):
def annotate_sum(self, modelClass, field_name):
annotation_name="%s__%s__%s" % (modelClass._meta.model_name,field_name,'sum')
raw_query = "SELECT SUM({field}) FROM {model2} WHERE {model2}.{model3}_id = {model1}.id".format(
field = field_name,
model3 = self.model._meta.model_name,
model2 = modelClass._meta.db_table,
model1 = self.model._meta.db_table
)
debug.debug("%s" % raw_query)
annotation = {annotation_name: RawSQL(raw_query, [])}
return self.annotate(**annotation)
And views.py:
both = Foo.objects.annotate_sum(Bar1, 'value').annotate_sum( Bar2, 'value')
the sql result is exact what I want:
SELECT "app_foo"."id", "app_foo"."name", (SELECT SUM(value) FROM app_bar1 WHERE app_bar1.foo_id = app_foo.id) AS "bar1__value__sum", (SELECT SUM(value) FROM app_bar2 WHERE app_bar2.foo_id = app_foo.id) AS "bar2__value__sum" FROM "app_foo"
Of course it isn't perfect - it needs some error checking (e.g. double quotes) or aliases, but i think this is the right direction
I landed on this page after having a similar problem, but with Count instead of Sum.
The simplest solution is to use Count(<field>, distinct=True) on the 2nd Count, i.e.
both = Foo.objects.all().annotate(Count("bar1__value")
).annotate(Count("bar2__value", distinct=True))
References:
ticket 10060/comment:60 linked by #emulbreh answer
django 2.0 docs / Aggregation # Combining multiple aggregations
Related
I have three models with a simple relation as below:
models.py
class Person(models.Model):
first_name = models.CharField(max_length=20)
last_name = models.CharField(max_length=20)
class PersonSession(models.Model):
start_time = models.DateTimeField(auto_now_add=True)
end_time = models.DateTimeField(null=True,
blank=True)
person = models.ForeignKey(Person, related_name='sessions')
class Billing(models.Model):
DEBT = 'DE'
BALANCED = 'BA'
CREDIT = 'CR'
session = models.OneToOneField(PersonSession,
blank=False,
null=False,
related_name='billing')
STATUS = ((BALANCED, 'Balanced'),
(DEBT, 'Debt'),
(CREDIT, 'Credit'))
status = models.CharField(max_length=2,
choices=STATUS,
blank=False,
default=BALANCED
)
views.py
class PersonFilter(django_filters.FilterSet):
start_time = django_filters.DateFromToRangeFilter(name='sessions__start_time',
distinct=True)
billing_status = django_filters.ChoiceFilter(name='sessions__billing__status',
choices=Billing.STATUS,
distinct=True)
class Meta:
model = Person
fields = ('first_name', 'last_name')
class PersonList(generics.ListCreateAPIView):
queryset = Person.objects.all()
serializer_class = PersonSerializer
filter_backends = (django_filters.rest_framework.DjangoFilterBackend)
filter_class = PersonFilter
I want to get billings from person endpoint which have DE status in billing and are between a period of time:
api/persons?start_time_0=2018-03-20&start_time_1=2018-03-23&billing_status=DE
But the result is not what I were looking for, this returns all persons has a session in that period and has a billing with the DE status, whether that billing is on the period or not.
In other words, it seems use or operation between two filter fields, I think this post is related to this issue but currently I could not find a way to get the result I want. I am using djang 1.10.3.
Edit
I try to write an example to show what I need and what I get from django filter. If I get persons using below query in the example, I got just two person:
select *
from
test_filter_person join test_filter_personsession on test_filter_person.id=test_filter_personsession.person_id join test_filter_billing on test_filter_personsession.id=test_filter_billing.session_id
where
start_time > '2000-02-01' and start_time < '2000-03-01' and status='DE';
Which gets me just person 1 and 2. But if I get somethings expected similar from url I would get all of persons, the similar url (at least one which I expected to be the same) is as below:
http://address/persons?start_time_0=2000-02-01&start_time_1=2000-03-01&billing_status=DE
Edit2
This is the data that my queries in the example are upon and using them you can see what must returns in queries that I mentioned above:
id | first_name | last_name | id | start_time | end_time | person_id | id | status | session_id
----+------------+-----------+----+---------------------------+---------------------------+-----------+----+--------+------------
0 | person | 0 | 0 | 2000-01-01 16:32:00+03:30 | 2000-01-01 17:32:00+03:30 | 0 | 0 | DE | 0
0 | person | 0 | 1 | 2000-02-01 16:32:00+03:30 | 2000-02-01 17:32:00+03:30 | 0 | 1 | BA | 1
0 | person | 0 | 2 | 2000-03-01 16:32:00+03:30 | 2000-03-01 17:32:00+03:30 | 0 | 2 | DE | 2
1 | person | 1 | 3 | 2000-01-01 16:32:00+03:30 | 2000-01-01 17:32:00+03:30 | 1 | 3 | BA | 3
1 | person | 1 | 4 | 2000-02-01 16:32:00+03:30 | 2000-02-01 17:32:00+03:30 | 1 | 4 | DE | 4
1 | person | 1 | 5 | 2000-03-01 16:32:00+03:30 | 2000-03-01 17:32:00+03:30 | 1 | 5 | DE | 5
2 | person | 2 | 6 | 2000-01-01 16:32:00+03:30 | 2000-01-01 17:32:00+03:30 | 2 | 6 | DE | 6
2 | person | 2 | 7 | 2000-02-01 16:32:00+03:30 | 2000-02-01 17:32:00+03:30 | 2 | 7 | DE | 7
2 | person | 2 | 8 | 2000-03-01 16:32:00+03:30 | 2000-03-01 17:32:00+03:30 | 2 | 8 | BA | 8
Edit3
I try using prefetch_related to join tables and get results as I expected because I thought that extra join causes this problem but this did not work and I still get the same result and this had not any effects.
Edit4
This issue has the same problem.
I don't have a solution yet; but I thought a concise summary of the problem will set more and better minds than mine at work!
From what I understand; your core issue is a result of two pre-conditions:
The fact that you have two discrete filters defined on a related model; resulting in filter spanning-multi-valued-relationships
The way FilterSet implements filtering
Let us look at these in more detail:
filter spanning-multi-valued-relationships
This is a great resource to understand issue pre-condition #1 better:
https://docs.djangoproject.com/en/2.0/topics/db/queries/#spanning-multi-valued-relationships
Essentially, the start_time filter adds a .filter(sessions__start_time=value) to your Queryset, and the billing_status filter adds a .filter(sessions_billing_status=value) to the filter. This results in the "spanning-multi-valued-relationships" issue described above, meaning it will do an OR between these filters instead of an AND as you require it to.
This got me thinking, why don't we see the same issue in the start_time filter; but the trick here is that it is defined as a DateFromToRangeFilter; it internally uses a single filter query with the __range= construct. If instead it did sessions__start_time__gt= and sessions__start_time__lt=, we would have the same issue here.
The way FilterSet implements filtering
Talk is cheap; show me the code
#property
def qs(self):
if not hasattr(self, '_qs'):
if not self.is_bound:
self._qs = self.queryset.all()
return self._qs
if not self.form.is_valid():
if self.strict == STRICTNESS.RAISE_VALIDATION_ERROR:
raise forms.ValidationError(self.form.errors)
elif self.strict == STRICTNESS.RETURN_NO_RESULTS:
self._qs = self.queryset.none()
return self._qs
# else STRICTNESS.IGNORE... ignoring
# start with all the results and filter from there
qs = self.queryset.all()
for name, filter_ in six.iteritems(self.filters):
value = self.form.cleaned_data.get(name)
if value is not None: # valid & clean data
qs = filter_.filter(qs, value)
self._qs = qs
return self._qs
As you can see, the qs property is resolved by iterating over a list of Filter objects, passing the initial qs through each of them successively and returning the result. See qs = filter_.filter(qs, value)
Each Filter object here defines a specific def filter operation, that basically takes teh Queryset and then adds a successive .filter to it.
Here's an example from the BaseFilter class
def filter(self, qs, value):
if isinstance(value, Lookup):
lookup = six.text_type(value.lookup_type)
value = value.value
else:
lookup = self.lookup_expr
if value in EMPTY_VALUES:
return qs
if self.distinct:
qs = qs.distinct()
qs = self.get_method(qs)(**{'%s__%s' % (self.name, lookup): value})
return qs
The line of code that matters is: qs = self.get_method(qs)(**{'%s__%s' % (self.name, lookup): value})
So the two pre-conditions create the perfect storm for this issue.
This worked for me:
class FooFilterSet(FilterSet):
def filter_queryset(self, queryset):
"""
Overrides the basic methtod, so that instead of iterating over tthe queryset with multiple `.filter()`
calls, one for each filter, it accumulates the lookup expressions and applies them all in a single
`.filter()` call - to filter with an explicit "AND" in many to many relationships.
"""
filter_kwargs = {}
for name, value in self.form.cleaned_data.items():
if value not in EMPTY_VALUES:
lookup = '%s__%s' % (self.filters[name].field_name, self.filters[name].lookup_expr)
filter_kwargs.update({lookup:value})
queryset = queryset.filter(**filter_kwargs)
assert isinstance(queryset, models.QuerySet), \
"Expected '%s.%s' to return a QuerySet, but got a %s instead." \
% (type(self).__name__, name, type(queryset).__name__)
return queryset
Overriding the filter_queryset method so that it accumulates the expressions and applies them in a single .filter() call
This is my dictionary
{u'krishna': [u'vijayawada', u'gudivada', u'avanigada']}
I want to iterate items and save in the database,my Models is
class Example(models.Model):
district = models.CharField(max_length=50,**optional)
taluk = models.CharField(max_length=20,**optional)
It should save as:
-----------------------------
|district | taluk |
|-----------|--------------- |
|krishna | vijayawada |
|-----------|----------------|
|krishna | gudivada |
|----------------------------|
|krishna | avanigada |
------------------------------
You can do something like this:
form models import Example
places = {u'krishna': [u'vijayawada', u'gudivada', u'avanigada']}
for district in places:
for taluk in district:
e = Example(district=district, taluk=taluk)
e.save()
for key in dict:
for value in dict[key]:
example = Example()
example.district = key
example.taluk = value
example.save()
This will work for you:-
for districtName in places.keys():
for talukName in places[districtName]:
print districtName,talukName #Try to print it
addData = Example.objects.create(district=districtName,taluk=talukName)
addData.save()
I have a query that for some reason is not returning distinct values even though I have specified distinct, I thought it may be because of the only, so I removed that, but the list is still the same
circuit_providers = CircuitInfoData.objects.only('provider').values('provider').distinct()
I just want a list of unqiue providers
model.py
from __future__ import unicode_literals
from django.db import models
import string
import random
import time
import os
# Create your models here.
from service.models import ServiceContacts
def site_photos_path(instance, filename):
file ,extension = os.path.splitext(filename)
# file will be uploaded to MEDIA_ROOT/user_<id>/<filename>
chars=string.ascii_uppercase + string.digits
random_string = ''.join(random.choice(chars) for _ in range(6))
filename = '%s-%s%s' % (random_string,time.strftime("%d-%m-%H-%M-%S"),extension)
return 'site_photos/{0}'.format(filename)
def service_upload_path(instance, filename):
file ,extension = os.path.splitext(filename)
# file will be uploaded to MEDIA_ROOT/user_<id>/<filename>
chars=string.ascii_uppercase + string.digits
random_string = ''.join(random.choice(chars) for _ in range(6))
filename = '%s-%s%s' % (random_string,time.strftime("%d-%m-%H-%M-%S"),extension)
return 'service_files/{0}'.format(filename)
def site_files_path(instance, filename):
file ,extension = os.path.splitext(filename)
# file will be uploaded to MEDIA_ROOT/user_<id>/<filename>
chars=string.ascii_uppercase + string.digits
random_string = ''.join(random.choice(chars) for _ in range(6))
filename = '%s-%s%s' % (random_string,time.strftime("%d-%m-%H-%M-%S"),extension)
return 'site_files/{0}'.format(filename)
provider_choices = (
('KCOM','KCOM'),
('BT','BT'),
('EE','EE'),
('THREE','THREE'),
)
circuit_choices = (
('DSL','DSL'),
('VDSL','VDSL'),
('MPLS','MPLS'),
('4G','4G'),
('Internet Leased Line','Internet Leased Line'),
)
subnet_mask_choices = (
('/16','/16'),
('/24','/24'),
('/25','/25'),
('/26','/26'),
('/27','/27'),
('/28','/28'),
('/29','/29'),
('/30','/30'),
('/31','/31'),
)
class ShowroomConfigData(models.Model):
location = models.CharField(max_length=50)
subnet = models.GenericIPAddressField(protocol='IPv4')
r1_loopback_ip = models.GenericIPAddressField(protocol='IPv4',verbose_name="R1 Loopback IP")
r2_loopback_ip = models.GenericIPAddressField(protocol='IPv4',verbose_name="R2 Loopback IP")
opening_date = models.DateField(verbose_name="Showroom opening date")
last_hw_refresh_date = models.DateField(verbose_name="Date of latest hardware refresh")
is_showroom = models.BooleanField(default=True,verbose_name="Is this site a showroom?")
class Meta:
verbose_name = "Showroom Data"
verbose_name_plural = "Showroom Data"
ordering = ('location',)
def __unicode__(self):
return self.location
class MajorSiteInfoData(models.Model):
location = models.CharField(max_length=200)
major_subnet = models.GenericIPAddressField(protocol='IPv4',verbose_name="Major Site Subnet")
routed_subnet = models.GenericIPAddressField(protocol='IPv4',verbose_name="Routed Link Subnet")
bgp_as = models.CharField(max_length=6,verbose_name="BGP AS Number")
class Meta:
verbose_name = "Major Site Data"
verbose_name_plural = "Major Site Data"
def __unicode__(self):
return self.location
class CircuitInfoData(models.Model):
showroom_config_data = models.ForeignKey(ShowroomConfigData,verbose_name="Install Showroom")
major_site_info = models.ForeignKey(MajorSiteInfoData,verbose_name="Install Site")
circuit_type = models.CharField(max_length=100,choices=circuit_choices)
circuit_speed = models.IntegerField(blank=True)
circuit_bearer = models.IntegerField(blank=True)
provider = models.CharField(max_length=200,choices=provider_choices)
ref_no = models.CharField(max_length=200,verbose_name="Reference No")
class Meta:
verbose_name = "Circuit Data"
verbose_name_plural = "Circuit Data"
ordering = ('showroom_config_data__location','circuit_speed')
def __unicode__(self):
return '%s | %s | %s | %s | %s' % (self.showroom_config_data.location,self.major_site_info.location, self.provider, self.service_type, self.ref_no)
results from shell below
[root#network-tools infternal]# python manage.py shell
Python 2.7.5 (default, Nov 20 2015, 02:00:19)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from networks.models import CircuitInfoData
>>> d = CircuitInfoData.objects.values('provider').distinct()
>>> for item in d:
... print item
...
{'provider': u'BT'}
{'provider': u'BT'}
{'provider': u'KCOM'}
{'provider': u'BT'}
{'provider': u'BT'}
{'provider': u'KCOM'}
.....
>>> print d.query
SELECT DISTINCT "networks_circuitinfodata"."provider", "networks_showroomconfigdata"."location", "networks_circuitinfodata"."circuit_speed" FROM "networks_circuitinfodata" INNER JOIN "networks_showroomconfigdata" ON ("networks_circuitinfodata"."showroom_config_data_id" = "networks_showroomconfigdata"."id") ORDER BY "networks_showroomconfigdata"."location" ASC, "networks_circuitinfodata"."circuit_speed" ASC
>>>
one thing ive noticed is that when i print items in shell as above
#### with def __unicode__(self): #####
>>> from networks.models import CircuitInfoData
>>> d = CircuitInfoData.objects.only('provider').distinct()
>>> for i in d:
... print i
...
Location1 | Showroom | BT | DSL | N/A
Location2 | Showroom | BT | MPLS | XXXX
Location2 | Showroom | KCOM | MPLS | XXXX
Location3 | Showroom | BT | MPLS | XXXX
Location3 | Showroom | BT | DSL | N/A
Location4 | Showroom | KCOM | MPLS | XXXXX
...
#### with out def __unicode__(self): #####
>>> from networks.models import CircuitInfoData
>>> d = CircuitInfoData.objects.only('provider').distinct()
>>> for i in d:
... print i
...
CircuitInfoData_Deferred_circuit_bearer_circuit_cfb3d62ef325a6acfc8ddcb43c8ae1c6 object
CircuitInfoData_Deferred_circuit_bearer_circuit_cfb3d62ef325a6acfc8ddcb43c8ae1c6 object
CircuitInfoData_Deferred_circuit_bearer_circuit_cfb3d62ef325a6acfc8ddcb43c8ae1c6 object
CircuitInfoData_Deferred_circuit_bearer_circuit_cfb3d62ef325a6acfc8ddcb43c8ae1c6 object
CircuitInfoData_Deferred_circuit_bearer_circuit_cfb3d62ef325a6acfc8ddcb43c8ae1c6 object
CircuitInfoData_Deferred_circuit_bearer_circuit_cfb3d62ef325a6acfc8ddcb43c8ae1c6 object
...
#### with either ####
>>> for i in d:
... print i.provider
...
BT
BT
KCOM
BT
BT
KCOM
...
The documentation for distinct says
Returns a new QuerySet that uses SELECT DISTINCT in its SQL query.
This eliminates duplicate rows from the query results.
By default, a QuerySet will not eliminate duplicate rows. In practice,
this is rarely a problem, because simple queries such as
Blog.objects.all() don’t introduce the possibility of duplicate result
rows.
Distinct gives you distinct rows but you are looking at only one of the fields in the record and in that field items can be duplicated unless it has a unique constraint on it. And in this case you don't.
If you happen to be using postgresql you can do
CircuitInfoData.objects.distinct('provider')
to achieve your objective.
UPDATE:
Since you mentioned in the comments that you use sqlite, use this solution.
CircuitInfoData.objects.values('provider').distinct()
This will work because now each row has only one column. the resulting query will be similar to
SELECT DISTINCT "someapp_circuitinfodata"."name" FROM "someapp_circuitinfodata"
UPDATE 2:
Notice that you have overridden the __unicode__ function.
def __unicode__(self):
return '%s | %s | %s | %s | %s' %
(self.showroom_config_data.location,self.major_site_info.location,
self.provider, self.service_type, self.ref_no)
You are referring to the fields in the related model. This is going to be very costly (unless you use select_related). Also note that if you iterate through a queryset and use print for debug purposes it will give you misleading results (since what you are seeing is the output of __unicode__, a rather complex function)
... and return unexpected results (in Django 1.6.5)
My models.py
class Member(models.Model):
...
class Donation(models.Model):
year = models.PositiveSmallIntegerField()
cheque_amount = models.DecimalField(default=0, max_digits=8, decimal_places=2)
donor = models.ForeignKey(Member)
...
class SpecialTitle(models.Model):
chair_title = models.CharField(max_length=128, blank=True)
member = models.OneToOneField(Member)
...
I'd like the union of the two querysets in one of my admin filters
donors = queryset.filter(
donation__year__exact=2014
).annotate(sum_donation=Sum('donation__cheque_amount')).filter(sum_donation__gte=1000)
chairs = queryset.filter(specialtitle__chair_title__iendswith='Chair')
Here is the puzzling part (in Django manager shell)
>>> donors | chairs == chairs | donors
False
>>> donors.count(); chairs.count()
189
17
>>> (donors | chairs).count(); (chairs | donors).count()
193
291
>>> (donors | chairs).distinct().count(); (chairs | donors).distinct().count()
193
207
And none of them are the correct results. I'd expect a set operation to be
>>> set(donors) | set(chairs) == set(chairs) | set(donors)
True
>>> set(donors) & set(chairs) == set(chairs) & set(donors)
True
>>>
And they return the correct results. However, Django admin filter demands a QuerySet, not a python set (or list)
Why is this? How do I get a proper union of Django QuerySet (of the same type) after annotated filter?
Thank you.
It appears I had no other choice but to use the python set union operator and hit the database again for the desired result.
donors = queryset.filter(
donation__year__exact=2014
).annotate(sum_donation=Sum('donation__cheque_amount')).filter(sum_donation__gte=1000)
chairs = queryset.filter(specialtitle__chair_title__iendswith='Chair')
result = queryset.filter(pk__in=[person.id for person in set(donors) | set(chairs)])
models:
class Category(models.Model):
name = models.CharField(max_length=100)
class Operation(models.Model):
date = models.DateField()
value = models.DecimalField(max_digits = 9, decimal_places = 2)
category = models.ForeignKey(Category, null = True)
comments = models.TextField(null = True)
Now I want to create a view, with 13 columns:
name of category | -11 | -10 | -9 | ... | -1 | 0
eg.
...food.. | $123.00 | $100.14 | ... | $120.13| $54.12
.clothes.| $555.23 | $232.23 | ... | $200.12| $84.44
where $123.00 for example is a sum of values of operations with category food, made 11 months ago, $100.14 - 10 months ago and so on - $54.12 is sum of current month, 555.23 => the same but category clothes...
I googled a lot, but most of examples are simple - without related class (category)
The correct answer after suggestion of Answer 1:
def get_month_sum_series(self):
import qsstats, datetime
from django.db.models import Sum
qss = qsstats.QuerySetStats(self.operation_set.all(), date_field='date', aggregate_field='value',aggregate_class=Sum)
today = datetime.date.today()
year_ago = today - datetime.timedelta(days=365)
return qss.time_series( start_date=year_ago, end_date=today, interval='months')
Take a look at django-qsstats. It has a time_series feature which will alow you to get whole series of data for all time in one request. In your case I'd create a method in Category, something like:
def price_series(self):
return qsstats.time_series(queryset=self.operation_set.all(), start_date=year_ago, end_date=now, interval='months')
Of course, you'll need to set up year_ago and now variables (for example, using datetime module functions).