How to group by two columns on queryset in Django2?

How to group by two columns on queryset in Django2? - django

I am getting little confused about how to use .annotate on quesryset.
To be quick: I have a model:
class Row(models.Model):
order = models.ForeignKey('order.Header', blank=True, null=True)
qty = models.IntegerField(blank=True, null=True, default=0)
name = models.CharField(default='', blank=True, null=True)
total = models.DecimalField(max_digits=10, decimal_places=2,default=0, blank=True, null=True)
profit = models.DecimalField(max_digits=10,decimal_places=2,default=0, blank=True, null=True)
profit_percent = models.DecimalField(max_digits=6,decimal_places=2,default=0, blank=True, null=True)
month_sold = models.IntegerField(default=0)
month_painted = models.IntegerField(default=0)
area_painted_1 = models.DecimalField(max_digits=5,decimal_places=2,default=0, blank=True, null=True)
area_painted_2 = models.DecimalField(max_digits=5,decimal_places=2,default=0, blank=True, null=True)
What I need to do is to create a kind of a summary, that will tell me month by month, a sum of Total, Profit Avg of profit, and also a sum of the painted area.
Something like that:
+-------+-------+--------+----------+--------+--------+
| month | Total | Profit | Profit % | area_1 | area_2 |
+-------+-------+--------+----------+--------+--------+
| 0 | 23000 | 3000 | 13% | 55 | 12 |
| Jan | 10000 | 1000 | 10% | 43 | 44 |
| April | 20000 | 1000 | 5% | 99 | 134 |
+-------+-------+--------+----------+--------+--------+
I tried to achieve that with .annotate:
result = Row.objects.values('month_sold') \
.annotate(total=Sum('total')+1) \
.annotate(profit=Sum('profit'))
.annotate(profit_percent=Round(F('profit')/F('total')*100, 2))
.annotate(area_2=Sum('area_painted_2'))
.annotate(area_1=Sum('area_painted_1'))
.values('month_sold', 'total', 'profit', 'profit_percent',
'area_1', 'area_2')
.order_by('moth_sold')
But obviously, it groups by month_sold. So total, profit values are good, but I don't know how to get area_1 and _2 by month_painted.
Any indications or ideas how can I solve it?

I'm not sure I've got you right. In your table "Something like that", do you want month to refer to different fields in your model (either month_sold or month_painted) depending on what aggregate you're looking at? So for Total and Profit, it's month_sold, and for area_1 and area_2 it's month_painted?
If that's the case, you're not going to achieve it with one single query. In raw SQL, you could join the table with itself on month_sold = month_painted; in Djano's ORM, I believe you'd need subqueries for each aggregate that is not grouped on the month type of the main query. For instance:
sq1 = (
Row.objects
.filter(month_painted=OuterRef('month_sold'))
.values('month_painted')
.annotate(area_1=Sum('area_painted_1'))
.values('area_1')
)
sq2 = (
Row.objects
.filter(month_painted=OuterRef('month_sold'))
.values('month_painted')
.annotate(area_2=Sum('area_painted_2'))
.values('area_2')
)
result = (
Row.objects
.values('month_sold')
.annotate(total=Sum('total')+1)
.annotate(profit=Sum('profit'))
.annotate(profit_percent=Round(F('profit')/F('total')*100, 2))
.annotate(area_1=Subquery(sq1, output_field=models.IntegerField()))
.annotate(area_2=Subquery(sq2, output_field=models.IntegerField()))
.values('month_sold', 'total', 'profit', 'profit_percent',
'area_1', 'area_2')
.order_by('month_sold')
)
Which month fields (month_sold or month_painted) the main query and the subqueries are base based on depends on which month type you want to be the outer part of the outer join, ie. which month type you want to include even if there are no corresponding values for the other month type. To include both (= FULL OUTER JOIN) using the ORM, you'd first have to get a list of all months (whether painted or sold), and then pull in the other columns as individual subqueries.

Related

django - improve performance of __in queryset in M2M filtering

I have a models that has a M2M relationship to another model.
These are my models:
class Catalogue(models.Model):
city = models.CharField(db_index=True,max_length=100, null=True)
district = models.CharField(db_index=True,max_length=100, null=True)
type = models.ManyToManyField(Type, db_index=True)
datetime = models.CharField(db_index=True, max_length=100, null=True)
class Type(models.Model):
name = models.CharField(max_length=100)
def __str__(self):
return self.name
And this is views.py:
class all_ads(generic.ListView):
paginate_by = 12
template_name = 'new_list_view_grid-card.html'
def get_queryset(self):
city_district = self.request.GET.getlist('city_district')
usage = self.request.GET.get('usage')
status = self.request.GET.get('status')
last2week = datetime.datetime.now() - datetime.timedelta(days=14)
status = status.split(',')
if usage:
usage = usage.split(',')
else:
usage = ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31']
intersections = list(set(status).intersection(usage))
type_q = (Q(type__in=intersections) & Q(type__isnull=False))
result = models.Catalogue.objects.filter(
Q(datetime__gte=last2week) &
type_q &
((reduce(operator.or_, (Q(city__contains=x) for x in city_district)) & Q(city__isnull=False)) |
(reduce(operator.or_, (Q(district__contains=x) for x in city_district)) & Q(district__isnull=False)))
).distinct().order_by('-datetime').prefetch_related('type')
return result
I want to filter MySQL db with some queries and return result in a listview.
It works good on a small database, but with large database it takes over 10 seconds to return results. If I delete type_q query, It takes 2 seconds (reduce 10 second!).
How can I improve performance of __in queryset?

It looks like type_q itself is not really the culprit, but acts as a multiplier, since now we make a LEFT OUTER JOIN, and thus the __contains runs over all combinations. This is thus more a peculiarity of two filters that work together
We can omit this with:
cat_ids = list(Catalogue.objects.filter(
Q(*[Q(city__contains=x) for x in city_district], _connector=Q.OR) |
Q(*[Q(district__contains=x) for x in city_district], _connector=Q.OR)
).values_list('pk', flat=True))
result = models.Catalogue.objects.filter(
Q(datetime__gte=last2week),
type_q,
pk__in=cat_ids
).distinct().order_by('-datetime').prefetch_related('type')
Some database (MySQL is known to not optimize a subquery very well), can even do that with a subquery with. So here we do not materialize the list, but let Django work with a subquery:
cat_ids = Catalogue.objects.filter(
Q(*[Q(city__contains=x) for x in city_district], _connector=Q.OR) |
Q(*[Q(district__contains=x) for x in city_district], _connector=Q.OR)
).values_list('pk', flat=True)
result = models.Catalogue.objects.filter(
Q(datetime__gte=last2week),
type_q,
pk__in=cat_ids
).distinct().order_by('-datetime').prefetch_related('type')

Can Django queryset generate a SQL statement with self join?

I have a table that relates parents to children, it has the following data:
+-----+-----+-----+--------+
| pid | rel | cid | relcat |
+-----+-----+-----+--------+
| 13 | F | 216 | 1 |
| 13 | F | 229 | 1 |
| 13 | f | 328 | 2 |
| 13 | F | 508 | 1 |
| 13 | F | 599 | 1 |
| 13 | f | 702 | 2 |
| 560 | M | 229 | 1 |
| 560 | m | 702 | 2 |
+-----+-----+-----+--------+
I can find brothers of 229 by joining npr table to itself with SQL:
SELECT npr_a.cid,
CASE (SUM(IF(npr_a.relcat=1 AND npr_b.relcat=1,1,0))) WHEN 2 THEN '~FB~' WHEN 1 THEN '~HB~' ELSE '~Foster~' END AS BrotherType,
abs(person_details.isalive) as isalive
FROM person_details,
npr npr_a,
npr npr_b
WHERE ( npr_b.cid = 229) AND
( npr_a.pid = npr_b.pid ) AND
( npr_a.cid <> 229) AND
( npr_b.relcat <> 3 ) AND
( npr_a.relcat <> 3 ) AND
( person_details.id = npr_a.cid )
GROUP BY npr_a.cid;
to get:
+-----+-------------+---------+
| cid | BrotherType | isalive |
+-----+-------------+---------+
| 216 | ~HB~ | 1 |
| 328 | ~Foster~ | 0 |
| 508 | ~HB~ | 0 |
| 599 | ~HB~ | 0 |
| 702 | ~Foster~ | 1 |
+-----+-------------+---------+
I tried many ways to get it using Django queryset but all failed to get the correct results. the best thing I was able to get is:
idp = Npr.objects.filter(cid=229).values_list('pid', flat=True)
idc = Npr.objects.filter(pid__in=idp).exclude(cid=229)
but that solution was not able to generate the BrotherType field.
My models:
class PersonDetails(models.Model):
id = models.AutoField(db_column='ID', primary_key=True)
name= models.CharField(db_column='Name', max_length=20, blank=True, null=True)
isalive = models.BooleanField(db_column='isAlive')
class Meta:
managed = False
db_table = 'person_details'
class Npr(models.Model):
rid = models.AutoField(db_column='rid', primary_key=True)
pid = models.ForeignKey(PersonDetails, on_delete=models.CASCADE, db_column='PID')
rel = models.CharField(max_length=1)
cid = models.ForeignKey(PersonDetails, on_delete=models.CASCADE, db_column='CID')
relcat = models.PositiveIntegerField()
class Meta:
managed = False
db_table = 'npr'
unique_together = (('pid', 'cid'),)
using Django Version: 3.0.4 Python version: 3.7.3 Database: 10.3.22-MariaDB-0+deb10u1-log
Any suggestion to build the required queryset?

luckily I found a solution to my problem.
I created a new model pointing to the same database table
class nNpr(models.Model):
rid = models.AutoField(db_column='rid', primary_key=True)
pid = models.ForeignKey(Npr, on_delete=models.CASCADE, db_column='PID', to_field='pid', related_name='rel_pid')
rel = models.CharField(max_length=1)
cid = models.PositiveSmallIntegerField(db_column='CID')
relcat = models.PositiveIntegerField()
class Meta:
managed = False
db_table = 'npr'
unique_together = (('pid', 'cid'),)
and modified the original model to suppress warning that Django generates when you join a foreign key field to a non unique field, the modified model is:
class Npr(models.Model):
rid = models.AutoField(db_column='rid', primary_key=True)
pid = models.OneToOneField(PersonDetails, on_delete=models.CASCADE, , unique=True, db_column='PID', related_name='rel_pid')
rel = models.CharField(max_length=1)
cid = models.ForeignKey(PersonDetails, on_delete=models.CASCADE, db_column='CID', related_name='rel_cid')
relcat = models.PositiveIntegerField()
class Meta:
managed = False
db_table = 'npr'
unique_together = (('pid', 'cid'),)
the last thing is the query set:
brothers = nNpr.objects.select_related('pid').filter(cid = 229).exclude(Q(relcat = 3)\
| Q(pid__cid = 229) | Q(pid__relcat = 3 )).values('pid__cid', 'pid__cid__name',\
'pid__cid__isalive').annotate(cprel=Sum('relcat'), pcrel=Sum('pid__relcat'))
and the resulting SQL is:
SELECT T2.`CID`, `person_details`.`Name`, `person_details`.`isAlive`, SUM(`npr`.`relcat`) AS `cprel`\
, SUM(T2.`relcat`) AS `pcrel` FROM `npr` INNER JOIN `npr` T2 ON (`npr`.`PID` = T2.`PID`) INNER\
JOIN `person_details` ON (T2.`CID` = `person_details`.`ID`) WHERE (`npr`.`CID` = 229 AND NOT \
((`npr`.`relcat` = 3 OR T2.`CID` = 198 OR T2.`relcat` = 229))) GROUP BY T2.`CID`,\
`person_details`.`Name`, `person_details`.`isAlive` ORDER BY NULL

I found another better solution that doesn't need to create a new model pointing to the same database table. The solution was cascading the required relations recursively on one model:
brothers = Npr.objects.filter(pid__pid__cid = 229, relcat__lt = 3,
pid__pid__relcat__lt = 3).values('cid', 'cid__name')

Two forms in one model. Combining values for table databases

I have simple model in django that looks like :
class TimeTable(models.Model):
title = models.CharField(max_length=100)
start_time= models.CharField(choices=MY_CHOICES, max_length=10)
end_time = models.CharField(choices=MY_CHOICES, max_length=10)
day1 = models.BooleanField(default=False)
day2 = models.BooleanField(default=False)
day3 = models.BooleanField(default=False)
day4 = models.BooleanField(default=False)
day5 = models.BooleanField(default=False)
day6 = models.BooleanField(default=False)
day7 = models.BooleanField(default=False)
for this model I have 2 forms :
class TimeTableForm(ModelForm):
class Meta:
model = TimeTable
fields = ['title ', 'start_time', 'end_time']
class WeekDayForm(ModelForm):
class Meta:
model = TimeTable
fields = ['day1', 'day2', 'day3', 'day4', 'day5', 'day6', 'day7']
Now In views.py I need to save this values into database
def schedule(request):
if request.method == 'POST':
form = TimeTableForm(request.POST)
day_week = WeekDayForm(request.POST)
if all([form.is_valid(), day_week.is_valid()]):
form.save()
day_week.save()
I'm new in Django so I was thinking that this values will be combined into one and I will get proper data but for every one submit I get two separated objects in database for example
title | 8:00 | 10:00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| | | 0 | 1 | 1 | 1 | 1 | 0 | 0 |
but I should get
title | 8:00 | 10:00 | 0 | 1 | 1 | 1 | 1 | 0 | 0 |
As you see this two forms values are separated in database (probably for person who have more much common with django it is obvious) there is an option to combine this two into one ?

OR definition of filters when using relations in django filter

I have three models with a simple relation as below:
models.py
class Person(models.Model):
first_name = models.CharField(max_length=20)
last_name = models.CharField(max_length=20)
class PersonSession(models.Model):
start_time = models.DateTimeField(auto_now_add=True)
end_time = models.DateTimeField(null=True,
blank=True)
person = models.ForeignKey(Person, related_name='sessions')
class Billing(models.Model):
DEBT = 'DE'
BALANCED = 'BA'
CREDIT = 'CR'
session = models.OneToOneField(PersonSession,
blank=False,
null=False,
related_name='billing')
STATUS = ((BALANCED, 'Balanced'),
(DEBT, 'Debt'),
(CREDIT, 'Credit'))
status = models.CharField(max_length=2,
choices=STATUS,
blank=False,
default=BALANCED
)
views.py
class PersonFilter(django_filters.FilterSet):
start_time = django_filters.DateFromToRangeFilter(name='sessions__start_time',
distinct=True)
billing_status = django_filters.ChoiceFilter(name='sessions__billing__status',
choices=Billing.STATUS,
distinct=True)
class Meta:
model = Person
fields = ('first_name', 'last_name')
class PersonList(generics.ListCreateAPIView):
queryset = Person.objects.all()
serializer_class = PersonSerializer
filter_backends = (django_filters.rest_framework.DjangoFilterBackend)
filter_class = PersonFilter
I want to get billings from person endpoint which have DE status in billing and are between a period of time:
api/persons?start_time_0=2018-03-20&start_time_1=2018-03-23&billing_status=DE
But the result is not what I were looking for, this returns all persons has a session in that period and has a billing with the DE status, whether that billing is on the period or not.
In other words, it seems use or operation between two filter fields, I think this post is related to this issue but currently I could not find a way to get the result I want. I am using djang 1.10.3.
Edit
I try to write an example to show what I need and what I get from django filter. If I get persons using below query in the example, I got just two person:
select *
from
test_filter_person join test_filter_personsession on test_filter_person.id=test_filter_personsession.person_id join test_filter_billing on test_filter_personsession.id=test_filter_billing.session_id
where
start_time > '2000-02-01' and start_time < '2000-03-01' and status='DE';
Which gets me just person 1 and 2. But if I get somethings expected similar from url I would get all of persons, the similar url (at least one which I expected to be the same) is as below:
http://address/persons?start_time_0=2000-02-01&start_time_1=2000-03-01&billing_status=DE
Edit2
This is the data that my queries in the example are upon and using them you can see what must returns in queries that I mentioned above:
id | first_name | last_name | id | start_time | end_time | person_id | id | status | session_id
----+------------+-----------+----+---------------------------+---------------------------+-----------+----+--------+------------
0 | person | 0 | 0 | 2000-01-01 16:32:00+03:30 | 2000-01-01 17:32:00+03:30 | 0 | 0 | DE | 0
0 | person | 0 | 1 | 2000-02-01 16:32:00+03:30 | 2000-02-01 17:32:00+03:30 | 0 | 1 | BA | 1
0 | person | 0 | 2 | 2000-03-01 16:32:00+03:30 | 2000-03-01 17:32:00+03:30 | 0 | 2 | DE | 2
1 | person | 1 | 3 | 2000-01-01 16:32:00+03:30 | 2000-01-01 17:32:00+03:30 | 1 | 3 | BA | 3
1 | person | 1 | 4 | 2000-02-01 16:32:00+03:30 | 2000-02-01 17:32:00+03:30 | 1 | 4 | DE | 4
1 | person | 1 | 5 | 2000-03-01 16:32:00+03:30 | 2000-03-01 17:32:00+03:30 | 1 | 5 | DE | 5
2 | person | 2 | 6 | 2000-01-01 16:32:00+03:30 | 2000-01-01 17:32:00+03:30 | 2 | 6 | DE | 6
2 | person | 2 | 7 | 2000-02-01 16:32:00+03:30 | 2000-02-01 17:32:00+03:30 | 2 | 7 | DE | 7
2 | person | 2 | 8 | 2000-03-01 16:32:00+03:30 | 2000-03-01 17:32:00+03:30 | 2 | 8 | BA | 8
Edit3
I try using prefetch_related to join tables and get results as I expected because I thought that extra join causes this problem but this did not work and I still get the same result and this had not any effects.
Edit4
This issue has the same problem.

I don't have a solution yet; but I thought a concise summary of the problem will set more and better minds than mine at work!
From what I understand; your core issue is a result of two pre-conditions:
The fact that you have two discrete filters defined on a related model; resulting in filter spanning-multi-valued-relationships
The way FilterSet implements filtering
Let us look at these in more detail:
filter spanning-multi-valued-relationships
This is a great resource to understand issue pre-condition #1 better:
https://docs.djangoproject.com/en/2.0/topics/db/queries/#spanning-multi-valued-relationships
Essentially, the start_time filter adds a .filter(sessions__start_time=value) to your Queryset, and the billing_status filter adds a .filter(sessions_billing_status=value) to the filter. This results in the "spanning-multi-valued-relationships" issue described above, meaning it will do an OR between these filters instead of an AND as you require it to.
This got me thinking, why don't we see the same issue in the start_time filter; but the trick here is that it is defined as a DateFromToRangeFilter; it internally uses a single filter query with the __range= construct. If instead it did sessions__start_time__gt= and sessions__start_time__lt=, we would have the same issue here.
The way FilterSet implements filtering
Talk is cheap; show me the code
#property
def qs(self):
if not hasattr(self, '_qs'):
if not self.is_bound:
self._qs = self.queryset.all()
return self._qs
if not self.form.is_valid():
if self.strict == STRICTNESS.RAISE_VALIDATION_ERROR:
raise forms.ValidationError(self.form.errors)
elif self.strict == STRICTNESS.RETURN_NO_RESULTS:
self._qs = self.queryset.none()
return self._qs
# else STRICTNESS.IGNORE... ignoring
# start with all the results and filter from there
qs = self.queryset.all()
for name, filter_ in six.iteritems(self.filters):
value = self.form.cleaned_data.get(name)
if value is not None: # valid & clean data
qs = filter_.filter(qs, value)
self._qs = qs
return self._qs
As you can see, the qs property is resolved by iterating over a list of Filter objects, passing the initial qs through each of them successively and returning the result. See qs = filter_.filter(qs, value)
Each Filter object here defines a specific def filter operation, that basically takes teh Queryset and then adds a successive .filter to it.
Here's an example from the BaseFilter class
def filter(self, qs, value):
if isinstance(value, Lookup):
lookup = six.text_type(value.lookup_type)
value = value.value
else:
lookup = self.lookup_expr
if value in EMPTY_VALUES:
return qs
if self.distinct:
qs = qs.distinct()
qs = self.get_method(qs)(**{'%s__%s' % (self.name, lookup): value})
return qs
The line of code that matters is: qs = self.get_method(qs)(**{'%s__%s' % (self.name, lookup): value})
So the two pre-conditions create the perfect storm for this issue.

This worked for me:
class FooFilterSet(FilterSet):
def filter_queryset(self, queryset):
"""
Overrides the basic methtod, so that instead of iterating over tthe queryset with multiple `.filter()`
calls, one for each filter, it accumulates the lookup expressions and applies them all in a single
`.filter()` call - to filter with an explicit "AND" in many to many relationships.
"""
filter_kwargs = {}
for name, value in self.form.cleaned_data.items():
if value not in EMPTY_VALUES:
lookup = '%s__%s' % (self.filters[name].field_name, self.filters[name].lookup_expr)
filter_kwargs.update({lookup:value})
queryset = queryset.filter(**filter_kwargs)
assert isinstance(queryset, models.QuerySet), \
"Expected '%s.%s' to return a QuerySet, but got a %s instead." \
% (type(self).__name__, name, type(queryset).__name__)
return queryset
Overriding the filter_queryset method so that it accumulates the expressions and applies them in a single .filter() call

complex sums divided by month

models:
class Category(models.Model):
name = models.CharField(max_length=100)
class Operation(models.Model):
date = models.DateField()
value = models.DecimalField(max_digits = 9, decimal_places = 2)
category = models.ForeignKey(Category, null = True)
comments = models.TextField(null = True)
Now I want to create a view, with 13 columns:
name of category | -11 | -10 | -9 | ... | -1 | 0
eg.
...food.. | $123.00 | $100.14 | ... | $120.13| $54.12
.clothes.| $555.23 | $232.23 | ... | $200.12| $84.44
where $123.00 for example is a sum of values of operations with category food, made 11 months ago, $100.14 - 10 months ago and so on - $54.12 is sum of current month, 555.23 => the same but category clothes...
I googled a lot, but most of examples are simple - without related class (category)
The correct answer after suggestion of Answer 1:
def get_month_sum_series(self):
import qsstats, datetime
from django.db.models import Sum
qss = qsstats.QuerySetStats(self.operation_set.all(), date_field='date', aggregate_field='value',aggregate_class=Sum)
today = datetime.date.today()
year_ago = today - datetime.timedelta(days=365)
return qss.time_series( start_date=year_ago, end_date=today, interval='months')

Take a look at django-qsstats. It has a time_series feature which will alow you to get whole series of data for all time in one request. In your case I'd create a method in Category, something like:
def price_series(self):
return qsstats.time_series(queryset=self.operation_set.all(), start_date=year_ago, end_date=now, interval='months')
Of course, you'll need to set up year_ago and now variables (for example, using datetime module functions).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to group by two columns on queryset in Django2? - django

Related

django - improve performance of __in queryset in M2M filtering

Can Django queryset generate a SQL statement with self join?

Two forms in one model. Combining values for table databases

OR definition of filters when using relations in django filter

complex sums divided by month

Categories

Resources