Wrongly big numbers when use multiple of Sum, Count aggregations in annotate - django

I have these models:
User:
email = EmailField()
Payment:
user = ForeignKey(User)
sum = DecimalField()
GuestAccount:
user = ForeignKey(User)
guest = ForeignKey(User)
I want to get user emails, amount of money that came from every user
and number of its guests accounts.
My query:
User.objects.annotate(
money=Sum('payment__sum'),
guests_number=Count('guestaccount')
).values('email', 'money', 'guests_number')
But money and guests_number in the result of the query are bigger then they really are:
{'guests_number': 0, 'email': 'a#b.cd', 'money': None}
{'guests_number': 20, 'email': 'user1#mail.com', 'money': Decimal('6600.00')}
{'guests_number': 4, 'email': 'user1000#test.com', 'money': Decimal('2500.00')}
{'guests_number': 0, 'email': 'zzzz#bbbbb.com', 'money': None}
I noticed that I get correct data if I split the query into 2 separate queries:
User.objects.annotate(money=Sum('payment__sum')).values('email', 'money')
User.objects.annotate(guests_number=Count('guestaccount')).values('email', 'guests_number')
Correct result of 1st half:
{'email': 'a#b.cd', 'money': None}
{'email': 'user1#mail.com', 'money': Decimal('1650.00')}
{'email': 'user1000#test.com', 'money': Decimal('1250.00')}
{'email': 'zzzz#bbbbb.com', 'money': None}
Correct result of 2nd half:
{'email': 'a#b.cd', 'guests_number': 0}
{'email': 'user1#mail.com', 'guests_number': 4}
{'email': 'user1000#test.com', 'guests_number': 2}
{'email': 'zzzz#bbbbb.com', 'guests_number': 0}
Also I noticed that I can add distinct=True in Count aggregation:
User.objects.annotate(
money=Sum('payment__sum'),
guests_number=Count('guestaccount', distinct=True)
).values('email', 'money', 'guests_number')
It fixes guests_number:
{'guests_number': 0, 'email': 'a#b.cd', 'money': None}
{'guests_number': 4, 'email': 'user1#mail.com', 'money': Decimal('6600.00')}
{'guests_number': 2, 'email': 'user1000#test.com', 'money': Decimal('2500.00')}
{'guests_number': 0, 'email': 'zzzz#bbbbb.com', 'money': None}
Unfortunatly, there are no distinct parameter in Sum aggregation.
What is wrong with my query? How to fix these numbers getting bigger with every aggregation in annotate?

Raw SQL query investigation showed that the problem comes from multiple LEFT OUTER JOINs. So I ended up with raw SQL:
User.objects.extra(select={
"money": """
SELECT SUM("website_payment"."sum")
FROM "website_payment"
WHERE "website_user"."id" = "website_payment"."user_id"
""",
"guests_number": """
SELECT COUNT("guests_guestaccount"."id")
FROM "guests_guestaccount"
WHERE "website_user"."id" = "guests_guestaccount"."user_id"
""",
}
).values('email', 'money', 'guests_number')
But I need to annotate these fields into queried objects and extra don't do it.

Related

Get respectives values from Django annotate method

I have the following query:
result = data.values('collaborator').annotate(amount=Count('cc'))
top = result.order_by('-amount')[:3]
This one, get the collaborator field from data, data is a Django Queryset, i am trying to make like a GROUP BY query, and it's functional, but when i call the .values() method on the top variable, it's returning all the models instances as dicts into a queryset, i need the annotate method result as a list of dicts:
The following is the top variable content on shell:
<QuerySet [{'collaborator': '1092788966', 'amount': 20}, {'collaborator': '1083692812', 'amount': 20}, {'collaborator': '1083572767', 'amount': 20}]>
But when i make list(top.values()) i get the following result:
[{'name': 'Alyse Caffin', 'cc': '1043346592', 'location': 'Wu’an', 'gender': 'MASCULINO', 'voting_place': 'Corporación Educativa American School Barranquilla', 'table_number': '6', 'status': 'ESPERADO', 'amount': 1}, {'name': 'Barthel Hanlin', 'cc': '1043238706', 'location': 'General Santos', 'gender': 'MASCULINO', 'voting_place': 'Colegio San José – Compañía de Jesús Barranquilla', 'table_number': '10', 'status': 'PENDIENTE', 'amount': 1}, {'name': 'Harv Gertz', 'cc': '1043550513', 'location': 'Makueni', 'gender': 'FEMENINO', 'voting_place': 'Corporación Educativa American School Barranquilla', 'table_number': '7', 'status': 'ESPERADO', 'amount': 1}]
I just want the result to be like:
[{'collaborator': '1092788966', 'amount': 20}, {'collaborator': '1083692812', 'amount': 20}, {'collaborator': '1083572767', 'amount': 20}]
there is something wrong, maybe a typo (also it seems you do not show the full query... something like data=yourmodel.objects.filter... is missing before):
The output of list(top.values()) returns a completely different model's fields then what you post as top Queryset- are you sure you really did:
result = data.values('collaborator').annotate(amount=Count('cc'))
top = result.order_by('-amount')[:3]
list(top.values())
because it should deliver what you expect (provided that data is a Queryset)

Django - Query count of each distinct status

I have a model Model that has Model.status field. The status field can be of value draft, active or cancelled.
Is it possible to get a count of all objects based on their status? I would prefer to do that in one query instead of this:
Model.objects.filter(status='draft').count()
Model.objects.filter(status='active').count()
Model.objects.filter(status='cancelled').count()
I think that aggregate could help.
Yes, you can work with:
from django.db.models import Count
Model.objects.values('status').annotate(
count=Count('pk')
).order_by('count')
This will return a QuerSet of dictionaries:
<QuerySet [
{'status': 'active', 'count': 25 },
{'status': 'cancelled', 'count': 14 },
{'status': 'draft', 'count': 13 }
]>
This will however not list statuses for which no Model is present in the database.
Or you can make use of an aggregate with filter=:
from django.db.models import Count, Q
Model.objects.aggregate(
nactive=Count('pk', filter=Q(status='active')),
ncancelled=Count('pk', filter=Q(status='cancelled')),
ndraft=Count('pk', filter=Q(status='draft'))
)
This will return a dictionary:
{
'nactive': 25,
'ncancelled': 25,
'ndraft': 13
}
items for which it can not find a Model will be returned as None.

Flask: Save session data with redirect

I am using sessions to store products info to shopping cart. Now that works, however when I add it to the session and then redirect to the cart view, the newly added product id is not saved.
class CartAddProduct(View):
def dispatch_request(self, product_id):
print("*** Product id to add: ", product_id)
if not 'cart' in session:
session['cart'] = []
product_info = {}
product_info['product_id'] = str(product_id)
product_info['amount'] = 1
print("going to append: ", product_info)
print("before: ", session['cart'])
session['cart'].append(product_info)
print("after: ", session['cart'])
return redirect(url_for('cart'))
Response:
*** Product id to add: 5
going to append: {'product_id': '5', 'amount': 1}
before: [{'amount': 1, 'product_id': '9'}]
after: [{'amount': 1, 'product_id': '9'}, {'product_id': '5', 'amount': 1}]
But then after I redirect to "cart" and I print it, its unchanged. The session does not pass with the redirect?
[{'amount': 1, 'product_id': '9'}]
I forgot to add the session.modified = True
https://flask.palletsprojects.com/en/1.1.x/api/#flask.session
Be advised that modifications on mutable structures are not picked up
automatically, in that situation you have to explicitly set the
attribute to True yourself.
So objects of built-in types like (list, set, dict) are mutable.

Django query — how to get list of dictionaries with M2M relation?

Let's say, I have this simple application with two models — Tag and SomeModel
class Tag(models.Model):
text = ...
class SomeModel(models.Model):
tags = models.ManyToManyField(Tag, related_name='tags')
And I want to get something like this from database:
[{'id': 1, 'tags': [1, 4, 8, 10]}, {'id': 6, 'tags': []}, {'id': 8, 'tags': [1, 2]}]
It is list of several SomeModel's dictionaries with SomeModel's id and ids of tags.
What should the Django query looks like? I tried this:
>>> SomeModel.objects.values('id', 'tags').filter(pk__in=[1,6,8])
[{'id': 1, 'tags': 1}, {'id': 1, 'tags': 4}, {'id': 1, 'tags': 8}, ...]
This is not what I want, so I tried something like this:
>>> SomeModel.objects.values_list('id', 'tags').filter(pk__in=[1,6,8])
[(1, 1), (1, 4), (1, 8), ...]
And my last try was:
>>> SomeModel.objects.values_list('id', 'tags', flat=True).filter(pk__in=[1,6,8])
...
TypeError: 'flat' is not valid when values_list is called with more than one field.
—
Maybe Django cannot do this, so the most similar result to what I want is:
[{'id': 1, 'tags': 1}, {'id': 1, 'tags': 4}, {'id': 1, 'tags': 8}, ...]
Is there any Python build-in method which transform it to this?
[{'id': 1, 'tags': [1, 4, 8, 10]}, {'id': 6, 'tags': []}, {'id': 8, 'tags': [1, 2]}]
— EDIT:
If I write method in SomeModel:
class SomeModel(models.Model):
tags = models.ManyToManyField(Tag, related_name='tags')
def get_tag_ids(self):
aid = []
for a in self.answers.all():
aid.append(a.id)
return aid
And then call:
>>> sm = SomeModel.objects.only('id', 'tags').filter(pk__in=[1,6,8])
# Hit database
>>> for s in sm:
... s.get_tag_ids()
...
>>> # Hit database 3 times.
This is not working, because it access to database 4 times. I need just one access.
As ArgsKwargs mentioned here in comments — I write my own code, which packs the list:
>>> sm = SomeModel.objects.values('id', 'tags').filter(pk__in=[1,6,8])
>>> a = {}
>>> for s in sm:
... if s['id'] not in a:
... a[s['id']] = [s['tags'],]
... else:
... a[s['id']].append(s['tags'])
...
The output of this code is exactly what I need, and it hit database only once. But it is not very elegant, I don't like this code :)
Btw. is better use pk or id in queries? .values('id', 'tags') or .values('pk', 'tags')?
What about a custom method on the model that returns a list of all tags
class Tag(models.Model):
text = ...
class SomeModel(models.Model):
tags = models.ManyToManyField(Tag, related_name='tags')
def all_tags(self):
return self.tags.values_list('pk',flat=True)
and then
SomeModel.objects.values('id', 'all_tags').filter(pk__in=[1,6,8])

Django database caching

The object user has a foreign key relationship to address. Is there a difference between samples 1 and 2? Does sample 1 run the query multiple times? Or is the address object cached?
# Sample 1
country = user.address.country
city = user.address.city
state = user.address.state
# Sample 2
address = user.address
country = address.country
city = address.city
state = address.state
The address object is indeed cached. You can see this if you print the contents of user.__dict__ before and after accessing user.address. For example:
>>> user.__dict__
{'date_joined': datetime.datetime(2010, 4, 1, 12, 31, 59),
'email': u'user#test.com',
'first_name': u'myfirstname',
'id': 1L,
'is_active': 1,
'is_staff': 1,
'is_superuser': 1,
'last_login': datetime.datetime(2010, 4, 1, 12, 31, 59),
'last_name': u'mylastname',
'password': u'sha1$...$...',
'username': u'myusername'}
>>> country = user.address.country
>>> user.__dict__
{'_address': <myapp.models.address object at 0xwherever,
'email': u'user#test.com',
...etc}
So the user object gains a _address object which is used for subsequent lookups on the related object.
You can use select_related() when you first get the user to pre-populate this cache even before accessing address, so you only hit the database once.