Group by column Django orm - django

I have a database table users that consists of 4 columns id,name,country,state I will like to run an sql query like
SELECT country, state, name, COUNT(id) from users GROUP BY country, state, name
please how do i accomplish this using Django ORM

You can construct a QuerySet with:
from django.db.models import Count
qs = Users.objects.values('country', 'state', 'name').annotate(
count=Count('pk')
).order_by('country', 'state', 'name')
This will construct a queryset that is a collection of dictionaries, where each dictionary has four keys: country, state, name and count. The count is the total number of Users with the given name, state and country combination.

Related

Django Conditional update based on Foreign key values / joined fields

I'm trying to do a conditional update based on the value of a field on a foreign key. Example:
Model Kid: id, parent (a foreign key to Parent), has_rich_parent
Model Parent: id, income
So say I have a query set of A. I wanna update each item's has_guardian in A based on the value of age on the Kid's parent in one update. What I was trying to do is
queryset_of_kids.update(
has_rich_parent=Case(
When(parent__income__gte=10, then=True)
default=False
)
)
But this is giving me an error Joined field references are not permitted in this query. Which I am understanding it as joined fields / pursuing the foreignkey relationships aren't allowed in updates.
I'm wondering if there's any other way to accomplish the same thing, as in updating this queryset within one update call? My situation has a couple more fields that I'd like to verify instead of just income here so if I try to do filter then update, the number of calls will be linear to the number of arguments I'd like to filter/update.
Thanks in advance!
Here are the models that I assume you're using:
from django.db import models
class Kid(models.Model):
parent = models.ForeignKey('Parent', on_delete=models.CASCADE)
has_rich_parent = models.BooleanField(default=False)
class Parent(models.Model):
income = models.IntegerField()
You can use a Subquery to update the has_rich_parent field.
The subquery filters on the primary key pk of the surrounding query using .filter(pk=OuterRef('pk')).
It uses a Q query object to obtain whether the parent income is >= 10.
from .models import Kid, Parent
from django.db.models import Q, Subquery, OuterRef
Kid.objects.update(has_rich_parent=Subquery(
Kid.objects.filter(pk=OuterRef('pk'))
.values_list(Q(parent__income__gte=10))))
That command produces the following SQL query:
UPDATE "more_kids_kid"
SET "has_rich_parent" = (
SELECT (U1."income" >= 10) AS "q1"
FROM "more_kids_kid" U0
INNER JOIN "more_kids_parent" U1 ON (U0."parent_id" = U1."id")
WHERE U0."id" = ("more_kids_kid"."id")
)
This query isn't as efficient as a SELECT-then-UPDATE query. However, your database may be able to optimize it.

Django 2.0 - order a queryset by a field on the prefetch-related attribute

My goal is to run a query on one model but sort the results based on a field in another model fetched via prefetch_related.
Suppose I have two models:
class ModelA(models.Model):
...some fields...
class ModelB(models.Model):
...some fields...
model_a = models.ForeignKey(ModelA, db_column='id')
year = models.IntegerField()
I have tried:
ModelA.objects.filter(...).prefetch_related(
Prefetch(
'modelb_set',
queryset=ModelB.objects.filter().order_by('-year'),
to_attr="modelb_date"
)
).order_by('-modelb_date')
but this fails because modelb_date is not a field on ModelA, it's a list. What I want is to order the ModelA queryset according to the latest associated date field (from ModelB). That is, if instance One of ModelA has a modelb_date attribute = [x, y, z] where x.year = 2017 and instance Two of ModelA has a modelb_date attribute = [v, w] where v.year = 2018 then the query would order instance Two before One.
I am using Django 2.0, python 3.6, and Oracle 12c.
Can anybody help? Thanks!
If you want to order by some value of a related model, with a one-to-many relation, then you need some sort of way to first "fold" that related data: for example taking the minimum, maximum (and depending on the type of the data, sum, average, etc. might also be valid options).
If we for example want to sort the ModelAs depending on the minimum date of the related ModelB items, then we can use an .annotate(..) and .order_by(..) combination, like:
ModelA.objects.filter(...).prefetch_related(
Prefetch(
'modelb_set',
queryset=ModelB.objects.filter().order_by('-year'),
to_attr="modelb_date"
)
).annotate(
first_date=Min('modelb__date')
).order_by('-first_date')
So here we will sort the ModelAs with the first_date in descending order, and first_date, is the lowest date for a related ModelB.

Django filter by annotated field is too slow

I use DRF and I have model Motocycle, which has > 2000 objects in DB. Model has one brand. I want to search by full_name:
queryset = Motocycle.objects.prefetch_related(
"brand"
).annotate(
full_name=Concat(
'brand__title',
Value(' - '),
'title',
)
)
)
I want to filter by full_name, but query is running very slowly:
(1.156) SELECT "mp_api_motocycle"."id"...
Without filtering with pagination:
(3.980) SELECT "mp_api_motocycle"."id"...
There is some possibilty to make this query faster?
Keep your full_name annotation as a column in the database and add an index to it.
Otherwise, you are doing full table scan while calculating full_name and then sorting by it.

Join two queries in Django ORM

I have a Person model which has a birthday. I would like to create a query that returns all the persons information along with an additional field that tells how many people are sharing each person's birthday. In SQL I would write it like this:
SELECT p.name, b.count FROM
persons as p INNER JOIN
(SELECT birthday as date, COUNT(*) AS count FROM persons GROUP_BY birthday) AS b
WHERE p.birthday = b.date
With Django querysets I can do the inner select but I don't know how to do the inner join.
Seems tough to do with the ORM (though maybe possible with extra).
You could create a dict of counts by date (max 366 values, if ignoring year):
from django.db.models import Count
birthdate = lambda d: d.strftime("%m-%d")
# this runs the subquery in your SQL:
birthdays = Person.objects.values('birthday')
counts = birthdays.annotate(count=Count('birthday'))
counts_by_date = {
birthdate(r['birthday']): r['count']
for r in counts
}
for person in Person.objects.all():
count = counts_by_date[birthdate(person.birthday)]
print "%d people share your birthday!" % count
I would add a model method to get the count. You can do it using the ORM like this:
from django.db import models
class Person(models.Model):
birthdate = models.DateField()
def shared_count(self):
return Person.objects.filter(birthdate=self.birthdate).exclude(pk=self.id).count()
Then you can just access the count on the Person instance like this:
my_person = Person.objects.get(pk=12)
count = my_person.shared_count()
Or access it in a template like this:
{{ my_person.shared_count }}

Grouping by multiple columns in Django

I have a model with a few columns:
Model (id, country, city, name, age)
I need to group and sort by a few columns:
SELECT * FROM Model
WHERE country = 'USA'
GROUP BY id, city, age
ORDER BY id, city, age
and I'd like to query the same with Django ORM.
Model.objects.filter(country='USA').group_by('id', 'city', 'age').order_by('id', 'city', 'age')
But this call throws after group_by as it returns None.