Count objects in queryset by value in a field. Django - django

Imagine I have a model which looks something like following:
class Car(models.Model):
TYPE_CHOICES = [
(1: 'Hatchback')
(2: 'Saloon')
type = models.CharField(choices=TYPE_CHOICES, ...)
color = models.CharField()
owner = models.ForeignKey(User, ...)
And I want to count objects by specific values. Say black saloons owned by Johns or white hatchbacks owned by Matts.
The best what I came up so far is:
Car.objects.annotate(
black_saloons_owned_by_Johns=Count(
'type',
filter=(
Q(type=2) &
Q(owner__first_name='John')
)
),
white_hatchbacks_owned_by_Matts=Count(
'type',
filter=(
Q(type=1) &
Q(owner__first_name='Matt')
)
)
).aggregate(
aggregated_black_saloons_owned_by_Johns=Sum(
'black_saloons_owned_by_Johns'
),
aggregated_white_hatchbacks_owned_by_Matts=Sum(
'white_hatchbacks_owned_by_Matts'
)
)
Is there a better way to get the desired result? Thanks.
Update.
As I said, I need to perform multiple lookups in a single query. The query I used had just one example. I updated it. Should have explicitly point it out. Sorry.

We can filter the queryset, and then use .count() [Django-doc]:
Car.objects.filter(type=2, owner__first_name='John').count()
or if you need to perform multiple lookups, you can use .aggregate(..) directly:
You can Count the Car objects directly with:
Car.objects.aggregate(
total_john=Count(
'pk', filter=Q(type=2, owner__first_name='John')
),
total_matts=Count(
'pk', filter=Q(type=1, owner__first_name='Matt')
)
)
this will then return a dictionary that has two keys: 'total_john' and 'total_matts', and these will contain the count of their number of Cars respectively.

Related

Add subquery on annotate with the model field value from the same model in Django

I have a model as follows:
class Job(models.Model):
id = models.UUIDField(default=uuid.uuid4, primary_key=True)
name = models.CharField(max_length=255)
parent_job = models.UUIDField(default=uuid.uuid4)
Now I need to get the name of the parent model in an annotate.
I have tried the following but to no success:
(
Job.objects.filter()
.annotate(
par_job_name=Subquery(
Job.objects.filter(id=OuterRef("parent_job")).first().name
)
)
.values("id", "par_job_name")
)
(
Job.objects.filter()
.annotate(
par_job_name=Subquery(
Job.objects.filter(id=F("parent_job")).first().name
)
)
.values("id", "par_job_name")
)
How can I get the par_job_name here?
Note: I know that using a foreignkey to self might be a good way to model here but this is existing code and I have to work with this for now. So I have to implement the solution for the existing code.
After going through the docs, I came to know that I was using OuterRef the wrong way. I had to first create a queryset and then pass it to Subquery so that it can be resolved by the outer queryset.
parent_job_qs = Job.objects.filter(id=OuterRef("parent_job"))
(
Job.objects.filter()
.annotate(
par_job_name=Subquery(
parent_job_qs.values('name')
)
)
.values("id", "par_job_name")
)
Note: I will still wait for a better answer for sometime. If I do not get any response then I will mark this as the correct answer.

Django is complex select query possible for this?

I have the following model used to store a bidirectional relationship between two users. The records are always inserted where the smaller user id is user_a while the larger user id is user_b.
Is there a way to retrieve all records belonging to a reference user and the correct value of the status (apply negative transformation to relationship_type if user_a) based on whether the reference user id is larger or smaller than the other user id?
Perhaps two separate queries, one where reference user = user_a and another where reference user = user_b, followed by a join?
class Relationship(models.Model):
RELATIONSHIP_CHOICES = (
(0, 'Blocked'),
(1, 'Allowed'),
(-2, 'Pending_A'),
(2, 'Pending_B'),
(-3, 'Blocked_A'),
(3, 'Blocked_B'),
)
user_a = models.ForeignKey(CustomUser, on_delete=models.SET_NULL, related_name='user_a',null=True)
user_b = models.ForeignKey(CustomUser, on_delete=models.SET_NULL, related_name='user_b',null=True)
relationship_type = models.SmallIntegerField(choices=RELATIONSHIP_CHOICES, default=0)
A SQL query of what I'm trying to achieve:
(SELECT user_b as user_select, -relationship_type as type_select WHERE user_a='reference_user') UNION (SELECT user_a as user_select, relationship_type as type_select WHERE user_b='reference_user')
Given you have the id of the user user_id, you can filter with:
from django.db.models import Q
Relationship.objects.filter(Q(user_a_id=user_id) | Q(user_b_id=user_id))
If you have a CustomUser object user, it is almost the same:
from django.db.models import Q
Relationship.objects.filter(Q(user_a=user) | Q(user_b=user))
If you are looking to obtain Relationships with a given type, we can do the following:
from django.db.models import Q
rel_type = 2 # example rel_type
Relationship.objects.filter(
Q(user_a=user, relationship_type=rel_type) |
Q(user_b=user, relationship_type=-rel_type)
)
Here we thus retrieve Relationship objects with user_a the given user and relationship_type=2, or Relationship objects with user_b the given user, and relationship_type=-2.
We could annotate the querysets, and then take the union, like:
qs1 = Relationship.objects.filter(
user_a=user, relationship_type=rel_type
).annotate(
user_select=F('user_b'),
rel_type=F('relationship_type')
)
qs2 = Relationship.objects.filter(
user_a=user, relationship_type=rel_type
).annotate(
user_select=F('user_a'),
rel_type=-F('relationship_type')
)
qs = qs1.union(qs2)
Although I do not know if that is a good idea: the annotations are not "writable" (so you can not update these).
It might be better to implement some sort of "proxy object" that can swap user_a and user_b, and negate the relationship type, and thus is able to act as if it is a real Relationship object.
As you said, id in user_a is always smaller than user_b. So if you query with user_b=user then you should always get the references where user_id in the reference is always higher than other user_id. So I think you can use following querysets:
user = CustomUser.objects.get(id=1)
user_a_references = Relationship.objects.filter(user_a=user)
user_b_references = Relationship.objects.filter(user_b=user)
all_relation_ships = user_a_reference.union(user_b_references)

QuerySet Optimisations in Django

I was just wondering, I have the following two pseudo-related queries:
organisation = Organisation.objects.get(pk=org_id)
employees = Employee.objects.filter(organisation=organisation).filter(is_active=True)
Each Employee has a ForeignKey relationship with Organisation.
I was wondering if there is anything I can leverage to do the above in one Query in the native Django ORM?
Also, would:
employees = Employee.objects.filter(organisation__id=organisation.id).filter(is_active=True)
Be a quicker way to fetch employees?
For Willem's reference, employees is then used as:
# Before constructing **parameters, it is neccessary to filter out any supurfluous key, value pair that do not correspond to model attributes:
if len(request.GET.getlist('gender[]')) > 0:
parameters['gender__in'] = request.GET.getlist('gender[]')
employees = employees.filter(**parameters)
if len(request.GET.getlist('age_group[]')) > 0:
parameters['age_group__in'] = request.GET.getlist('age_group[]')
employees = employees.filter(**parameters)
results = SurveyResult.objects.filter(
user__in=employees,
created_date__range=date_range,
).annotate(
date=TruncDate('created_date'),
).values(
'survey',
'date',
).annotate(
score=Sum('normalized_score'),
participants=Count('user'),
).order_by(
'survey',
'date',
)
I omitted this as it seemed like unnecessary complications to my original goal.
Also, would:
employees = Employee.objects.filter(organisation__id=organisation.id).filter(is_active=True)
Be a quicker way to fetch employees?
No, or perhaps marginally, since that is in essence what the Django ORM will do itself: it will simply obtain the primary key of the organisation and then make a query like the one you describe.
If you do not need the organisation itself, you can query with:
employees = Employee.objects.filter(organisation_id=org_pk, is_active=True)
Furthermore you can for example perform a .select_related(..) [Django-doc] on the organisation, to load the data of the organisation in the same query as the one of the employee, although reducing one extra query, usually does not make that much of a difference. Performance is more an issue if iut results in N+1 queries.
We can for example "piggyback" fetching the Organisation details with fetching the employees, like:
employees = list(
Employee.objects.select_related('organization').filter(
organisation_id=org_pk, is_active=True
)
)
if employees: # at least one employee
organization = employees[0].organization
But anyway, as said before the difference between one or two queries is not that much. It is usually more of a problem if you have N+1 queries. It is a bit of a pitty that Django/Python does not seem to have a Haxl [GitHub] equivalent, to enable fast retrieval of (remote) resources through algebraic analysis.
In case you are interested in the Employee servey results, you can query with:
results = SurveyResult.objects.filter(
user__organization_id=org_pk,
created_date__range=date_range,
).annotate(
date=TruncDate('created_date'),
).values(
'survey',
'date',
).annotate(
score=Sum('normalized_score'),
participants=Count('user'),
).order_by(
'survey',
'date',
)
You can thus omit a separate querying of Employees if you do not need these anyway.
You can furthermore add the filters to your query, like:
emp_filter = {}
genders = request.GET.getlist('gender[]')
if genders:
emp_filter['user__gender__in'] = genders
age_groups = request.GET.getlist('age_group[]')
if age_groups:
emp_filter['user__age_group__in'] = age_groups
results = SurveyResult.objects.filter(
user__organization_id=org_pk,
created_date__range=date_range,
**emp_filter
).annotate(
date=TruncDate('created_date'),
).values(
'survey',
'date',
).annotate(
score=Sum('normalized_score'),
participants=Count('user'),
).order_by(
'survey',
'date',
)
if you have a foreign key relation between organisation and employees then you can get the employees using the select_related like this:
employees = Employee.objects.selected_related('organisation').filter(is_active=True)
OR
organisation = Organisation.objects.get(pk=org_id)
employees =organisation.employee_set.all() #your_employee_model_name_set.all

Filter based on children_set count in Django

I am filteren a children set as
ParentModel.objects.all().prefetch_related(
Prefetch(
lookup='childrenmodel_set',
queryset=ChildrenModel.objects.exclude(is_active=False)
)
)
Now each ParentModel object can have a childrenmodel_set but some of these querysets are empty.
How can I exclude those ParentModel objects which has no childrens?
I've thought of:
ParentModel.objects.all().prefetch_related(
Prefetch(
lookup='childrenmodel_set',
queryset=ChildrenModel.objects.exclude(is_active=False)
)
).exclude(childrenmodel_set=None)
or
ParentModel.objects.all().prefetch_related(
Prefetch(
lookup='childrenmodel_set',
queryset=ChildrenModel.objects.exclude(is_active=False)
).aggregate(num_objects=Count(id))
).exclude(num_objects=0)
or
ParentModel.objects.all().prefetch_related(
Prefetch(
lookup='childrenmodel_set',
queryset=ChildrenModel.objects.exclude(is_active=False)
)
).annotate(childrenset_size=Count(childrenset)).exclude(childrenset_size=0)
I could of course check this in the template but I want to do it on database level.
Edit:
Now my code is
self.model.objects.prefetch_related(
Prefetch(
lookup='periods',
queryset=Period.objects.exclude(is_active=False)
)
).exclude(periods__isnull=True)
It removes those objects which has no periods. But if an object has only non-active periods it is still represented in the queryset. How can I make the exclude take the prefetching into account?
Edit 2
My models
class Article:
name = CharField(max_length=100)
class Period:
article = ForeignKey(Article)
is_active = BooleanField(default=True)
ParentModel.objects.exclude(childrenmodel_set__isnull=True)

How do I set a Many-to-Many relation for predefined chocies in Django models?

I have a WorderOrder class that has predefined work order types:
class WorkOrder( models.Model ) :
WORK_TYPE_CHOICES = (
( 'hc', 'Heating and cooling' ),
( 'el', 'Electrical' ),
( 'pl', 'Plumbing' ),
( 'ap', 'Appliances' ),
( 'pe', 'Pests' ),
( 'ex', 'Exterior' ),
( 'in', 'Interior' ),
( 'ot', 'Others' ),
)
work_type = models.CharField( max_length = 2, choices = WORK_TYPE_CHOICES )
vendor = models.ForeignKey( Vendor, null = True, blank = True )
Therefore each order must have one work order type. Later down the road, a vendor can also be assigned to a work order.
I want the Vendor class to have a M2M relationship to the same work order choices in the WorkOrder class. In other words, each vendor are able to do one or many work types. For example, Bob's Plumbing can only do "Plumbing", whereas Solid Home Repair can do "Electrical", "Plumbing", "Exterior", and "Interior".
I understand I can create another table called WorkType and use foreign keys from WorkOrder and a M2M from Vendor, but since I feel I won't be changing the work type choices, I would rather have them predefined in models.py.
Also if I can predefine it in models.py, then I don't have to pre-populate the table WorkType during deployments and upgrades.
Some options for you:
create a model for work_type_choices, instantiate the records (hc, el, etc), then use a manytomany field, or
create a charfield and save CSV values to it (eg: "hc, el"), spliting/joining the value into it's elements as required, or
encapsule the above charfield and functions into a custom field and use that
leverage someone else's snippet, eg:
http://djangosnippets.org/snippets/1200/