Django group by a pair of values - django

I would want to do a group by with a pair of values in django.
Consider I have the following model,
class Vehicle:
vehicle_id = models.PositiveIntegerField()
version_id = models.PositiveIntegerField(default=1)
name=models.CharField(max_length=8096)
description=models.CharField(max_length=8096)
vehicle_id is not the primary key since there can be multiple rows in the table with the same vehicle_id but with different version_id
Now I would want to get the latest versions of all the vehicles.
select * from (
select vehicle_id, MAX(version_id) as MaxVersion from Vehicle group by vehicle_id
) s1
JOIN Vehicle s2
on s2.vehicle_id = s1.vehicle_id AND s2.version_id=s1.MaxVersion;
Tried to represent this in Django ORM like below,
feed_max_version = (
self
.values("vehicle_id")
.annotate(max_version=Max("version_id"))
.order_by("vehicle_id")
)
q_statement = Q()
for pair in feed_max_version:
q_statement |= Q(vehicle_id__exact=pair["vehicle_id"]) & Q(
version_id=pair["max_version"]
)
return self.filter(q_statement)
But this seems less efficient and takes long time to load. I am not very keen on having raw SQL since I would not be able to add any more queryset methods on top of it.

Related

How to filter on a foreign key that is grouped?

Model:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=100)
class Result(models.Model):
person = models.ForeignKey(Person, on_delete=models.CASCADE)
outcome = models.IntegerField()
time = models.DateTimeField()
Sql:
select * from person as p
inner join (
select person_id, max(time) as max_time, outcome from result
group by person_id
) r on p.id = r.person_id
where r.result in (2, 3)
I'm wanting to get the all person records where the last result outcome was either a 2 or 3. I added the raw sql above to further explain.
I looked at using a subquery to filter person records that have a matching result id
sub_query = Result.objects.values("person_id").annotate(max_time=Max("time"))
however using values strips out the other fields.
Ideally I'd be able to do this in one person queryset but I don't think that is the case.
The below query exclude all persons whose marks not equal to 2 OR 3, then it sorts the results in time descending order (latest will be on top) and finally get the details for person ...
from django.db.models import Q
results = Results.objects.filter(Q(outcome=3) | Q(outcome=2)).order_by('-time').values('person')
As a person may have multiple result records and I only want to check the last record, A subquery was the only way I could find to do this
last_result = Subquery(
Result.objects.filter(person_id=OuterRef("pk")).order_by("-time").values("result")[:1]
)
people = Person.objects.all().annotate(max_time=Max("result__time"), current_result=last_result).filter(current_result__in=[2,3)
First I create a sub query that will return the last result record. Then I add this as a field in the people query so that I can filter on that for only results with 2 or 3.
This was it will only return person records where the current result is a 2 or 3.

How to create an annotation in Django that references two related models

I'm trying to add an annotation to a QuerySet that is True/False when the value of a field on one related object is less than the value of a field on a different related object.
Here are some models for an example:
class RobotManager(models.Manager):
queryset = super(RobotManager, self).get_queryset()
queryset = queryset.annotate(canteen_empty=UNKNOWN CODE)
return queryset
class Robot(models.Model):
# Has some other unrelated stuff
objects = RobotManager()
class CanteenLevel(models.Model):
time = models.DateTimeField()
robot = models.ForeignKey("SomeApp.Robot")
gallons = models.IntegerField()
class RobotConfiguration(models.Model):
time = models.DateTimeField()
robot = models.ForeignKey("SomeApp.Robot")
canteen_empty_level = models.IntegerField()
With the above models, as the Robot's Configuration or CanteenLevel change, we create new records and save the historicals.
What I would like to do is add an annotation to a Robot QuerySet that states if the Robot's Canteen is considered empty (Robot's latest CanteenLevel.gallons is less than the Robot's latest Configuration.canteen_empty_level).
The aim is to allow for a statement like this using the annotation in the QuerySet:
bad_robots = Robot.objects.filter(canteen_empty=True)
I had tried something like this in the annotation:
canteen_empty=ExpressionWrapper(CanteenLevel.objects.filter(robot=OuterRef('pk')).order_by('-time').values('gallons')[:1] <= RobotConfiguration.objects.filter(robot=OuterRef('robot')).order_by('-time').values('canteen_empty_level')[:1], output_field=models.BooleanField))
But obviously the "<=" operator isn't allowed.
I also tried this:
canteen_empty=Exists(CanteenLevel.objects.filter(robot=OuterRef('pk')).order_by('-time').values('gallons')[:1].filter(gallons__lte=Subquery(RobotConfiguration.objects.filter(robot=OuterRef('robot')).order_by('-time').values('canteen_empty_level')[:1]))))
But you can't filter after taking a slice of a QuerySet.
Any help would be appreciated!
We can make two annotations here:
from django.db.models import Subquery, OuterRef
latest_gallons = Subquery(CanteenLevel.objects.filter(
robot=OuterRef('pk')
).order_by('-time').values('gallons')[:1])
latest_canteen = Subquery(RobotConfiguration.objects.filter(
robot=OuterRef('pk')
).order_by('-time').values('canteen_empty_level')[:1])
then we can first annotate the Robot objects with these, and filter:
from django.db.models import F
Robot.objects.annotate(
latest_gallons=latest_gallons,
latest_canteen=latest_canteen
).filter(latest_gallons__lte=F('latest_canteen'))
This will construct a query that looks like:
SELECT robot.*,
(SELECT U0.gallons
FROM canteenlevel U0
WHERE U0.robot_id = robot.id
ORDER BY U0.time DESC
LIMIT 1) AS latest_gallons,
(SELECT U0.canteen_empty_level
FROM robotconfiguration U0
WHERE U0.robot_id = robot.id
ORDER BY U0.time DESC
LIMIT 1) AS latest_canteen
FROM robot
WHERE
(SELECT U0.gallons
FROM canteenlevel U0
WHERE U0.robot_id = robot.id
ORDER BY U0.time DESC
LIMIT 1
) <= (
SELECT U0.canteen_empty_level
FROM robotconfiguration U0
WHERE U0.robot_id = robot.id
ORDER BY U0.time DESC
LIMIT 1
)
Note however that if a Robot has no related CanteenLevel, or RobotConfiguration (one of them, or both), then that Robot will not be included in the queryset.

Django queryset get max id's for a filter

I want to get a list of max ids for a filter I have in Django
class Foo(models.Model):
name = models.CharField()
poo = models.CharField()
Foo.objects.filter(name__in=['foo','koo','too']).latest_by_id()
End result a queryset having only the latest objects by id for each name. How can I do that in Django?
Edit: I want multiple objects in the end result. Not just one object.
Edit1: Added __in. Once again I need only latest( as a result distinct) objects for each name.
Something like this.
my_id_list = [Foo.objects.filter(name=name).latest('id').id for name in ['foo','koo','too']]
Foo.objects.filter(id__in=my_id_list)
The above works. But I want a more concise way of doing it. Is it possible to do this in a single query/filter annotate combination?
you can try:
qs = Foo.objects.filter(name__in=['foo','koo','too'])
# Get list of max == last pk for your filter objects
max_pks = qs.annotate(mpk=Max('pk')).order_by().values_list('mpk', flat=True)
# after it filter your queryset by last pk
result = qs.filter(pk__in=max_pks)
If you are using PostgreSQL you can do the following
Foo.objects.order_by('name', '-id').distinct('name')
MySQL is more complicated since is lacks a DISTINCT ON clause. Here is the raw query that is very hard to force Django to generate from ORM function calls:
Foo.objects.raw("""
SELECT
*
FROM
`foo`
GROUP BY `foo`.`name`
ORDER BY `foo`.`name` ASC , `foo`.`id` DESC
""")

django sub query filter using value from field in parent query

I have following models:
class Domain(models.Model):
name = models.CharField(...)
plan = models.ForeignKey(Plan, ....)
class Plan(models.Model):
name = models.CharField(...)
num_ex_accounts = models.IntegerField(...)
class PlanDetails(models.Model):
accounts = models.IntegerField()
plan = models.ForeignKey(Plan, ....)
class Mailbox(models.Model):
domain = models.ForeignKey(Domain, ...)
Any domain has a plan, any plan has N plan details which has accounts value for create mailboxes using a domain, I want to get in a queryset domains which exceed the accounts value, using raw sql the sql is like:
SELECT domain
FROM domain, plan
WHERE plan.id = domain.plan_id
AND (
SELECT SUM(accounts)
FROM plandetails WHERE plandetails.plan_id=plan.id
)
<=
(
SELECT COUNT(*)
FROM mailbox WHERE mailbox.domain_id=domain.id
)
I tried in django something like this:
domains = Domain.objects.filter(
Q(
PlainDetails.objects.filter(plan = Domain.plan).aggregate(Sum('accounts')) <=
Mailbox.objects.filter(domain = Domain).count()
)
)
But it doesn't works, it throws an error about the Domain.plan, is there a way to reference that field value from parent query in the sub-query? is this queryset valid or is there another (better) approach? or I should use simply raw sql, what is the best option in this case?
If you are using Django 1.8 and higher, then try this:
Domain.objects.annotate(
account_sum=Sum('plan__plandetails_set__accounts'),
mailbox_count=Count('mailbox_set__id'))
).filter(
account_sum__lte=F('mailbox_count')
)
Replace plandetails_set and mailbox_set with the appropriate related name if you explicitly specified any in the ForeignKey fields to Plan. Those two are just the defaults if none has been specified.
UPDATE
For very complex queries, it might be better to just run the actual SQL using cursors. Django ORM is usually good enough, but depending on the needs, it may not be enough. Please view the docs.

Django ORM: Apply ordering to raw query sets

Here is the raw query set Django ORM:
ob = Shop.objects.raw('SELECT * from shops GROUP BY
(duplicate_field_name) having COUNT(*) = 1 ORDER BY some_field')
listorder = ["check_in","check_out","location"]
This listorder part is dynamic. I don't know how it ll be. It ll change the ordering from time to time & one more thing can't apply ordering on raw query sets because I want the whole data for other purpose.After that only i can apply ordering.
Here want ordering by the list "listorder".
mObj = ob.order_by[*listorder].
In above facing error like can't apply ordering to raw query sets.
Anyone having any idea?
If you want a raw queryset to be ordered by different fields, you can add them to the ORDER BY clause.
ob = Shop.objects.raw('SELECT * from shops GROUP BY
(duplicate_field_name) having COUNT(*) = 1 ORDER BY check_in, check_out, location')
if you want the order to be reversed for a particular field you can change it as
ob = Shop.objects.raw('SELECT * from shops GROUP BY
(duplicate_field_name) having COUNT(*) = 1 ORDER BY check_in, check_out DESC, location')
If the ordering is going to be dynamic, you can create the querystring dynamically.
qs = ''SELECT * from shops GROUP BY
(duplicate_field_name) having COUNT(*) = 1'
# some other code here to decide what your ordering is example
order_fields = ['id','location','check_in','check_out']
qs = qs + "ORDER BY " + ",".join(order_fields)
Then you can query as before
Shop.objects.raw(qs)