Join two queries in Django ORM

Join two queries in Django ORM - django

I have a Person model which has a birthday. I would like to create a query that returns all the persons information along with an additional field that tells how many people are sharing each person's birthday. In SQL I would write it like this:
SELECT p.name, b.count FROM
persons as p INNER JOIN
(SELECT birthday as date, COUNT(*) AS count FROM persons GROUP_BY birthday) AS b
WHERE p.birthday = b.date
With Django querysets I can do the inner select but I don't know how to do the inner join.

Seems tough to do with the ORM (though maybe possible with extra).
You could create a dict of counts by date (max 366 values, if ignoring year):
from django.db.models import Count
birthdate = lambda d: d.strftime("%m-%d")
# this runs the subquery in your SQL:
birthdays = Person.objects.values('birthday')
counts = birthdays.annotate(count=Count('birthday'))
counts_by_date = {
birthdate(r['birthday']): r['count']
for r in counts
}
for person in Person.objects.all():
count = counts_by_date[birthdate(person.birthday)]
print "%d people share your birthday!" % count

I would add a model method to get the count. You can do it using the ORM like this:
from django.db import models
class Person(models.Model):
birthdate = models.DateField()
def shared_count(self):
return Person.objects.filter(birthdate=self.birthdate).exclude(pk=self.id).count()
Then you can just access the count on the Person instance like this:
my_person = Person.objects.get(pk=12)
count = my_person.shared_count()
Or access it in a template like this:
{{ my_person.shared_count }}

Related

How to filter on a foreign key that is grouped?

Model:
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=100)
class Result(models.Model):
person = models.ForeignKey(Person, on_delete=models.CASCADE)
outcome = models.IntegerField()
time = models.DateTimeField()
Sql:
select * from person as p
inner join (
select person_id, max(time) as max_time, outcome from result
group by person_id
) r on p.id = r.person_id
where r.result in (2, 3)
I'm wanting to get the all person records where the last result outcome was either a 2 or 3. I added the raw sql above to further explain.
I looked at using a subquery to filter person records that have a matching result id
sub_query = Result.objects.values("person_id").annotate(max_time=Max("time"))
however using values strips out the other fields.
Ideally I'd be able to do this in one person queryset but I don't think that is the case.

The below query exclude all persons whose marks not equal to 2 OR 3, then it sorts the results in time descending order (latest will be on top) and finally get the details for person ...
from django.db.models import Q
results = Results.objects.filter(Q(outcome=3) | Q(outcome=2)).order_by('-time').values('person')

As a person may have multiple result records and I only want to check the last record, A subquery was the only way I could find to do this
last_result = Subquery(
Result.objects.filter(person_id=OuterRef("pk")).order_by("-time").values("result")[:1]
)
people = Person.objects.all().annotate(max_time=Max("result__time"), current_result=last_result).filter(current_result__in=[2,3)
First I create a sub query that will return the last result record. Then I add this as a field in the people query so that I can filter on that for only results with 2 or 3.
This was it will only return person records where the current result is a 2 or 3.

How to create an annotation in Django that references two related models

I'm trying to add an annotation to a QuerySet that is True/False when the value of a field on one related object is less than the value of a field on a different related object.
Here are some models for an example:
class RobotManager(models.Manager):
queryset = super(RobotManager, self).get_queryset()
queryset = queryset.annotate(canteen_empty=UNKNOWN CODE)
return queryset
class Robot(models.Model):
# Has some other unrelated stuff
objects = RobotManager()
class CanteenLevel(models.Model):
time = models.DateTimeField()
robot = models.ForeignKey("SomeApp.Robot")
gallons = models.IntegerField()
class RobotConfiguration(models.Model):
time = models.DateTimeField()
robot = models.ForeignKey("SomeApp.Robot")
canteen_empty_level = models.IntegerField()
With the above models, as the Robot's Configuration or CanteenLevel change, we create new records and save the historicals.
What I would like to do is add an annotation to a Robot QuerySet that states if the Robot's Canteen is considered empty (Robot's latest CanteenLevel.gallons is less than the Robot's latest Configuration.canteen_empty_level).
The aim is to allow for a statement like this using the annotation in the QuerySet:
bad_robots = Robot.objects.filter(canteen_empty=True)
I had tried something like this in the annotation:
canteen_empty=ExpressionWrapper(CanteenLevel.objects.filter(robot=OuterRef('pk')).order_by('-time').values('gallons')[:1] <= RobotConfiguration.objects.filter(robot=OuterRef('robot')).order_by('-time').values('canteen_empty_level')[:1], output_field=models.BooleanField))
But obviously the "<=" operator isn't allowed.
I also tried this:
canteen_empty=Exists(CanteenLevel.objects.filter(robot=OuterRef('pk')).order_by('-time').values('gallons')[:1].filter(gallons__lte=Subquery(RobotConfiguration.objects.filter(robot=OuterRef('robot')).order_by('-time').values('canteen_empty_level')[:1]))))
But you can't filter after taking a slice of a QuerySet.
Any help would be appreciated!

We can make two annotations here:
from django.db.models import Subquery, OuterRef
latest_gallons = Subquery(CanteenLevel.objects.filter(
robot=OuterRef('pk')
).order_by('-time').values('gallons')[:1])
latest_canteen = Subquery(RobotConfiguration.objects.filter(
robot=OuterRef('pk')
).order_by('-time').values('canteen_empty_level')[:1])
then we can first annotate the Robot objects with these, and filter:
from django.db.models import F
Robot.objects.annotate(
latest_gallons=latest_gallons,
latest_canteen=latest_canteen
).filter(latest_gallons__lte=F('latest_canteen'))
This will construct a query that looks like:
SELECT robot.*,
(SELECT U0.gallons
FROM canteenlevel U0
WHERE U0.robot_id = robot.id
ORDER BY U0.time DESC
LIMIT 1) AS latest_gallons,
(SELECT U0.canteen_empty_level
FROM robotconfiguration U0
WHERE U0.robot_id = robot.id
ORDER BY U0.time DESC
LIMIT 1) AS latest_canteen
FROM robot
WHERE
(SELECT U0.gallons
FROM canteenlevel U0
WHERE U0.robot_id = robot.id
ORDER BY U0.time DESC
LIMIT 1
) <= (
SELECT U0.canteen_empty_level
FROM robotconfiguration U0
WHERE U0.robot_id = robot.id
ORDER BY U0.time DESC
LIMIT 1
)
Note however that if a Robot has no related CanteenLevel, or RobotConfiguration (one of them, or both), then that Robot will not be included in the queryset.

Django: Single query with multiple joins on the same one-to-many relationship

Using the Django QuerySet API, how can I perform multiple joins between the same two tables/models? See the following untested code for illustration purposes:
class DataPacket(models.Model):
time = models.DateTimeField(auto_now_add=True)
class Field(models.Model):
packet = models.ForeignKey(DataPacket, models.CASCADE)
name = models.CharField(max_length=25)
value = models.FloatField()
I want to grab a list of data packets with only specific named fields. I tried something like this:
pp = DataPacket.prefetch_related('field_set')
result = []
for p in pp:
o = {
f.name: f.value
for f in p.field_set.all()
if f.name in ('latitude', 'longitude')
}
o['time'] = p.time
result.append(o)
But this has proven extremely inefficient because I'm working with hundreds to thousands of packets with a lot of other fields besides the latitude and longitude fields I want.
Is there a Django QuerySet call which translates into an efficient SQL query performing two inner joins from the datapacket table to the field table on different rows? I can do it with raw SQL, as follows (assuming the Django application is named myapp) (again, untested code for illustration purposes):
from django.db import connection
with connection.cursor() as cursor:
cursor.execute('''
SELECT p.time AS time, f1.value AS lat, f2.value AS lon
FROM myapp_datapacket AS p
INNER JOIN myapp_field as f1 ON p.id = f1.packet_id
INNER JOIN myapp_field as f2 ON p.id = f2.packet_id
WHERE f1.name = 'latitude' AND f2.name = 'longitude'
''')
result = list(cursor)
But instinct tells me not to use the low-level DB api if I don't have to do so. Possible reasons to back that up might be that my SQL code might not be compatible with all the DBMs Django supports, or I feel like I'm more at risk of trashing my database by misunderstanding a SQL command than I am at misunderstanding the Django API call, etc.

Try Performing raw SQL queries in django. As well as select related in raw request.
prefetch on raw query:
from django.db.models.query import prefetch_related_objects
raw_queryset = list(raw_queryset)
prefetch_related_objects(raw_queryset, ['a_related_lookup',
'another_related_lookup', ...])
Your example:
from django.db.models.query import prefetch_related_objects
raw_DataPacket = list(DataPacket.objects.raw)
pp = prefetch_related_objects(raw_DataPacket, ['field_set'])
Example of prefetch_related with Raw Queryset:
models:
class Country:
name = CharField()
class City:
country = models.ForeignKey(Country)
name = models.CharField()
prefetch_related:
from django.db.models.query import prefetch_related_objects
#raw querysets do not have len()
#thats why we need to evaluate them to list
cities = list(City.objects.raw("select * from city inner join country on city.country_id = country.id where name = 'london'"))
prefetch_related_objects(cities, ['country'])
Answer provided from information from these sources: djangoproject - performing raw queries | Related Stackoverflow Question | Google docs question

Django Queryset - extracting only date from datetime field in query (inside .value() )

I want to extract some particular columns from django query
models.py
class table
id = models.IntegerField(primaryKey= True)
date = models.DatetimeField()
address = models.CharField(max_length=50)
city = models.CharField(max_length=20)
cityid = models.IntegerField(20)
This is what I am currently using for my query
obj = table.objects.filter(date__range(start,end)).values('id','date','address','city','date').annotate(count= Count('cityid')).order_by('date','-count')
I am hoping to have a SQL query that is similar to this
select DATE(date), id,address,city, COUNT(cityid) as count from table where date between "start" and "end" group by DATE(date), address,id, city order by DATE(date) ASC,count DESC;

At least in Django 1.10.5, you can use something like this, without extra and RawSQL:
from django.db.models.functions import Cast
from django.db.models.fields import DateField
table.objects.annotate(date_only=Cast('date', DateField()))
And for filtering, you can use date lookup (https://docs.djangoproject.com/en/1.11/ref/models/querysets/#date):
table.objects.filter(date__date__range=(start, end))

For the below case.
select DATE(date), id,address,city, COUNT(cityid) as count from table where date between "start" and "end" group by DATE(date), address,id, city order by DATE(date) ASC,count DESC;
You can use extra where you can implement DB functions.
Table.objects.filter(date__range(start,end)).extra(select={'date':'DATE(date)','count':'COUNT(cityid)'}).values('date','id','address_city').order_by('date')
Hope it will help you.
Thanks.

Django select related in raw request

How to make "manual" select_related imitation to avoid undesirable DB hits?
we have:
class Country:
name = CharField()
class City:
country = models.ForeignKey(Country)
name = models.CharField()
cities = City.objects.raw("select * from city inner join country on city.country_id = country.id where name = 'london'")
#this will hill hit DB
print cities[0].country.name
How to tell django that related models are already fetched.

A solution with prefetch_related (this means that two queries will be made, 1 for the cities and 1 for the countries) taken from django-users which is not part of the public API but is working on Django 1.7
from django.db.models.query import prefetch_related_objects
#raw querysets do not have len()
#thats why we need to evaluate them to list
cities = list(City.objects.raw("select * from city inner join country on city.country_id = country.id where name = 'london'"))
prefetch_related_objects(cities, ['country'])
UPDATE
Now in Django 1.10 prefetch_related_objects is part of the public API.

Not sure if you still need this, but I solved it starting with Alasdair's answer. You want to use the info from the query to build the model or it'll still fire additional queries when you try to access the foreign key field. So in your case, you'd want:
cities = list(City.objects.raw("""
SELECT
city.*, country.name as countryName
FROM
cities INNER JOIN country ON city.country_id = country.id
WHERE
city.name = 'LONDON"""))
for city in cities:
city.country = Country(name=city.countryName)
The line that assigns the country doesn't hit the database, it's just creating a model. Then after that, when you access city.country it won't fire another database query.

I'm not sure if you can do this. As an alternative, you can select individual fields from the country table and access them on each instance.
cities = City.objects.raw("select city.*, name as country_name from city inner join country on city.country_id = country.id where name = 'london'")
city = cities[0]
# this will not hit the database again
city.country_name

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Join two queries in Django ORM - django

Related

How to filter on a foreign key that is grouped?

How to create an annotation in Django that references two related models

Django: Single query with multiple joins on the same one-to-many relationship

Django Queryset - extracting only date from datetime field in query (inside .value() )

Django select related in raw request

Categories

Resources