Django: querying models not related with FK - django

I'm developing a Django project. I need to make many queries with the following pattern:
I have two models, not related by a FK, but that can be related by some fields (not their PKs).
I need to query the first model, and annotate it with results from the second model, joined by that field that is not de PK.
I can do it with a Subquery and an OuterRef function.
M2_queryset = M2.objects.filter(f1 = OuterRef('f2'))
M1.objects.annotate(b_f3 = Subquery(M2_queryset.values('f3')))
But if I need to annotate two columns, I need to do this:
M2_queryset = M2.objects.filter(f1 = OuterRef('f2'))
M1.objects.annotate(b_f3 = Subquery(M2_queryset.values('f3'))).annotate(b_f4 = Subquery(M2_queryset.values('f4')))
It's very inefficient because of the two identical subqueries.
It would be very interesting doing something like this:
M2_queryset = M2.objects.filter(f1 = OuterRef('f2'))
M1.objects.annotate(b_f3, b_f4 = Subquery(M2_queryset.values('f3','f4')))
or more interesting something like this and avoiding subqueries:
M1.objects.join(M2 on M2.f1 = M1.f2)...
For example in this model:
db
I need to do this regular query:
select m1.id,m1.f5, sum(m2.f2), sum(m2.f3)
from M1, M2
where M1.f1 = M2.f2
group by 1,2
without a fk between f1 and f2.

Related

Django - Getting Related objects

There are such models:
class Nomenclature(models.Model):
nameNom = models.CharField(max_length=150,verbose_name = "Название номеклатуры")
numNom = models.CharField(max_length=50,verbose_name = "Номер номеклатуры",unique=True)
quantity = models.IntegerField(verbose_name="Количество", default=0)
numPolk = models.CharField(max_length=150,verbose_name = "Номер полки/места"
class Changes(models.Model):
numNomenclature = models.ForeignKey(Nomenclature, on_delete=models.CASCADE,related_name="chamges",verbose_name="Номер номеклатуры")
quantity = models.IntegerField(verbose_name="Количество",null=True)
location = models.CharField(max_length=50,verbose_name = "Место установки")
fullname = models.CharField(max_length=150,verbose_name = "ФИО")
appointment = models.CharField(max_length=50,verbose_name = "Назначение")
created_at = models.DateTimeField(auto_now_add=True,verbose_name='Дата/время', null=True)
It is necessary to output the name and number of the nomenclature and all related changes to the template, and also output all fields
I found that select_related exists, but I thought that it doesn't work the way I need it to.
I'm not completely sure if this is what you need.
If you need to fetch all of the changes, from a single "Nomenclature" model:
md = Nomenclature.objects.get(id=id) # Not sure how you fetch this, just an example.
all_changes_for_md = Changes.objects.filter(numNomenclature__id=md.id)
This will fetch you all changes for a nomenclature model.
Also possible to do it like this:
md = Nomenclature.objects.get(id=id) # Not sure how you fetch this, just an example.
all_changes_for_md = md.chamges.all() # You made a typo in the related name.
Select related has another purpose, it is used for prefetching.
From the Django docs:
select_related(*fields)
Returns a QuerySet that will “follow” foreign-key relationships, selecting additional related-object data when it executes its query. This is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries.
https://docs.djangoproject.com/en/4.1/ref/models/querysets/#select-related

Django: Single query with multiple joins on the same one-to-many relationship

Using the Django QuerySet API, how can I perform multiple joins between the same two tables/models? See the following untested code for illustration purposes:
class DataPacket(models.Model):
time = models.DateTimeField(auto_now_add=True)
class Field(models.Model):
packet = models.ForeignKey(DataPacket, models.CASCADE)
name = models.CharField(max_length=25)
value = models.FloatField()
I want to grab a list of data packets with only specific named fields. I tried something like this:
pp = DataPacket.prefetch_related('field_set')
result = []
for p in pp:
o = {
f.name: f.value
for f in p.field_set.all()
if f.name in ('latitude', 'longitude')
}
o['time'] = p.time
result.append(o)
But this has proven extremely inefficient because I'm working with hundreds to thousands of packets with a lot of other fields besides the latitude and longitude fields I want.
Is there a Django QuerySet call which translates into an efficient SQL query performing two inner joins from the datapacket table to the field table on different rows? I can do it with raw SQL, as follows (assuming the Django application is named myapp) (again, untested code for illustration purposes):
from django.db import connection
with connection.cursor() as cursor:
cursor.execute('''
SELECT p.time AS time, f1.value AS lat, f2.value AS lon
FROM myapp_datapacket AS p
INNER JOIN myapp_field as f1 ON p.id = f1.packet_id
INNER JOIN myapp_field as f2 ON p.id = f2.packet_id
WHERE f1.name = 'latitude' AND f2.name = 'longitude'
''')
result = list(cursor)
But instinct tells me not to use the low-level DB api if I don't have to do so. Possible reasons to back that up might be that my SQL code might not be compatible with all the DBMs Django supports, or I feel like I'm more at risk of trashing my database by misunderstanding a SQL command than I am at misunderstanding the Django API call, etc.
Try Performing raw SQL queries in django. As well as select related in raw request.
prefetch on raw query:
from django.db.models.query import prefetch_related_objects
raw_queryset = list(raw_queryset)
prefetch_related_objects(raw_queryset, ['a_related_lookup',
'another_related_lookup', ...])
Your example:
from django.db.models.query import prefetch_related_objects
raw_DataPacket = list(DataPacket.objects.raw)
pp = prefetch_related_objects(raw_DataPacket, ['field_set'])
Example of prefetch_related with Raw Queryset:
models:
class Country:
name = CharField()
class City:
country = models.ForeignKey(Country)
name = models.CharField()
prefetch_related:
from django.db.models.query import prefetch_related_objects
#raw querysets do not have len()
#thats why we need to evaluate them to list
cities = list(City.objects.raw("select * from city inner join country on city.country_id = country.id where name = 'london'"))
prefetch_related_objects(cities, ['country'])
Answer provided from information from these sources: djangoproject - performing raw queries | Related Stackoverflow Question | Google docs question

Django: Select data from two tables with foreigin key to third table

I have following models:
class Dictionary(models.Model):
word = models.CharField(unique=True)
class ProcessedText(models.Model):
text_id = models.ForeiginKey('Text')
word_id = models.ForeignKey('Dictionary')
class UserDictionary(models.Model):
word_id = models.ForeignKey('Dictionary')
user_id = models.ForeignKye('User')
I want to make query using django ORM same with next sql
SELECT * FROM ProcessedText, UserDictionary WHERE
ProcessedText.text_id = text_id
AND ProcessedText.word_id = UserDictionary.word_id
AND UserDictionary.user_id = user_id
How to do it in one query without using cycles?
This might help you:
How do I select from multiple tables in one query with Django?
And also you may have to restructure your models to enable select_related concept of django.

How to get object from manytomany?

I have models:
class Z(models.Model):
name = ...
class B(model.Model):
something = model...
other = models.ForeignKey(Z)
class A(models.Model):
date = model.DateTimeField()
objs_b = models.ManyToManyField(B)
def get_obj_b(self,z_id):
self.obj_b = self.objs_b.get(other=z_id)
and query:
qs = A.objects.filter(...)
but if I want get object B related to A I must call get_obj_b:
for item in gs:
item.get_obj_b(my_known_z_id)
It was generate many queries. How to do it simple? I can not change models, and generally I must use filter (not my own manager) function.
If you are using Django 1.4, I would suggest that you use prefetch_related like this:
A.objects.all().prefetch_related('objs_b__other')
This would minimize the number of queries to 2: one for model A, and one for 'objs_b' joined with 'other'
And you can combine it with a filter suggested by pastylegs:
A.objects.filter(objs_b__other__id=z_id).prefetch_related('objs_b__other')
For details see: https://docs.djangoproject.com/en/1.4/ref/models/querysets/#prefetch-related

Query all rows and return most recent of each duplicate

I have a model that has an id that isn't unique. Each model also has a date. I would like to return all results but only the most recent of each row that shares ids. The model looks something like this:
class MyModel(models.Model):
my_id = models.PositiveIntegerField()
date = models.DateTimeField()
title = models.CharField(max_length=36)
## Add some entries
m1 = MyModel(my_id=1, date=yesterday, title='stop')
m1.save()
m2 = MyModel(my_id=1, date=today, title='go')
m2.save()
m3 = MyModel(my_id=2, date=today, title='hello')
m3.save()
Now try to retrieve these results:
MyModel.objects.all()... # then limit duplicate my_id's by most recent
Results should be only m2 and m3
You won't be able to do this with just the ORM, you'll need to get all the records, and then discard the duplicates in Python.
For example:
objs = MyModel.objects.all().order_by("-date")
seen = set()
keep = []
for o in objs:
if o.id not in seen:
keep.append(o)
seen.add(o.id)
Here's some custom SQL that can get what you want from the database:
select * from mymodel where (id, date) in (select id, max(date) from mymodel group by id)
You should be able to adapt this to use in the ORM.
You should also look into abstracting the logic above into a manager:
http://docs.djangoproject.com/en/dev/topics/db/managers/
That way you can call something like MyModel.objects.no_dupes() where you would define no_dupes() in a manager and do the logic Ned laid out in there.
Your models.py would now look like this:
class MyModelManager(models.Manager):
def no_dupes:
objs = MyModel.objects.all().order_by("-date")
seen = set()
keep = []
for o in objs:
if o.id not in seen:
keep.append(o)
seen.add(o.id)
return keep
class MyModel(models.Model):
my_id = models.PositiveIntegerField()
date = models.DateTimeField()
title = models.CharField(max_length=36)
objects = MyModelManager()
With the above code in place, you can call: MyModel.objects.no_dupes(), this should give your desired result. Looks like you can even override the all() function as well if you would want that instead:
http://docs.djangoproject.com/en/1.2/topics/db/managers/#modifying-initial-manager-querysets
I find the manager to be a better solution in case you will need to use this in more than one view across the project, this way you don't have to rewrite the code X number of times.
As Ned says, I don't know of a way to do this with the ORM. But you might be able to use the db to restrict the amount of work you have to do in the for loop in python.
The idea is to use Django's annotate (which is basically running group_by) to find all the instances that have more than one row with the same my_id and process them as Ned suggests. Then for the remainder (which have no duplicates), you can just grab the individual rows.
from django.db.models import Count, Q
annotated_qs = MyModel.objects.annotate(num_my_ids=Count('my_id')).order_by('-date')
dupes = annotated_qs.filter(num_my_ids__gt=1)
uniques = annotated_qs.filter(num_my_ids__lte=1)
for dupe in dupes:
... # just keep the most recent, as Ned describes
keep_ids = [keep.id for keep in keeps]
latests = MyModel.objects.filter(Q(id__in=keep_ids) | Q(id__in=uniques))
If you only have a small number of dupes, this will mean that your for loop is much shorter, at the expense of an extra query (to get the dupes).