Following nested foreign keys while prefetching in Django - django

I have an information model with deep and complex foreign key relations. and lots of them. Because of it I am trying to use select_related() and prefetch_related() to minimize the number of queries to my DB.
I am having the problem, however, that I cannot figure out a way to make pre-fetching operators follow foreign keys to an arbitrary depth. I am aware of the double underscore operator (__), but that isn't really an option, because I do not know in advance how deep the nesting will be.
So for example let's say I have objects A, B, C,...Z. Any object can have an arbitrary number of foreign keys pointing to any object that appear, say, later on in the alphabet. How can I make sure that, for example, prefetching the foreign key that from A points to B will follow all the foreign keys on B?
My best shot for now was a semi-hard coded approach on the get_queryset() method on the object manager.
Thank you in advance
EDIT:
Ok so an idea on how I am trying to do it at the moment is as follows:
class MyModelmanager(model.Manger):
def get_queryset()
qs = super().get_queryset()
qs = qs.select_related(*thefiledsiwannaprefetch)
return qs
Now in the fields I am prefetching there are foreign keys relations I would like to follow. How do I achieve that (without using '__')?
EDIT 2
Another attempt was the following:
class MyModelmanager(model.Manger):
def get_queryset()
return super().get_queryset().prefetch_related()
I did then overrode the manger of the other models, so that they also performed prefetching in their get_queryset() method. This also didn't work.

From the docs:
There may be some situations where you wish to call select_related() with a lot of related objects, or where you don’t know all of the relations. In these cases it is possible to call select_related() with no arguments. This will follow all non-null foreign keys it can find - nullable foreign keys must be specified. This is not recommended in most cases as it is likely to make the underlying query more complex, and return more data, than is actually needed.

Related

Differences between Django one-to-one and foreign key relationships at db level?

Is OneToOneField just ForeignKey with unique=True.What is the difference at db level,how about performance?
Thanks so much.
If we look to the source code of the OneToOneField, we see:
class OneToOneField(ForeignKey):
# ...
def __init__(self, to, on_delete, to_field=None, **kwargs):
kwargs['unique'] = True
super().__init__(to, on_delete, to_field=to_field, **kwargs)
It contains some extra attributes such that Django understands that the reverse of a OneToOneField is not a one-to-may field, but that it can thus query the for an object directly.
We thus will construct a UNIQUE(foreign_column) (depending on the database language the syntax can look a bit different).
For most popular database engines, at the database level, by default for a ForeignKey, the database constructs an index, to boost the performance of a reverse lookup. So in case you create a new row, or update an existing one, the database will check that the index does not contain such value (and thus uniqness is guaranteed).
This thus will for updates and creates result in an extra lookup. But the index lookup should be performed anyway when we use a ForeignKey (with an index), since the index is updated. Therefore on reasonable database systems, the performance difference should be negligible.

Getting all foreign keys from a QuerySet

I have two models, Story and FollowUp. FollowUp can have a foreign key of Story. Story has a foreign key of User.
I can get all Story that are associated with a User with user.story_set.all(). I would like to, without iterating through the queryset, be able to get all FollowUps associated with all the Storys.
I had initially tried user.story_set.all().followup_set.all(), but that does not work.
How can I accomplish this?
The easier way would be to use prefetch_related to preload all the objects, and then just iterate:
followups = []
for story in user.story_set.all().prefetch_related():
followups.extend(story.followup_set.all())
Alternatively, you could do:
followups = Followup.objects.filter(story__in = user.story_set.all())
(The second piece of code might need a list compression to use IDs and not objects, but you get the idea)
A more concise approach is:
FollowUp.objects.filter(story__user=user)

Behavior of querysets with foreign keys in Django

When a model object is an aggregate of many other objects, whether via Foreign Key or Many To Many, does iterating over the queryset of that object result in individual queries to the related objects?
Lets say I have
class aggregateObj(models.Model):
parentorg = models.ForeignKey(Parentorgs)
contracts = models.ForeignKey(Contracts)
plans = models.ForeignKey(Plans)
and execute
objs = aggregateObj.objects.all()
if I iterate over objs, does every comparison made within the parentorg, contracts or plan fields result in an individual query to that object?
Yes, by default every comparison will create an individual query. To get around that, you can make use of the select_related (and prefetch_related the relationship is in the 'backwards' direction) QuerySet method to fetch all the related object in the initial query:
Returns a QuerySet that will automatically “follow” foreign-key relationships, selecting that additional related-object data when it executes its query. This is a performance booster which results in (sometimes much) larger queries but means later use of foreign-key relationships won’t require database queries.
Yes. To prevent that, use select_related to fetch the related data via a JOIN at query time.

django select_related for multiple foreign keys

How does select_related work with a model which has multiple foreign keys? Does it just choose the first one?
class Model:
fkey1, fkey2, fkey3...
The documentation doesn't say anything about this, at least not in where the method is specified.
NOTE: Maybe I'm the only one who will get confused. I guess select_related is just a performance booster (I can see that) but I had the wrong idea that it was something else.
You can use select_related in a chain as following
Comment.objects.select_related('user').select_related('article').all()
If your model has multiple foreign keys you can:
Call .select_related(), that will “follow” all non-null foreign-key relationships
Call .select_related('foreign_key1', 'foreign_key2', ...), that will “follow” only the foreign-key provided as arguments.
Note that "to follow a FK relationship" means selecting additional related-object data when the query is executed (by performing a SQL join). This will make the main query heavier but can be used to avoid N + 1 queries problem.
According to select_related documentation, the first method (without arguments) is not recommended as "it is likely to make the underlying query more complex, and return more data, than is actually needed."
If your model has "nested" foreign keys with other models (i.e. Book <>-- Author <>-- Hometown) you can also use select_related as follow:
Call Book.select_related('author__hometown'), that will “follow” the author's foreign-key (in Book model) and the hometown's foreign-key (in Author model).
If your model has many-to-many or many-to-one relations you would like to retrieve from the database, you should take a look at prefetch_related.
On the contrary, the documentation is very clear on the matter. It says that by default all ForeignKeys are followed, but you can give the method a list of fields and it will only follow those relationships.
You can pass foreign keys and even nested foreign keys to the select_related method eg select_related('book__author', 'publisher',) you can add as many foreign keys as you want, if you call select_related() without any argument then it will follow all the foreign key relationship which is not recommended at all because you'll be complicating the query by fetching data you don't need. Finally, from Django documentation "Chaining select_related calls works in a similar way to other methods - that is that select_related('foo', 'bar') is equivalent to select_related('foo').select_related('bar')

Django ORM: Optimizing queries involving many-to-many relations

I have the following model structure:
class Container(models.Model):
pass
class Generic(models.Model):
name = models.CharacterField(unique=True)
cont = models.ManyToManyField(Container, null=True)
# It is possible to have a Generic object not associated with any container,
# thats why null=True
class Specific1(Generic):
...
class Specific2(Generic):
...
...
class SpecificN(Generic):
...
Say, I need to retrieve all Specific-type models, that have a relationship with a particular Container.
The SQL for that is more or less trivial, but that is not the question. Unfortunately, I am not very experienced at working with ORMs (Django's ORM in particular), so I might be missing a pattern here.
When done in a brute-force manner, -
c = Container.objects.get(name='somename') # this gets me the container
items = c.generic_set.all()
# this gets me all Generic objects, that are related to the container
# Now what? I need to get to the actual Specific objects, so I need to somehow
# get the type of the underlying Specific object and get it
for item in items:
spec = getattr(item, item.get_my_specific_type())
this results in a ton of db hits (one for each Generic record, that relates to a Container), so this is obviously not the way to do it. Now, it could, perhaps, be done by getting the SpecificX objects directly:
s = Specific1.objects.filter(cont__name='somename')
# This gets me all Specific1 objects for the specified container
...
# do it for every Specific type
that way the db will be hit once for each Specific type (acceptable, I guess).
I know, that .select_related() doesn't work with m2m relationships, so it is not of much help here.
To reiterate, the end result has to be a collection of SpecificX objects (not Generic).
I think you've already outlined the two easy possibilities. Either you do a single filter query against Generic and then cast each item to its Specific subtype (results in n+1 queries, where n is the number of items returned), or you make a separate query against each Specific table (results in k queries, where k is the number of Specific types).
It's actually worth benchmarking to see which of these is faster in reality. The second seems better because it's (probably) fewer queries, but each one of those queries has to perform a join with the m2m intermediate table. In the former case you only do one join query, and then many simple ones. Some database backends perform better with lots of small queries than fewer, more complex ones.
If the second is actually significantly faster for your use case, and you're willing to do some extra work to clean up your code, it should be possible to write a custom manager method for the Generic model that "pre-fetches" all the subtype data from the relevant Specific tables for a given queryset, using only one query per subtype table; similar to how this snippet optimizes generic foreign keys with a bulk prefetch. This would give you the same queries as your second option, with the DRYer syntax of your first option.
Not a complete answer but you can avoid a great number of hits by doing this
items= list(items)
for item in items:
spec = getattr(item, item.get_my_specific_type())
instead of this :
for item in items:
spec = getattr(item, item.get_my_specific_type())
Indeed, by forcing a cast to a python list, you force the django orm to load all elements in your queryset. It then does this in one query.
I accidentally stubmled upon the following post, which pretty much answers your question :
http://lazypython.blogspot.com/2008/11/timeline-view-in-django.html