Behavior of querysets with foreign keys in Django - django

When a model object is an aggregate of many other objects, whether via Foreign Key or Many To Many, does iterating over the queryset of that object result in individual queries to the related objects?
Lets say I have
class aggregateObj(models.Model):
parentorg = models.ForeignKey(Parentorgs)
contracts = models.ForeignKey(Contracts)
plans = models.ForeignKey(Plans)
and execute
objs = aggregateObj.objects.all()
if I iterate over objs, does every comparison made within the parentorg, contracts or plan fields result in an individual query to that object?

Yes, by default every comparison will create an individual query. To get around that, you can make use of the select_related (and prefetch_related the relationship is in the 'backwards' direction) QuerySet method to fetch all the related object in the initial query:
Returns a QuerySet that will automatically “follow” foreign-key relationships, selecting that additional related-object data when it executes its query. This is a performance booster which results in (sometimes much) larger queries but means later use of foreign-key relationships won’t require database queries.

Yes. To prevent that, use select_related to fetch the related data via a JOIN at query time.

Related

Following nested foreign keys while prefetching in Django

I have an information model with deep and complex foreign key relations. and lots of them. Because of it I am trying to use select_related() and prefetch_related() to minimize the number of queries to my DB.
I am having the problem, however, that I cannot figure out a way to make pre-fetching operators follow foreign keys to an arbitrary depth. I am aware of the double underscore operator (__), but that isn't really an option, because I do not know in advance how deep the nesting will be.
So for example let's say I have objects A, B, C,...Z. Any object can have an arbitrary number of foreign keys pointing to any object that appear, say, later on in the alphabet. How can I make sure that, for example, prefetching the foreign key that from A points to B will follow all the foreign keys on B?
My best shot for now was a semi-hard coded approach on the get_queryset() method on the object manager.
Thank you in advance
EDIT:
Ok so an idea on how I am trying to do it at the moment is as follows:
class MyModelmanager(model.Manger):
def get_queryset()
qs = super().get_queryset()
qs = qs.select_related(*thefiledsiwannaprefetch)
return qs
Now in the fields I am prefetching there are foreign keys relations I would like to follow. How do I achieve that (without using '__')?
EDIT 2
Another attempt was the following:
class MyModelmanager(model.Manger):
def get_queryset()
return super().get_queryset().prefetch_related()
I did then overrode the manger of the other models, so that they also performed prefetching in their get_queryset() method. This also didn't work.
From the docs:
There may be some situations where you wish to call select_related() with a lot of related objects, or where you don’t know all of the relations. In these cases it is possible to call select_related() with no arguments. This will follow all non-null foreign keys it can find - nullable foreign keys must be specified. This is not recommended in most cases as it is likely to make the underlying query more complex, and return more data, than is actually needed.

Django Postgres ArrayField vs One-to-Many relationship

For a model in my database I need to store around 300 values for a specific field. What would be the drawbacks, in terms of performance and simplicity in query, if I use Postgres-specific ArrayField instead of a separate table with One-to-Many relationship?
If you use an array field
The size of each row in your DB is going to be a bit large thus Postgres is going to be using a lot more toast tables (http://www.postgresql.org/docs/9.5/static/storage-toast.html)
Every time you get the row, unless you specifically use defer (https://docs.djangoproject.com/en/1.9/ref/models/querysets/#defer) the field or otherwise exclude it from the query via only, or values or something, you paying the cost of loading all those values every time you iterate across that row. If that's what you need then so be it.
Filtering based on values in that array, while possible isn't going to be as nice and the Django ORM doesn't make it as obvious as it does for M2M tables.
If you use M2M
You can filter more easily on those related values
Those fields are postponed by default, you can use prefetch_related if you need them and then get fancy if you want only a subset of those values loaded
Total storage in the DB is going to be slightly higher with M2M because of keys, and extra id fields
The cost of the joins in this case is completely negligible because of keys.
Personally I'd say go with the M2M tables, but I don't know your specific application. If you're going to be working with a massive amount of data it's likely worth grabbing a representative dataset and testing both methods with it.

How to modify a queryset and save it as new objects?

I need to query for a set of objects for a particular Model, change a single attribute/column ("account"), and then save the entire queryset's objects as new objects/rows. In other words, I want to duplicate the objects, with a single attribute ("account") changed on the duplicates. I'm basically creating a new account and then going through each model and copying a previous account's objects to the new account, so I'll be doing this repeatedly, with different models, probably using django shell. How should I approach this? Can it be done at the queryset level or do I need to loop through all the objects?
i.e.,
MyModel.objects.filter(account="acct_1")
# Now I need to set account = "acct_2" for the entire queryset,
# and save as new rows in the database
From the docs:
If the object’s primary key attribute is not set, or if it’s set but a
record doesn’t exist, Django executes an INSERT.
So if you set the id or pk to None it should work, but I've seen conflicting responses to this solution on SO: Duplicating model instances and their related objects in Django / Algorithm for recusrively duplicating an object
This solution should work (thanks #JoshSmeaton for the fix):
models = MyModel.objects.filter(account="acct_1")
for model in models:
model.id = None
model.account = "acct_2"
model.save()
I think in my case, I have a OneToOneField on the model that I'm testing on, so it makes sense that my test wouldn't work with this basic solution. But, I believe it should work, so long as you take care of OneToOneField's.

SQL Index on Django Generic Relation

Is it possible/sensible to create an SQL index on a GenericForeignKey in a Django model?
I want to perform a lookup on a large number of (~1 million) objects in my postgreSQL database. My lookup is based on a GenericForeignkey on the relevant model, which is actually stored as two fields: object_id (the pk of the object that is being linked to) and content_type (a FK to the Django ContentType model representing the type of object being linked to).
In SQL terms this is essentially:
WHERE ("my_model"."content_type_id" = x AND "my_model"."object_id" = y)
object_id is a non-unique field - since the generic FK can link to multiple models, its possible that objects of different types will have the same pk.
I am wondering whether I can speed up my query times by creating a non-unique index on my_model.object_id. My knowledge of indexing is limited, so I may not have understood their use correctly, but I know that Django automatically creates indexes on normal ForeignKey relations so I assume there is an associated speedup.
Has anyone had any experience creating indexes for GenericForeignKeys? Did you find a resulting performance increase? Any help or insight is much appreciated.

django select_related for multiple foreign keys

How does select_related work with a model which has multiple foreign keys? Does it just choose the first one?
class Model:
fkey1, fkey2, fkey3...
The documentation doesn't say anything about this, at least not in where the method is specified.
NOTE: Maybe I'm the only one who will get confused. I guess select_related is just a performance booster (I can see that) but I had the wrong idea that it was something else.
You can use select_related in a chain as following
Comment.objects.select_related('user').select_related('article').all()
If your model has multiple foreign keys you can:
Call .select_related(), that will “follow” all non-null foreign-key relationships
Call .select_related('foreign_key1', 'foreign_key2', ...), that will “follow” only the foreign-key provided as arguments.
Note that "to follow a FK relationship" means selecting additional related-object data when the query is executed (by performing a SQL join). This will make the main query heavier but can be used to avoid N + 1 queries problem.
According to select_related documentation, the first method (without arguments) is not recommended as "it is likely to make the underlying query more complex, and return more data, than is actually needed."
If your model has "nested" foreign keys with other models (i.e. Book <>-- Author <>-- Hometown) you can also use select_related as follow:
Call Book.select_related('author__hometown'), that will “follow” the author's foreign-key (in Book model) and the hometown's foreign-key (in Author model).
If your model has many-to-many or many-to-one relations you would like to retrieve from the database, you should take a look at prefetch_related.
On the contrary, the documentation is very clear on the matter. It says that by default all ForeignKeys are followed, but you can give the method a list of fields and it will only follow those relationships.
You can pass foreign keys and even nested foreign keys to the select_related method eg select_related('book__author', 'publisher',) you can add as many foreign keys as you want, if you call select_related() without any argument then it will follow all the foreign key relationship which is not recommended at all because you'll be complicating the query by fetching data you don't need. Finally, from Django documentation "Chaining select_related calls works in a similar way to other methods - that is that select_related('foo', 'bar') is equivalent to select_related('foo').select_related('bar')