Django: How to load related model objects when instantiating a model

Django: How to load related model objects when instantiating a model - django

I have two related models (1-n). From the parent model, I am doing a lot of operations on the child model. For each operation I am calling:
ItensOrder.objects.filter(order=self.pk)
Inside the Order class, which is the parent, I am using the children objects several times, like this:
def total(self):
itens = ItensOrder.objects.filter(order=self.pk)
valor = sum(Counter(item.price * item.quantity for item in itens))
return str(valor)
def details(self):
itens = ItensOrder.objects.filter(order=self.pk)
return format_html_join('\n', "{} ({} x {} = {})<br/>",
((item.item.name,str(item.quantity),item.price,str(item.price * item.quantity)) for item in itens))
What is the best way to load the related objects ONLY ONCE, so I can avoid reaching the database every time I need the related objects.
I've been trying this on the parent model:
def __init__(self, *args, **kwargs):
if self.pk is not None:
self.itens = ItensOrder.objects.filter(order=self.pk)
else:
self.itens = None
But this is wrong....
Anybody can help please!?

You can access related child objects by using the related_name of a ForeignKey field
order = Order.objects.get(id=1)
itens = order.itensorder_set.all()
This reverse relationship attribute will by default be the model name lowercase followed by "_set", you can change this by setting related_name on the foreign key
You can pre-populate this property with a cache of all the related objects by using prefetch_related
order = Order.objects.prefetch_related('itensorder_set').get(id=1)
order.itensorder_set.all() # This can be called multiple times but will not hit the database
In your case
class Order(models.Model):
def total(self):
valor = sum(Counter(item.price * item.quantity for item in self.itensorder_set.all()))
return str(valor)
def details(self):
return format_html_join('\n', "{} ({} x {} = {})<br/>",
((item.item.name,str(item.quantity),item.price,str(item.price * item.quantity)) for item in self.itensorder_set.all()))
and in your model admin override get_queryset
def get_queryset(self, request):
return super().get_queryset(request).prefetch_related('itensorder_set')

you can user select_related inside your functions
def total(self):
itens = ItensOrder.objects.select_related('order').filter(order=self)
valor = sum(Counter(item.price * item.quantity for item in itens))
return str(valor)
def details(self):
itens = ItensOrder.objects.select_related('order').filter(order=self)
return format_html_join('\n', "{} ({} x {} = {})<br/>",
((item.item.name,str(item.quantity),item.price,str(item.price * item.quantity)) for item in itens))

Related

Optimising number of queries within a DRF ModelSerializer

Within Django Rest Framework's serialiser it is possible to add more data to the serialised object than in the original Model.
This is useful for when calculating statistical information, on the server-side, and adding this extra information when responding to an API call.
As I understand, adding extra data is done using a SerializerMethodField, where each field is implemented by a get_... function.
However, if you have a number of these SerializerMethodFields, each one can be querying the Model/database separately, for what might be essentially the same data.
Is it possible to query the database once, store the list/result as a data member of the ModelSerializer object, and use the result of the queryset in many functions?
Here's a very simple example, just for illustration:
############## Model
class Employee(Model):
SALARY_TYPE_CHOICES = (('HR', 'Hourly Rate'), ('YR', 'Annual Salary'))
salary_type = CharField(max_length=2, choices=SALARY_TYPE_CHOICES, blank=False)
salary = PositiveIntegerField(blank=True, null=True, default=0)
company = ForeignKey(Company, related_name='employees')
class Company(Model):
name = CharField(verbose_name='company name', max_length=100)
############## View
class CompanyView(RetrieveAPIView):
queryset = Company.objects.all()
lookup_field='id'
serializer_class = CompanySerialiser
class CompanyListView(ListAPIView):
queryset = Company.objects.all()
serializer_class = CompanySerialiser
############## Serializer
class CompanySerialiser(ModelSerializer):
number_employees = SerializerMethodField()
total_salaries_estimate = SerializerMethodField()
class Meta:
model = Company
fields = ['id', 'name',
'number_employees',
'total_salaries_estimate',
]
def get_number_employees(self, obj):
return obj.employees.count()
def get_total_salaries_estimate(self, obj):
employee_list = obj.employees.all()
salaries_estimate = 0
HOURS_PER_YEAR = 8*200 # 8hrs/day, 200days/year
for empl in employee_list:
if empl.salary_type == 'YR':
salaries_estimate += empl.salary
elif empl.salary_type == 'HR':
salaries_estimate += empl.salary * HOURS_PER_YEAR
return salaries_estimate
The Serialiser can be optimised to:
use an object data member to store the result from the query set,
only retrieve the queryset once,
re-use the result of the queryset for all extra information provided in SerializerMethodFields.
Example:
class CompanySerialiser(ModelSerializer):
def __init__(self, *args, **kwargs):
super(CompanySerialiser, self).__init__(*args, **kwargs)
self.employee_list = None
number_employees = SerializerMethodField()
total_salaries_estimate = SerializerMethodField()
class Meta:
model = Company
fields = ['id', 'name',
'number_employees',
'total_salaries_estimate',
]
def _populate_employee_list(self, obj):
if not self.employee_list: # Query the database only once.
self.employee_list = obj.employees.all()
def get_number_employees(self, obj):
self._populate_employee_list(obj)
return len(self.employee_list)
def get_total_salaries_estimate(self, obj):
self._populate_employee_list(obj)
salaries_estimate = 0
HOURS_PER_YEAR = 8*200 # 8hrs/day, 200days/year
for empl in self.employee_list:
if empl.salary_type == 'YR':
salaries_estimate += empl.salary
elif empl.salary_type == 'HR':
salaries_estimate += empl.salary * HOURS_PER_YEAR
return salaries_estimate
This works for the single retrieve CompanyView. And, in fact saves one query/context-switch/round-trip to the database; I've eliminated the "count" query.
However, it does not work for the list view CompanyListView, because it seems that the serialiser object is created once and reused for each Company. So, only the first Company's list of employees is stored in the objects "self.employee_list" data member, and thus, all other companies erroneously get given the data from the first company.
Is there a best practice solution to this type of problem? Or am I just wrong to use the ListAPIView, and if so, is there an alternative?

I think this issue can be solved if you can pass the queryset to the CompanySerialiser with data already fetched.
You can do the following changes
class CompanyListView(ListAPIView):
queryset = Company.objects.all().prefetch_related('employee_set')
serializer_class = CompanySerialiser`
And instead of count use len function because count does the query again.
class CompanySerialiser(ModelSerializer):
number_employees = SerializerMethodField()
total_salaries_estimate = SerializerMethodField()
class Meta:
model = Company
fields = ['id', 'name',
'number_employees',
'total_salaries_estimate',
]
def get_number_employees(self, obj):
return len(obj.employees.all())
def get_total_salaries_estimate(self, obj):
employee_list = obj.employees.all()
salaries_estimate = 0
HOURS_PER_YEAR = 8*200 # 8hrs/day, 200days/year
for empl in employee_list:
if empl.salary_type == 'YR':
salaries_estimate += empl.salary
elif empl.salary_type == 'HR':
salaries_estimate += empl.salary * HOURS_PER_YEAR
return salaries_estimate
Since the data is prefetched, serializer will not do any additional query for all. But make sure you are not doing any kind of filter because another query will execute in that case.

As mentioned by #Ritesh Agrawal, you simply need to prefetch the data. However, I advise to do the aggregations directly inside the database instead of using Python:
class CompanySerializer(ModelSerializer):
number_employees = IntegerField()
total_salaries_estimate = FloatField()
class Meta:
model = Company
fields = ['id', 'name',
'number_employees',
'total_salaries_estimate', ...
]
class CompanyListView(ListAPIView):
queryset = Company.objects.annotate(
number_employees=Count('employees'),
total_salaries_estimate=Sum(
Case(
When(employees__salary_type=Value('HR'),
then=F('employees_salary') * Value(8 * 200)
),
default=F('employees__salary'),
output_field=IntegerField() #optional a priori, because you only manipulate integers
)
)
)
serializer_class = CompanySerializer
Notes:
I haven't tested this code, but I'm using the same kind of syntax for my own projects. If you encounter errors (like 'cannot determine type of output' or similar), try wrapping F('employees_salary') * Value(8 * 200) inside an ExpressionWrapper(..., output_field=IntegerField()).
Using aggregation, you can apply filters on the queryset afterwards. However, if you're prefetching your related Employees, then you cannot filter the related objects anymore (as mentioned in the previous answer). BUT, if you already know you'll need the list of employees with hourly rate, you can do .prefetch_related(Prefetch('employees', queryset=Employee.object.filter(salary_type='HR'), to_attr="hourly_rate_employees")).
Relevant documentation:
Query optimization
Aggregation
Hope this will help you ;)

AND search with reverse relations

I'm working on a django project with the following models.
class User(models.Model):
pass
class Item(models.Model):
user = models.ForeignKey(User)
item_id = models.IntegerField()
There are about 10 million items and 100 thousand users.
My goal is to override the default admin search that takes forever and
return all the matching users that own "all" of the specified item ids within a reasonable timeframe.
These are a couple of the tests I use to better illustrate my criteria.
class TestSearch(TestCase):
def search(self, searchterm):
"""A tuple is returned with the first element as the queryset"""
return do_admin_search(User.objects.all())
def test_return_matching_users(self):
user = User.objects.create()
Item.objects.create(item_id=12345, user=user)
Item.objects.create(item_id=67890, user=user)
result = self.search('12345 67890')
assert_equal(1, result[0].count())
assert_equal(user, result[0][0])
def test_exclude_users_that_do_not_match_1(self):
user = User.objects.create()
Item.objects.create(item_id=12345, user=user)
result = self.search('12345 67890')
assert_false(result[0].exists())
def test_exclude_users_that_do_not_match_2(self):
user = User.objects.create()
result = self.search('12345 67890')
assert_false(result[0].exists())
The following snippet is my best attempt using annotate that takes over 50 seconds.
def search_by_item_ids(queryset, item_ids):
params = {}
for i in item_ids:
cond = Case(When(item__item_id=i, then=True), output_field=BooleanField())
params['has_' + str(i)] = cond
queryset = queryset.annotate(**params)
params = {}
for i in item_ids:
params['has_' + str(i)] = True
queryset = queryset.filter(**params)
return queryset
Is there anything I can do to speed it up?

Here's some quick suggestions that should improve performance drastically.
Use prefetch_related` on the initial queryset to get related items
queryset = User.objects.filter(...).prefetch_related('user_set')
Filter with the __in operator instead of looping through a list of IDs
def search_by_item_ids(queryset, item_ids):
return queryset.filter(item__item_id__in=item_ids)
Don't annotate if it's already a condition of the query
Since you know that this queryset only consists of records with ids in the item_ids list, no need to write that per object.
Putting it all together
You can speed up what you are doing drastically just by calling -
queryset = User.objects.filter(
item__item_id__in=item_ids
).prefetch_related('user_set')
with only 2 db hits for the full query.

How to prefetch descendants with django-treebeard's MP_Node?

I'm developing an application with a hierarchical data structure in django-rest-framework using django-treebeard. My (simplified) main model looks like this
class Task(MP_Node):
name = models.CharField(_('name'), max_length=64)
started = models.BooleanField(default=True)
What I'm currently trying to achieve is a list view of all root nodes which shows extra fields (such as whether all children have started). To do this I specified a view:
class TaskViewSet(viewsets.ViewSet):
def retrieve(self, request, pk=None):
queryset = Task.get_tree().filter(depth=1, job__isnull=True)
operation = get_object_or_404(queryset, pk=pk)
serializer = TaskSerializer(operation)
return Response(serializer.data)
and serializer
class TaskSerializer(serializers.ModelSerializer):
are_children_started = serializers.SerializerMethodField()
def get_are_children_started(self, obj):
return all(task.started for task in Task.get_tree(obj))
This all works and I get the expected results. However, I run into a N+1 query problem where for each root task I need to fetch all children separately. Normally this would be solvable using prefetch_related but as I use the Materialized Path structure from django-treebeard there are no Django relationships between the task models, so prefetch_related doesn't know what to do out of the box. I've tried to use custom Prefetch objects but as this still requires a Django relation path I could not get it to work.
My current idea is to extend the Task model with a foreign key pointing to its root node like such:
root_node = models.ForeignKey('self', null=True,
related_name='descendant_tasks',
verbose_name=_('root task')
)
in order to make the MP relationship explicit so it can be queried. However, this does feel like a bit of a non-dry method of doing it so I wonder whether anyone has another suggestion on how to tackle it.

In the end I did end up with adding a foreign key to each task pointing to its root node like such:
root_node = models.ForeignKey('self', null=True,
related_name='descendant_tasks',
verbose_name=_('root task')
)
I updated my save method on my Task model to make sure I always point to the correct root node
def save(self, force_insert=False, force_update=False, using=None, update_fields=None):
try:
self.root_task = self.get_root()
except ObjectDoesNotExist:
self.root_task = None
return super(Task, self).save(force_insert=False, force_update=False, using=None,
update_fields=None
)
and this allows me to simply prefetch all descendants using prefetch_related('descendants').
Whenever I need to have the descendants in a nested fashion I use the following function to nest the flattened list of descendants again
def build_nested(tasks):
def get_basepath(path, depth):
return path[0:depth * Task.steplen]
container, link = [], {}
for task in sorted(tasks, key=attrgetter('depth')):
depth = int(len(task.path) / Task.steplen)
try:
parent_path = get_basepath(task.path, depth - 1)
parent_obj = link[parent_path]
if not hasattr(parent_obj, 'sub_tasks'):
parent_obj.sub_tasks = []
parent_obj.sub_tasks.append(task)
except KeyError: # Append it as root task if no parent exists
container.append(task)
link[task.path] = task
return container

If you want to avoid using a Foreign Key you can iterate over the queryset and re-create the tree structure in memory.
In my case I wanted to have a template tag (much like django-mptt's recursetree templatetag) to show multiple levels of nested pages with only one database query. Basically copying mptt.utils.get_cached_trees I ended up with this:
def get_cached_trees(queryset: QuerySet) -> list:
"""Return top-most pages / roots.
Each page will have its children stored in `_cached_children` attribute
and its parent in `_cached_parent`. This avoids having to query the database.
"""
top_nodes: list = []
path: list = []
for obj in queryset:
obj._cached_children = []
if obj.depth == queryset[0].depth:
add_top_node(obj, top_nodes, path)
else:
while not is_child_of(obj, parent := path[-1]):
path.pop()
add_child(parent, obj)
if obj.numchild:
path.append(obj)
return top_nodes
def add_top_node(obj: MP_Node, top_nodes: list, path: list) -> None:
top_nodes.append(obj)
path.clear()
def add_child(parent: MP_Node, obj: MP_Node) -> None:
obj._cached_parent = parent
parent._cached_children.append(obj)
def is_child_of(child: MP_Node, parent: MP_Node) -> bool:
"""Return whether `child` is a sub page of `parent` without database query.
`_get_children_path_interval` is an internal method of MP_Node.
"""
start, end = parent._get_children_path_interval(parent.path)
return start < child.path < end
It can be used like this to avoid the dreaded N+1 query problem:
for page in get_cached_trees(queryset):
for child in page._cached_children:
...

Django: Building a QuerySet Mixin for a model and a related model

My question is about creating a QuerySet Mixin which provides identical QuerySet methods for both a model and a related model. Here is example code, and the first class ByPositionMixin is what I am focused on:
from django.db import models
from django.db.models.query import QuerySet
from django.core.exceptions import FieldError
class ByPositionMixin(object):
def batters(self):
try:
return self.exclude(positions=1)
except FieldError:
return self.exclude(position=1)
class PlayerQuerySet(QuerySet, ByPositionMixin):
pass
class PlayerPositionQuerySet(QuerySet, ByPositionMixin):
pass
class PlayerManager(models.Manager):
def get_query_set(self):
return PlayerQuerySet(self.model, using=self._db)
class PlayerPositionManager(models.Manager):
def get_query_set(self):
return PlayerPositionQuerySet(self.model, using=self._db)
class Position(models.Model):
# pos_list in order ('P', 'C', '1B', '2B', '3B', 'SS', 'LF', 'CF', 'RF')
# pos id / pk correspond to index value of pos_list(pos)
pos = models.CharField(max_length=2)
class Player(models.Model):
name = models.CharField(max_length=100)
positions = models.ManyToManyField(Position, through='PlayerPosition')
objects = PlayerManager()
class PlayerPosition(models.Model):
player = models.ForeignKey(Player)
position = models.ForeignKey(Position)
primary = models.BooleanField()
objects = PlayerPositionManager()
Inside ByPositionMixin, I try exclude(positions=1) which queries against PlayerQuerySet and if that generates a FieldError, I try exclude(position=1) which queries against PlayerPositionQuerySet. The difference in field names is precise, a Player() has positions, but a PlayerPosition() has only one position. So the difference it the exclude() query is 'positions' / 'position'. Since I will have many custom queries (e.g. batters(), pitchers(), by_position() etc.), do I have to write out try / except code for each one?
Or is there a different approach which would let me write custom queries without having to try against one model and then against the other one?
UPDATE: basically, I have decided to write a kwarg helper function, which provides the correct kwargs for both Player and PlayerPosition. It's a little elaborate (and perhaps completely unnecessary), but should be able to be made to simplify the code for several custom queries.
class ByPositionMixin(object):
def pkw(self, **kwargs):
# returns appropriate kwargs, at the moment, only handles one kwarg
key = kwargs.keys()[0] # e.g. 'positions__in'
value = kwargs[key]
key_args = key.split('__')
if self.model.__name__ == 'Player':
first_arg = 'positions'
elif self.model.__name__ == 'PlayerPosition':
first_arg = 'position'
else:
first_arg = key_args[0]
key = '__'.join([first_arg] + key_args[1:])
return {key: value}
def batters(self): # shows how pkw() is used
return self.exclude(**self.pkw(positions=1))

Reducing queries for manytomany models in django

EDIT:
It turns out the real question is - how do I get select_related to follow the m2m relationships I have defined? Those are the ones that are taxing my system. Any ideas?
I have two classes for my django app. The first (Item class) describes an item along with some functions that return information about the item. The second class (Itemlist class) takes a list of these items and then does some processing on them to return different values. The problem I'm having is that returning a list of items from Itemlist is taking a ton of queries, and I'm not sure where they're coming from.
class Item(models.Model):
# for archiving purposes
archive_id = models.IntegerField()
users = models.ManyToManyField(User, through='User_item_rel',
related_name='users_set')
# for many to one relationship (tags)
tag = models.ForeignKey(Tag)
sub_tag = models.CharField(default='',max_length=40)
name = models.CharField(max_length=40)
purch_date = models.DateField(default=datetime.datetime.now())
date_edited = models.DateTimeField(auto_now_add=True)
price = models.DecimalField(max_digits=6, decimal_places=2)
buyer = models.ManyToManyField(User, through='Buyer_item_rel',
related_name='buyers_set')
comments = models.CharField(default='',max_length=400)
house_id = models.IntegerField()
class Meta:
ordering = ['-purch_date']
def shortDisplayBuyers(self):
if len(self.buyer_item_rel_set.all()) != 1:
return "multiple buyers"
else:
return self.buyer_item_rel_set.all()[0].buyer.name
def listBuyers(self):
return self.buyer_item_rel_set.all()
def listUsers(self):
return self.user_item_rel_set.all()
def tag_name(self):
return self.tag
def sub_tag_name(self):
return self.sub_tag
def __unicode__(self):
return self.name
and the second class:
class Item_list:
def __init__(self, list = None, house_id = None, user_id = None,
archive_id = None, houseMode = 0):
self.list = list
self.house_id = house_id
self.uid = int(user_id)
self.archive_id = archive_id
self.gen_balancing_transactions()
self.houseMode = houseMode
def ret_list(self):
return self.list
So after I construct Itemlist with a large list of items, Itemlist.ret_list() takes up to 800 queries for 25 items. What can I do to fix this?

Try using select_related
As per a question I asked here

Dan is right in telling you to use select_related.
select_related can be read about here.
What it does is return in the same query data for the main object in your queryset and the model or fields specified in the select_related clause.
So, instead of a query like:
select * from item
followed by several queries like this every time you access one of the item_list objects:
select * from item_list where item_id = <one of the items for the query above>
the ORM will generate a query like:
select item.*, item_list.*
from item a join item_list b
where item a.id = b.item_id
In other words: it will hit the database once for all the data.

You probably want to use prefetch_related
Works similarly to select_related, but can deal with relations selected_related cannot. The join happens in python, but I've found it to be more efficient for this kind of work than the large # of queries.
Related reading on the subject

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Django: How to load related model objects when instantiating a model - django

Related

Optimising number of queries within a DRF ModelSerializer

AND search with reverse relations

How to prefetch descendants with django-treebeard's MP_Node?

Django: Building a QuerySet Mixin for a model and a related model

Reducing queries for manytomany models in django

Categories

Resources