Django: pagination with prefetch_related - django

I have a model specifications which are divided into categories. For the template, in order to display the category as an header in the table, I make a prefetch_related on the Category like this:
categories = Category.objects.distinct().prefetch_related('specifications').filter(filters)
Now I can loop over the categories and show the related specifications like:
{% for category in categories %}
<tr>
<th colspan="7">{{ category.name }} - ({{ category.abbr }})</th>
</tr>
{% for specification in category.specifications.all %}
...
I also want to use the paginator, but now the paginator only counts the categories and not the related specifications. Is it possible to paginate on the specifications with the given query or should I change the query to retrieve all the specifications?

Use Prefetch
categories = Category.objects.distinct().filter(filters)
category_ids = categories.values_list('id', flat=True) # category ids on page
categories = categories.prefetch_related(
Prefetch(
'specifications',
queryset=Specialization.objects.filter(category_id__in=category_ids)
)
)
Here it creates another db request (to fetch category ids) but it will cost less than prefetch all specializations I think. It depends on your data structure but it definitely one of solutions.

Have you tried django-tables2?
With it you could just simply render the table with something like:
Create a CategoryTable class and add it to your view:
class CategoryTable(tables.Table):
class Meta:
model = Category
def your_view(request):
...
categories = Category.objects.distinct()
.prefetch_related('specifications')
.filter(filters)
table = CategoryTable(categories)
table.paginate(page=request.GET.get("page", 1), per_page=25)
return render(request, "category_template.html", {"table": table})
Then, in your template just put:
{% load django_tables2 %}
{% render_table table %}

What you are trying to achieve is an anti-pattern of prefetch_related. Prefetch is to fetch "all" the related rows, but the use case is to paginate the specifications.
Prefetch would be good in cases where the number of related rows per parent row is ~5 (or upto 10, remember that you are wasting DB network bandwidth growing prefetched child rows. So if you are considering pagination, it is best to avoid prefetch)
Note: Child rows = specifications, Parent rows = categories, in your use case.
My answer would be to avoid using prefetch in this case. Just use the following
categories = Category.objects.distinct().filter(filters)
--
{% for category in categories %}
<tr>
<th colspan="7">{{ category.name }} - ({{ category.abbr }})</th>
</tr>
# Use some table lib like django-tables
...
If this is a simple internal project, do go ahead with prefetch, no problems, otherwise you are going to hit DB performance issues.

Related

Django: Join two tables in a single query

In my template I have a table which I want to populate with data from two different tables which use foreign-keys.
The solution I find while googling around is to use values(), example of my queryset below:
data = Table1.objects.all().filter(date=today).values('table2__price')
My template
{% for item in data %}
<tr class="original">
<td>{{ item.fruit }}</td>
<td>{{ item.table2__price }}</td>
</tr>
{% endfor %}
It do fetch price from table2, but it also make the data (item.fruit) from Table1 disappear. If I remove the values field, Table1 populates accordingly.
Anyone have any feedback of what I am doing wrong?
Thanks
If you make use of .values(…) [Django-doc], then you retrieve a queryset of dictionaries, that only contains the values specified. This is not a good idea. Not only because as you found out, it wiill thus no longer use the values in the model, but you will "erode" the model layer as well: it is no longer a Table1 object, but a dictionary. Which is a form of a primitive obsession antipattern [refactoring.guru].
You can make use of .annotate(…) [Django-doc] to add extra attributes for the elements that arise from this queryset:
from django.db.models import F
data = Table1.objects.filter(date=today).annotate(
table2__price=F('table2__price')
)
If there is however a ForeignKey (or OneToOneField) from Table1 to Table2, you can select the columns of Table2 in the same query with .select_related(…) [Django-doc]:
data = Table1.objects.filter(date=today).select_related('table2')
in the template, you can then use:
{{ item.fruit }}
{{ item.table2.price }}
{{ item.table2.other_attr }}
The .select_related(…) part is not strictly necessary, but without it, it will make an extra query per item, thus this will produce an N+1 problem.

Using queryset manager with prefetch_related

I have been succesfully using this brilliant technique to keep my code DRY encapsulating ORM relations in querysets so that code in views is simple and not containing foreign key dependency. But this time I face the following issue best descriped by code:
View:
vendors_qs = vendors_qs.select_related().prefetch_related('agreement_vendors')
Model
class AgreementVendorsQuerySet(models.query.QuerySet):
def some_filter1(self, option):
result = self.filter(.....)
return result
def some_filter2(self, option):
result = self.filter(.....)
return result
And a template:
{% for vendor in vendors_qs %}
<tr>
...
<td>
{% for vend_agr in vendor.agreement_vendors.all %}
{{vend_agr.serial_number}}
{% endfor %}
<td>
</tr>
{% endfor %}
The question is, how and where do I apply the some_filter1 to vendor agreements given that it is fetched as prefetch_related relation. Do I have to apply the filter in the template somehow or in the view itself ?
If I didn't put the question clearly enough, I will ask your questions to clarify further...
UPDATE:
Anna's asnwer looks very much like the truth, but one detail remains unclear. What if I want to apply several filters based on if-condition. For exapmle, if the filters were to apply to vendors, then the code would simply look like:
if (condition_1)
vendors_qs = vendors_qs.filter1()
if (condition_2)
vendors_qs = vendors_qs.filter2()
If I clearly understand your question you need something like this
vendors_qs = vendors_qs.prefetch_related(models.Prefetch('agreement_vendors', queryset=some_filter, to_attr='agreement_vendors_list'))
And then in template you can call it like {% for vend_agr in vendor.agreement_vendors_list %}

Django contraint filtered models to include set that matches first criteria

I want to iterate over user matching entities to avoid grouping in helper array first.
How from query like this I can make User.stone_set to contain only ones that matched the filter?
users = User.objects.filter(payment__due_day__lte=datetime.today()+3)
Now each user should have only payment instances that mateched the filter, but users.payment_set is a manager not really related to it.
Clarification
Here is my model:
class Payment(models.Model):
owner = models.ForeignKey(User)
due_day = models.IntegerField()
now, query like this: User.objects.filter(payment__due_day=10) will give me users that have payment due day 10. Now iterating over those users objects I want it to have only those payment I queried for the first time (only those that have payment with due_day = 10), without quering single user again for their payments.
If I understand your question, in the end you want a queryset of Payment objects.
Then, start from that model, not User.
payments = Payment.objects.filter(due_day=10).select_related('owner').order_by('owner')
After that, you can iterate over the queryset and get the owner (user) of each payment without any extra query, because select_related will do a SQL join for you in the background.
The order_by clause is needed if you have multiple payments for each user and you need to show that. In your template you can use the regroup built-in template tag.
For example:
# In your view (EXTREMELY simplified)
def my_view(request):
payments = Payment.objects.filter(due_day=10).select_related('owner').order_by('owner')
return render_to_response('template.html', {'payments': payments})
# In your template
{% regroup payments by owner as payment_list %}
<ul>
{% for payment in payment_list %}
<li>{{ payment.grouper }} <!-- this is the owner -->
<ul>
{% for item in payment.list %}
<li>{{ item.due_day }}</li>
{% endfor %}
</ul>
</li>
{% endfor %}
</ul>
If, instead, you want to achieve that in bare python, the best solution is to use itertools.groupby (which is, in fact, used by the regroup tag).
A quick and untested example:
from itertools import groupby
from operator import attrgetter
payments = Payment.objects.filter(due_day=10).select_related('owner').order_by('owner')
for owner, grouped_payments in groupby(payments, attrgetter('owner')):
do_stuff_with(owner)
do_some_other_stuff_with(list(grouped_payments))
Be sure to read the docs.

django best practice query foreign key

I am trying to understand the best way to structure queries in django to avoid excessive database hits.
This is similar to the question: Django best practice with foreign key queries, but involves greater 'depth' in the queries.
My situation:
models.py
class Bucket(models.Model):
categories = models.ManyToManyField('Category')
class Category(models.Model):
name = models.CharField(max_length=50)
class SubCategory(models.Model):
category = models.ForeignKey(Category)
class SubSubCategory(models.Model):
subcat = models.ForeignKey(SubCategory)
views.py
def showbucket(request, id):
bucket = Bucket.objects.prefetch_related('categories',).get(pk=id)
cats = bucket.categories.prefetch_related('subcategory_set__subsubcategory_set',)
return render_to_response('showbucket.html', locals(), context_instance=RequestContext(request))
and relevant template:
{% for c in cats %}
{{c}}
<ul>
{% for d in c.subcategory_set.all %}
<li>{{d}}</li>
<ul>
{% for e in d.subsubcategory_set.all %}
<li>{{e}}</li>
{% endfor %}
</ul>
{% endfor %}
</ul>
{% endfor %}
Despite the use of prefetch_related(), I seem to be hitting the database each time the top two for statements are evaluated, e.g. {% for c in cats %}, (at least I believe so from reading the debug_toolbar). Other ways I've tried have ended up with (C x D x E) number of database hits. Is this something inherently wrong with my use of prefetch, queries, or models? What is the best way in Django to access database objects with a "depth > 1" so-to-speak?
Use select_related() instead:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#django.db.models.query.QuerySet.select_related
bucket = Bucket.objects.select_related('categories',).get(id=id)
cats = bucket.categories.select_related('subcategory_set__subsubcategory_set',)
So, i found out there's a couple things going on here:
First, my current understanding on select_related vs prefetch_related:
select_related() follows foreign-key relationships, causing larger result sets but means that later use of FK won't require additional database hits. It is limited to FK and one-to-one relationships.
prefetch_related() does a separate lookup for each relationship and joins them in python, and is means to be used for many-to-many, many-to-one, and GenericRelation and GenericForeignKey.
By the book, I should be using prefetch(), as I was not 'following' the Foreign Keys.
That's what I had understood going into this, but my template seemed to be causing additional queries when evaluating the given for loops in the template, even when I added the use of {with} tags.
At first, I had thought I had discovered something similar to this issue, but I am unable to replicate when I built out my simplified example. I switched from using the debug toolbar to direct checking using the following template code (in the article Tracking SQL Queries for a Request using Django by Karen Tracey, I would link but am link-limited):
{% with sql_queries|length as qcount %}
{{ qcount }} quer{{ qcount|pluralize:"y,ies" }}
{% for qdict in sql_queries %}
{{ qdict.sql }} ({{ qdict.time }} seconds)
{% endfor %}
{% endwith %}
Using this method, I am only seeing 5 queries for using pre-fetch() (7 with debug_toolbar), and queries grow linearly when using select_related() (with +2 for debug_toolbar), which I believe is expected.
I will gladly take any other advice/tools on getting a handle on these types of issues.

Filter Django Haystack results like QuerySet?

Is it possible to combine a Django Haystack search with "built-in" QuerySet filter operations, specifically filtering with Q() instances and lookup types not supported by SearchQuerySet? In either order:
haystack-searched -> queryset-filtered
or
queryset-filtered -> haystack-searched
Browsing the Django Haystack documentation didn't give any directions how to do this.
You could filter your queryset based on the results of a Haystack search, using the objects' PKs:
def view(request):
if request.GET.get('q'):
from haystack import ModelSearchForm
form = ModelSearchForm(request.GET, searchqueryset=None, load_all=True)
searchqueryset = form.search()
results = [ r.pk for r in searchqueryset ]
docs = Document.objects.filter(pk__in=results)
# do something with your plain old regular queryset
return render_to_response('results.html', {'documents': docs});
Not sure how this scales, but for small resultsets (a few hundred, in my case), this works fine.
From the docs:
SearchQuerySet.load_all(self)
Efficiently populates the objects in the search results. Without using
this method, DB lookups are done on a per-object basis, resulting in
many individual trips to the database. If load_all is used, the
SearchQuerySet will group similar objects into a single query,
resulting in only as many queries as there are different object types
returned.
http://django-haystack.readthedocs.org/en/latest/searchqueryset_api.html#load-all
Therefore, after you have a filtered SQS, you can do a load_all() on it and just access the database objects via SearchResult.object. E.g.
sqs = SearchQuerySet()
# filter as needed, then load_all
sqs = sqs.load_all()
for result in sqs:
my_obj = result.object
# my_obj is a your model object
If you want to keep up with the pertinence, you have to access the object from the database through "object" :
example in your template:
{% for result in results %}
{{ result.object.title }}
{{ result.objects.author }}
{% endfor %}
But this is really bad since haystack will make an extra request like "SELECT * FROM blah WHERE id = 42" on each results.
Seems you're trying to get those object from your database because you didn't put some extra fields in your index ins't it ? If you add the title AND the author in your SearchIndex, then you can just use your results:
{% for result in results %}
{{ result.title }}
{{ result.author }}
{% endfor %}
and avoid some extra queries.