Prefetching model with GenericForeignKey - django

I have a data structure in which a Document has many Blocks which have exactly one Paragraph or Header. A simplified implementation:
class Document(models.Model):
title = models.CharField()
class Block(models.Model):
document = models.ForeignKey(to=Document)
content_block_type = models.ForeignKey(to=ContentType)
content_block_id = models.CharField()
content_block = GenericForeignKey(
ct_field="content_block_type",
fk_field="content_block_id",
)
class Paragraph(models.Model):
text = models.TextField()
class Header(models.Model):
text = models.TextField()
level = models.SmallPositiveIntegerField()
(Note that there is an actual need for having Paragraph and Header in separate models unlike in the implementation above.)
I use jinja2 to template a Latex file for the document. Templating is slow though as jinja performs a new database query for every Block and Paragraph or Header.
template = get_template(template_name="latex_templates/document.tex", using="tex")
return template.render(context={'script': self.script})
\documentclass[a4paper,10pt]{report}
\begin{document}
{% for block in chapter.block_set.all() %}
{% if block.content_block_type.name == 'header' %}
\section{ {{- block.content_block.latex_text -}} }
{% elif block.content_block_type.name == 'paragraph' %}
{{ block.content_block.latex_text }}
{% endif %}
{% endfor %}
\end{document}
(content_block.latex_text() is a function that converts a HTML string to a Latex string)
Hence I would like to prefetch script.blocks and blocks.content_block. I understand that there are two methods for prefetching in Django:
select_related() performs a JOIN query but only works on ForeignKeys. It would work for script.blocks but not for blocks.content_block.
prefetch_related() works with GenericForeignKeys as well, but if I understand the docs correctly, it can only fetch one ContentType at a time while I have two.
Is there any way to perform the necessary prefetching here? Thank you for your help.

Not really an elegant solution but you can try using reverse generic relations:
from django.contrib.contenttypes.fields import GenericRelation
class Paragraph(models.Model):
text = models.TextField()
blocks = GenericRelation(Block, related_query_name='paragraph')
class Header(models.Model):
text = models.TextField()
level = models.SmallPositiveIntegerField()
blocks = GenericRelation(Block, related_query_name='header')
and prefetch on that:
Document.objects.prefetch_related('block_set__header', 'block_set__paragraph')
then change the template rendering to something like (not tested, will try to test later):
\documentclass[a4paper,10pt]{report}
\begin{document}
{% for block in chapter.block_set.all %}
{% if block.header %}
\section{ {{- block.header.0.latex_text -}} }
{% elif block.paragraph %}
{{ block.paragraph.0.latex_text }}
{% endif %}
{% endfor %}
\end{document}

My bad, I did not notice that document is an FK, and reverse FK can not be joined with select_related.
First of all, I would suggest to add related_name="blocks" anyway.
When you prefetch, you can pass the queryset. But you should not pass filters by doc_id, Django's ORM adds it automatically.
And if you pass the queryset, you can also add select/prefetch related call there.
blocks_qs = Block.objects.all().prefetch_related('content_block')
doc_prefetched = Document.objects.prefetch_related(
Prefetch('blocks', queryset=blocks_qs)
).get(uuid=doc_uuid)
But if you don't need extra filters or annotation, the simpler syntax would probably work for you
document = (
Document.objects
.prefecth_related('blocks', 'blocks__content_block')
.get(uuid=doc_uuid)
)

Related

Joining Models And Passing To Template

I have multiple related models and on a single page I need to access data from different models on a variety of conditions. Most of this data is accessed without a form as only a few fields need to be manipulated by the user.
To give you an idea of how interconnected my Models are:
class User(models.Model):
first = models.ManyToManyField(First)
second= models.ManyToManyField(Second)
class First(models.Model):
[...]
class Second(models.Model):
first = models.ManyToManyField(First)
class Third(models.Model):
first = models.ForeignKey(First)
class Fourth(models.Model):
user = models.ForeignKey(User)
second = models.ForeignKey(Second)
third = models.ForeignKey(Third)
class Fifth(models.Model):
fourth = models.ForeignKey(Fourth)
class Sixth(models.Model):
fifth = models.ForeignKey(Fifth)
I'm having a lot of difficulty passing this data to my template and displaying it in efficiently. I originally displayed it to the user in a horrendous way as I just wanted to see if my data was accessible/displaying properly. This is an example of some of the dodgy code I used to test that my data was being passed properly:
{% for second in user.second.all %}
{% for first in second.first.all %}
[...]
{% for fourth in user.fourth.all %}
[...]
{% if fourth.first_id == first.id and fourth.second_id == second.id %}
{% if fourth.checked == True %}
[...]
{% else %}
[...]
{% endif %}
{% endif %}
{% endfor %}
{% endfor %}
{% endfor %}
[...]
Isn't it an abomination, and it goes deeper. Now I am trying to reformat this. I know that my processing should be done within my view rather than my template. The thing is I can't just filter out a lot of the data in my views as I need a lot of it in the template at different times (eg I might have 20 pieces of data in a model and I need to access all 20, but some when the id match, some when the id and type match etc. It's a lot of template side logic - I think that's bad).
This is the kind of thing I have been trying to do so far but am having no luck:
second = Second.objects.filter(user__id=user.id) .select_related('First')
Any help with how to join/pass all these models to my view and access the data without the nested nested loops, or even just pointers on how to approach this would be appreciated. I's also unsure if I should be aiming to join as many models as possible then pass into my view or pass many separate models.
Thank you.
you are serializer the data use a drf serializer you will need to use a nested serializer
example for nested serializer
class HospitalsSerializer(serializers.ModelSerializer):
class Meta:
model = Hospitals
fields = '__all__'
class HealthProductSerializers(serializers.ModelSerializer):
hospital_list = HospitalsSerializer(many=True)
class Meta:
model = HealthProduct
exclude = ("created_at", "updated_at")

django count specific rows in queryset

class Order(models.Model):
name = models.CharField(max_length=100)
# other fields..
user = models.ForeginKey(User)
old = models.BooleanField(default=False)
I want to display all the orders of a specific user, but I want to split them those which are "old" and the ones who are not.
So, currently I do in views.py:
orders = Order.objects.filter(user=user)
In template:
First table:
<table>
{% for order in orders %}
{% if not order.old %}
<tr>
<td>... </td>
</tr>
{% endif %}
{% endfor %}
</table>
And another table:
{% for order in orders %}
{% if order.old %}
<tr>
<td>...</td>
<tr>
{% endif %}
{% endfor %}
This way have some drawbacks, first, now I want to count how many of the orders are "old", to display this number in the template. I can't, unless I do another query.
Is it possible to annotate(number_of_old=Count('old'))? Or I have to do another query?
So what would be the best?
1. Do two queries, one with old=False, another with old=True, and pass two querysets to the template. And use |len filter on the querysets
2. Do one query like this and split them somehow in python? That will be less convenient as I have a similar structures which I want to split like that.
And should I call the DB .count() at all?
EDIT:
If I would write my model like this:
class Order(models.Model):
name = models.CharField(max_length=100)
# other fields..
user = models.ForeginKey(User)
old = models.BooleanField(default=False)
objects = CustomManager() # Custom manager
class CustomQueryset(models.QuerySet):
def no_old(self):
return self.filter(old=False)
class CustomManager(models.Manager):
def get_queryset(self):
return CustomQuerySet(model=self.model, using=self._db)
Is this template code produce one or two queries ?
{% if orders.no_old %}
{% for order orders.no_old %}
...
{% endfor %}
{% endif %}
You can't do any annotations, and there is no need to make .count() since you already have all the data in memory. So its really just between:
orders = Order.objects.filter(user=user)
old_orders = [o for o in orders if o.old]
new_orders = [o for o in orders if not o.old]
#or
old_orders = Order.objects.filter(user=user, old=True)
new_orders = Order.objects.filter(user=user, old=False)
In this specific scenario, I don't think there will be any performance difference. Personally I will choose the 2nd approach with the two queries.
A good read with tips about the problem: Django Database access optimization
Update
About the custom Manager which you introduce. I don't think you are doing it correctly I think what you want is this:
class CustomQueryset(models.QuerySet):
def no_old(self):
return self.filter(old=False)
class Order(models.Model):
name = models.CharField(max_length=100)
# other fields..
user = models.ForeginKey(User)
old = models.BooleanField(default=False)
#if you already have a manager
#objects = CustomManager.from_queryset(CustomQueryset)()
#if you dont:
objects = CustomQueryset.as_manager()
So having:
orders = Order.objects.filter(user=user)
If you do {% if orders.no_old %} will do another query, because this is new QuerySet instance which has no cache..
About the {% regroup %} tag
As you mention, in order to use it, you need to .order_by('old'), and if you have another order, you can still use it, just apply your order after the old, e.g. .order_by('old', 'another_field'). This way you will use only one Query and this will save you one iteration over the list (because Django will split the list iterating it only once), but you will get less readability in the template.

Django Haystack: responsibilities of the template and model indexes

I've gone over the docs, and I've even created a few search back ends, but I"m still really confused on what these things do in haystack. Is the search back end searching the fields you put in your class that inherits indexes.SearchIndex, indexes.Indexable, or is the back end searching the text inside your template? Can someone explain this to me?
In django haystack you will create a class that defines what fields should be queried (well that's how I understand it) like so:
class ProductIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
name = indexes.CharField(model_attr='title', boost=1.75)
description = indexes.CharField(model_attr='description')
short_description = indexes.CharField(model_attr='short_description')
def get_model(self):
return Product
def index_queryset(self, using=None):
"""Used when the entire index for model is updated."""
return self.get_model().objects.filter(active=True,
published_at__lte=datetime.now())
You'll also create a template txt that will do something - I"m not sure what. I know that the search backend will go over this template during the searching algorithm.
{{ object.name }}
{{ object.description }}
{{ object.short_description }}
{% for related in object.related %}
{{ related.name }}
{{ related.description }}
{% endfor %}
{% for category in object.categories.all %}
{% if category.active %}
{{ category.name }}
{% endif %}
{% endfor %}
As you can see the template has some fields that my index class doesn't have, however, these will be searched by the search backend. So why even have fields in the index? What are the rolls of the index class, and the index template? Can someone please explain this to me.
The ProductIndex class is the main thing here. Haystack will use this configuration to index your Product model according to the fields you have chosen to be indexed and in what way. You can read more about it here.
The template which you have created will be used by this field text = indexes.CharField(document=True, use_template=True). In this template we include every important data from model or related models, why? because this is used to perform search query on all data if you don't want to lookup in just one field.
# filtering on single field
qs = SearchQuerySet().models(Product).filter(name=query)
# filtering on multiple fields
qs = SearchQuerySet().models(Product).filter(name=query).filter(description=query)
# filtering on all data where ever there is a match
qs = SearchQuerySet().models(Product).filter(text=query)

Django: retrieving ManyToManyField objects with minimum set of queries

My code looks like this:
models.py
class Tag(models.Model):
name = models.CharField(max_length=42)
class Post(models.Model):
user = models.ForeignKey(User, related_name='post')
#...various fields...
tags = models.ManyToManyField(Tag, null=True)
views.py
posts = Post.objects.all().values('id', 'user', 'title')
tags_dict = {}
for post in posts: # Iteration? Why?
p = Post.objects.get(pk=[post['id']]) # one extra query? Why?
tags_dict[post['id']] = p.tags.all()
How am I supposed to create a dictionary with tags for each Post object with minimum set of queries? Is it possible to avoid iterating, too?
Yes you will need a loop. But you can save one extra query in each iteration, you don't need to get post object to get all its tags. You can directly query on Tag model to get tags related to post id:
for post in posts:
tags_dict[post['id']] = Tag.objects.filter(post__id=post['id'])
Or use Dict Comprehension for efficiency:
tags_dict = {post['id']: Tag.objects.filter(post__id=post['id']) for post in posts}
If you have Django version >= 1.4 and don't really need a dictionary, but need to cut down the count of queries, you can use this method like this:
posts = Post.objects.all().only('id', 'user', 'title').prefetch_related('tags')
It seems to execute only 2 queries (one for Post and another for Tag with INNER JOIN).
And then you can access post.tags.all without extra queries, because tags was already prefetched.
{% for post in posts %}
{% for tag in post.tags.all %}
{{ tag.name }}
{% endfor %}
{% endfor %}

Iterating over results, sorted by a m2m

My data model is simple:
class Neighborhood(models.Model):
name = models.CharField(max_length = 50)
slug = models.SlugField()
class Location(models.Model):
company = models.ForeignKey(Company)
alt_name = models.CharField()
neighborhoods = models.ManyToManyField(Neighborhood)
I would like to supply a page on my site that lists all locations by their neighborhood (s). If it were singular, I think {% regroup %} with a {% ifchanged %} applied to the neighborhood name would be all that I need, but in my case, having it be a m2m, I'm not sure how do this. A location may have multiple neighborhoods, and so I would like them to be redundantly displayed under each matching neighborhood.
I'm also aware of FOO_set but that's per Object; I want to load the entire data set.
The final result (in the template) should be something like:
Alameda
Crazy Thai
Castro
Kellys Burgers
Pizza Orgasmica
Filmore
Kellys Burgers
Some Jazz Bar
Mission
Crazy Thai
Elixir
...
The template syntax would (ideally?) look something like:
{% for neighborhood in neighborhood_list %}
{% ifchanged %}{{ neighborhood.name }}{% endifchanged %}
{% for location in neighborhood.restaurants.all %}
{{ location.name }}
{% endfor %}
{% endfor %}
I'd just do it the expensive way and cache the result over scratching my head. Your template example would work fine until performance becomes an issue in generating that one page per X cache timeout.
You could do it in python as well if the result set is small enough:
# untested code - don't have your models
from collections import defaultdict
results = defaultdict(list)
for location_m2m in Location.neighborhoods.through.objects.all() \ # line wrap
.select_related('neighborhood', 'location', 'location__company__name'):
if location_m2m.location not in results[location_m2m.neighborhood]:
results[location_m2m.neighborhood].append(location_m2m.location)
# sort now unique locations per neighborhood
map(lambda x: x.sort(key=lambda x: x.location.company.name), results.values())
# sort neighborhoods by name
sorted_results = sorted(results.items(), key=lambda x:x[0].name)
create another model that would define your m2m relationship:
class ThroughModel(models.Model):
location = models.ForeignKey('Location')
neighborhood = models.ForeignKey('Neighborhood')
and use it as the through model for your m2m field:
class Location(models.Model):
...
neighborhoods = models.ManyToManyField(Neighborhood, through='ThroughModel')
Then you can get all your neighbourhoods sorted:
ThroughModel.objects.select_related().order_by('neighborhood__name')
This is untested.
Or, if you cannot change the database, just do a raw SQL join.