Django Haystack: responsibilities of the template and model indexes - django

I've gone over the docs, and I've even created a few search back ends, but I"m still really confused on what these things do in haystack. Is the search back end searching the fields you put in your class that inherits indexes.SearchIndex, indexes.Indexable, or is the back end searching the text inside your template? Can someone explain this to me?
In django haystack you will create a class that defines what fields should be queried (well that's how I understand it) like so:
class ProductIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
name = indexes.CharField(model_attr='title', boost=1.75)
description = indexes.CharField(model_attr='description')
short_description = indexes.CharField(model_attr='short_description')
def get_model(self):
return Product
def index_queryset(self, using=None):
"""Used when the entire index for model is updated."""
return self.get_model().objects.filter(active=True,
published_at__lte=datetime.now())
You'll also create a template txt that will do something - I"m not sure what. I know that the search backend will go over this template during the searching algorithm.
{{ object.name }}
{{ object.description }}
{{ object.short_description }}
{% for related in object.related %}
{{ related.name }}
{{ related.description }}
{% endfor %}
{% for category in object.categories.all %}
{% if category.active %}
{{ category.name }}
{% endif %}
{% endfor %}
As you can see the template has some fields that my index class doesn't have, however, these will be searched by the search backend. So why even have fields in the index? What are the rolls of the index class, and the index template? Can someone please explain this to me.

The ProductIndex class is the main thing here. Haystack will use this configuration to index your Product model according to the fields you have chosen to be indexed and in what way. You can read more about it here.
The template which you have created will be used by this field text = indexes.CharField(document=True, use_template=True). In this template we include every important data from model or related models, why? because this is used to perform search query on all data if you don't want to lookup in just one field.
# filtering on single field
qs = SearchQuerySet().models(Product).filter(name=query)
# filtering on multiple fields
qs = SearchQuerySet().models(Product).filter(name=query).filter(description=query)
# filtering on all data where ever there is a match
qs = SearchQuerySet().models(Product).filter(text=query)

Related

Prefetching model with GenericForeignKey

I have a data structure in which a Document has many Blocks which have exactly one Paragraph or Header. A simplified implementation:
class Document(models.Model):
title = models.CharField()
class Block(models.Model):
document = models.ForeignKey(to=Document)
content_block_type = models.ForeignKey(to=ContentType)
content_block_id = models.CharField()
content_block = GenericForeignKey(
ct_field="content_block_type",
fk_field="content_block_id",
)
class Paragraph(models.Model):
text = models.TextField()
class Header(models.Model):
text = models.TextField()
level = models.SmallPositiveIntegerField()
(Note that there is an actual need for having Paragraph and Header in separate models unlike in the implementation above.)
I use jinja2 to template a Latex file for the document. Templating is slow though as jinja performs a new database query for every Block and Paragraph or Header.
template = get_template(template_name="latex_templates/document.tex", using="tex")
return template.render(context={'script': self.script})
\documentclass[a4paper,10pt]{report}
\begin{document}
{% for block in chapter.block_set.all() %}
{% if block.content_block_type.name == 'header' %}
\section{ {{- block.content_block.latex_text -}} }
{% elif block.content_block_type.name == 'paragraph' %}
{{ block.content_block.latex_text }}
{% endif %}
{% endfor %}
\end{document}
(content_block.latex_text() is a function that converts a HTML string to a Latex string)
Hence I would like to prefetch script.blocks and blocks.content_block. I understand that there are two methods for prefetching in Django:
select_related() performs a JOIN query but only works on ForeignKeys. It would work for script.blocks but not for blocks.content_block.
prefetch_related() works with GenericForeignKeys as well, but if I understand the docs correctly, it can only fetch one ContentType at a time while I have two.
Is there any way to perform the necessary prefetching here? Thank you for your help.
Not really an elegant solution but you can try using reverse generic relations:
from django.contrib.contenttypes.fields import GenericRelation
class Paragraph(models.Model):
text = models.TextField()
blocks = GenericRelation(Block, related_query_name='paragraph')
class Header(models.Model):
text = models.TextField()
level = models.SmallPositiveIntegerField()
blocks = GenericRelation(Block, related_query_name='header')
and prefetch on that:
Document.objects.prefetch_related('block_set__header', 'block_set__paragraph')
then change the template rendering to something like (not tested, will try to test later):
\documentclass[a4paper,10pt]{report}
\begin{document}
{% for block in chapter.block_set.all %}
{% if block.header %}
\section{ {{- block.header.0.latex_text -}} }
{% elif block.paragraph %}
{{ block.paragraph.0.latex_text }}
{% endif %}
{% endfor %}
\end{document}
My bad, I did not notice that document is an FK, and reverse FK can not be joined with select_related.
First of all, I would suggest to add related_name="blocks" anyway.
When you prefetch, you can pass the queryset. But you should not pass filters by doc_id, Django's ORM adds it automatically.
And if you pass the queryset, you can also add select/prefetch related call there.
blocks_qs = Block.objects.all().prefetch_related('content_block')
doc_prefetched = Document.objects.prefetch_related(
Prefetch('blocks', queryset=blocks_qs)
).get(uuid=doc_uuid)
But if you don't need extra filters or annotation, the simpler syntax would probably work for you
document = (
Document.objects
.prefecth_related('blocks', 'blocks__content_block')
.get(uuid=doc_uuid)
)

How to correctly display data from two related models in Django ListView

I have two models as below:
class Loo(models.Model):
loo_type = models.CharField(max_length=3, default="LOO"...)
loo_from = models.ForeignKey(Harr, on_delete=models.CASCADE, ...)
loo_fac = models.DecimalField(max_digits=7, decimal_places=.....)
class Woo(models.Model):
woo_item = models.AutoField(primary_key=True, ...)
woo_loo = models.ForeignKey(Loo, on_delete=models.CASCADE, ...)
woo_dt = models.DateField(null=True, ...)
woo_rate = models.DecimalField(max_digits=7, decimal_places=.....)
I am trying to display data from the models using the following listview:
class WhoLooView(ListView):
template_name = "who_loo_list.html"
context_object_name = 'wholoos'
model = Loo
def get_context_data(self, **kwargs):
context = super(WhoLooView, self).get_context_data(**kwargs)
context.update({
'woo_item_list': Woo.objects.order_by('-woo_dt'),
})
return context
def get_queryset(self):
return Loo.objects.order_by('loo_from')
Note that there can be more than one "woo_item" per instance of Loo (id), so in the listview there will be occasions when for the same Loo id it will have two instances of Woo/s, and thus need to be displayed separately (preferably in two distinct rows).
What I have tried so far creates extra (adjacent) columns for each Loo id and whichever Loo/s have a single instance of Woo, are shown in the normal fashion as expected.
How does one take care of such a situation. Can we have a nested row for cases where there are more than one instance of Woo?
Edit
What I have tried (based on your code sample):
{% for obj in wholoos %} <!-- wholoos : context object name -->
{{ obj.loo_type }}
{% for item in obj.woo_set.all %}
{{ item.woo_dt }}
{% endfor %}
{% endfor %}
But now I am not getting anything from the second model Woo.
Edit 2
I am getting the same result as earlier with my original code. Check the image below:
If you notice (in the image above), objects # 31 and 34 have one each of child objects (sub-objects). # 32 and 33 have two each. I want them to be in separate rows and not columns. The reason is, in case there are a number of items for each parent item (which my DB design makes it imperative to be), I would end up with an enormous number of extra columns for the sub-objects (and that too without any column header).
you can loop the instances of Loo as shown, in your templates, don't have to override the get_context_data.
{% for obj in object_list %}
{{ obj.loo_type }}
{% for item in obj.woo_set.all %}
{{ item.woo_dt }}
{% endfor %}{% endfor %}

django count specific rows in queryset

class Order(models.Model):
name = models.CharField(max_length=100)
# other fields..
user = models.ForeginKey(User)
old = models.BooleanField(default=False)
I want to display all the orders of a specific user, but I want to split them those which are "old" and the ones who are not.
So, currently I do in views.py:
orders = Order.objects.filter(user=user)
In template:
First table:
<table>
{% for order in orders %}
{% if not order.old %}
<tr>
<td>... </td>
</tr>
{% endif %}
{% endfor %}
</table>
And another table:
{% for order in orders %}
{% if order.old %}
<tr>
<td>...</td>
<tr>
{% endif %}
{% endfor %}
This way have some drawbacks, first, now I want to count how many of the orders are "old", to display this number in the template. I can't, unless I do another query.
Is it possible to annotate(number_of_old=Count('old'))? Or I have to do another query?
So what would be the best?
1. Do two queries, one with old=False, another with old=True, and pass two querysets to the template. And use |len filter on the querysets
2. Do one query like this and split them somehow in python? That will be less convenient as I have a similar structures which I want to split like that.
And should I call the DB .count() at all?
EDIT:
If I would write my model like this:
class Order(models.Model):
name = models.CharField(max_length=100)
# other fields..
user = models.ForeginKey(User)
old = models.BooleanField(default=False)
objects = CustomManager() # Custom manager
class CustomQueryset(models.QuerySet):
def no_old(self):
return self.filter(old=False)
class CustomManager(models.Manager):
def get_queryset(self):
return CustomQuerySet(model=self.model, using=self._db)
Is this template code produce one or two queries ?
{% if orders.no_old %}
{% for order orders.no_old %}
...
{% endfor %}
{% endif %}
You can't do any annotations, and there is no need to make .count() since you already have all the data in memory. So its really just between:
orders = Order.objects.filter(user=user)
old_orders = [o for o in orders if o.old]
new_orders = [o for o in orders if not o.old]
#or
old_orders = Order.objects.filter(user=user, old=True)
new_orders = Order.objects.filter(user=user, old=False)
In this specific scenario, I don't think there will be any performance difference. Personally I will choose the 2nd approach with the two queries.
A good read with tips about the problem: Django Database access optimization
Update
About the custom Manager which you introduce. I don't think you are doing it correctly I think what you want is this:
class CustomQueryset(models.QuerySet):
def no_old(self):
return self.filter(old=False)
class Order(models.Model):
name = models.CharField(max_length=100)
# other fields..
user = models.ForeginKey(User)
old = models.BooleanField(default=False)
#if you already have a manager
#objects = CustomManager.from_queryset(CustomQueryset)()
#if you dont:
objects = CustomQueryset.as_manager()
So having:
orders = Order.objects.filter(user=user)
If you do {% if orders.no_old %} will do another query, because this is new QuerySet instance which has no cache..
About the {% regroup %} tag
As you mention, in order to use it, you need to .order_by('old'), and if you have another order, you can still use it, just apply your order after the old, e.g. .order_by('old', 'another_field'). This way you will use only one Query and this will save you one iteration over the list (because Django will split the list iterating it only once), but you will get less readability in the template.

Django: retrieving ManyToManyField objects with minimum set of queries

My code looks like this:
models.py
class Tag(models.Model):
name = models.CharField(max_length=42)
class Post(models.Model):
user = models.ForeignKey(User, related_name='post')
#...various fields...
tags = models.ManyToManyField(Tag, null=True)
views.py
posts = Post.objects.all().values('id', 'user', 'title')
tags_dict = {}
for post in posts: # Iteration? Why?
p = Post.objects.get(pk=[post['id']]) # one extra query? Why?
tags_dict[post['id']] = p.tags.all()
How am I supposed to create a dictionary with tags for each Post object with minimum set of queries? Is it possible to avoid iterating, too?
Yes you will need a loop. But you can save one extra query in each iteration, you don't need to get post object to get all its tags. You can directly query on Tag model to get tags related to post id:
for post in posts:
tags_dict[post['id']] = Tag.objects.filter(post__id=post['id'])
Or use Dict Comprehension for efficiency:
tags_dict = {post['id']: Tag.objects.filter(post__id=post['id']) for post in posts}
If you have Django version >= 1.4 and don't really need a dictionary, but need to cut down the count of queries, you can use this method like this:
posts = Post.objects.all().only('id', 'user', 'title').prefetch_related('tags')
It seems to execute only 2 queries (one for Post and another for Tag with INNER JOIN).
And then you can access post.tags.all without extra queries, because tags was already prefetched.
{% for post in posts %}
{% for tag in post.tags.all %}
{{ tag.name }}
{% endfor %}
{% endfor %}

Using multiple model fields to regroup list in Django template

I'm using the regroup tag in a Django template to list a number of items, grouped by Customer. My model is:
class Customer(models.Model):
name = models.CharField(max_length=25)
city = models.CharField(max_length=25)
I can list the items customer.name (or customer.city), by what I really want is to order them as "Name, City". According to the documentation "Any valid template lookup is a legal grouping attribute for the regroup tag, including methods, attributes, dictionary keys and list items." [1] How to I define a method for this? And how do I call it from my template?
[1] https://docs.djangoproject.com/en/dev/ref/templates/builtins/
Update: As i understand the regroup functionality, and are using it now, I group the list using one of the object's fields. The separator, customer.grouper, displays the name of that particular field. In my case "customer.name" or "customer.city". My goal is to present this together, like "customer.name, customer.city" (i.e. "Microsoft, Redmond"). The documentation mentions this briefly but I cannot figure it out.
def display_name(self):
return "%s, %s" (self.name, self.city)
I have tried a method like above, as part of my Customer model, to fix my problem. But I'm not sure how to call it from my template.
Your model function is correct but it should contain a % before the braces:
def display_name(self):
return "%s, %s" %(self.name, self.city)
Your view should pass a list of objects and not values.
Let the list be tp, so your template code should be something like this:
{% regroup tp by display_name as tp_list %}
<ul>
{% for t in tp_list %}
<li>{{ t.grouper }}
<ul>
{% for item in t.list %}
<...something of your code....>
{% endfor %}
</ul>
</li>
{% endfor %}
</ul>
This should work out for you well enough.