Django-Haystack - How to faceting? - django

I'm using Django-Haystack with ElasticSearch.
I will need to have Faceting.
The Django-Haystack documentation says:
You generally create a unique SearchIndex for each type of Model you wish to index, though you can reuse the same SearchIndex between different models if you take care in doing so and your field names are very standardized.
My doubt is, to get faceting working I can use one index per Model or I must to create a unique index?

You can have multiple index classes, but each of them should have the field you will be faceting on.

Related

Filter List in Django JSON Field with string contains

My Django JSON field contains a list of values like ["N05BB01", "R06AX33"].
atc_code = JSONField(default=list())
I would like to filter this field, for 'does any string in list contain "N05"?'.
like
mymodel.objects.filter(?????)
In this case if you are not using SQLite or Oracle, you can use contains:
mymodel.objects.filter(atc_code__contains='N05')
Which generates this SQL:
SELECT * FROM "mymodel" WHERE UPPER("mymodel"."atc_code"::text) LIKE UPPER(%N05%)
Relational-based answer
Usually, such an approach (with a list of values in JSONField) can be possible when the relational structure is used in the wrong way.
The best approach here:
Create a new model that describes your atc_code entity. For example AtcCode
Depends on the meaning of the atc_code and its relation to the MainEnity use ForeignKeyField or ManyToManyField
Utilize all pros from the relational database and powerful Django ORM with such built-in features as filtering, adding, removing, querying with any database backend.
It will work on any supported database. A relational database will work faster when you are using relations properly.
My recommendation is to use JSONField when you have a really unstructured object.

Ordering Django querysets using a JSONField's properties

I have a model that kinda looks like this:
class Person(models.Model):
data = JSONField()
The data field has 2 properties, name, and age. Now, lets say I want to get a paginated queryset (each page containing 20 people), with a filter where age is greater than 25, and the queryset is to be ordered in descending order. In a usual setup, that is, a normalized database, I can write this query like so:
person_list_page_1 = Person.objects.filter(age > 25).order_by('-age')[:20]
Now, what is the equivalence of the above when filtering and ordering using keys stored in the JSONField? I have researched into this, and it seems it was meant to be a feature for 2.1, but I can't seem to find anything relevant.
Link to the ticket about it being implemented in the future
I also have another question. Lets say we filter and order using the JSONField. Will the ORM have to get all the objects, filter, and order them before sending the first 20 in such a case? That is, will performance be legitimately slower?
Obviously, I know a normalized database is far better for these things, but my hands are kinda tied.
You can use the postgresql sql syntax to extract subfields. Then they can be used just as any other field on the model in queryset filters.
from django.db.models.expressions import RawSQL
Person.objects.annotate(
age=RawSQL("(data->>'age')::int", [])
).filter(age__gte=25).order_by('-age')[:20]
See the postgresql docs for other operators and functions.
In some cases, you might have to add explicit typecasts (::int, for example)
https://www.postgresql.org/docs/current/static/functions-json.html
Performance will be slower than with a proper field, but it's not bad.

How to return only indexed objects of a specific type in Haystack

Is there any way to use SearchQuerySet and restrict the results to only a specific indexed model? i.e. If i add Note and NoteIndex to Haystack, can I pull out just results that correspond to Note instances?
EDIT:
I have had a look and found that there is a reserved field named django_ct that is stored on every indexed model. Is it possible to filter on this field? What values does it take?
DOUBLE EDIT:
Nevermind. After reading the Haystack source code, django_ct is 'appname.modelname' internally and can be querired with SearchQuerySet.filter(django_ct = 'appname.modelname')
According to the Haystack documentation, a SearchQueryset object has a method called models() that restricts the results to those models.
e.g.
SearchQuerySet().models(BlogEntry, Comment).filter(content='foo')
As you see, it uses the actual model class. My guess is that it uses this to lookup the content-type to perform filter.

Filter on a list of tags

I'm trying to select all the songs in my Django database whose tag is any of those in a given list. There is a Song model, a Tag model, and a SongTag model (for the many to many relationship).
This is my attempt:
taglist = ["cool", "great"]
tags = Tag.objects.filter(name__in=taglist).values_list('id', flat=True)
song_tags = SongTag.objects.filter(tag__in=list(tags))
At this point I'm getting an error:
DatabaseError: MultiQuery does not support keys_only.
What am I getting wrong? If you can suggest a completely different approach to the problem, it would be more than welcome too!
EDIT: I should have mentioned I'm using Django on Google AppEngine with django-nonrel
You shouldn't use m2m relationship with AppEngine. NoSQL databases (and BigTable is one of them) generally don't support JOINs, and programmer is supposed to denormalize the data structure. This is a deliberate design desicion: while your database will contain redundant data, your read queries will be much simpler (no need to combine data from 3 tables), which in turn makes the design of DB server much simpler as well (of course this is made for the sake of optimization and scaling)
In your case you should probably get rid of Tag and SongTag models, and just store the tag in the Song model as a string. I of course assume that Tag model only contains id and name, if Tag in fact contains more data, you should still have Tag model. Song model in that case should contain both tag_id and tag_name. The idea, as I explained above, is to introduce redundancy for the sake of simpler queries
Please, please let the ORM build the query for you:
song_tags = SongTag.objects.filter(tag__name__in = taglist)
You should try to use only one query, so that Django also generates only one query using a join.
Something like this should work:
Song.objects.filter(tags__name__in=taglist)
You may need to change some names from this example (most likely the tags in tags__name__in), see https://docs.djangoproject.com/en/1.3/ref/models/relations/.

Django - How to annotate QuerySet using multiple field values?

I have a model called "Story" that has two integer fields called "views" and "votes". When I retrieve all the Story objects I would like to annotate the returned QuerySet with a "ranking" field that is simply "views"/"votes". Then I would like to sort the QuerySet by "ranking". Something along the lines of...
Story.objects.annotate( ranking=CalcRanking('views','votes') ).sort_by(ranking)
How can I do this in Django? Or should it be done after the QuerySet is retrieved in Python (like creating a list that contains the ranking for each object in the QuerySet)?
Thanks!
PS: In my actual program, the ranking calculation isn't as simple as above and depends on other filters to the initial QuerySet, so I can't store it as another field in the Story model.
In Django, the things you can pass to annotate (and aggregate) must be subclasses of django.db.models.aggregates.Aggregate. You can't just pass arbitrary Python objects to it, since the aggregation/annotation actually happens inside the database (that's the whole point of aggregate and annotate). Note that writing custom aggregations is not supported in Django (there is no documentation for it). All information available on it is this minimal source code: https://code.djangoproject.com/browser/django/trunk/django/db/models/aggregates.py
This means you either have to store the calculations in the database somehow, figure out how the aggregation API works or use raw sql (raw method on the Manager) to do what you do.