Haystack scores make no sense

Haystack scores make no sense - django

I'm using haystack with elastic search for a project, but the scores I get make no sense (to me).
The model I'm trying to index and search looks similar to:
class Car(models.Model):
name = models.CharField(max_length=255)
class Color(models.Model):
car = models.ForeignKey(Car)
name = models.CharField(max_length=255)
And the search index, even if I'm interested in cars, I want to search them by color as I want to display a pic of that color specifically:
class CarIndex(indexes.SearchIndex, indexes.Indexable):
text = CharField(document=True)
def get_model(self):
return Color
def prepare_text(self, obj):
# Some cleaning
return " ".join([obj.name, obj.car.name])
Now I add a car with three colors, a LaFerrari in Red, Black and White. Having only one model of car, for search purposes there are 3 cars.
So I check Kibana and I get a normal output.
Then I perform a normal search: LaFerrari
All three models have the same info, changing only the color name on the text field. I've even tried removing the color from the text, and guess what I got.
After this fiasco, I tried the python elasticsearch library, and I got normal results (doing manual index and search), all three colors had the same score if I performed a search for LaFerrari.
Any idea what is going on?
I'm thinking about moving from haystack to plain elasticsearch, any recommendations?

If you want to search more distinctively you should add two more fields to the index:
color (and this is really the color like white however you name the models and attributes)
name (the brand name)
The catch-all document field will get you only so far. You would have to make it so that Elasticsearch uses a DisMax query and searches on all configured fields for the given search terms.
https://www.elastic.co/guide/en/elasticsearch/reference/1.7/query-dsl-dis-max-query.html
I've only used the SearchQuerySet+Elastic (based on the catch-all field) so far (and custom+Solr a lot). While the SearchQuerySet fits in very nicely with the Django ORM it will only get you so far. So, you are probably right that you might have to use custom code for querying. I would still recommend Haystack for indexing though (it might be slower but very easy to setup and maintain).
Looking at your example, what you gain with different fields would be:
You search for Laferrari and this is the exact value found in all three documents in the field name (or brand_name). The results will then have the same scores.
Different fields also enable you to use facets: https://www.elastic.co/guide/en/elasticsearch/reference/1.7/search-facets.html#search-facets

Related

Filter multiple Django model fields with variable number of arguments

I'm implementing search functionality with an option of looking for a record by matching multiple tables and multiple fields in these tables.
Say I want to find a Customer by his/her first or last name, or by ID of placed Order which is stored in different model than Customer.
The easy scenario which I already implemented is that a user only types single word into search field, I then use Django Q to query Order model using direct field reference or related_query_name reference like:
result = Order.objects.filter(
Q(customer__first_name__icontains=user_input)
|Q(customer__last_name__icontains=user_input)
|Q(order_id__icontains=user_input)
).distinct()
Piece of a cake, no problems at all.
But what if user wants to narrow the search and types multiple words into search field.
Example: user has typed Bruce and got a whole lot of records back as a result of search.
Now he/she wants to be more specific and adds customer's last name to search.So the search becomes Bruce Wayne, after splitting this into separate parts I'm having Bruce and Wayne. Obviously I don't want to search Orders model because order_id is a single-word instance and it's sufficient to find customer at once so for this case I'm dropping it out of query at all.
Now I'm trying to match customer by both first AND last name, I also want to handle the scenario where the order of provided data is random, to properly handle Bruce Wayne and Wayne Bruce, meaning I still have customers full name but the position of first and last name aren't fixed.
And this is the question I'm looking answer for: how to build query that will search multiple fields of model not knowing which of search words belongs to which table.
I'm guessing the solution is trivial and there's for sure an elegant way to create such a dynamic query, but I can't think of a way how.

You can dynamically OR a variable number of Q objects together to achieve your desired search. The approach below makes it trivial to add or remove fields you want to include in the search.
from functools import reduce
from operator import or_
fields = (
'customer__first_name__icontains',
'customer__last_name__icontains',
'order_id__icontains'
)
parts = []
terms = ["Bruce", "Wayne"] # produce this from your search input field
for term in terms:
for field in fields:
parts.append(Q(**{field: term}))
query = reduce(or_, parts)
result = Order.objects.filter(query).distinct()
The use of reduce combines the Q objects by ORing them together. Credit to that part of the answer goes to this answer.

The solution I came up with is rather complex, but it works exactly the way I wanted to handle this problem:
search_keys = user_input.split()
if len(search_keys) > 1:
first_name_set = set()
last_name_set = set()
for key in search_keys:
first_name_set.add(Q(customer__first_name__icontains=key))
last_name_set.add(Q(customer__last_name__icontains=key))
query = reduce(and_, [reduce(or_, first_name_set), reduce(or_, last_name_set)])
else:
search_fields = [
Q(customer__first_name__icontains=user_input),
Q(customer__last_name__icontains=user_input),
Q(order_id__icontains=user_input),
]
query = reduce(or_, search_fields)
result = Order.objects.filter(query).distinct()

django subquery with a join in it

I've got django 1.8.5 and Python 3.4.3, and trying to create a subquery that constrains my main data set - but the subquery itself (I think) needs a join in it. Or maybe there is a better way to do it.
Here's a trimmed down set of models:
class Lot(models.Model):
lot_id = models.CharField(max_length=200, unique=True)
class Lot_Country(models.Model):
lot = models.ForeignKey(Lot)
country = CountryField()
class Discrete(models.Model):
discrete_id = models.CharField(max_length=200, unique=True)
master_id = models.ForeignKey(Inventory_Master)
location = models.ForeignKey(Location)
lot = models.ForeignKey(Lot)
I am filtering on various attributes of Discrete (which is discrete supply) and I want to go "up" through Lot, over the Lot_Country, meaning "I only want to get rows from Discrete if the Lot associated with that row has an entry in Lot_Country for my appropriate country (let's say US.)
I've tried something like this:
oklots=list(Lot_Country.objects.filter(country='US'))
But, first of all that gives me the str back, which I don't really want (and changed it to be lot_id, but that's a hack.)
What's the best way to constrain Discrete through Lot and over to Lot_Country? In SQL I would just join in the subquery (or even in the main query - maybe that's what I need? I guess I don't know how to join up to a parent then down into that parent's other child...)
Thanks in advance for your help.

I'm not sure what you mean by "it gives me the str back"... Lot_Country.objects.filter(country='US') will return a queryset. Of course if you print it in your console, you will see a string.
I also think your models need refactoring. The way you have currently defined it, you can associate multiple Lot_Countrys with one Lot, and a country can only be associated with one lot.
If I understand your general model correctly that isn't what you want - you want to associate multiple Lots with one Lot_Country. To do that you need to reverse your foreign key relationship (i.e., put it inside the Lot).
Then, for fetching all the Discrete lots that are in a given country, you would do:
discretes_in_us = Discrete.objects.filter(lot__lot_country__country='US')
Which will give you a queryset of all Discretes whose Lot is in the US.

Searching a many to many database using Google Cloud Datastore

I am quite new to google app engine. I know google datastore is not sql, but I am trying to get many to many relationship behaviour in it. As you can see below, I have Gif entities and Tag entities. I want my application to search Gif entities by related tag. Here is what I have done;
class Gif(ndb.Model):
author = ndb.UserProperty()
link = ndb.StringProperty(indexed=False)
class Tag(ndb.Model):
name = ndb.StringProperty()
class TagGifPair(ndb.Model):
tag_id = ndb.IntegerProperty()
gif_id = ndb.IntegerProperty()
#classmethod
def search_gif_by_tag(cls, tag_name)
query = cls.query(name=tag_name)
# I am stuck here ...
Is this a correct start to do this? If so, how can I finish it. If not, how to do it?

You can use repeated properties https://developers.google.com/appengine/docs/python/ndb/properties#repeated the sample in the link uses tags with entity as sample but for your exact use case will be like:
class Gif(ndb.Model):
author = ndb.UserProperty()
link = ndb.StringProperty(indexed=False)
# you store array of tag keys here you can also just make this
# StringProperty(repeated=True)
tag = ndb.KeyProperty(repeated=True)
#classmethod
def get_by_tag(cls, tag_name):
# a query to a repeated property works the same as if it was a single value
return cls.query(cls.tag == ndb.Key(Tag, tag_name)).fetch()
# we will put the tag_name as its key.id()
# you only really need this if you wanna keep records of your tags
# you can simply keep the tags as string too
class Tag(ndb.Model):
gif_count = ndb.IntegerProperty(indexed=False)

Maybe you want to use list? I would do something like this if you only need to search gif by tags. I'm using db since I'm not familiar with ndb.
class Gif(db.Model):
author = db.UserProperty()
link = db.StringProperty(indexed=False)
tags = db.StringListProperty(indexed=True)
Query like this
Gif.all().filter('tags =', tag).fetch(1000)

There's different ways of doing many-to-many relationships. Using ListProperties is one way. The limitation to keep in mind if using ListProperties is that there's a limit to the number of indexes per entity, and a limit to the total entity size. This means that there's a limit to the number of entities in the list (depending on whether you hit the index count or entity size first). See the bottom of this page: https://developers.google.com/appengine/docs/python/datastore/overview
If you believe the number of references will work within this limit, this is a good way to go. Considering that you're not going to have thousands of admins for a Page, this is probably the right way.
The other way is to have an intermediate entity that has reference properties to both sides of your many-to-many. This method will let you scale much higher, but because of all the extra entity writes and reads, this is much more expensive.

Sort order for model items in django admin

Say I have a model:
class Question(models.Model):
text = models.TextField(verbose_name=u'Вопрос', max_length=1024)
is_free_text = models.BooleanField(verbose_name=u'Ответ в виде текста?')
sort_order = models.PositiveSmallIntegerField()
is there a plugin or something for django admin to have list of models somewhere and move them up and down (or drag and drop) to define their sort order? I basically want every question to have a unique sort order so that they are sorted according to their sort_order where fetched.

There are lots of options but you will most likely have to tinker a bit with them. Here are some snippets (I'm not sure how well these will work):
http://djangosnippets.org/snippets/2306/
http://djangosnippets.org/snippets/2047/
http://djangosnippets.org/snippets/2057/
Some apps that claim to do it:
https://github.com/ff0000/django-sortable
https://github.com/centralniak/django-inline-ordering
If you are interested installing django-grapelli (which is a skin for the entire django admin) , you can make use of their sortable change list items

To Use Django-Haystack or not?

So this might be an obvious answer to some but I'm not sure what the right answer is. I have a simple donation application where Donor objects get created through a form. A feature to be added is to allow of a search for each Donor by last name and or phone number.
Is this a good case to use django-haystack or should I just create my own filters? The problem I may see with haystack is that a few donations are being submitted every minute so could indexing be a problem? There are currently around 130,000 records and growing. I have started to implement haystack but have realized it might not be necessary?

Don't use haystack -- that's for fast full-text search when the underlying relational database can't handle it easily. The use case for haystack is when you store many large documents with huge chunks of text that you want indexed by words in the document so you can easily search.
Django by default already allows you to easily index/search text records. For example, using the admin backend simply specify search fields and you can easily search for name or telephone number. (And it will generally do case insensitive contains searches -- this will find partial matches; e.g., the name "John Doe" will come up if you search for just "doe" or "ohn").
So if your models.py has:
class Donor(models.Model):
name = models.CharField(max_length=50)
phone = models.CharField(max_length=15)
and an admin.py with:
from django.contrib import admin
from mysite.myapp.models import Donor
class DonorAdmin(admin.ModelAdmin):
model = Donor
search_fields = ['name', 'phone']
admin.site.register(Donor, DonorAdmin)
it should work fine. If an improvement is needed consider adding an full-text index to the underlying RDBMS. For example, with postgres you can create either a text search indexes post 8.3 with a one liner in the underlying database, which django should automatically use: http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js