Django-Haystack: How to limit search within entries that have a specific value for a given field? - django

Say I have a model Person for which there is a PersonIndex class in search_indexes.py that makes all fields of it searchable. How can I make a search within only those entries where say the has_title field is True?
I tried the following, but it just searches among all entries, not just the ones where has_title is True:
srch = request.GET.get('search', "")
sqs = SearchQuerySet().filter(has_title=True)
clean_query = sqs.query.clean(srch)
results = sqs.raw_search(clean_query)
I am using Whoosh 2.4.1, Django-haystack 1.2.7 and Django 1.4.

Use filter(content=clean_query) instead of raw_search(clean_query). See here for more details.

Related

Django - Search matches with all objects - even if they don't actually match

This is the model that has to be searched:
class BlockQuote(models.Model):
debate = models.ForeignKey(Debate, related_name='quotes')
speaker = models.ForeignKey(Speaker, related_name='quotes')
text = models.TextField()
I have around a thousand instances on the database on my laptop (with around 50000 on the production server)
I am creating a 'manage.py' function that will search through the database and returns all 'BlockQuote' objects whose textfield contains the keyword.
I am doing this with the Django's (1.11) Postgres search options in order to use the 'rank' attribute, which sounds like something that would come in handy. I used the official Django fulltext-search documentation for the code below
Yet when I run this code, it matches with all objects, regardless if BlockQuote.text actually contains the queryfield.
def handle(self, *args, **options):
vector = SearchVector('text')
query = options['query'][0]
Search_Instance = Search_Instance.objects.create(query=query)
set = BlockQuote.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
for result in set:
match = QueryMatch.objects.create(quote=result, query=Search_Instance)
match.save()
Does anyone have an idea of what I am doing wrong?
I don't see you actually filtering ever.
BlockQuote.objects.annotate(...).filter(rank__gte=0.5)

Django haystack elasticsearch problems with autocomplete (and queries with Capital letters)

I've got a basic django haystack elasticsearch installation running, that seems to be working.. until I hit an autocomplete problem:
It doesn't return autocompletion just the full field. another problem is with data that has CAPS, that isn't normalized (such as usernames..)
MY installation:
django 1.6.4
haystack 2.1.0
elasticsearch 1.3.1
py-elasticsearch 0.6.1
class SocialProfileIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(document=True, use_template=True)
username = indexes.CharField(model_attr='username')
first_name = indexes.CharField(model_attr='first_name')
last_name = indexes.CharField(model_attr='last_name')
# Auto-complete
username_auto = indexes.EdgeNgramField(model_attr='username')
first_name_auto = indexes.EdgeNgramField(model_attr='first_name')
last_name_auto = indexes.EdgeNgramField(model_attr='last_name')
def get_model(self):
return SocialProfile
def index_queryset(self, using=None):
return self.get_model().objects.all()
Were in the view I return:
results = SearchQuerySet().models(SocialProfile).autocomplete(username_auto=q)
so when indexing a SocialProfile:
username=alonisser
when q (the query) is 'alonisser' I get the correct reply, But when I try 'alon' or similiar I don't get any results.
When I access elasticsearch directly through py-elasticsearch (without haystack):
es = Elasticsearch('http://elasticsearch.url:9200')
es.search('username_auto:alon', index='haystack')
I do get the correct result, so the is stored there and the problem is probably doing something wrong with haystack..
Similiar but different problems is when the searched item has Caps :like 'Alonisser' so searching for 'alonisser' doesn't return any result, but searching for 'Alonisser' does.
What am I doing wrong? Thanks for the help..
I think you already got an answer in the haystack forums but just to bring it out here also.
One way to get rid of the Caps problem is to use a custom prepare method in your index class although my haystack somehow handles it by default :S.
def prepare_username_auto(self, obj):
return obj.username.lower()
This will convert all usernames to lowercase when you run 'update_index'. Then you can also turn you user inserted search term to lower also which should produce correct results.
To search for a part of the word you need to use:
results = SearchQuerySet().models(SocialProfile).autocomplete(username_auto__startswith=q)

Character folding for a Django haystack and whoosh

I have a django based app with haystack and whoosh search engine. I want to provide an accent and special character independent search so that I can find indexed data with special characters also by using words without special chars:
Indexed is:
'café'
Search term:
'cafe'
'café'
I've written a provided a specific FoldingWhooshSearchBackend which uses a StemmingAnalyzer and aCharsetFilter(accent_map) as described in the following document:
https://gist.github.com/gregplaysguitar/1727204
However the search still doesn't work like expected, i.e. I cannot search with 'cafe' and find 'café'. I've looked into the search index using:
from whoosh.index import open_dir
ix = open_dir('myservice/settings/whoosh_index')
searcher = ix.searcher()
for doc in searcher.documents():
print doc
The special characters are still in the index.
Do I have to do something additional? Is is about changing the index template?
You have to write Haystack SearchIndex classes for your models. That's how you can prepare models data for the search index.
Example of myapp/search_index.py:
from haystack import site
from haystack import indexes
class UserProfileIndex(indexes.SearchIndex):
text = indexes.CharField(document=True)
def prepare_text(self, obj):
data = [obj.get_full_name(), obj.user.email, obj.phone]
original = ' '.join(data)
slugified = slugify(original)
return ' '.join([original, slugified])
site.register(UserProfile, UserProfileIndex)
If a user has name café, you will find his profile with bouth search terms café and cafe.
I think the best approach is to let Haystack create the schema for maximum forwards compatibility, and then hack the CharsetFilter in.
This code is working for me with Haystack 2.4.0 and Whoosh 2.7.0:
from haystack.backends.whoosh_backend import WhooshEngine, WhooshSearchBackend
from whoosh.analysis import CharsetFilter, StemmingAnalyzer
from whoosh.support.charset import accent_map
from whoosh.fields import TEXT
class FoldingWhooshSearchBackend(WhooshSearchBackend):
def build_schema(self, fields):
schema = super(FoldingWhooshSearchBackend, self).build_schema(fields)
for name, field in schema[1].items():
if isinstance(field, TEXT):
field.analyzer = StemmingAnalyzer() | CharsetFilter(accent_map)
return schema
class FoldingWhooshEngine(WhooshEngine):
backend = FoldingWhooshSearchBackend

Django: Querying comments based on object field

I've been using the built-in Django comments system which has been working great. On a particular page I need to list the latest X comments which I've just been fetching with:
latest_comments =
Comment.objects.filter(is_public=True, is_removed=False)
.order_by('submit_date').reverse()[:5]
However I've now introduced a Boolean field 'published' into the parent object of the comments, and I want to include that in the query above. I've tried using the content_type and object_pk fields but I'm not really getting anywhere. Normally you'd do something like:
Comment.objects.filter(blogPost__published=True)
But as it is not stored like that I am not sure how to proceed.
posts_ids = BlogPost.objects.filter(is_published=True).values_list('id', flat=True) #return [3,4,5,...]
ctype = ContentType.objects.get_for_model(BlogPost)
latest_comments = Comment.objects.filter(is_public=True, is_removed=False, content_type=ctype, content_object__in=posts_ids).order_by('-submit_date')[:5]
Comments use GenericForeignKey to store the relation to parent object. Because of the way generic relations work related lookups using __<field> syntax are not supported.
You can accomplish the desired behaviour using the 'in' lookup, however it'll require lot of comparisons when there'll be a lot of BlogPosts.
ids = BlogPost.objects.filter(published=True).values_list('id', flat=True) # Get list of ids, you would probably want to limit number of items returned here
content_type = ContentType.objects.get_for_model(BlogPost) # Becasue we filter only comments for BlogPost
latest_comments = Comment.objects.filter(content_type=content_type, object_pk__in=ids, is_public=True, is_removed=False, ).order_by('submit_date').reverse()[:5]
See the Comment model doc for the description of all fields.
You just cannot do that in one query. Comments use GenericForeignKey. Documentation says:
Due to the way GenericForeignKey is implemented, you cannot use such
fields directly with filters (filter() and exclude(), for example) via
the database API.

django-reversion revert ManyToMany fields outside admin

I am using django-reversion in my project.
And it works good except one thing:
I can't get previous versions of ManyToMany fields. But in django admin it is works, not in my code.
To get previous version I use following code:
vprod = Version.objects.get_for_date(product, ondate).get_object_version().object
and it works except m2m field
where 'product' is object of Product class,
class Product(models.Model):
name = models.CharField(max_length=255)
elements = models.ManyToManyField(Sku)
class Sku(models.Model):
name = models.CharField(max_length=255, verbose_name="SKU Name")
I can get vprod.name and it returns what I need, but when I try vprod.elements.all() it returns list only the current (last) version, even if the number of elements changed.
If I understand it correctly, I think you should get the revision for the version; the version contains the data of the object, the revision contains versions for multiple objects. Have a look at:
some_version.revision.version_set.all()
Concretely, I think you should use (untested):
[
v for v in Version.objects.get_for_date(product, ondate).revision.version_set.all()
if version.content_type == ContentType.objects.get_for_model(Sku)
]
Note, btw, that reversions should know that it should follow relationships. Using the low level API:
reversion.register(YourModel, follow=["your_foreign_key_field"])
I had the same issue and thanks to #Webthusiast's answer I got my working code. Adapting to your example would be something like this.
Imports:
from django.contrib.contenttypes.models import ContentType
import reversion
Register your models:
reversion.register(Sku)
reversion.register(Product, follow=['elements'])
And then you can iterate:
object = Product.objects.get(some_id)
versions = reversion.get_for_object(self.object)
for version in versions:
elements = [v.object_version.object \
for v in version.revision.version_set.all() \
if v.content_type == ContentType.objects.get_for_model(Product)]
The documentation for this is now on Read the Docs. Refer to the 'Advanced model registration' section of the Low-level API page.