Django Haystack refuses to show no results, even for absurd queries - django

My question may be a bit strange, but it's been bothering me since the behavior is not what I expected. Here is my query:
query = request.GET.get('q','')
#in search_indexes:
#start_datetime = indexes.DateTimeField(model_attr='start_datetime',null=True)
#end_datetime = indexes.DateTimeField(model_attr='end_datetime')
search_events = SearchQuerySet().models(Event).filter(content=query).
filter(end_datetime__gte=datetime.now()).
order_by("start_datetime")
Now I type in a query like "asdfasdfjasldf lolol hwtf asdlfka" and I still get 3 results. (Note, I only have 5 events to start with. Not sure if that could affect anything.) I print out the scores, and they are [42,42,42]. Doesn't filter() match on exact phrases? Especially if I use quotes?
//edit
I also tried using auto_query, and the results are the same.
I'm really confused about what's happening, so hopefully somebody can help clear this up. Thanks in advance!

Turns out that someone else on my team had set HAYSTACK_DEFAULT_OPERATOR to 'OR' instead of 'AND'. Explains everything - the additional filter tag was actually expanding the number of results!

You might like to perform search using auto_query():
search_events = SearchQuerySet().models(Event)
.auto_query(query)
.filter(end_datetime__gte=datetime.now())
.order_by("start_datetime")
It has some extra features, like for example exact query searching when phrase is enclosed in quotes.

Related

TO_PURE_TEXT with cell referencing doesn't filter double quotes against documentation

I want to get a clean number from a string like "123.45". On using of =TO_PURE_TEXT(C10) it doesn't work for me,
against an example from the documentation. An absolute referencing doesn't help.
But, if i use no cell referencing, but direct input, like =TO_PURE_TEXT("123.45") the input is correct, as expected without quotes.
Is it a kind of bug, or do i really do something wrong? How can i get this work with the cell referencing?
all you need is:
=SUBSTITUTE(C10, """", )*1
or:
=REGEXREPLACE(C10, """", )*1
I can't speak to whether it's a bug. Does seem odd, but this should work for now:
=1*SUBSTITUTE(C10,CHAR(34),"")

How to I make gerrit query that spans across few specific projects?

I tried for few hours to find the right syntax for making a regex query that returns reviews from 2-3 different projects but I failed and decided to crowdsource the task ;)
The search is documented at https://review.openstack.org/Documentation/user-search.html and mentions possible use of REGEX,... but it just didn't work.
Task: return all CRs from openstack-infra/gerritlib and openstack-infra/git-review projects from https://review.openstack.org
Doing it for one project works well project:openstack-infra/gerritlib
Ideally I would like to look for somethign like ^openstack-infra\/(gerritlib|git-review), or at least this is the standard regex syntax.
Still, I found impossible to use parentheses so far, every time I used them it stopped it from returning any results.
1) You don't need to escape the "/" character.
2) You need to use double quotes to make the parentheses work.
So the following search should work for you:
project:"^openstack-infra/(gerritlib|git-review)"

Is there a way to search terms in order with RegexpQuery in lucene?

I would like to search my indexed documents in order using RegexpQuery.
For example I have 2 Document
text: Oracle unveils better than expected quarterly results.
text: Research In Motion shares gained almost 13 per cent on the Toronto Stock Exchange Friday, a day after the smartphone maker posted better than expected quarterly results.
So far I tried this but I got no luck.
Query regexq = new RegexpQuery(new Term("text", "^.+better.+quarterly.+results"));
Is there another way of implementing this?
Thanks
I believe a PhraseQuery fits what you are looking for better. You can use PhraseQuery.setSlop(int) to allow terms to appear between the terms of the query. This would like like:
Query pq = new PhraseQuery();
pq.add(new Term("text", "better"));
pq.add(new Term("text", "quarterly"));
pq.add(new Term("text", "results"));
pq.setSlop(10); //Or whatever is an appropriate slop value for you.
This sort of query is also supported by the standard QueryParser, as seen here, like:
text:"better quarterly results"~10
I think a PhraseQuery is most definitely the better implementation here, but...
Regarding RegexpQuery:
I believe it is intended to compare terms against the regex, and since the phrase you are searching for (I am assuming) is tokenized, no single Term matches your whole regex. You would need to index the entire field as a single Term to make this work, using StringField, KeywordAnalyzer, or similar.
I believe it works like Matcher.matches(), rather than Matcher.find(), which is to say, it must match the entire input term, rather than a portion of it. So, if you had specified "text" as a StringField, you would need to add a .* to the end to consume the rest of the input.
On a similar note, I'm not sure if it supports the use of the character "^" as the start of input, being that it is redundant in that case. I don't see it specified in Lucene's Regexp, but I have seen reference to it's use, so I'm not sure whether it would be accepted or not.
To summarize, a RegexpQuery could work like:
Query regexq = new RegexpQuery(new Term("text", ".+better.+quarterly.+results.*"));
If you used a StringField, or KeywordAnalyzer index the entire field as a single Term.
With the leading wildcard in your regexp, though, you could expect very poor performance from it (See the warning at the top of the RegexpQuery documentation).

Using Regex to validate the number of words in a text area

I am attempting to write a MVC model validation that verifies that there is 10 or more words in a string. The string is being populated correctly, so I did not include the HTML. I have done a fair bit of research, and it seems that something along the lines of what I have tries should work, but, for whatever reason, mine always seem to fail. Any ideas as to what I am doing wrong here?
(using System.ComponentModel.DataAnnotations, in a mvc 4 vb.net environment)
Have tried ([\w]+){10,}, ((\\S+)\s?){10,}, [\b]{20,}, [\w+\w?]{10,}, (\b(\w+?)\b){10,}, ([\w]+?\s){10}, ([\w]+?\s){9}[\w], ([\S]+\s){9}[\S], ([a-zA-Z0-9,.'":;$-]+\s+){10,} and several more varaiations on the same basic idea.
<Required(ErrorMessage:="The Description of Operations field is required"), RegularExpression("([\w]+){20,}", ErrorMessage:="ERROZ")>
Public Property DescOfOperations As String = String.Empty
Correct Solution was ([\S]+\s+){9}[\S\s]+
EDIT Moved accepted version to the top, removing unused versions. Unless I am wrong and the whole sequence needs to match, then something like (also accounting for double spaces):
([\S]+\s+){9}[\S\s]+
Or:
([\w]+?\s+){9}[\w]+
Give this a try:
([a-zA-Z0-9,.'":;$-]+\s){10,}

Using Regex to find multiple values (groups) in webpage

I'm trying to retrieve 2 fields from a web page. I'm using the following two patterns:
string paternExperience = #"Experience\s\:\s\<strong\>(?<Level>.*?)\<";
string paternAccount = #"account_value\""\>(?<Account>.*?)\<";
and the following method to retrieve values and it works.
Regex.Matches(pageBody, patern..., RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase | RegexOptions.Compiled |RegexOptions.Multiline);
I was trying to avoid using twice the method to retrieve 2 values, and I'm trying to create a pattern to get Level and Account in just one call of the Matches method. So I thought that something like the one below should work...
string paternBoth = #"Experience\s\:\s\<strong\>(?<Level>.*?)\< .* account_value\""\>(?<Account>.*?)\<";
But it doesn't work because I think that the two values are on diferent lines in html, so I added RegexOptions.SingleLine and now the method times out (the page has around 20kb).
Can you help me please with some advice? Thank you!
You could try putting those 2 values in 1 variable, then just check that variable with your regex.
I know it doesnt really make any sense but I try out things like that and sometimes it actually works.
Never had this scenario but I did have any simular problems in the past.
Might not be the best way. but sometimes making it work is more important then making it look pretty. ;)