Partial text search with postgreql & django - django

Tried to implement partial text search with postgresql and django,used the following query
Entry.objects.filter(headline__contains="search text")
This returns records having exact match,ie suppose checking for a match against the record "welcome to the new world" with query __contains="welcome world" , returns zero records
How can i implement this partial text search with postgresql-8.4 and django?

If you want this exact partial search you can use the startswitch field lookup method: Entry.objects.filter(headline__startswith="search text"). See more info at https://docs.djangoproject.com/en/dev/ref/models/querysets/#startswith.
This method creates a LIKE query ("SELECT ... WHERE headline LIKE 'search text%'") so if you're looking for a fulltext alternative you can check out PostgreSQL's built in Tsearch2 extension or other options such as Xapian, Solr, Sphinx, etc.
Each of the former engines mentioned have Django apps that makes them easier to integrate: Djapian for Xapian integration or Haystack for multiple integrations in one app.

Related

fuzzy search in django postgresql without using Elasticsearch

I try to incorporate fuzzy serach function in a django project without using Elasticsearch.
1- I am using postgres, so I first tried levenshtein, but it did not work for my purpose.
class Levenshtein(Func):
template = "%(function)s(%(expressions)s, '%(search_term)s')"
function = "levenshtein"
def __init__(self, expression, search_term, **extras):
super(Levenshtein, self).__init__(
expression,
search_term=search_term,
**extras
)
items = Product.objects.annotate(lev_dist=Levenshtein(F('sort_name'), searchterm)).filter(
lev_dist__lte=2
)
Search "glyoxl" did not pick up "4-Methylphenylglyoxal hydrate", because levenshtein considered "Methylphenylglyoxal" as a word and compared with my searchterm "glyoxl".
2. trigram_similar gave weird results and was slow
items = Product.objects.filter(sort_name__trigram_similar=searchterm)
"phnylglyoxal" did not pick up "4-Methylphenylglyoxal hydrate", but
picked up some other similar terms: "4-Hydroxyphenylglyoxal hydrate",
"2,4,6-Trimethylphenylglyoxal hydrate"
"glyoxl" did not pick up any of the above terms
3. python package, fuzzywuzzy seems can solve my problem, but I was not able to incorporate it into query function.
ratio= fuzz.partial_ratio('glyoxl', '4-Methylphenylglyoxal hydrate')
# ratio = 83
I tried to use fuzz.partial_ratio function in annotate, but it did not work.
items = Product.objects.annotate(ratio=fuzz.partial_ratio(searchterm, 'full_name')).filter(
ratio__gte=75
)
Here is the error message:
QuerySet.annotate() received non-expression(s): 12.
According to this stackoverflow post (1), annotate does not take regular python functions. The post also mentioned that from Django 2.1, one can subclass Func to generate a custom function. But it seems that Func can only take database functions such as levenshtein.
Any way to solve these problems? thanks!

How to run a combination of query and filter in elasticsearch?

I am experimenting using elasticsearch in a dummy project in django. I am attempting to make a search page using django-elasticsearch-dsl. The user may provide a title, summary and a score to search for. The search should match all the information given by the user, but if the user does not provide any info about something, this should be skipped.
I am running the following code to search for all the values.
client = Elasticsearch()
s = Search().using(client).query("match", title=title_value)\
.query("match", summary=summary_value)\
.filter('range', score={'gt': scorefrom_value, 'lte': scoreto_value})
When I have a value for all the fields then the search works correctly, but if for example I do not provide a value for the summary_value, although I am expecting the search to continue searching for the rest of the values, the result is that it comes up with nothing as a result.
Is there some value that the fields should have by default in case the user does not provide a value? Or how should I approach this?
UPDATE 1
I tried using the following, but it returns every time no matter the input i am giving the same results.
s = Search(using=client)
if title:
s.query("match", title=title)
if summary:
s.query("match", summary=summary)
response = s.execute()
UPDATE 2
I can print using the to_dict().
if it is like the following then s is empty
s = Search(using=client)
s.query("match", title=title)
if it is like this
s = Search(using=client).query("match", title=title)
then it works properly but still if i add s.query("match", summary=summary) it does nothing.
You need to assign back into s:
if title:
s = s.query("match", title=title)
if summary:
s = s.query("match", summary=summary)
I can see in the Search example that django-elasticsearch-dsl lets you apply aggregations after a search so...
How about "staging" your search? I can think if the following:
#first, declare the Search object
s = Search(using=client, index="my-index")
#if parameter1 exists
if parameter1:
s.filter("term", field1= parameter1)
#if parameter2 exists
if parameter2:
s.query("match", field=parameter2)
Do the same for all your parameters (with the needed method for each) so only the ones that exist will appear in your query. At the end just run
response = s.execute()
and everything should work as you want :D
I would recommend you to use the Python ES Client. It lets you manage multiple things related to your cluster: set mappings, health checks, do queries, etc.
In its method .search(), the body parameter is where you send your query as you normally would run it ({"query"...}). Check the Usage example.
Now, for your particular case, you can have a template of your query stored in a variable. You first start with, let's say, an "empty query" only with filter, just like:
query = {
"query":{
"bool":{
"filter":[
]
}
}
}
From here, you now can build your query from the parameters you have.
This is:
#This would look a little messy, but it's useful ;)
#if parameter1 is not None or emtpy
#(change the if statement for your particular case)
if parameter1:
query["query"]["bool"]["filter"].append({"term": {"field1": parameter1}})
Do the same for all your parameters (for strings, use "term", for ranges use "range" as usual) and send the query in the .search()'s body parameter and it should work as you want.
Hope this is helpful! :D

what's best approach permalink database with zf2 and doctrine2?

i'm building a web site with zf2 and doctrine. How can i save permalink to the my database? What's the best approach way? Should i save permalink in my entities?
//Product Doctirine Entity
public function setProdName($prodName){
$this->prodName = $prodName;
//is this right way?
$this->setSeoLink = urlhelper->url($prodName);
}
I suggest you to create your urls using hash, like
/this-is-the-product-title-xyz123.html
where xyz123 is the hash of your product.
Doing this you'll have these advantage
1) you can change your product title anytime, your product can always be reached even the search engine or links in other website has the old one, for instance both
/this-is-my-product-xyz123.html
and
/this-is-my-wonderful-product-xyz123.html
will work.
2) you are not showing the real id to people.
To accomplish this I used the HashId module, creating a factory service that return an already configurated object, then you can encrypt/descript using this code:
$hashids = $this->getServiceLocator()->get('my.service.hashid');
// decrypt having hash
$id = $hashids->decrypt($hash);
// encrypt having id
$hash = $hashids->encrypt($id);
The .html suffix is just as example, you can create your url as you wish.

How to match strings in MongoDb and ignore any whitespace

Is it possible to ignore all whitespace using regex in MongoDB queries?
My Node.js program uses Cheerio to pull data from a number of websites, parses and then stores the data in MongoDB. My database has a People collection that keys on the string field Name.
Problem occurs where one website (site-A) shows the name HTML text as John&npsp;Smith, whereas another website (site-B) shows name as John Smith. My program has two scripts, one that scrapes site-A and another to scrape site-B; both of which use the following to scrape the Name data -
var $ = cheerio.load(htmlrow);
var personobj = { name: $('td.person a').text().trim() }
Each script then uses the following MongoDb command (using the native driver) to upsert the scraped data, keying on the Name field. However, this results in two records in the People collection -
db.collection('people').update(
{ Name: personobj.name },
{ $set: { LastScan: new Date() }},
{ upsert: true },
function(){} );
Now, I tried using the regex "extended" 'x' option to query in MongoDb, but it's not working. In fact, I tried testing the 'x' option via the find operator in Robomongo, and it returns zero records. I also note that when find testing in Robomongo, and I simply type Name: "John Smith", it only returns the site-B record, the one without the $nbsp; whitespace; even though when I view the detail of both records, the name strings appear identical. (I suppose difference is caused somewhere by all the encoding/decoding going on here to scrape, parse, store, retrieve... but I'm not sure where or why).
Is it possible to ignore all whitespace when querying MongoDb using regex?
Or, is it easier to handle this in my javascript parse line, to somehow replace and 'standardize' all possible whitespace characters? (Any recommended library to do so?)

SOLR Java APi - Adding multiple condition

I am using Solr API to index the records and for search functionality. I am using the following code to search through
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("q", "country_id:("+id+")");
I would like to add one more parameter like state_id and I would like to do logical AND/OR operations and depending on the result the records should be retrieved. I searched through Google, but could not find a way to combine the conditions. Is it possible through the SOLR api? Or am I doing something wrong?
You can make Your Query as Following.....
String qr="cstm_text:"+"devotional";
SolrQuery qry = new SolrQuery(qr);
qry.setIncludeScore(true);
qry.setShowDebugInfo(true);
qry.setRows(1);
qry.setFacet(true);
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/spell");
params.set("spellcheck", "on");
params.set(MoreLikeThisParams.MLT,true);
qry.add(params);