Elastic search querying

Elastic search querying - django

I'm having some issues with my elastic search querying. I have the following fields, patientid, patientfirstname, patientmidname, and patientlastname. I want to be able to enter in either one of those 4 fields and get matching results returned. So far my query works only if I use a patientid. If i type something like harry (firstname) or middle/last name it doesn't query it. Individual term querying works for each of them.
q = Q({"bool": { "should": [ {"term":{"patientid":text}}, {"wildcard":{"patientlastname":"*"+text+"*"}}, {"wildcard":{"patientfirstname":"*"+text+"*"}}, {"wildcard":{"patientmidname":"*"+text+"*"}} ]}})
r = Search().query(q)[0:10000]

the matching depends on your analyzers, what I would recommend is to just use:
Search().query('multi_match', query=text, fields=['patientid', 'patientlastname', 'patientfirstname', 'patientmidname'])
which will query across those fields (you can read about different types of multi_match query in [0]).
You just need to make sure that all the patient name fields are properly analyzed (see [1] for details)
0 - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html
1 - https://www.elastic.co/guide/en/elasticsearch/reference/6.4/analysis.html#_index_time_analysis

Related

django split data and apply search istartswith = query

I have a Project and when searching a query I need to split the data (not search query) in to words and apply searching.
for example:
my query is : 'bot' (typing 'bottle')
but if I use meta_keywords__icontains = query the filter will also return queries with 'robot'.
Here meta_keywords are keywords that can be used for searching.
I won't be able to access data if the data in meta_keywords is 'water bottle' when I use meta_keywords__istartswith is there any way I can use in this case.
what I just need is search in every words of data with just istartswith
I can simply create a model for 'meta_keywords' and use the current data to assign values by splitting and saving as different data. I know it might be the best way. I need some other ways to achieve it.

You can search the name field with each word that istartswith in variable query.
import re
instances = Model.objects.filter(Q(name__iregex=r'[[:<:]]' + re.escape(query)))
Eg: Hello world can be searched using the query 'hello' and 'world'. It don't check the icontains
note: It works only in Python3

Filter multiple Django model fields with variable number of arguments

I'm implementing search functionality with an option of looking for a record by matching multiple tables and multiple fields in these tables.
Say I want to find a Customer by his/her first or last name, or by ID of placed Order which is stored in different model than Customer.
The easy scenario which I already implemented is that a user only types single word into search field, I then use Django Q to query Order model using direct field reference or related_query_name reference like:
result = Order.objects.filter(
Q(customer__first_name__icontains=user_input)
|Q(customer__last_name__icontains=user_input)
|Q(order_id__icontains=user_input)
).distinct()
Piece of a cake, no problems at all.
But what if user wants to narrow the search and types multiple words into search field.
Example: user has typed Bruce and got a whole lot of records back as a result of search.
Now he/she wants to be more specific and adds customer's last name to search.So the search becomes Bruce Wayne, after splitting this into separate parts I'm having Bruce and Wayne. Obviously I don't want to search Orders model because order_id is a single-word instance and it's sufficient to find customer at once so for this case I'm dropping it out of query at all.
Now I'm trying to match customer by both first AND last name, I also want to handle the scenario where the order of provided data is random, to properly handle Bruce Wayne and Wayne Bruce, meaning I still have customers full name but the position of first and last name aren't fixed.
And this is the question I'm looking answer for: how to build query that will search multiple fields of model not knowing which of search words belongs to which table.
I'm guessing the solution is trivial and there's for sure an elegant way to create such a dynamic query, but I can't think of a way how.

You can dynamically OR a variable number of Q objects together to achieve your desired search. The approach below makes it trivial to add or remove fields you want to include in the search.
from functools import reduce
from operator import or_
fields = (
'customer__first_name__icontains',
'customer__last_name__icontains',
'order_id__icontains'
)
parts = []
terms = ["Bruce", "Wayne"] # produce this from your search input field
for term in terms:
for field in fields:
parts.append(Q(**{field: term}))
query = reduce(or_, parts)
result = Order.objects.filter(query).distinct()
The use of reduce combines the Q objects by ORing them together. Credit to that part of the answer goes to this answer.

The solution I came up with is rather complex, but it works exactly the way I wanted to handle this problem:
search_keys = user_input.split()
if len(search_keys) > 1:
first_name_set = set()
last_name_set = set()
for key in search_keys:
first_name_set.add(Q(customer__first_name__icontains=key))
last_name_set.add(Q(customer__last_name__icontains=key))
query = reduce(and_, [reduce(or_, first_name_set), reduce(or_, last_name_set)])
else:
search_fields = [
Q(customer__first_name__icontains=user_input),
Q(customer__last_name__icontains=user_input),
Q(order_id__icontains=user_input),
]
query = reduce(or_, search_fields)
result = Order.objects.filter(query).distinct()

Django postgress - dynamic SearchQuery object creation

I have a app that lets the user search a database of +/- 100,000 documents for keywords / sentences.
I am using Django 1.11 and the Postgres FullTextSearch features described in the documentation
However, I am running into the following problem and I was wondering if someone knows a solution:
I want to create a SearchQuery object for each word in the supplied queryset like so:
query typed in by the user in the input field: ['term1' , 'term2', 'term3']
query = SearchQuery('term1') | SearchQuery('term2') | SearchQuery('term3')
vector = SearchVector('text')
Document.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank').annotate(similarity=TrigramSimilarity(vector, query).filter(simularity__gt=0.3).order_by('-simularity')
The problem is that I used 3 terms for my query in the example, but I want that number to be dynamic. A user could also supply 1, or 10 terms, but I do not know how to add the relevant code to the query assignment.
I briefly thought about having the program write something like this to an empty document:
for query in terms:
file.write(' | (SearchQuery( %s )' % query ))
But having a python program writing python code seems like a very convoluted solution. Does anyone know a better way to achieve this?

Ive never used it, but to do a dynamic query you can just loop and add.
compound_statement = SearchQuery(list_of_words[0])
for term in list_of_words[1:]:
compound_statement = compound_statement | SearchQuery(term)
But the documentation tells us that
By default, all the words the user provides are passed through the stemming algorithms, and then it looks for matches for all of the resulting terms.
are you sure you need this?

Python Neo4j retrieve count in a single query for multiple match queries

I am trying to get the number of links a user has, using the query shown below with neo4j-driver for python.
with driver.session() as session:
query = 'MATCH (n:User {userId: "1234"})-[r]-() RETURN COUNT(r)'
result = session.run(query)
Problem is, this takes a lot of time since I have a lot of user ids. I am a noobie with Neo4j. I was just wondering if there was a way to retrieve the count for multiple user ids with a single query. I am looking for something like:
'MATCH (n:User {userId: "1234", "1235", "1236", ...})-\
[r1, r2, r3...]-() RETURN COUNT(r1), COUNT(r2), COUNT(r3)...'
Thanks in advance.

I am assuming that you want count of relation ship of all users
MATCH (n:User)-[r]-() RETURN DISTINCT n.userId ,COUNT(r)
user1,10
user2,11
user3,10
or if you have list of particular users then you can do by this
MATCH (n:User)-[r]-()
WHERE n.userId in ["1234", "1235", "1236"]
RETURN DISTINCT n.userId , COUNT(r)

You can get the degree of relationships from a node without paying the cost of actually expanding the relationships. This is done by getting the size of the pattern that includes only the starting node, the relationship type, and the relationship direction (though for this example, ignoring direction).
So in this case:
MATCH (n:User)
WHERE n.userId in ["1234", "1235", "1236"]
RETURN n.userId , size((n)-[r]-()) as degree

How do I use django's Q with django taggit?

I have a Result object that is tagged with "one" and "two". When I try to query for objects tagged "one" and "two", I get nothing back:
q = Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
print len(q)
# prints zero, was expecting 1
Why does it not work with Q? How can I make it work?

The way django-taggit implements tagging is essentially through a ManytoMany relationship. In such cases there is a separate table in the database that holds these relations. It is usually called a "through" or intermediate model as it connects the two models. In the case of django-taggit this is called TaggedItem. So you have the Result model which is your model and you have two models Tag and TaggedItem provided by django-taggit.
When you make a query such as Result.objects.filter(Q(tags__name="one")) it translates to looking up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has the name="one".
Trying to match for two tag names would translate to looking up up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has both name="one" AND name="two". You obviously never have that as you only have one value in a row, it's either "one" or "two".
These details are hidden away from you in the django-taggit implementation, but this is what happens whenever you have a ManytoMany relationship between objects.
To resolve this you can:
Option 1
Query tag after tag evaluating the results each time, as it is suggested in the answers from others. This might be okay for two tags, but will not be good when you need to look for objects that have 10 tags set on them. Here would be one way to do this that would result in two queries and get you the result:
# get the IDs of the Result objects tagged with "one"
query_1 = Result.objects.filter(tags__name="one").values('id')
# use this in a second query to filter the ID and look for the second tag.
results = Result.objects.filter(pk__in=query_1, tags__name="two")
You could achieve this with a single query so you only have one trip from the app to the database, which would look like this:
# create django subquery - this is not evaluated, but used to construct the final query
subquery = Result.objects.filter(pk=OuterRef('pk'), tags__name="one").values('id')
# perform a combined query using a subquery against the database
results = Result.objects.filter(Exists(subquery), tags__name="two")
This would only make one trip to the database. (Note: filtering on sub-queries requires django 3.0).
But you are still limited to two tags. If you need to check for 10 tags or more, the above is not really workable...
Option 2
Query the relationship table instead directly and aggregate the results in a way that give you the object IDs.
# django-taggit uses Content Types so we need to pick up the content type from cache
result_content_type = ContentType.objects.get_for_model(Result)
tag_names = ["one", "two"]
tagged_results = (
TaggedItem.objects.filter(tag__name__in=tag_names, content_type=result_content_type)
.values('object_id')
.annotate(occurence=Count('object_id'))
.filter(occurence=len(tag_names))
.values_list('object_id', flat=True)
)
TaggedItem is the hidden table in the django-taggit implementation that contains the relationships. The above will query that table and aggregate all the rows that refer either to the "one" or "two" tags, group the results by the ID of the objects and then pick those where the object ID had the number of tags you are looking for.
This is a single query and at the end gets you the IDs of all the objects that have been tagged with both tags. It is also the exact same query regardless if you need 2 tags or 200.
Please review this and let me know if anything needs clarification.

first of all, this three are same:
Result.objects.filter(tags__name="one", tags__name="two")
Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
Result.objects.filter(tags__name_in=["one"]).filter(tags__name_in=["two"])
i think the name field is CharField and no record could be equal to "one" and "two" at same time.
in python code the query looks like this(always false, and why you are geting no result):
from random import choice
name = choice(["abtin", "shino"])
if name == "abtin" and name == "shino":
we use Q object for implement OR or complex queries

Into the example that works you do an end on two python objects (query sets). That gets applied to any record not necessarily to the same record that has one AND two as tag.
ps: Why do you use the in filter ?

q = Result.objects.filter(tags_name_in=["one"]).filter(tags_name_in=["two"])
add .distinct() to remove duplicates if expecting more than one unique object

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js