I have 2 Doctrine tables that are linked with a ManyToMany relationship.
Table: entries
TabLe: tags
I would like to be able to find the entries that have multiple or one tag(s) matching every tags that I input.
Ex: En entry "foo" have the tag "1" and "2". If I try to find every entries by the tag "1", I find this entry, if I do another search for both "1" and "2" I find it again, but if I add a search for the tag "3", then the value is not matched.
So far I have found some easy methods to implement such thing with an OR, but it doesn't give me the results I want and I don't really know how I could make that kind of search with Doctrine 2.
Normaly I would use the relation table to do that, but I don't know if it's possible under Doctrine.
Not really sure if I understood, but try if this works:
//class EntryRepository
public function yourFunction($tags)
{
return $this->createQueryBuilder("o")
->innerJoin("o.Tags", "t", "WITH", "t.name IN :tags")
->setParameter("tags", $tags)
...
}
This will return entries that have at least one of the tags in $tags array. If that is what you want, you might also play with
->addSelect( "COUNT(t.id) as HIDDEN relevance")->groupBy("o.id")
->orderBy("relevance")
That would return results in order of how many tags are matched, but I didn't test it.
Related
I'm having some issues with my elastic search querying. I have the following fields, patientid, patientfirstname, patientmidname, and patientlastname. I want to be able to enter in either one of those 4 fields and get matching results returned. So far my query works only if I use a patientid. If i type something like harry (firstname) or middle/last name it doesn't query it. Individual term querying works for each of them.
q = Q({"bool": { "should": [ {"term":{"patientid":text}}, {"wildcard":{"patientlastname":"*"+text+"*"}}, {"wildcard":{"patientfirstname":"*"+text+"*"}}, {"wildcard":{"patientmidname":"*"+text+"*"}} ]}})
r = Search().query(q)[0:10000]
the matching depends on your analyzers, what I would recommend is to just use:
Search().query('multi_match', query=text, fields=['patientid', 'patientlastname', 'patientfirstname', 'patientmidname'])
which will query across those fields (you can read about different types of multi_match query in [0]).
You just need to make sure that all the patient name fields are properly analyzed (see [1] for details)
0 - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html
1 - https://www.elastic.co/guide/en/elasticsearch/reference/6.4/analysis.html#_index_time_analysis
I have a Result object that is tagged with "one" and "two". When I try to query for objects tagged "one" and "two", I get nothing back:
q = Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
print len(q)
# prints zero, was expecting 1
Why does it not work with Q? How can I make it work?
The way django-taggit implements tagging is essentially through a ManytoMany relationship. In such cases there is a separate table in the database that holds these relations. It is usually called a "through" or intermediate model as it connects the two models. In the case of django-taggit this is called TaggedItem. So you have the Result model which is your model and you have two models Tag and TaggedItem provided by django-taggit.
When you make a query such as Result.objects.filter(Q(tags__name="one")) it translates to looking up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has the name="one".
Trying to match for two tag names would translate to looking up up rows in the Result table that have a corresponding row in the TaggedItem table that has a corresponding row in the Tag table that has both name="one" AND name="two". You obviously never have that as you only have one value in a row, it's either "one" or "two".
These details are hidden away from you in the django-taggit implementation, but this is what happens whenever you have a ManytoMany relationship between objects.
To resolve this you can:
Option 1
Query tag after tag evaluating the results each time, as it is suggested in the answers from others. This might be okay for two tags, but will not be good when you need to look for objects that have 10 tags set on them. Here would be one way to do this that would result in two queries and get you the result:
# get the IDs of the Result objects tagged with "one"
query_1 = Result.objects.filter(tags__name="one").values('id')
# use this in a second query to filter the ID and look for the second tag.
results = Result.objects.filter(pk__in=query_1, tags__name="two")
You could achieve this with a single query so you only have one trip from the app to the database, which would look like this:
# create django subquery - this is not evaluated, but used to construct the final query
subquery = Result.objects.filter(pk=OuterRef('pk'), tags__name="one").values('id')
# perform a combined query using a subquery against the database
results = Result.objects.filter(Exists(subquery), tags__name="two")
This would only make one trip to the database. (Note: filtering on sub-queries requires django 3.0).
But you are still limited to two tags. If you need to check for 10 tags or more, the above is not really workable...
Option 2
Query the relationship table instead directly and aggregate the results in a way that give you the object IDs.
# django-taggit uses Content Types so we need to pick up the content type from cache
result_content_type = ContentType.objects.get_for_model(Result)
tag_names = ["one", "two"]
tagged_results = (
TaggedItem.objects.filter(tag__name__in=tag_names, content_type=result_content_type)
.values('object_id')
.annotate(occurence=Count('object_id'))
.filter(occurence=len(tag_names))
.values_list('object_id', flat=True)
)
TaggedItem is the hidden table in the django-taggit implementation that contains the relationships. The above will query that table and aggregate all the rows that refer either to the "one" or "two" tags, group the results by the ID of the objects and then pick those where the object ID had the number of tags you are looking for.
This is a single query and at the end gets you the IDs of all the objects that have been tagged with both tags. It is also the exact same query regardless if you need 2 tags or 200.
Please review this and let me know if anything needs clarification.
first of all, this three are same:
Result.objects.filter(tags__name="one", tags__name="two")
Result.objects.filter(Q(tags__name="one") & Q(tags__name="two"))
Result.objects.filter(tags__name_in=["one"]).filter(tags__name_in=["two"])
i think the name field is CharField and no record could be equal to "one" and "two" at same time.
in python code the query looks like this(always false, and why you are geting no result):
from random import choice
name = choice(["abtin", "shino"])
if name == "abtin" and name == "shino":
we use Q object for implement OR or complex queries
Into the example that works you do an end on two python objects (query sets). That gets applied to any record not necessarily to the same record that has one AND two as tag.
ps: Why do you use the in filter ?
q = Result.objects.filter(tags_name_in=["one"]).filter(tags_name_in=["two"])
add .distinct() to remove duplicates if expecting more than one unique object
I want to have a content entry block. When a user types #word or #blah in a field, I want efficiently search that field and add the string right after the "#" to a different field as a n entry in a different table. Like what Twitter does. This would allow a user to sort by that string later.
I believe that I would do this as a part of the save method on the model, but I'm not sure. AND, if the #blah already exists, than the content would belong to that "blah"
Can anyone suggest samples of how to do this? This is a little beyond what I'm able to figure out on my own.
Thanks!
You can use regex (re) during save() or whenever to check if your field text contains #(?P<blah>\w+) , extract your blah and and use it for whatever you want .
I have a model pretty much like this:
Document Release
has an embedded array ReleaseDetails[]
The ReleaseDetails array contains documents of type ReleaseDetails
A ReleaseDetails document has a field called ArtistName of type text
A ReleaseDetails document has a field called Type of type text
I basically want to do this:
retrieve all the Release documents that have an entry in their ReleaseDetails array which (both) has ArtistName=someRegexExpression AND Type=someOtherRegexExpression. Basically I do this:
db.getCollection("releasesCollection").
find({ "ReleaseDetails" :
{ "$elemMatch" :
{ "ArtistName" : {$regex:"^David"},
"Type" : {$regex:".*singer.*"}}})
Problem is, if I call explain() on such a query I can see that the indexes I've made on
ReleaseDetails.ArtistName and ReleaseDetails.Type are effectively not taken into account (the query just goes through all documents in the collection).
On the other hand, if I do the exact same query but replace the regex expressions with actual values, in other words, if I do this:
db.getCollection("releasesCollection").
find({ "ReleaseDetails" :
{ "$elemMatch" :
{ "ArtistName" : "David Halliday",
"Type" : "mainSinger"}}})
in this case the indexs ARE taken into account (explain() show that clearly).
My question is then, is there a way to have a query that does $elemMatch WITH regex take advantage of the indexes?
(I'm asking because I've also seen that in fact, if you do a regex query on a basic field (like a text field, not an embedded-array field) AND that field is indexed, my regex query will infact take advantage of the indexes. Why is it that regex query on a basic indexed fields uses the index but regex query on an embedded-array indexed field fails to use the indexes?)
Two important things you may missed:
1.Only case sensitive prefix regexp can use index in mongodb, all others - can't.
For example following query will use index:
db.users.find({ "name": /^andrew/ })
2.Any query can use only one index per query, therefore it will be better to create compound index for your query:
db.items.ensureIndex({"ReleaseDetails.ArtistName": 1, "ReleaseDetails.Type" : 1});
And to take advantages of mongodb indexes you should not use like regexp -> "Type" : {$regex:".*singer.*"} (probably because of this regexp your query not use index).
If you really need like search that you can tokenize yourself Type and store it as an array. For example:
If you have following Type: "My favorite singer" you can:
Split this phrase into words and store in lower case: [my, favorite, singer]
Tokenize words that's bigger then 3 chars to use like search like this:
[my, fav, favo, favor, favori, favorit, favorite, avorite, vorite, orite, rite, ite,
avorit, vori] (i've skipped singer word tokenizing)
After such tokenizing done you can search by exact match and your query will use index, but your database for sure will be bigger ;).
About algorithms how to tokenize words you can read from full text search engines like lucene, sphinx
Recently i have implemented django-sphinx search on my website.
It is working fine of each separate model.
But now my client requirement has changed.
To implement that functionality i need field name to whom search is made.
suppose my query is:
"select id, name,description from table1"
and search keyword is matched with value in field "name". So i need to return that field also.
Is it possible to get field name or any method provided by django-sphinx which return field name.
Please help me...
As far as I know, this isn't possible. You might look at the contents of _sphinx though.
Well from django-sphinx it might not be possible. But there is a solution -
Make different indexes, each index specifying the field that you need to search.
In your django-sphinx models while searching do this -
search1 = SphinxSearch(index='index1')
search2 = SphinxSearch(index='index2')
...
After getting all the search results, you aggregate them & you have the info of from where they have come.