Query Elasticsearch by id using the regex or wildcard filter - regex

I got a list of IDs:
bc2***********************13
b53***********************92
39f***********************bb
eb7***********************7a
80b***********************22
Each * is a unknown char and I need to find all IDs matching these patterns.
I tried the regex filter on field names like id, _id and ID, always with "bc2.*13" (or others) but always got no matches even for existing documents.

By default, _id field is not indexed : that's why you have no results.
Try setting _id field as analyzed in the mapping:
POST /test_id/
{
"mappings":{
"indexed":{
"_id":{
"index":"analyzed"
}
}
}
}
Adding some docs :
PUT /test_id/indexed/bc2***********************13
{
"content":"test1"
}
PUT /test_id/indexed/b53***********************92
{
"content":"test2"
}
I checked with one of your simple regexp query :
POST /test_id/_search
{
"query": {
"regexp": {
"_id": "bc2.*13"
}
}
}
Result :
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_id",
"_type": "indexed",
"_id": "bc2***********************13",
"_score": 1,
"_source": {
"content": "test1"
}
}
]
}
Hope this helps :)

If the *'s are of a known and constant length:
bc2.{23}13|b53.{23}92|39f.{23}bb|eb7.{23}7a|80b.{23}22
DEMO
Else:
bc2.*?13|b53.*?92|39f.*?bb|eb7.*?7a|80b.*?22
DEMO2

Use the _uid field and the wildcard query:
GET yourIndex/yourType/_search
{
"query": {
"wildcard": {
"_uid": "bc2***********************13"
}
}
}

Related

ElasticSearch wildcard not returning when value has special characters

I have an elastic search service that fetches when you type into a text input to then populate a table. The search is working (returning filtered data) correctly for all alphanumeric values but not special characters (hyphens in particular). For example for the country Timor-Leste if I pass in Timor as the term I get the result but as soon as I add the hyphen (Timor-) I get an empty array response.
const queryService = {
search(tableName, field, term) {
// If there is no search term, run the wildcard search with 20 values
// for the smaller lists to be pre-populated, like "Gender"
return `
{
"size": ${term ? 200 : 20},
"query": {
"bool": {
"must": [
{
"match": {
"tablename": "${tableName}"
}
},
{
"wildcard": {
"${field}": {
"value": "${term ? `*${term.trim()}*` : '*'}",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
]
}
}
}
`;
},
};
Is there a way I can modify my wildcard request to allow hyphens? The other response I've seen on here has suggested using "analyze_wildcard": true which hasn't worked. I've also tried to manually escape by putting a \ before each hyphen with .replace.
It all boils down to Elasticsearch analyzers.
By default, all text fields will be run through the standard analyzer, e.g.:
GET _analyze/
{
"text": ["Timor-Leste"],
"analyzer": "standard"
}
This will lowercase your input, strip any special chars, and produce the tokens:
["timor", "leste"]
If you'd like to forgo this default process, add a .keyword mapping:
PUT your-index/
{
"mappings": {
"properties": {
"country": {
"type": "text",
"fields": { <---
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Then reindex your docs, and when dynamically constructing the wildcard query with the newly created .keyword field, make sure the hyphen (and all other special chars) is properly escaped:
POST your-index/_search
{
"query": {
"wildcard": {
"country.keyword": {
"value": "*Timor\\-*" <---
}
}
}
}

Get keys from Json with regex Jmeter

I'm hustling with regex, and trying to get the id's from this body.
But only the id´s in the members list, and not the id in the verified key. :)
To clarify, I'm using Regular Expression Extractor in JMeter
{
"id": "9c40ffca-0f1a-4f93-b068-1f6332707d02", //<--not this
"me": {
"id": "38a2b866-c8a9-424f-a5d4-93b379f080ce", //<--not this
"isMe": true,
"user": {
"verified": {
"id": "257e30f4-d001-47b3-9e7f-5772e591970b" //<--not this
}
}
},
"members": [
{
"id": "88a2b866-c8a9-424f-a5d4-93b379f780ce", //<--this
"isMe": true,
"user": {
"verified": {
"id": "223e30f4-d001-47b3-9e7f-5772e781970b" //<--not this
}
}
},
{
"id": "53cdc218-4784-4e55-a784-72e6a3ffa9bc", //<--this
"isMe": false,
"user": {
"unverified": {
"verification": "XYZ"
}
}
}
]
}
at the moment I have this regex :
("id": )("[\w-]+")
But as you can see here it returns every guid
Any ideas on how to go on?
Thanks in advance.
Since the input data type is JSON, it is recommended to use the JMeter's JSON Path Extractor Plugin.
Once you add it, use the
$.members[*].id
JSON path expression to match all id values of each members in the document that are the top nodes.
If you may have nested memebers, you may get them all using
$..members[*].id
You may test these expressions at https://jsonpath.com/, see a test:

AWS ElasticSearch Query for Keyword not getting results I expect

I have an ElasticSearch query that looks like:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"Message.keyword": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"Message.keyword": "*system.net.webclient).downloadfile(*"
}
}
]
}
}
}
}
}
And a Doc in my Index that includes:
message:Engine state is changed from None to Available. Details: NewEngineState=Available PreviousEngineState=None SequenceNumber=13 HostName=ConsoleHost HostVersion=5.1.18362.628 HostId=3dd1a50a-cc15-45e0-bf63-4456d556fb67 HostApplication=powershell.exe -command PowerShell -ExecutionPolicy bypass -noprofile -windowstyle hidden -command (New-Object System.Net.WebClient).DownloadFile('https://drive.google.com/uc?export=download EngineVersion=5.1.18362.628 RunspaceId=de762b62-056c-4be1-90bf-a12cfe6fbc72
As you can see above it includes:
(New-Object System.Net.WebClient).DownloadFile('https:....
It seems like the filter here should be matching the message, but when I execute the Query through Kibana, nothing matches even though I can see the doc above inside my index through Kibana UI if I just query for *.
I think maybe this is because the query above is querying for Message.keyword? How do I get it to successfully hit the document above?
Edit:
mapping: https://pastebin.com/cWN4jF3d
Sample data: https://pastebin.com/SyErqaG8
There are two reasons for the query not returning the result:
The field name in mapping is message whereas in query you are using Message.
A field with keyword datatype index the data as it is. This means it will be case sensitive as well. The document you shared has text System.Net.WebClient).DownloadFile( where you can see that there are characters with upper case whereas the search query you expect to match "*system.net.webclient).downloadfile(*" has all lower case characters.
Therefore the query should be:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"message.keyword": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"message.keyword": "*System.Net.WebClient).DownloadFile(*"
}
}
]
}
}
}
}
}
The keyword fields are used only for exact match. You will need to match the regular fields if you only want to match a substring / subset of the string, by querying on Message instead of Message.keyword:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"Message": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"Message": "*system.net.webclient).downloadfile(*"
}
}
]
}
}
}
}
}

Using regular expressions in elasticsearch term queries

I want find all items filtered by ID match some regular expression like
*TEST123* //pattern for regexp
So expected result are items
ATEST123001
ATEST123002
ATEST123003
TTTTEST123001
...
I can create some script which scan full storage and save IDs in log-file which can check later. But I want to find some better solution
Updated
I tried
"query" : { "match_all" : { }, "filtered" : { "filter" : { "regexp": { "id":".test123." } } } }, }
I receive
//nested: ElasticsearchParseException[Expected field name but got START_OBJECT \"filtered\"]
When I tried
{
"regexp": {
"id": "test123"
}
}
//Parse Failure [No parser for element [regexp]]]
ES 1.7.4 and Lucene 4.10.4
You can use regular expression queries. The regexp query allows you to use regular expression term queries.
Ref:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html
Sample regex query :
{
"regexp":{
"id": "*test123*"
}
}
Update:
In 2.0 regexp filter has been replaced by regexp query.
{
"query": {
"filtered": {
"filter": {
"regexp":{
"id":".*TEST123.*"
}
}
}
}
}
You can try Query String.
{
"query": {
"query_string": {
"default_field": "if",
"query": "*test123*"
}
}
}

MongoDB Search and Sort, with Number of Matches and Exact Match

I want to create a small MongoDB Search Query where I want to sort the result set based exact match followed by no. of matches.
For eg. if I have following labels
Physics
11th-Physics
JEE-IIT-Physics
Physics-Physics
Then, if I search for "Physics" it should sort as
Physics
Physics-Physics
11th-Physics
JEE-IIT-Physics
Looking for the sort of "scoring" you are talking about here is an excercise in "imperfect solutions". In this case, the "best fit" here starts with "text search", and "imperfect" is the term to consider first when working with the text search capabilties of MongoDB.
MongoDB is "not" a dedicated "text search" product, nor is it ( like most databases ) trying to be one. Full capabilites of "text search" is reserved for dedicated products that do that as there area of expertise. So maybe not the best fit, but "text search" is given as an option for those who can live with the limitations and don't want to implement another engine. Or Yet! At least.
With that said, let's look at what you can do with the data sample as given. First set up some data in a collection:
db.junk.insert([
{ "data": "Physics" },
{ "data": "11th-Physics" },
{ "data": "JEE-IIT-Physics" },
{ "data": "Physics-Physics" },
{ "data": "Something Unrelated" }
])
Then of course to "enable" the text search capabilties, then you need to index at least one of the fields in the document with the "text" index type:
db.junk.createIndex({ "data": "text" })
Now that is "ready to go", let's have a look at a first basic query:
db.junk.find(
{ "$text": { "$search": "\"Physics\"" } },
{ "score": { "$meta": "textScore" } }
).sort({ "score": { "$meta": "textScore" } })
That is going to give results like this:
{
"_id" : ObjectId("55af83b964876554be823f33"),
"data" : "Physics-Physics",
"score" : 1.5
}
{
"_id" : ObjectId("55af83b964876554be823f30"),
"data" : "Physics",
"score" : 1
}
{
"_id" : ObjectId("55af83b964876554be823f31"),
"data" : "11th-Physics",
"score" : 0.75
}
{
"_id" : ObjectId("55af83b964876554be823f32"),
"data" : "JEE-IIT-Physics",
"score" : 0.6666666666666666
}
So that is "close" to your desired result, but of course there is no "exact match" component. In addition, the logic here used by the text search capabilities with the $text operator means that "Physics-Physics" is the preferred match here.
This is because then engine does not recognize "non words" such as the "hyphen" in between. To it, the word "Physics" appears several times in the indexed content for the document, therefore it has a higher score.
Now the rest of your logic here depends on the application of "exact match" and what you mean by that. If you are looking for "Physics" in the string and "not" surrounded by "hyphens" or other characters then the following does not suit. But you can just match a field "value" that is "exactly" just "Physics":
db.junk.aggregate([
{ "$match": {
"$text": { "$search": "Physics" }
}},
{ "$project": {
"data": 1,
"score": {
"$add": [
{ "$meta": "textScore" },
{ "$cond": [
{ "$eq": [ "$data", "Physics" ] },
10,
0
]}
]
}
}},
{ "$sort": { "score": -1 } }
])
And that will give you a result that both looks at the "textScore" produced by the engine and then applies some math with a logical test. In this case where the "data" is exactly equal to "Physics" then we "weight" the score by an additional factor using $add:
{
"_id": ObjectId("55af83b964876554be823f30"),
"data" : "Physics",
"score" : 11
}
{
"_id" : ObjectId("55af83b964876554be823f33"),
"data" : "Physics-Physics",
"score" : 1.5
}
{
"_id" : ObjectId("55af83b964876554be823f31"),
"data" : "11th-Physics",
"score" : 0.75
}
{
"_id" : ObjectId("55af83b964876554be823f32"),
"data" : "JEE-IIT-Physics",
"score" : 0.6666666666666666
}
That is what the aggregation framework can do for you, by allowing manipulation of the returned data with additional conditions. The end result is passed to the $sort stage ( notice it is reversed in descending order ) to allow that new value to be to sorting key.
But the aggregation framework can really only deal with "exact matches" like this on strings. There is no facility at present to deal with regular expression matches or index positions in strings that return a meaningful value for projection. Not even a logical match. And the $regex operation is only used to "filter" in queries, so not of use here.
So if you were looking for something in a "phrase" thats was a bit more invovled than a "string equals" exact match, then the other option is using mapReduce.
This is another "imperfect" approach as the limitations of the mapReduce command mean that the "textScore" from such a query by the engine is "completely gone". While the actual documents will be selected correctly, the inherrent "ranking data" is not available to the engine. This is a by-product of how MongoDB "projects" the "score" into the document in the first place, and "projection" is not a feature available to mapReduce.
But you can "play with" the strings using JavaScript, as in my "imperfect" sample:
db.junk.mapReduce(
function() {
var _id = this._id,
score = 0;
delete this._id;
score += this.data.indexOf(search);
score += this.data.lastIndexOf(search);
emit({ "score": score, "id": _id }, this);
},
function() {},
{
"out": { "inline": 1 },
"query": { "$text": { "$search": "Physics" } },
"scope": { "search": "Physics" }
}
)
Which gives results like this:
{
"_id" : {
"score" : 0,
"id" : ObjectId("55af83b964876554be823f30")
},
"value" : {
"data" : "Physics"
}
},
{
"_id" : {
"score" : 8,
"id" : ObjectId("55af83b964876554be823f33")
},
"value" : {
"data" : "Physics-Physics"
}
},
{
"_id" : {
"score" : 10,
"id" : ObjectId("55af83b964876554be823f31")
},
"value" : {
"data" : "11th-Physics"
}
},
{
"_id" : {
"score" : 16,
"id" : ObjectId("55af83b964876554be823f32")
},
"value" : {
"data" : "JEE-IIT-Physics"
}
}
My own "silly little algorithm" here is basically taking both the "first" and "last" index position of the matched string here and adding them together to produce a score. It's likely not what you really want, but the point is that if you can code your logic in JavaScript, then you can throw it at the engine to produce the desired "ranking".
The only real "trick" here to remember is that the "score" must be the "preceeding" part of the grouping "key" here, and that if including the orginal document _id value then that composite key part must be renamed, otherwise the _id will take precedence of order.
This is just part of mapReduce where as an "optimization" all output "key" values are sorted in "ascending order" before being processed by the reducer. Which of course does nothing here since we are not "aggregating", but just using the JavaScript runner and document reshaping of mapReduce in general.
So the overall note is, those are the available options. None of them perfect, but you might be able to live with them or even just "accept" the default engine result.
If you want more then look at external "dedicated" text search products, which would be better suited.
Side Note: The $text searches here are preferred over $regex because they can use an index. A "non-anchored" regular expression ( without the caret ^ ) cannot use an index optimally with MongoDB. Therefore the $text searches are generally going to be a better base for finding "words" within a phrase.
One more way is using the $indexOfCp aggregation operator to get the index of matched string and then apply sort on the indexed field
Data insertion
db.junk.insert([
{ "data": "Physics" },
{ "data": "11th-Physics" },
{ "data": "JEE-IIT-Physics" },
{ "data": "Physics-Physics" },
{ "data": "Something Unrelated" }
])
Query
const data = "Physics";
db.junk.aggregate([
{ "$match": { "data": { "$regex": data, "$options": "i" }}},
{ "$addFields": { "score": { "$indexOfCP": [{ "$toLower": "$data" }, { "$toLower": data }]}}},
{ "$sort": { "score": 1 }}
])
Here you can test the output
[
{
"_id": ObjectId("5a934e000102030405000000"),
"data": "Physics",
"score": 0
},
{
"_id": ObjectId("5a934e000102030405000003"),
"data": "Physics-Physics",
"score": 0
},
{
"_id": ObjectId("5a934e000102030405000001"),
"data": "11th-Physics",
"score": 5
},
{
"_id": ObjectId("5a934e000102030405000002"),
"data": "JEE-IIT-Physics",
"score": 8
}
]