AWS ElasticSearch Query for Keyword not getting results I expect - amazon-web-services

I have an ElasticSearch query that looks like:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"Message.keyword": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"Message.keyword": "*system.net.webclient).downloadfile(*"
}
}
]
}
}
}
}
}
And a Doc in my Index that includes:
message:Engine state is changed from None to Available. Details: NewEngineState=Available PreviousEngineState=None SequenceNumber=13 HostName=ConsoleHost HostVersion=5.1.18362.628 HostId=3dd1a50a-cc15-45e0-bf63-4456d556fb67 HostApplication=powershell.exe -command PowerShell -ExecutionPolicy bypass -noprofile -windowstyle hidden -command (New-Object System.Net.WebClient).DownloadFile('https://drive.google.com/uc?export=download EngineVersion=5.1.18362.628 RunspaceId=de762b62-056c-4be1-90bf-a12cfe6fbc72
As you can see above it includes:
(New-Object System.Net.WebClient).DownloadFile('https:....
It seems like the filter here should be matching the message, but when I execute the Query through Kibana, nothing matches even though I can see the doc above inside my index through Kibana UI if I just query for *.
I think maybe this is because the query above is querying for Message.keyword? How do I get it to successfully hit the document above?
Edit:
mapping: https://pastebin.com/cWN4jF3d
Sample data: https://pastebin.com/SyErqaG8

There are two reasons for the query not returning the result:
The field name in mapping is message whereas in query you are using Message.
A field with keyword datatype index the data as it is. This means it will be case sensitive as well. The document you shared has text System.Net.WebClient).DownloadFile( where you can see that there are characters with upper case whereas the search query you expect to match "*system.net.webclient).downloadfile(*" has all lower case characters.
Therefore the query should be:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"message.keyword": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"message.keyword": "*System.Net.WebClient).DownloadFile(*"
}
}
]
}
}
}
}
}

The keyword fields are used only for exact match. You will need to match the regular fields if you only want to match a substring / subset of the string, by querying on Message instead of Message.keyword:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"wildcard": {
"Message": "*System.Net.WebClient).DownloadString(*"
}
},
{
"wildcard": {
"Message": "*system.net.webclient).downloadfile(*"
}
}
]
}
}
}
}
}

Related

ElasticSearch wildcard not returning when value has special characters

I have an elastic search service that fetches when you type into a text input to then populate a table. The search is working (returning filtered data) correctly for all alphanumeric values but not special characters (hyphens in particular). For example for the country Timor-Leste if I pass in Timor as the term I get the result but as soon as I add the hyphen (Timor-) I get an empty array response.
const queryService = {
search(tableName, field, term) {
// If there is no search term, run the wildcard search with 20 values
// for the smaller lists to be pre-populated, like "Gender"
return `
{
"size": ${term ? 200 : 20},
"query": {
"bool": {
"must": [
{
"match": {
"tablename": "${tableName}"
}
},
{
"wildcard": {
"${field}": {
"value": "${term ? `*${term.trim()}*` : '*'}",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
]
}
}
}
`;
},
};
Is there a way I can modify my wildcard request to allow hyphens? The other response I've seen on here has suggested using "analyze_wildcard": true which hasn't worked. I've also tried to manually escape by putting a \ before each hyphen with .replace.
It all boils down to Elasticsearch analyzers.
By default, all text fields will be run through the standard analyzer, e.g.:
GET _analyze/
{
"text": ["Timor-Leste"],
"analyzer": "standard"
}
This will lowercase your input, strip any special chars, and produce the tokens:
["timor", "leste"]
If you'd like to forgo this default process, add a .keyword mapping:
PUT your-index/
{
"mappings": {
"properties": {
"country": {
"type": "text",
"fields": { <---
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Then reindex your docs, and when dynamically constructing the wildcard query with the newly created .keyword field, make sure the hyphen (and all other special chars) is properly escaped:
POST your-index/_search
{
"query": {
"wildcard": {
"country.keyword": {
"value": "*Timor\\-*" <---
}
}
}
}

elasticsearch in json string (and / or )

I am new to AWS elasticsearch but need to create queries to search the follow data with different criteria.
search_metadata (json string with key/value pair) - "{\"number\":\"111\"; \"area\":\"central\"; "\code\":\"1111\"; \"type\":\"internal\"}"
category - "statement" or "bill" or "email"
datetime - "2019-05-04T00:00:00" or "2019-07-16T00:01:00"
flag - "good" or "bad"
I need to construct query to do the following
AND or OR condition in search_metadata field (JSON string) -> not sure how to do it.
along with AND condition for category, datetime range and flag. -> Do I need to use muliti-match for flag and category ?
"query": {
"bool": {
"must": [
{
"match_phrase": {
"search_metadata": "number 111" --> not sure about AND or OR with "area" and others
}
},
{
"range": {
"datetime": {
"gte": "2019-05-04T00:00:00Z",
"lte": "2019-07-16T00:01:00Z"
}
}
}
]
}
}
}

Wildcard on multiple columns in Elastic search query

I have a requirement to match an input passed by user across 2 attributes in elastic search and it needs to be a wildcard search.
I am using AWS-ES version 6.4
When I query on one single attribute the results are okay but when I include both the attributes, it gives me 400 status code.
Query which works:
{"query":
{"bool": {"should": [
{"wildcard": { "phone1.searchTerm": "*1234*" }}
]}}
}
Query which fails: (phone1 and phone2 both)
Is there a binding on should/must condition to have only one wildcard inside it?
{"query":
{"bool": {"should": [
{"wildcard": { "phone1.searchTerm": "*1234*" }} ,
{"wildcard": { "phone2.searchTerm": "*1234*" }} ]}}
}
Does this have something to do with elastic search version?
it will work as follows :-
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"Author": "roh*"
}
},
{
"wildcard": {
"Title": "q*"
}
}
]
}
}
}

Using regular expressions in elasticsearch term queries

I want find all items filtered by ID match some regular expression like
*TEST123* //pattern for regexp
So expected result are items
ATEST123001
ATEST123002
ATEST123003
TTTTEST123001
...
I can create some script which scan full storage and save IDs in log-file which can check later. But I want to find some better solution
Updated
I tried
"query" : { "match_all" : { }, "filtered" : { "filter" : { "regexp": { "id":".test123." } } } }, }
I receive
//nested: ElasticsearchParseException[Expected field name but got START_OBJECT \"filtered\"]
When I tried
{
"regexp": {
"id": "test123"
}
}
//Parse Failure [No parser for element [regexp]]]
ES 1.7.4 and Lucene 4.10.4
You can use regular expression queries. The regexp query allows you to use regular expression term queries.
Ref:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html
Sample regex query :
{
"regexp":{
"id": "*test123*"
}
}
Update:
In 2.0 regexp filter has been replaced by regexp query.
{
"query": {
"filtered": {
"filter": {
"regexp":{
"id":".*TEST123.*"
}
}
}
}
}
You can try Query String.
{
"query": {
"query_string": {
"default_field": "if",
"query": "*test123*"
}
}
}

Elasticsearch escape "¬" character in regex

I am stuck with this symbol "¬" when trying to run a elasticsearch regex query
to return from set of record in format "prefix-content¬value".
Example (not limited to website pattern, can be any value) : "website-website descriptions that not required¬www.google.com" .
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"regexp": {
"information": "(website?)(.*¬)(www.google.com?)"
}
}
}
}
}
Has anyone encounter such problem before and manage to handle this ? Thanks.