Not able to get desired search results in ElasticSearch search api - regex

I have field "xyz" on which i want to search. The type of the field is keyword. The different values of the field "xyz "are -
a/b/c/d
a/b/c/e
a/b/f/g
a/b/f/h
Now for the following query -
{
"query": {
"query_string" : {
"query" : "(xyz:(\"a/b/c\"*))"
}
}
}
I should only get these two results -
a/b/c/d
a/b/c/e
but i get all the four results -
a/b/c/d
a/b/c/e
a/b/f/g
a/b/f/h
Edit -
Actually i am not directly querying on ElasticSearch, I am using this API https://atlas.apache.org/api/v2/resource_DiscoveryREST.html#resource_DiscoveryREST_searchWithParameters_POST which creates the above mentioned query for elasticsearch, so i dont have much control over the elasticsearch query_string. What i can change is the elasticsearch analyzer for this field or it's type.

You'll need to let the query_string parser know you'll be using regex so wrap the whole thing in /.../ and escape the forward slashes:
{
"query": {
"query_string": {
"query": "xyz:/(a\\/b\\/c\\/.*)/"
}
}
}
Or, you might as well use a regexp query:
{
"query": {
"regexp": {
"xyz": "a/b/c/.*"
}
}
}

Related

Elasticsearch Update Doc String Replacement

I have some documents on my Elasticsearch. I want to update my document contents by using String Regexp.
For example, I would like to replace all http words into https words, is it possible ?
Thank You
This should get you off to a start. Check out the "Update by Query" API here. The API allows you to include the update script and search query in the same request body.
Regarding your case, an example might look like this...
POST addresses/_update_by_query
{
"script":
{
"lang": "painless",
"inline": "ctx._source.data.url = ctx._source.data.url.replace('http', 'https')"
},
"query":
{
"query_string":
{
"query": "http://*",
"analyze_wildcard": true
}
}
}
Pretty self explanatory, but script is where we do the update, and query returns the documents to update.
Painless supports regex so you're in luck, look here for some examples, and update the inline value accordingly.

Elasticsearch matching words

I have a problem. I want my query to match mysql with mysql!, mysql(, mysql ... in other words I want my query result to show only these values which have just word "mysql" OR word "mysql" and special characters around this word.
For example query should get - mysql, mysql, #mysql, mysql$. But not "mysql and R" or "mysql mysql".
I tried this query for matching, but I keep getting query malformed, no field after start_object, so not sure what to do.
Also wanted to ask maybe there are other option to perform the same task ?
POST skills/_search
{
"query": {
"filtered": {
"query": {
"match": {
"skillname": "mysql"
}
},
"filter": {
"type":"pattern_capture",
"patterns":["mysql([##]\\w+)"],
"preserve_original": true
}
}
}
}

How to use the elasticsearch regex query correctly?

I am working on translating a Splunk query to Elasticsearch DSL.
I want to check if a URL in the logs contains something like:
"script>" OR "UNION ALL SELECT"
Fair enough I thought, went to the doc, and:
{
"regexp": {
"http.url": "script>"
}
}
Elasticsearch (2.3) replies:
"root_cause": [
{
"reason": "failed to parse search source. unknown search element [regexp]",
"type": "search_parse_exception",
"line": 2,
Could someone enlighten me please about these kinds of queries?
This is a pretty straightforward mistake when starting out with the documentation. In the docs, we generally only show the raw query (and its parameters). Queries are either compound queries or leaf queries. regexp is an example of a leaf query.
However, that's not enough to actually send the query. You're missing a simple wrapper part of the DSL for any query:
{
"query": {
"regexp": {
"http.url": "script>"
}
}
}
To use a compound query, the best way is to use the bool compound query.
It has must, must_not, should, or filter and each accept an array of queries (or filters, which are just scoreless, cacheable queries). should is the OR-like aspect of it, but do read the docs on how it behaves when you add must alongside it. The gist is that should by itself is exactly like an OR (as shown below), but if you combine it with must, then it becomes completely optional without using "minimum_should_match": 1.
{
"query": {
"bool": {
"should": [
{
"term": {
"http.url": "script>"
}
},
{
"term": {
"http.url": "UNION ALL SELECT"
}
}
]
}
}
}

Kibana regex not work

I need to search for a range value from my logs, but my regex doesn't work in Kibana.
/(took":[1-9][0-9][0-9][,])/g
Content:
{"real_time":"2016-05-03T10:02:13.360Z","content":{"delay":687,"updated":true,"searchItems":{"monitoring_id":"111354","params":{"pass":["111354"],"named":{"d":"2016-04-29|2016-04-30"},"action":"mentions","plugin":null,"controller":"api11","form":[],"url":{"url":"1.1\/mentions\/111354\/","publickey":"yn68FDuQ","time":"1462303544,8356","signature":"102ade1f6749e89be876fdb00a7b9ade","published_date":"2016-04-29|2016-04-30","ipp":"100","page":"14"},"isAjax":false},"source_ids":"","timestamp":"","pagination":"1300, 100","trackerId":"","onlyIds":[],"exceptIds":[],"timezone":"Brazil\/East"},"search":[{"index":"mentions_ro","type":"mention","from":1300,"size":100,"body":{"query":{"bool":{"must":[{"term":{"monitoring.id":"111354"}},{"range":{"published_at":{"gte":"1969-12-31T21:00:00-03:00","lte":"1969-12-31T21:00:00-03:00"}}}]}},"sort":{"published_at":{"order":"desc"}}},"fields":[]}],"response":{"took":500,"timed_out":false,"_shards":{"total":21,"successful":21,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}}}
My regex is working here, however:
https://regex101.com/r/pV4mR7/1
Obs:
I already tried to escape some characters
If I look the request sent to Elastic, Kibana uses a query string:
Any tips?
According to their documentation these characters are always metacharacters, and must be escaped if you want them as literals:
. ? + * | { } [ ] ( ) " \
These characters are metacharacters under certain modes:
# & < > ~ #
You don't need to put the comma in a char class.
It looks like you might not be able to just throw the regex in the search box.
Kibana only matches regexp over the _all field:
Try to "inspect" one of the elements in your page, you will see that _all field is hardcoded :
"global": true,
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"regexp": {
"_all": {
"value": "category: /pattern/"
> https://github.com/elastic/kibana/issues/631
Try this:
(took\":[1-9][0-9][0-9],)
I'm not familiar with Elasticsearch or Kibana, but your query may end up looking like this:
"regexp": {
"_all": {
"value": "category: /(took\":[1-9][0-9][0-9],)/"
}
}

Elasticsearch - behavior of regexp query against a non-analyzed field

What is the default behavior for a regexp query against a non-analyzed field? Also, is that the same answer when dealing with .raw fields?
After everything i've read, i understand the following.
1. RegExp queries will work on analyzed and non-analyzed fields.
2. A regexp query should work across the entire phrase rather than just matching on a single token in non-analyzed fields.
Here's the problem though. I can not actually get this to work. I've tried it across multiple fields.
The setup i'm working with is a stock elk install and i'm dumping pfsense and snort logs into it with a basic parser. I'm currently on Kibana 4.3 and ES 2.1
I did a query to look at the mapping for one of the fields and it indicates it is not_analyzed, yet the regex does not work across the entire field.
"description": {
"type": "string",
"norms": {
"enabled": false
},
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
What am i missing here?
if a field is non-analyzed, the field is only a single token.
It's same answer when dealing with .raw fields, at least in my work.
You can use groovy script:
matcher = (doc[fields.raw].value =~ /${pattern}/ );
if(matcher.matches()) {
matcher.group(matchname)}
you can pass pattern and matchname in params.
What's meaning of tried it across multiple fields.? If your situation is more complex, maybe you could make a native java plugin.
UPDATE
{
"script_fields" : {
"regexp_field" : {
"script" : "matcher = (doc[fieldname].value =~ /${pattern}/ );if(matcher.matches()) {matcher.group(matchname)}",
"params" : {
"pattern" : "your pattern",
"matchname" : "your match",
"fieldname" : "fields.raw"
}
}
}
}