Elasticsearch - behavior of regexp query against a non-analyzed field - regex

What is the default behavior for a regexp query against a non-analyzed field? Also, is that the same answer when dealing with .raw fields?
After everything i've read, i understand the following.
1. RegExp queries will work on analyzed and non-analyzed fields.
2. A regexp query should work across the entire phrase rather than just matching on a single token in non-analyzed fields.
Here's the problem though. I can not actually get this to work. I've tried it across multiple fields.
The setup i'm working with is a stock elk install and i'm dumping pfsense and snort logs into it with a basic parser. I'm currently on Kibana 4.3 and ES 2.1
I did a query to look at the mapping for one of the fields and it indicates it is not_analyzed, yet the regex does not work across the entire field.
"description": {
"type": "string",
"norms": {
"enabled": false
},
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
}
What am i missing here?

if a field is non-analyzed, the field is only a single token.
It's same answer when dealing with .raw fields, at least in my work.
You can use groovy script:
matcher = (doc[fields.raw].value =~ /${pattern}/ );
if(matcher.matches()) {
matcher.group(matchname)}
you can pass pattern and matchname in params.
What's meaning of tried it across multiple fields.? If your situation is more complex, maybe you could make a native java plugin.
UPDATE
{
"script_fields" : {
"regexp_field" : {
"script" : "matcher = (doc[fieldname].value =~ /${pattern}/ );if(matcher.matches()) {matcher.group(matchname)}",
"params" : {
"pattern" : "your pattern",
"matchname" : "your match",
"fieldname" : "fields.raw"
}
}
}
}

Related

Not able to get desired search results in ElasticSearch search api

I have field "xyz" on which i want to search. The type of the field is keyword. The different values of the field "xyz "are -
a/b/c/d
a/b/c/e
a/b/f/g
a/b/f/h
Now for the following query -
{
"query": {
"query_string" : {
"query" : "(xyz:(\"a/b/c\"*))"
}
}
}
I should only get these two results -
a/b/c/d
a/b/c/e
but i get all the four results -
a/b/c/d
a/b/c/e
a/b/f/g
a/b/f/h
Edit -
Actually i am not directly querying on ElasticSearch, I am using this API https://atlas.apache.org/api/v2/resource_DiscoveryREST.html#resource_DiscoveryREST_searchWithParameters_POST which creates the above mentioned query for elasticsearch, so i dont have much control over the elasticsearch query_string. What i can change is the elasticsearch analyzer for this field or it's type.
You'll need to let the query_string parser know you'll be using regex so wrap the whole thing in /.../ and escape the forward slashes:
{
"query": {
"query_string": {
"query": "xyz:/(a\\/b\\/c\\/.*)/"
}
}
}
Or, you might as well use a regexp query:
{
"query": {
"regexp": {
"xyz": "a/b/c/.*"
}
}
}

Issues with regex in Kibana

I am having a hard time using a regex pattern inside Kibana/Elasticsearch version 6.5.4. The field I am searching for has the following mapping:
"field": {
"type": "text",
"analyzer": "custom_analyzer"
},
Regex searches in this field return several hits when requested straight to elasticsearch:
GET /my_index/_search
{
"query": {
"regexp":{
"field": "abc[0-9]{4}"
}
}
}
On the other hand, in Kibana's discover/dashboard pages all queries below return empty:
original query - field:/abc[0-9]{4}/
scaped query - field:/abc\[0\-9\]\{4\}/
desperate query - field:/.*/
Inspecting the request done by kibana to elasticsearch reveals the following query:
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "field:/abc[0-9]{4}/",
"analyze_wildcard": true,
"default_field": "*"
}
}
I expected kibana to understand the double forward slash syntax /my_query/ and make a ´regexp query´ instead of a ´query_string´. I have tried this with both query languages: "lucene", "kuery" and with the optional "experimental query features" enabled/disabled.
Digging further I found this old issue which says that elastic only runs regex into the now deprecated _all field. If this still holds true I am not sure how regex work in kibana/elastic 6.X.
What am I missing? Any help in clarifying the conditions to use regex in Kibana would be much appreciated
All other stack questions in this subject are either old or were related to syntax issues and/or lack of understanding of how the analyzer deals with whitespaces and did not provide me any help.
So I don't exactly have the answer on how to make Lucene work with Regexp search in Kibana. But I figured out a way to do this in Kibana.
Solution is to use Filter with custom DSL
Here is an example of what to put in Query JSON -
{
"regexp": {
"req.url.keyword": "/question/[0-9]+/answer"
}
}
Example Url I have in my data - /questions/432142/answer
Additional to this, you can write more filters using Kibana search (Lucene syntax)
It does the appropriate search, no escaping issue or any such thing.
Hope it helps.
Make sure Kibana hasn't got query feature turned on in top right.

Elasticsearch Update Doc String Replacement

I have some documents on my Elasticsearch. I want to update my document contents by using String Regexp.
For example, I would like to replace all http words into https words, is it possible ?
Thank You
This should get you off to a start. Check out the "Update by Query" API here. The API allows you to include the update script and search query in the same request body.
Regarding your case, an example might look like this...
POST addresses/_update_by_query
{
"script":
{
"lang": "painless",
"inline": "ctx._source.data.url = ctx._source.data.url.replace('http', 'https')"
},
"query":
{
"query_string":
{
"query": "http://*",
"analyze_wildcard": true
}
}
}
Pretty self explanatory, but script is where we do the update, and query returns the documents to update.
Painless supports regex so you're in luck, look here for some examples, and update the inline value accordingly.

How to use the elasticsearch regex query correctly?

I am working on translating a Splunk query to Elasticsearch DSL.
I want to check if a URL in the logs contains something like:
"script>" OR "UNION ALL SELECT"
Fair enough I thought, went to the doc, and:
{
"regexp": {
"http.url": "script>"
}
}
Elasticsearch (2.3) replies:
"root_cause": [
{
"reason": "failed to parse search source. unknown search element [regexp]",
"type": "search_parse_exception",
"line": 2,
Could someone enlighten me please about these kinds of queries?
This is a pretty straightforward mistake when starting out with the documentation. In the docs, we generally only show the raw query (and its parameters). Queries are either compound queries or leaf queries. regexp is an example of a leaf query.
However, that's not enough to actually send the query. You're missing a simple wrapper part of the DSL for any query:
{
"query": {
"regexp": {
"http.url": "script>"
}
}
}
To use a compound query, the best way is to use the bool compound query.
It has must, must_not, should, or filter and each accept an array of queries (or filters, which are just scoreless, cacheable queries). should is the OR-like aspect of it, but do read the docs on how it behaves when you add must alongside it. The gist is that should by itself is exactly like an OR (as shown below), but if you combine it with must, then it becomes completely optional without using "minimum_should_match": 1.
{
"query": {
"bool": {
"should": [
{
"term": {
"http.url": "script>"
}
},
{
"term": {
"http.url": "UNION ALL SELECT"
}
}
]
}
}
}

Extract multiple strings using Regex jmeter

I have the following JSON response.
{ "Customer1": { "details": { "acc": { "number": "91422915166" }, "phone": { "number": "98400915180" } }, "DateofBirth": "1979-04-03", "firstName": "Harry", "lastName": "Potter" } }
Jmeter script structure:
Thread group (Get customer details)
+Regular expression extractor
.....name: customer
.....expression:"number":(.+?)"DateofBirth":"(.+?)"
.....MatchNo: -1
I want to use an extractor expression that only extracts the Phone "number". My present code is extracting both acc "number" and phone "number". Can you please tell me what expression I need to use in order to get this working? Thank you
If I understand correctly, the first number in your response is account number. You don't want that. If so
Expression : "phone": { "number": "(\d+)" }
should help.
P.S. : In your expression you also have DateofBirth in the expression. You needed only the number and not DateofBirth. If you want to extract two variables with multiple occurrences, I have tutorial exactly for that here. http://goo.gl/w3u1r