Kibana Regular expression search - regex

I am newbie to ELK. I want to search for docs based on order of occurrence of words in a field. For example,
In doc1, my_field: "MY FOO WORD BAR EXAMPLE" In doc2, my_field: "MY BAR WORD FOO EXAMPLE"
I would like to query in Kibana for docs where "FOO" is followed by "BAR" and not the opposite. So, I would like doc1 to return in this case and not doc2.
I tried using below query in Kibana search. But, it is not working. This query doesn't even produce any search results.
my_field.raw:/.*FOO.*BAR.*/
I also tried with analyzed field(just my_field), though I came to know that should not work. And of course, that didn't produce any results either.
Please help me with this regex search. Why am I not getting any matching result for that query?

I'm not sure offhand why that regex query wouldn't be working but I believe Kibana is using Elasticsearch's query string query documented here so for instance you could do a phrase query (documented in the link) by putting your search in double quotes and it would look for the word "foo" followed by "bar". This would perform better too since you would do this on your analyzed field (my_field) where it has tokenized each word to perform fast lookups. So you search in Kibana would be:
my_field: "FOO BAR"
Update:
Looks like this is an annoying quirk of Kibana (probably for backwards compatability reasons). Anyway, this isn't matching for you because you're searching against a non-analyzed field and apparently Kibana by default is lowercasing the search therefore it won't match the the non-analyzed uppercase "FOO". You can configure this in Kibana advanced settings mentioned here, specifically by setting the configuration option "lowercase_expanded_terms" to false.

Kibana’s standard query language is based on Lucene query syntax.
And the default analyzer will tokenize the text to different words: [MY, FOO, WORD, BAR, EXAMPLE]
Instead of using regex match, you can try the following search string in Kibana:
my_field: FOO AND my_field: BAR
And if your "my_field" data looks like "MYFOOWORDBAREXAMPLE",which can not be tokenized, you should use the query string:
my_field: *FOO*BAR*

GET /_search
{
"query": {
"regexp": {
"user": {
"value": "k.*y",
"flags" : "ALL",
"max_determinized_states": 10000,
"rewrite": "constant_score"
}
}
}
}
More details on here

Related

What is the correct way to format regex metacharacters and options when using the regex operator in $searchBeta in MongoDB?

I'm trying to do full-text search in MongoDB with $searchBeta (aggregation) and I'm using the 'regex' operator to do so. Here's the portion of the $searchBeta I have that isn't working how I expecting it would:
$searchBeta: {
regex: {
query: '\blightn', // '\b' is the word boundary metacharacter
path: ["name", "set_name"],
allowAnalyzedField: true
}
}
Here's an example of two documents that I'm expecting to get matched by the expression:
{
"name": "Lightning Bolt"
"set_name": "Masters 25"
},
{
"name": "Chain Lightning",
"set_name": "Battlebond"
}
What I actually get:
[] //empty array
If I use an expression like:
$searchBeta: {
regex: {
query: '[a-zA-Z]'
path: ["name", "set_name"],
allowAnalyzedField: true
}
}
then I get results back.
I can't get any expression that has regex metacharacters and/or options in it to work, so I'm pretty sure I'm just entering it wrong in my query string. The $searchBeta regex documentation doesn't really cover how to format metacharacters into your query string. Also, the $searchBeta regex operator is different from $regex because it doesn't require slashes (i.e. "/your expression/" ). Really pulling my hair out on something so simple that I can't figure out.
$searchBeta uses Lucene for regular expressions, which is not Perl Compatible (PCRE) and doesn't support \b. You can read about the Lucene regex syntax here and also Elastic's docs on it are also helpful.
Here is a similar question for ElasticSearch and includes some workarounds.

Kibana Custom Filter ,How to create Regex to eliminate all terms with numeric values

I have a list of request coming in based on free text searches or codes.
I would like to eliminate the code-like requests, and only keep the natural language request.
Therefore I would need a query that can separate those terms.
Below is the query-json I already tried
{
"query": {
"regexp": {
"q": "[^\d\W]"}
}
}
}
error I get is "Bad String" for the following line "q": "[^\d\W]"}
Expected would be to improve the regex in order to be able to keep the relevant data
You may use
"regexp": {
"q": "[^0-9]+"}
}
The Lucene regex engine used in Kibana anchors all patterns by default, so [^0-9]+ will match any string, from start to end of which there are only characters other than digits.
Moreover, \d and \W and other shorthand character classes are not supported either.

Regex HTTP Response Body Message

I use a jmeter for REST testing.
I have made a HTTP Request, and this is the response data:
{"id":11,"name":"value","password":null,"status":"ACTIVE","lastIp":"0.0.0.0","lastLogin":null,"addedDate":1429090984000}
I need just the ID (which is 11) in
{"id":11,....
I use the REGEX below :
([0-9].+?)
It works perfectly but it will be a problem if my ID more than 2 digits. I need to change the REGEX to :
([0-9][0-9].+?)
Is there any dynamic REGEX for my problem. Thank you for your attention.
Regards,
Stefio
If you want any integer between {"id": and , use the following Regular Expression:
{"id":(\d+),
However the smarter way of dealing with JSON data could be JSON Path Extractor (available via JMeter Plugins), going forward this option can be much easier to use against complex JSON.
See Using the XPath Extractor in JMeter guide (scroll down to "Parsing JSON") to learn more on syntax and use cases.
I suggest using the following regular expression:
"id":([^,]*),
This will first find "id": and then look for anything that is not a comma until it finds a comma. Note the character grouping is only around the value of the ID.
This will work for ANY length ID.
Edit:
The same concept works for almost any JSON data, for example where the value is quoted:
"key":"([^"]*)"
That regular expression will extract the value from given key, as long as value is quoted and does not contain quotes. It first finds "key": and then matches anything that is not a quote until the next quote.
You can use the quantifier like this:
([0-9]{2,}.+?)
It will catch 2 or more digits, and then any symbol, 1 or more times. If you want to allow no other characters after the digits, use * instead of +:
([0-9]{2,}.*?)
Regex demo

Kibana query exact match

I would like to know how to query a field to exactly match a string.
I'm actually trying to query like this:
url : "http://www.domain_name.com"
Which returns all string starting with http://www.domain_name.com .
I had a similar issue, and ifound that ".raw" fixed it - in your example, try
url.raw : "http://www.domain_name.com"
Or for newer versions of ES(5.x, 6.x):
url.keyword : "http://www.domain_name.com"
Just giving more visibility to #dezhi's comment.
in newer version of ES(5.x, 6.x),
you should use `url.keyword` instead,
as they have changed to a new keyword type.
Therefore, it would be:
url.keyword : "http://www.domain_name.com"
Exact value is not supported out of the box.
http://blogs.perl.org/users/mark_leighton_fisher/2012/01/stupid-lucene-tricks-exact-match-starts-with-ends-with.html
Out of the box, Lucene does not provide exact field matches, like matching "Acer Negundo Ab" and only "Acer Negundo Ab" (not also "Acer
Negundo Ab IgG" ). Neither does Lucene provide "Starts With" or "Ends
With" functionality. Fortunately, there are workarounds.
"Cannot change the info of a user"
To search for an exact string, you need to wrap the string in double
quotation marks. Without quotation marks, the search in the example
would match any documents containing one of the following words:
"Cannot" OR "change" OR "the" OR "info" OR "a" OR "user".
Kibana v6.5
As per you query, it seems fine.
For matching the exact following is the syntax :
fieldname : string
and
For matchign the Substring, use wild card (*),
Syntax :
fieldname : *string*
Also, whatever the query you applied; is that query is the part of Query Criteria of your particuler output component.
So, i suggest you to check whether any of the wild card is applied in your search.

ElasticSearch Regexp Filter

I'm having problems correctly expressing a regexp for the ElasticSearch Regexp Filter. I'm trying to match on anything in "info-for/media" in the url field e.g. http://mydomain.co.uk/info-for/media/press-release-1. To try and get the regex right I'm using match_all for now, but this will eventually be match_phrase with the user's query string.
POST to localhost:9200/_search
{
"query" : {
"match_all" : { },
"filtered" : {
"filter" : {
"regexp": {
"url":".*info-for/media.*"
}
}
}
},
}
This returns 0 hits, but does parse correctly. .*info.* does get results containing the url, but unfortunately is too broad, e.g. matching any urls containing "information". As soon as I add the hyphen in "info-for" back in, I get 0 results again. No matter what combination of escape characters I try, I either get a parse exception, or no matches. Can anybody help explain what I'm doing wrong?
First, to the extent possible, try to never use regular expressions or wildcards that don't have a prefix. The way a search for .*foo.* is done, is that every single term in the index's dictionary is matched against the pattern, which in turn is constructed into an OR-query of the matching terms. This is O(n) in the number of unique terms in your corpus, with a subsequent search that is quite expensive as well.
This article has some more details about that: https://www.found.no/foundation/elasticsearch-from-the-bottom-up/
Secondly, your url is probably tokenized in a way that makes "info-for" and "media" separate terms in your index. Thus, there is no info-for/media-term in the dictionary for the regexp to match.
What you probably want to do is to index the path and the domain separately, with a path_hierarchy-tokenizer to generate the terms.
Here is an example that demonstrates how the tokens are generated: https://www.found.no/play/gist/ecf511d4102a806f350b#analysis
I.e. /foo/bar/baz generates the tokens /foo/bar/baz, /foo/bar, /foo and the domain foo.example.com is tokenized to foo.example.com, example.com, com
A search for anything in below /foo/bar could then be a simple term filter matching path:/foo/bar. That's a massively more performant filter, which can also be cached.