regex breaks when I use a colon(:) - regex

I just started working with elastic search. By started working I mean I have to query an already running elastic database. Is there a good documentation of the regex they follow. I know about the one on their official site, but its not very helpful.
The more specific problem is that I want to query for lines of the sort:
10:02:37:623421|0098-TSOT {TRANSITION} {ID} {1619245525} {securityID} {} {fromStatus} {NOT_PRESENT} {toStatus} {WAITING}
or
01:01:36:832516|0058-CT {ADD} {0} {3137TTDR7} {23} {COM} {New} {0} {0} {52} {1}
and more of a similar structure. I don't want a generalized regex. If possible, could someone give me a regex expression for each of these that would run with elastic?
I noticed that it matches if the regexp matches with a substring too when I ran with:
query = {"query":
{"regexp":
{
"message": "[0-9]{2}"
}
},
"sort":
[
{"#timestamp":"asc"}
]
}
But it wont match anything if I use:
query = {"query":
{"regexp":
{
"message": "[0-9]{2}:.*"
}
},
"sort":
[
{"#timestamp":"asc"}
]
}
I want to write regex that are more specific and that are different for the two examples given near the top.

turns out my message is present in the tokenized form instead of the raw form, and : is one of the default delimiters of the tokenizer, in elastic. And as a reason, I can't use regexp query on the whole message because it matches it with each token individually.

Related

Find string in between in kibana elastic search with regex like in splunk

In splunk, we can filter out dynamic string in between two strings.
Say for example,
<TextileType>Shirt</TextileType>
<TextileType>Trousers</TextileType>
<TextileType>Shirt</TextileType>
<TextileType>Trousers</TextileType>
<TextileType>Shirt</TextileType>
The output I am expecting:
Shirt - 3
Trousers - 2
I am able to do this in splunk, easily.
Picture copied from Google (not exact one)
How can I achieve this in Kibana ?
Tried many ways, but not able to do any regex as per my need.
Note: Here's the example json query, in which I need to add regex. In this example, I am just trying to search for "Shirt" manually, which I am expecting to get dynamically.
{
"query": {
"match": {
"text": {
"query": "Shirt",
"type": "phrase"
}
}
}
}
Considering data is in the sample index, you can use a wildcard search:
GET /sample/_search
{
"query": {
"wildcard":{
"column2":"*Shirt*"
}
}
}
Notice how it only returns results containing keyword Shirt
If you are looking to clean the data, you'd need to run it through a logstash pipeline to strip the XML tags and leave you with the text.

How to write a regexp in elastisearch so that it gives me URLs with numbers?

I am trying to write a query in Kibana which works with Elastisearch Query DSL. The basic filter is as follows:
{
"query": {
"match": {
"path": {
"query": "/abc/",
"type": "phrase"
}
}
}
}
Now I need to write a query so that it gives me "path" which is of the form /abc/(0-9)/.
I tried the reference provided here but it does not make sense to me (I am not well versed with Elasticsearch):
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html
I would like to filter out results which are of the form path = /abc/12345/
This RegEx might help you to do so:
\x22query\x22:\s\x22(\/.*)\x22
It creates a target capturing group, where your desired output is and you might be able to call it using $1.
You may add additional boundaries to your pattern, if you wish, such as this RegEx:
\x22query\x22:\s\x22([\/a-z0-9]+)\x22

Elasticsearch Query on indexes whose name is matching a certain pattern

I have a couple of indexes in my Elasticsearch DB as follows
Index_2019_01
Index_2019_02
Index_2019_03
Index_2019_04
.
.
Index_2019_12
Suppose I want to search only on the first 3 Indexes.
I mean a regular expression like this:
select count(*) from Index_2019_0[1-3] where LanguageId="English"
What is the correct way to do that in Elasticsearch?
How can I query several indexes with certain names?
This can be achieved via multi-index search, which is a built-in capability of Elasticsearch. To achieve described behavior one should try a query like this:
POST /index_2019_01,index_2019_02/_search
{
"query": {
"match": {
"LanguageID": "English"
}
}
}
Or, using URI search:
curl 'http://<host>:<port>/index_2019_01,index_2019_02/_search?q=LanguageID:English'
More details are available here. Note that Elasticsearch requires index names to be lowercase.
Can I use a regex to specify index name pattern?
In short, no. It is possible to use index name in queries using a special "virtual" field _index but its use is limited. For instance, one cannot use a regexp against index name:
The _index is exposed as a virtual field — it is not added to the
Lucene index as a real field. This means that you can use the _index
field in a term or terms query (or any query that is rewritten to a
term query, such as the match, query_string or simple_query_string
query), but it does not support prefix, wildcard, regexp, or fuzzy
queries.
For instance, the query from above can be rewritten as:
POST /_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"_index": [
"index_2019_01",
"index_2019_02"
]
}
},
{
"match": {
"LanguageID": "English"
}
}
]
}
}
}
Which employs a bool and a terms queries.
Hope that helps!
Why use POST when you are not adding any additional data to it.
I advise using GET for your case. Secondly, If the Index have similar names like in your case, you should be using an index pattern like in the query below,
GET /index_2019_*/_search
{
"query": {
"match": {
"LanguageID": "English"
}
}
}
OR in a URL
curl -XGET "http://<host>:<port>/index_2019_*/_search" -H 'Content-Type: application/json' -d'{"query": {"match":{"LanguageID": "English"}}}'
While searching for indices using a regex is not possible you might be able to use date math to take you a bit further.
You can look at the docs here
As an example, lets say you wish the last 3 months from those indices
that means that if we have
index_2019_01
index_2019_02
index_2019_03
index_2019_04
And today is 2019/04/20, we could use the following query to get 04,03 and 02
GET /<index-{now/M-0M{yyyy_MM}}>,<index-{now/M-1M{yyyy_MM}}>,<index-{now/M-2M{yyyy_MM}}>
I used M-0M for the first one so the query construction loop doesn't need a special case for the first index
Look at the docs regarding URL encoding this query and how to have literal braces in the index name, if a client is used the URL encoding is done for you (at least in the python client)

Regular Expression If 2nd parameter is Enrollment

I have below response
{
"id": "3452",
"enrollable_id": "3452",
"enrollable_type": "Enrollment"
}
{
"id": "3453",
"enrollable_id": "3453",
"enrollable_type": "Task"
}
{
"id": "3454",
"enrollable_id": "3454",
"enrollable_type": "Enrollment"
}
{
"id": "3455",
"enrollable_id": "3455",
"enrollable_type": "Task"
}
I would like to get id [3452 and 3454] only if enrollable_type= Enrollment. This is for jmeter regex extractor so it would be great if I can just use one liner regex to fetch 3452 and 3454.
The RegEx you are looking for is:
_id":\s*"([^"]+(?=[^\0}]+_type":\s*"E))
Try it online!
Explanation
_id":\s*" Finds the place where the enrollment_id is
[^"]+(?= Matches the ID if:
[^\0}]+_type":\s* Finds the place where enrollable_type is
"E Checks if the enrollable type begins with an uppercase E
) End if
( ) Captures the ID
It's important to note that this RegEx will match on valid people and capture the valid ID. This means you will need to get each match's capture rather than just getting each match.
Disclaimer
The above RegEx contains backslashes, which you will need to escape if using the RegEx as a string literal.
This is the RegEx with all necessary-to-escape characters escaped:
_id":\\s*"([^"]+(?=[^\\0}]+_type":\\s*"E))
It's usually a bad idea to parse structured data with just a regex, but if you're intent on going this route then here you go:
"(\d+)"\s*,\s*(?="enrollable_type":\s*"Enrollment")
This assumes that entrollable_type always follows enrollable_id and that everything is quoted consistently with a little allowance for variance in white space. You should be able to handle a little more variance if necessary, such as if you're unsure if can depend on keys or data being quoted (["']?). However, if you can depend on the order of the properties (such as if they type comes before id) then you should abandon using a regex.
Here's a sample working in JavaScript
const text = `{ "id": "3452", "enrollable_id": "3452", "enrollable_type": "Enrollment" } { "id": "3453", "enrollable_id": "3453", "enrollable_type": "Task" } { "id": "3454", "enrollable_id": "3454", "enrollable_type": "Enrollment" } { "id": "3455", "enrollable_id": "3455", "enrollable_type": "Task" }`;
const re = /"(\d+)"\s*,\s*(?="enrollable_type":\s*"Enrollment")/g;
var match;
while(match = re.exec(text)) {
console.log(match[1]);
}
Your response seems to be a JSON one (however it's malformed). If this is the case and it's really JSON - I would recommend going for JSON Extractor instead as regular expressions are fragile, sensitive to markup change, new lines, order of elements, etc. while JSON Extractor looks only into the content.
The relevant JSON Path query would be something like:
$..[?(#.enrollable_type == 'Enrollment')].enrollable_id
Demo:
More information: JMeter's JSON Path Extractor Plugin - Advanced Usage Scenarios
You can extract the data in 2 ways
Using Json Extractor.
To extract data using json extractor response data should follow json syntax rules,
To extract data use the following JSON path in json extractor
$..[?(#.enrollable_type=="Enrollment")].id
and use match no -1 as shown below
To extract data using regular expression extractor use the following regex
id": "(.+?)",\s*(.+?)\s*"enrollable_type": "Enrollment
template : $1$2$3$4$
Match no -1
as shown below
you can see the variables stored using debug sampler
More information
extract variables

Mongodb distinct query with contains query

I have a mongo collection User which contains data like:-
{
id : 1,
name : "gaurav",
skills : "C++ HTML CSS"
}
when I am searching for all users that have C++ skill in it with the following query I am getting correct results as expected
db.user.find({skills:{contains:"C++"}});
But when I am searching all the unique names from the user using the same condition I m not getting any desired result
db.user.distinct('name',{skills:{contains:"C++"}});
Can anyone help me with what I am doing wrong?
The "contains" is not a valid keyword for MongoDB queries. You need $regex which submits a general "regular expression" statement matching the pcre specifications:
db.user.distinct( "name", { "skills": { "$regex": "C\+\+" } })
If using JavaScript as you language then this is also safe:
db.user.distinct( "name", { "skills": /C\+\+/ })
To determine if the string "C++" occurred somewhere within the string value of the field being tested. The + character is reserved in "regex" operations and therefore you need to escape it with a \ char as the standard escaping mechanism.
On your data this is the result:
db.user.distinct( "name", { "skills": { "$regex": "C\+\+" } })
[ "gaurav" ]
Try to use REGEX like below query
db.user.distinct("name",{"skills":{"$regex":"C++.*"}})