How to split json value in log file using grok/regular expression

How to split json value in log file using grok/regular expression - regex

I have one log file which I need to extract the json content from the file and I need to parse it using logstash json filter. I wrote one grok pattern but which it is not working properly. Below is my log file.
2016-12-18 12:13:52.313 -08:00 [Information] 636176600323139749 1b2c4c40-3da6-46ff-b93f-0eb07a57f2a3 18 - API: GET https://aaa.com/o/v/S?$filter=uid eq '9'&$expand=org($filter=org eq '0')
{
"Id": "1b",
"App": "D",
"User": "",
"Machine": "DC",
"RequestIpAddress": "xx.xxx.xxx",
"RequestHeaders": {
"Cache-Control": "no-transform",
"Connection": "close",
"Accept": "application/json"
},
"RequestTimestamp": "2016-12-18T12:13:52.2609587-08:00",
"ResponseContentType": "application/json",
"ResponseContentBody": {
"#od","value":[
{
"uid":"","sId":"10,org":[
{
"startDate":"2015-02-27T08:00:00Z","Code":"0","emailId":"xx#gg.COM"
}
]
}
]
},
"ResponseStatusCode": 200,
"ResponseHeaders": {
"Content-Type": "application/json;"
},
"ResponseTimestamp": "2016-12-18T12:13:52.3119655-08:00"
}
My Grok pattern
grok {
match => [ "message","%{TIMESTAMP_ISO8601:exclude}%{GREEDYDATA:exclude1}(?<exclude2>[\s])(?<json_value>[\W\w]+)"]
}

Assuming this is all one message (it's not multiline, or has been combined before now) and there's a space between the URI and the json, this grok pattern should work:
%{TIMESTAMP_ISO8601} %{NOTSPACE:timezone} \[%{WORD:severity}] %{WORD:field1} %{UUID:field2} %{NUMBER:field3} - API: %{WORD:verb} (?<field4>[^\{]*) %{GREEDYDATA:json}
It would have been nice to use %{URI}, but the string you have is not a valid URI (it contains unescaped spaces).

Related

ElasticSearch wildcard not returning when value has special characters

I have an elastic search service that fetches when you type into a text input to then populate a table. The search is working (returning filtered data) correctly for all alphanumeric values but not special characters (hyphens in particular). For example for the country Timor-Leste if I pass in Timor as the term I get the result but as soon as I add the hyphen (Timor-) I get an empty array response.
const queryService = {
search(tableName, field, term) {
// If there is no search term, run the wildcard search with 20 values
// for the smaller lists to be pre-populated, like "Gender"
return `
{
"size": ${term ? 200 : 20},
"query": {
"bool": {
"must": [
{
"match": {
"tablename": "${tableName}"
}
},
{
"wildcard": {
"${field}": {
"value": "${term ? `*${term.trim()}*` : '*'}",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
]
}
}
}
`;
},
};
Is there a way I can modify my wildcard request to allow hyphens? The other response I've seen on here has suggested using "analyze_wildcard": true which hasn't worked. I've also tried to manually escape by putting a \ before each hyphen with .replace.

It all boils down to Elasticsearch analyzers.
By default, all text fields will be run through the standard analyzer, e.g.:
GET _analyze/
{
"text": ["Timor-Leste"],
"analyzer": "standard"
}
This will lowercase your input, strip any special chars, and produce the tokens:
["timor", "leste"]
If you'd like to forgo this default process, add a .keyword mapping:
PUT your-index/
{
"mappings": {
"properties": {
"country": {
"type": "text",
"fields": { <---
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Then reindex your docs, and when dynamically constructing the wildcard query with the newly created .keyword field, make sure the hyphen (and all other special chars) is properly escaped:
POST your-index/_search
{
"query": {
"wildcard": {
"country.keyword": {
"value": "*Timor\\-*" <---
}
}
}
}

ElasticSearch regexp query of a path

So far I've used a query that would match paths and get aggregations of those paths:
{
"query": {
"terms": {
"path.keyword": [
"/api/v1.0/cc-dashboard/aggregated",
"/api/v1.1/cc-dashboard/aggregated",
"/api/v1.2/cc-dashboard/aggregated",
"/api/v1.3/cc-dashboard/aggregated"
]
}
},
"size": 0,
"aggs": { ...
Since the only difference between the paths is the version number (which keeps changing) I thought about using Regexp query.
In a normal regex I would search for \/api\/v1\.\d\/cc-dashboard\/aggregated.
I know ElasticSearch uses different reserved characters for this and I've tried everything I know, but the search comes back without hits.
Any Thoughts?

I think there are a couple of things to watch out for here. First make sure that path.keyword is actually of the type "keyword" or else you will have problem matching b/c you are actually trying to match against tokens and Elasticsearch will split on /. Second it doesn't look like Elasticsearch supports \d to escape for a digit, but it does allow [0-9]. Third to escape the . I had to use two backslashes \\.
So all together now:
PUT /stackoverflow
{
"mappings": {
"properties": {
"path.keyword": {
"type": "keyword"
}
}
}
}
POST /stackoverflow/_doc/1
{
"path.keyword": "/api/v1.0/cc-dashboard/aggregated"
}
POST /stackoverflow/_doc/2
{
"path.keyword": "/api/v1.1/cc-dashboard/aggregated"
}
POST /stackoverflow/_doc/3
{
"path.keyword": "/api/not/cc-dashboard/aggregated"
}
GET /stackoverflow/_search
GET /stackoverflow/_search
{
"query": {
"regexp": {
"path.keyword": {
"value": "/api/v1\\.[0-9]/cc-dashboard/aggregated"
}
}
}
}
DELETE /stackoverflow

Get keys from Json with regex Jmeter

I'm hustling with regex, and trying to get the id's from this body.
But only the id´s in the members list, and not the id in the verified key. :)
To clarify, I'm using Regular Expression Extractor in JMeter
{
"id": "9c40ffca-0f1a-4f93-b068-1f6332707d02", //<--not this
"me": {
"id": "38a2b866-c8a9-424f-a5d4-93b379f080ce", //<--not this
"isMe": true,
"user": {
"verified": {
"id": "257e30f4-d001-47b3-9e7f-5772e591970b" //<--not this
}
}
},
"members": [
{
"id": "88a2b866-c8a9-424f-a5d4-93b379f780ce", //<--this
"isMe": true,
"user": {
"verified": {
"id": "223e30f4-d001-47b3-9e7f-5772e781970b" //<--not this
}
}
},
{
"id": "53cdc218-4784-4e55-a784-72e6a3ffa9bc", //<--this
"isMe": false,
"user": {
"unverified": {
"verification": "XYZ"
}
}
}
]
}
at the moment I have this regex :
("id": )("[\w-]+")
But as you can see here it returns every guid
Any ideas on how to go on?
Thanks in advance.

Since the input data type is JSON, it is recommended to use the JMeter's JSON Path Extractor Plugin.
Once you add it, use the
$.members[*].id
JSON path expression to match all id values of each members in the document that are the top nodes.
If you may have nested memebers, you may get them all using
$..members[*].id
You may test these expressions at https://jsonpath.com/, see a test:

How to using String replace in velocity template (AWS appsync + elasticsearch)?

I'm writing a appsync query to search for records by phone number from elastic (using velocity template).
The data stored on the elastic has the form "0123456789" but the request may take the form "012-123-1234". So I intended to use the string replace function to remove "-" character. However, my code is returning the following error:
"message": "Lexical error, Encountered: \" _ \ "(95), after: \". \ "at * unset * [line 11, column 51]"
I am not sure if my writing is correct or not, please help.
This is my code:
{
"version":"2017-02-28",
"operation":"GET",
"path":"/res/res/_search",
"params":{
"headers":{},
"queryString":{},
"body":{
"from":$util.defaultIfNull($ctx.args.nextToken, 0),
"size":$util.defaultIfNull($ctx.args.limit, 20),
"query": {
"match": { "phoneNumber": "$context.args.phoneNumber".replace('-', '') }
}
}
}
}

well, I found the error, it's wrong position of " character.
"match": { "phoneNumber": "$context.args.phoneNumber".replace('-', '') }
=>
"match": { "phoneNumber": "$context.args.phoneNumber.replace('-', '')" }

ElasticSearch and Regex queries

I am trying to query for documents that have dates within the body of the "content" field.
curl -XGET 'http://localhost:9200/index/_search' -d '{
"query": {
"regexp": {
"content": "^(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.]((19|20)\\d\\d)$"
}
}
}'
Getting closer maybe?
curl -XGET 'http://localhost:9200/index/_search' -d '{
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"regexp":{
"content" : "^(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.]((19|20)\\d\\d)$"
}
}
}
}'
My regex seems to have been off. This regex has been validated on regex101.com The following query still returns nothing from the 175k documents I have.
curl -XPOST 'http://localhost:9200/index/_search?pretty=true' -d '{
"query": {
"regexp":{
"content" : "/[0-9]{4}-[0-9]{2}-[0-9]{2}|[0-9]{2}-[0-9]{2}-[0-9]{4}|[0-9]{2}/[0-9]{2}/[0-9]{4}|[0-9]{4}/[0-9]{2}/[0-9]{2}/g"
}
}
}'
I am starting to think that my index might not be set up for such a query. What type of field do you have to use to be able to use regular expressions?
mappings: {
doc: {
properties: {
content: {
type: string
}title: {
type: string
}host: {
type: string
}cache: {
type: string
}segment: {
type: string
}query: {
properties: {
match_all: {
type: object
}
}
}digest: {
type: string
}boost: {
type: string
}tstamp: {
format: dateOptionalTimetype: date
}url: {
type: string
}fields: {
type: string
}anchor: {
type: string
}
}
}
I want to find any record that has a date and graph the volume of documents by that date. Step 1. is to get this query working. Step 2. will be to pull the dates out and group them by them accordingly. Can someone suggest a way to get the first part working as I know the second part will be really tricky.
Thanks!

You should read Elasticsearch's Regexp Query documentation carefully, you are making some incorrect assumptions about how the regexp query works.
Probably the most important thing to understand here is what the string you are trying to match is. You are trying to match terms, not the entire string. If this is being indexed with StandardAnalyzer, as I would suspect, your dates will be separated into multiple terms:
"01/01/1901" becomes tokens "01", "01" and "1901"
"01 01 1901" becomes tokens "01", "01" and "1901"
"01-01-1901" becomes tokens "01", "01" and "1901"
"01.01.1901" actually will be a single token: "01.01.1901" (Due to decimal handling, see UAX #29)
You can only match a single, whole token with a regexp query.
Elasticsearch (and lucene) don't support full Perl-compatible regex syntax.
In your first couple of examples, you are using anchors, ^ and $. These are not supported. Your regex must match the entire token to get a match anyway, so anchors are not needed.
Shorthand character classes like \d (or \\d) are also not supported. Instead of \\d\\d, use [0-9]{2}.
In your last attempt, you are using /{regex}/g, which is also not supported. Since your regex needs to match the whole string, the global flag wouldn't even make sense in context. Unless you are using a query parser which uses them to denote a regex, your regex should not be wrapped in slashes.
(By the way: How did this one validate on regex101? You have a bunch of unescaped /s. It complains at me when I try it.)
To support this sort of query on such an analyzed field, you'll probably want to look to span queries, and particularly Span Multiterm and Span Near. Perhaps something like:
{
"span_near" : {
"clauses" : [
{ "span_multi" : {
"match": {
"regexp": {"content": "0[1-9]|[12][0-9]|3[01]"}
}
}},
{ "span_multi" : {
"match": {
"regexp": {"content": "0[1-9]|1[012]"}
}
}},
{ "span_multi" : {
"match": {
"regexp": {"content": "(19|20)[0-9]{2}"}
}
}}
],
"slop" : 0,
"in_order" : true
}
}

For newer elasticsearch versions (tested 8.5).
We can use .keyword in the field. It will match the whole sentence.
{
"size": 10,
"_source": [
"load",
"unload"
],
"query": {
"bool": {
"should": [
{
"regexp": {
"load.keyword": {
"value": ".*Search Term.*",
"flags": "ALL"
}
}
}
]
}
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to split json value in log file using grok/regular expression - regex

Related

ElasticSearch wildcard not returning when value has special characters

ElasticSearch regexp query of a path

Get keys from Json with regex Jmeter

How to using String replace in velocity template (AWS appsync + elasticsearch)?

ElasticSearch and Regex queries

Categories

Resources