grok parse optional field pattern doesn't work - regex

I've got a log like this:
ERROR_MESSAGE:Invalid Credentials,THROTTLED_OUT_REASON:API_LIMIT_EXCEEDED
I'm trying to parse it with grok using grok debugger:
ERROR_MESSAGE:%{GREEDYDATA:errorMassage},THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason}
It works, but sometimes the log comes without THROTTLED_OUT_REASON field.
ERROR_MESSAGE:%{GREEDYDATA:errorMassage}
In that case I tried below code since THROTTLED_OUT_REASON is an optional field.
ERROR_MESSAGE:%{GREEDYDATA:errorMassage}(,THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason})?
So this should work for both cases. The given output for the log with optional field is:
{
"errorMassage": [
[
"Invalid Credentials,THROTTLED_OUT_REASON:API_LIMIT_EXCEEDED"
]
],
"throttledOutReason": [
[
null
]
]
}
But the expected output for the log with optional field:
{
"errorMassage": [
[
"Invalid Credentials"
]
],
"throttledOutReason": [
[
"API_LIMIT_EXCEEDED"
]
]
}
expected output for the log without optional field:
{
"errorMassage": [
[
"Invalid Credentials"
]
],
"throttledOutReason": [
[
null
]
]
}
Can anyone suggest a solution which gives correct output for both type of logs?

Since you use GREEDYDATA it "eats" as much as it can get in order to fill errormessage.
I do not know GROK enough to tell you what alternative defined patterns there are, but you should be able to use a custom pattern:
ERROR_MESSAGE:(?<errorMassage>.*?),THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason}

I got the answer using #Skeeve 's idea.
Here it is for anyone who would come up with a similar question:
I've used custom pattern in order to avoid excess eating of GREEDYDATA (for errorMessage field).
ERROR_MESSAGE:(?<errorMassage>([^,]*)?)(,THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason})?

Related

Logstash log time and date parsing

Hello I have below log
12-Apr-2021 16:11:41.078 WARNING [https-jsse-nio2-8443-exec-3] org.apache.catalina.realm.LockOutRealm.filterLockedAccounts An attempt was made to authenticate the locked user [user1]
I am trying to build a pattern for these for logstash.
I have following
%{MY_DATE_PATTERN:timestamp}\s%{WORD:severity}\s\[%{DATA:thread}\]\s%{NOTSPACE:type_log}
which parses below
{
"timestamp": [
"12-Apr-2021 16:01:01.505"
],
"severity": [
"FINE"
],
"thread": [
"https-jsse-nio2-8443-exec-8"
],
"type_log": [
"org.apache.catalina.realm.CombinedRealm.authenticate"
]
}
My Date stamp is a custom pattern it works with grok debugger but not with the system that i am using so i would need help to get date and time with regex. would anyone help me please?
12-Apr-2021 16:11:41.078 GROK REGEX for this
Instead of %{MY_DATE_PATTERN:timestamp}, you can use
(?<timestamp>%{MONTHDAY}-%{MONTH}-%{YEAR} %{HOUR}:%{MINUTE}:%{SECOND})
Legend:
%{MONTHDAY} - (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
%{MONTH} - \b(?:Jan(?:uary|uar)?|Feb(?:ruary|ruar)?|M(?:a|รค)?r(?:ch|z)?|Apr(?:il)?|Ma(?:y|i)?|Jun(?:e|i)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|O(?:c|k)?t(?:ober)?|Nov(?:ember)?|De(?:c|z)(?:ember)?)\b
%{YEAR} - (?>\d\d){1,2}`
%{HOUR} - (?:2[0123]|[01]?[0-9])
%{MINUTE} - (?:[0-5][0-9])
%{SECOND} - (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?).

Regex and config.json -file

I am building an Angular application and trying to figure out the way to write ngsw-config.json -file in order to define rules for service worker.
I assumed that regex would be recognized as regex in configuration file and not interpret as normal characters / text automatically, but it was not so. I have for example following piece of a code:
"name": "authentication",
"urls": [
"/login",
"/.*authentication.*"
],
part .* is not in my understanding recognized as regex (regex meaning in this case that any path that has text "authentication" would fall into this category, right?). This piece of a configuration tries to prevent service worker to take a lead in these two cases, it works with /login, but not with authentication part.
Question:
Can I somehow modify my file to make it recognize regex definitions?
According to the documentation at https://angular.io/guide/service-worker-config
you can use a limited glob format.
I don't know what kind of url you want to match.
Option: If you want to match a url like /foo/bar/authentication/foo2/bar2 you could use:
"name": "authentication",
"urls": [
"/login",
"/**/authentication/**/*"
],
Option: If you want to match a url like /foo/bar/something-authentication-otherthing/foo2/bar2 you could use:
"name": "authentication",
"urls": [
"/login",
"/**/*authentication*/**/*"
],

Regex In body of API test

I'm testing API with https://cloud.google.com/datastore/docs/reference/data/rest/v1/projects/lookup
The following brings a found result with data. I would like to use a regular expression with bring back all records with name having the number 100867. All my attempts result wit a missing result set.
i.e. change to "name": "/1000867.*/"
{
"keys": [
{
"path": [
{
"kind": "Job",
"name": "1000867:100071805:1"
}
]
}
]
}
The Google documentation for lookup key states that the name is a "string" and that
The name of the entity. A name matching regex __.*__ is reserved/read-only. A name must not be more than 1500 bytes when UTF-8 encoded. Cannot be "".
The regex part threw me off and the solution was to use runQuery!
Consider this closed.

Regex working with calculator but not with AWS lambda function

Trouble with regex and gather all data between [ and ].
Testing with the program: http://regexr.com/
String data
{
"Items": [
{
"UserID": "1487840267893246",
"Timestamp": 1487204364877,
},
{
"UserID": "1487840267893336",
"Timestamp": 1487204364888,
}
],
"Count": 2,
"ScannedCount": 3
}
The below (fired in AWS lambda) has the intention of pulling all chars between the [ and ] and outputting it. (\[[^]*\]) works with the regex calc above, but only returns "undefined" in Lambda. Why?
Items = data.match(/"(\[[^]*\])"/);
console.log(Items);
An alternative solution was to extract the data into an array as follows
userID = data.match(/"UserID":"([^"]+)"/g);
console.log(userID);
Try the dotall flag:
Items = data.match(/"(?s)\[.*\]");
And you didn't need those brackets.

Regex match if exist or ignore

i'm with a throble in regex
Input example:
/aaaa/admin.php?file=xpto.js&version=abcd123
/aaaa/admin.php
Output 1 -
url => /aaaa/admin.php
var => file=xpto.js&version=abcd123
Output 2 -
url => /aaaa/admin.php
i tried %{NOTSPACE:url}(?:/?%{NOTSPACE:var}) and a others but not worked
You may use
%{URIPATH:path}(?:%{URIPARAM:param})?
The patterns are provided at https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns.
The %{URIPATH:path} will match the path, while (?:%{URIPARAM:param})? will match 1 or 0 occurrences (due to the optional non-capturing group (?:...)?) of the query string.
If you need to get rid of the ? in the param, you may also use
(?<path>(?:/[A-Za-z0-9$.+!*'(){},~:;=##%_-]*)+)(?:\?(?<param>[A-Za-z0-9$.+!*'|(){},~##%&/=:;_?\[\]-]*))?
The output for the /aaaa/admin.php?file=xpto.js&version=abcd123 input:
{
"path": [
[
"/aaaa/admin.php"
]
],
"param": [
[
"?file=xpto.js&version=abcd123"
]
]
}
The output for /aaaa/admin.php:
{
"path": [
[
"/aaaa/admin.php"
]
],
"param": [
[
null
]
]
}
Try this regex :
(%{NOTSPACE:url})(?:\?(%{NOTSPACE:var}))?
Demo : http://regexr.com/3f6sm
Is this what your looking for?
([^\s?]+)(?:\?(\S+))?
You can test it here.
Also, you could just split the url string on ?