logstash log parsing with regex and grok - regex

Hello I have below logs
12-Apr-2021 16:11:41.078 WARNING [https-jsse-nio2-8443-exec-3] org.apache.catalina.realm.LockOutRealm.filterLockedAccounts An attempt was made to authenticate the locked user [uv19nb]
12-Apr-2021 16:01:01.505 FINE [https-jsse-nio2-8443-exec-8] org.apache.catalina.realm.CombinedRealm.authenticate Failed to authenticate user [uv19nb] with realm [org.apache.catalina.realm.JNDIRealm]
12-Apr-2021 17:12:45.289 FINE [https-jsse-nio2-8443-exec-5] org.apache.catalina.authenticator.FormAuthenticator.doAuthenticate Authentication of 'uv19nb' was successful
I am trying to build a pattern for these for logstash.
I have following
%{MY_DATE_PATTERN:timestamp}\s%{WORD:severity}\s\[%{DATA:thread}\]\s%{NOTSPACE:type_log}
which parses below
{
"timestamp": [
"12-Apr-2021 16:01:01.505"
],
"severity": [
"FINE"
],
"thread": [
"https-jsse-nio2-8443-exec-8"
],
"type_log": [
"org.apache.catalina.realm.CombinedRealm.authenticate"
]
}
and i would like to parse log as 2 parts as the bold ones and the user name what would you advise please?
An attempt was made to authenticate the locked user [uv19nb]
Failed to authenticate user [uv19nb] with realm [org.apache.catalina.realm.JNDIRealm]
Authentication of 'uv19nb' was successful
I have tried using (?<action>[^\[]*) and (?<action>[^']*) but they only capture if the next character is either [ or '.
I need some regex/grok pattern to catch all the sentence until any special character I believe and for user name I need to extract numbers and letters from [] and ''.

Provided the MY_DATE_PATTERN works well for you, you can use
%{MY_DATE_PATTERN:timestamp}\s+%{WORD:severity}\s+\[%{DATA:thread}\]\s+%{NOTSPACE:type_log}\s+(?<action>\w(?:[\w\s]*\w)?)
I added \s+(?<action>\w(?:[\w\s]*\w)?):
\s+ - one or more whitespaces
(?<action>\w(?:[\w\s]*\w)?) - Group "action":
\w - a word char followed with
(?:[\w\s]*\w)? - an optional occurrence of zero or more word and whitespace chars and then an obligatory word char.

Related

Logstash log time and date parsing

Hello I have below log
12-Apr-2021 16:11:41.078 WARNING [https-jsse-nio2-8443-exec-3] org.apache.catalina.realm.LockOutRealm.filterLockedAccounts An attempt was made to authenticate the locked user [user1]
I am trying to build a pattern for these for logstash.
I have following
%{MY_DATE_PATTERN:timestamp}\s%{WORD:severity}\s\[%{DATA:thread}\]\s%{NOTSPACE:type_log}
which parses below
{
"timestamp": [
"12-Apr-2021 16:01:01.505"
],
"severity": [
"FINE"
],
"thread": [
"https-jsse-nio2-8443-exec-8"
],
"type_log": [
"org.apache.catalina.realm.CombinedRealm.authenticate"
]
}
My Date stamp is a custom pattern it works with grok debugger but not with the system that i am using so i would need help to get date and time with regex. would anyone help me please?
12-Apr-2021 16:11:41.078 GROK REGEX for this
Instead of %{MY_DATE_PATTERN:timestamp}, you can use
(?<timestamp>%{MONTHDAY}-%{MONTH}-%{YEAR} %{HOUR}:%{MINUTE}:%{SECOND})
Legend:
%{MONTHDAY} - (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
%{MONTH} - \b(?:Jan(?:uary|uar)?|Feb(?:ruary|ruar)?|M(?:a|รค)?r(?:ch|z)?|Apr(?:il)?|Ma(?:y|i)?|Jun(?:e|i)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|O(?:c|k)?t(?:ober)?|Nov(?:ember)?|De(?:c|z)(?:ember)?)\b
%{YEAR} - (?>\d\d){1,2}`
%{HOUR} - (?:2[0123]|[01]?[0-9])
%{MINUTE} - (?:[0-5][0-9])
%{SECOND} - (?:(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?).

Regex and config.json -file

I am building an Angular application and trying to figure out the way to write ngsw-config.json -file in order to define rules for service worker.
I assumed that regex would be recognized as regex in configuration file and not interpret as normal characters / text automatically, but it was not so. I have for example following piece of a code:
"name": "authentication",
"urls": [
"/login",
"/.*authentication.*"
],
part .* is not in my understanding recognized as regex (regex meaning in this case that any path that has text "authentication" would fall into this category, right?). This piece of a configuration tries to prevent service worker to take a lead in these two cases, it works with /login, but not with authentication part.
Question:
Can I somehow modify my file to make it recognize regex definitions?
According to the documentation at https://angular.io/guide/service-worker-config
you can use a limited glob format.
I don't know what kind of url you want to match.
Option: If you want to match a url like /foo/bar/authentication/foo2/bar2 you could use:
"name": "authentication",
"urls": [
"/login",
"/**/authentication/**/*"
],
Option: If you want to match a url like /foo/bar/something-authentication-otherthing/foo2/bar2 you could use:
"name": "authentication",
"urls": [
"/login",
"/**/*authentication*/**/*"
],

grok parse optional field pattern doesn't work

I've got a log like this:
ERROR_MESSAGE:Invalid Credentials,THROTTLED_OUT_REASON:API_LIMIT_EXCEEDED
I'm trying to parse it with grok using grok debugger:
ERROR_MESSAGE:%{GREEDYDATA:errorMassage},THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason}
It works, but sometimes the log comes without THROTTLED_OUT_REASON field.
ERROR_MESSAGE:%{GREEDYDATA:errorMassage}
In that case I tried below code since THROTTLED_OUT_REASON is an optional field.
ERROR_MESSAGE:%{GREEDYDATA:errorMassage}(,THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason})?
So this should work for both cases. The given output for the log with optional field is:
{
"errorMassage": [
[
"Invalid Credentials,THROTTLED_OUT_REASON:API_LIMIT_EXCEEDED"
]
],
"throttledOutReason": [
[
null
]
]
}
But the expected output for the log with optional field:
{
"errorMassage": [
[
"Invalid Credentials"
]
],
"throttledOutReason": [
[
"API_LIMIT_EXCEEDED"
]
]
}
expected output for the log without optional field:
{
"errorMassage": [
[
"Invalid Credentials"
]
],
"throttledOutReason": [
[
null
]
]
}
Can anyone suggest a solution which gives correct output for both type of logs?
Since you use GREEDYDATA it "eats" as much as it can get in order to fill errormessage.
I do not know GROK enough to tell you what alternative defined patterns there are, but you should be able to use a custom pattern:
ERROR_MESSAGE:(?<errorMassage>.*?),THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason}
I got the answer using #Skeeve 's idea.
Here it is for anyone who would come up with a similar question:
I've used custom pattern in order to avoid excess eating of GREEDYDATA (for errorMessage field).
ERROR_MESSAGE:(?<errorMassage>([^,]*)?)(,THROTTLED_OUT_REASON:%{GREEDYDATA:throttledOutReason})?

Regex for json with negative lookahead

I have the following json array:
[
{
"roleId": 128,
"roleName": "B",
"Permissions": []
},
{
"roleId": 310,
"roleName": "ROLE",
"Permissions": [
{
"permissionId": 8074,
"isPermissionActive": true
},
{
"permissionId": 2271,
"isPermissionActive": true
},
{
"permissionId": 8075,
"isPermissionActive": true
},
{
"permissionId": 2275,
"isPermissionActive": true
}
]
},
{
"roleId": 201,
"roleName": "B",
"Permissions": []
},
{
"roleId": 5,
"roleName": "B",
"Permissions": []
}
]
I need to select the roleId of the jsons with non null permissions. I've tried using a negative lookahead, but doesnt seem to work.
Before selecting the roleId, I wanted to get a correct match on the json so I've tried \{(\s*.*?(?!"Permissions":\s*\[\]))*\}, but it seems to match all the jsons and even the inner jsons of the permission array. How should change the look ahead to match properly. I'm currently matching in sublime text 3.
Try the following Regex:
(?<="roleId": )\d*(?=(?:[^{]*"Permissions": \[(?!])))
Explanation:
The first part (?<="roleId": )\d* uses positive lookbehind to find the rollId
The second part (?=(?:[^{]*"Permissions": \[(?!]))) negative lookahead to find non-null permissions, and positive lookahead to find the rollId with the non-null permissions.
See detailed explanation here
While this doesn't directly answer your question, I would like to point out that it might be more natural to process JSON with JavaScript or with one of many languages that can easily convert JSON to native types. Alternatively, you could use a JSON processing language such as jq:
jq '
map( select(.Permissions != []) | .roleId )
'
which produces an array of role IDs with non-empty permissions.
A lookbehind to check if the number is a "roleId".
And a lookahead to check if the "Permissions" have a mustache.
(?<="roleId": )\d+(?=[^\}]*?"Permissions":\s*\[\s*\{)
Test it here

Regex for matching repeating k/v pairs plus trailing string in logstash

I need to write a bit of regex that is a bit over my head. The goal here is to parse the following type of log lines inside a logstash filter:
severity=I time=2017-02-23T10:04:31Z [SKYLIGHT] [0.5.1] Unable to start
severity=I time=2017-02-23T10:04:31Z adapter=redis adapter_host=1.1.1.1 Cache read: /model/reference/6235290d29a17a935f4d3d72d2e0a903750dd54b
severity=I time=2017-02-23T10:04:31Z remote_ip=1.1.1.1 uuid=daa8090d method=GET path=/somepath.json format=json controller=app action=index status=200 duration=30.47 view=10.04
severity=D time=2017-02-23T10:04:31Z remote_ip=1.1.1.1 uuid=daa8090d SOLR Request (18.3ms) [path=/admin/luke parameters={numTerms: 0}]
Essentially the output format is a set of arbitrary k=v pairs, followed by an occasional "raw message". Just using the logstash k/v filter directly produces undesired behavior since the trailing "message" can have k=v formats nested inside of it - such as path=/admin/luke in the final log line above. My working plan is to capture log into two parts, the k/v pairs as a string, and the trailing message, at which point the k/v string could be sent into the normal logstash kv filter. So for instance, the final log line would produce two groups:
severity=D time=2017-02-23T10:04:31Z remote_ip=1.1.1.1 uuid=daa8090d
SOLR Request (18.3ms) [path=/admin/luke parameters={numTerms: 0}]
With the end goal of the log document to be:
[
{
"severity": "I",
"time": "2017-02-23T10:04:31Z",
"message": "[SKYLIGHT] [0.5.1] Unable to start"
},
{
"severity": "I",
"time": "2017-02-23T10:04:31Z"
"adapter": "redis",
"adapter_host": "1.1.1.1",
"message": "Cache read: /model/reference/6235290d29a17a935f4d3d72d2e0a903750dd54b"
},
{
"severity": "I",
"time": "2017-02-23T10:04:31Z",
"message": "[SKYLIGHT] [0.5.1] Unable to start"
},
{
"severity": "I",
"time": "2017-02-23T10:04:31Z",
"remote_ip": "1.1.1.1",
"uuid": "daa8090d",
"method": "GET",
"path": "/somepath.json",
"format": "json",
"controller": "app",
"action": "index",
"status": "200",
"duration": "30.47",
"view": "10.04"
},
{
"severity": "D",
"time": "2017-02-23T10:04:31Z",
"remote_ip": "1.1.1.1",
"uuid": "daa8090d",
"message": "SOLR Request (18.3ms) [path=/admin/luke parameters={numTerms: 0}]"
}
]
Thank you!
For each row use the following regular expression:
(?:([^ =]+)=([^ =]+) ?)|(.+)
Explanation:
(?: - "External", non-capturing group (xxxx=yyyy).
([^ =]+) - First capturing group (xxxx).
= - Equals sign (between xxxx and yyyy).
([^ =]+) - Second capturing group (yyyy).
? - A space (may occur).
) - End of the "external" group.
| - Separator between variants.
(.+) - Second variant - third capturing group, any non-empty sequence of chars.
Note that regex processor initially tries the 1st variant (before the |),
capturing xxxx=yyyy pairs.
Then, if the 1st variant failed (after all xxxx=yyyy pairs),
the 2nd variant is tried, capturing the message (if any).
I tried this regex using an online verifier (regex101.com) for each your input row.
E.g. for the last row
(severity=D time=2017-02-23T10:04:31Z remote_ip=1.1.1.1 uuid=daa8090d SOLR Request (18.3ms) [path=/admin/luke parameters={numTerms: 0})
I got the following results:
Match 1
Full match 0-11 `severity=D `
Group 1. 0-8 `severity`
Group 2. 9-10 `D`
Match 2
Full match 11-37 `time=2017-02-23T10:04:31Z `
Group 1. 11-15 `time`
Group 2. 16-36 `2017-02-23T10:04:31Z`
Match 3
Full match 37-55 `remote_ip=1.1.1.1 `
Group 1. 37-46 `remote_ip`
Group 2. 47-54 `1.1.1.1`
Match 4
Full match 55-69 `uuid=daa8090d `
Group 1. 55-59 `uuid`
Group 2. 60-68 `daa8090d`
Match 5
Full match 69-133 `SOLR Request (18.3ms) [path=/admin/luke parameters={numTerms: 0}`
Group 3. 69-133 `SOLR Request (18.3ms) [path=/admin/luke parameters={numTerms: 0}`
Note that in case of matches No 1 to 4, groups 1 and 2 were found.
But for the last match, group 3 was found.
So, processing each match, you have to check:
If group 1 is not empty, then group 2 is also not empty
and they contain k and v.
Otherwise, group 3 holds the content of the message.