Regex Derived Fields - regex

I'm trying to do Regex for the below log line and put into the Derived fields in Grafana Loki data source.
2022-03-03 11:59:18,008 ERROR XXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXX XXXXXXXXXXXXXXXXXX
I cover like this:
Date - ([0-9]{4}-[0-9]{2}-[0-9]{2})
Time - ([0-9]{2}:[0-9]{2}:[0-9]{2}?,[0-9]{3})
LogLevel - ((TRACE|DEBUG|INFO|NOTICE|WARN|WARNING|ERROR|SEVERE|FATAL)`
And rest after the ERROR should be a message but i can not manage to do this.
I tried most of the negation regexes but did not get final result to leave the right side after the ERROR as Message variable.
I expect that Message will contain everything after the ERROR so XXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXX XXXXXXXXXXXXXXXXXX

Related

Create a cakephp filter for fail2ban

i would like to create a filter in fail2ban for searching and blocking bad request like "Controller class * could not be found."
For this problem i was create a cakephp.conf file in the filter.d directory in fail2ban. The Content:
[Definition]
failregex = ^[0-9]{4}\-[0-9]{2}\-[0-9]{2}.*Error:.*\nStack Trace:\n(\-.*|\n)*\n.*\n.*\nClient IP: <HOST>\n$
ignoreregex =
My example error log looks like this:
...
2020-10-08 19:59:46 Error: [Cake\Http\Exception\MissingControllerException] Controller class Webfig could not be found. in /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Controller/ControllerFactory.php on line 158
Stack Trace:
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Controller/ControllerFactory.php:46
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/BaseApplication.php:249
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:77
- /home/myapplication/htdocs/vendor/cakephp/authentication/src/Middleware/AuthenticationMiddleware.php:122
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:77
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Middleware/CsrfProtectionMiddleware.php:146
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:58
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Routing/Middleware/RoutingMiddleware.php:172
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Routing/Middleware/AssetMiddleware.php:68
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Error/Middleware/ErrorHandlerMiddleware.php:121
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:58
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Server.php:90
- /home/myapplication/htdocs/webroot/index.php:40
Request URL: /webfig/
Referer URL: http://X.X.X.X/webfig/
Client IP: X.X.X.X
...
X.X.X.X are replaced
But i can't match any ip adresses. The fail2ban tester says:
root#test:~# fail2ban-regex /home/myapplication/htdocs/logs/error.log /etc/fail2ban/filter.d/cakephp.conf
Running tests
=============
Use failregex filter file : cakephp, basedir: /etc/fail2ban
Use log file : /home/myapplication/htdocs/logs/error.log
Use encoding : UTF-8
Results
=======
Failregex: 0 total
Ignoreregex: 0 total
Date template hits:
|- [# of hits] date format
| [719] {^LN-BEG}ExYear(?P<_sep>[-/.])Month(?P=_sep)Day(?:T| ?)24hour:Minute:Second(?:[.,]Microseconds)?(?:\s*Zone offset)?
`-
Lines: 15447 lines, 0 ignored, 0 matched, 15447 missed
[processed in 10.02 sec]
Missed line(s): too many to print. Use --print-all-missed to print all 15447 lines
i can't see any problems. Can you help me? :)
Thanks
The issue is your log is poor suitable to parse - it is a multiline log-file (IP takes place in other line as the failure message).
Let alone the line with IP does not has any ID (common information with line of failure), it can be still worse if several messages are crossing (so Client IP from other message that is not a failure, coming after failure message).
If you can change the log-format better do that (so date, IP and failure sign are in the same line), e.g. if you use nginx, organize a conditional logging for access log from php-location in error case like this.
See Fail2ban :: wiki :: Best practice for more info.
If you cannot do that (well better would be to change it), you can use multi-line buffering and parsing using maxlines parameter and <SKIPLINES> regex.
Your filter would be something like that:
[Definition]
# we ignore stack trace, so don't need to hold buffer window too large,
# 5 would be enough, but to be sure (if some log-messages crossing):
maxlines = 10
ignoreregex = ^(?:Stack |- /)
failregex = ^\s+Error: \[[^\]]+\] Controller class \S+ could not be found\..*<SKIPLINES>^((?:Request|Referer) URL:.*<SKIPLINES>)*^Client IP: <HOST>
To test it directly use:
fail2ban-regex --maxlines=5 /path/to/log '^\s+Error: \[[^\]]+\] Controller class \S+ could not be found\..*<SKIPLINES>^((?:Request|Referer) URL:.*<SKIPLINES>)*^Client IP: <HOST>' '^(?:Stack |- /)'
But as already said, it is really ugly - better you find the way to log everything in a single line.

fluentd regexp to extract events from a log file

I'm new to fluentd.
I have a log that I want to push to AWS with fluentd but I can't figure out what the regexp should be.
All the log lines, except the multilines, start with a UUID.
Here's a sample log:
6b0815f2-8ff1-4181-a4e6-058148288281 2020-11-03 13:00:05.976366 [DEBUG] switch_core_state_machine.c:611 (some_other_data) State Change CS_REPORTING -> CS_DESTROY
And, I'm trying to get UUID, DateTime, and Message.
With this regex:
/^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>.*) (?<message>[^ ]*)/gm
I'm getting the last word CS_DESTROY.
I tried fluentular and still got:
text:
f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914 2020-11-03 14:32:34.975779 [CRIT] mod_dptools.c:1866 audio3: https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg
regexp:
^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>.*) (?<message>[^ ]*)$
and got:
time 2020/11/03 14:32:34 +0000
UUID f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914
message https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg
It's missing what's between the datetime and "https".
Try:
^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>[^\[]*) (?<message>\[.*)$
Live at rubular: https://rubular.com/r/JQQXs5VTkr2IxM
Here's the output for both logs:
Match 1
UUID 6b0815f2-8ff1-4181-a4e6-058148288281
time 2020-11-03 13:00:05.976366
message [DEBUG] switch_core_state_machine.c:611 (some_other_data) State Change CS_REPORTING -> CS_DESTROY
Match 2
UUID f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914
time 2020-11-03 14:32:34.975779
message [CRIT] mod_dptools.c:1866 audio3: https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg

JAVAMETHOD grok pattern with optional thread number at the end

I'm trying to parse log4j messages:
2019-12-02 20:48:20.198utc DEBUG UnknownElementContentHandler,streamLock-9-th-11:32 - blabla
2019-11-19 23:40:04.014utc WARN AnnotationBinder,localhost-startStop-1:611 - blabla
2019-11-19 23:40:04.014utc INFO CovImCtl,main:109 - blabla
with grok pattern
%{TIMESTAMP_ISO8601:timestamp}utc%{SPACE}%{LOGLEVEL:level}%{SPACE}%{JAVACLASS:class},%{JAVAMETHOD1:method}:%{POSINT:lineno}%{SPACE}-%{SPACE}%{GREEDYDATA:message}
with using a variation on the standard:
JAVAMETHOD (?:(<(?:cl)?init>)|[a-zA-Z$_][a-zA-Z$_0-9]*)
JAVAMETHOD1 (?:(<(?:cl)?init>)|[a-zA-Z$_][a-zA-Z$_\-0-9]*)
The JAVAMETHOD worked for "main" but not for the others, (the pattern was missing -).
JAVAMETHOD1 works, but I need to get the optional trailing integer retrieved as a "thread_no" field (11 from streamLock-9-th-11, 1 from localhost-startStop-1)
I'm wrecking my brain, the methods like streamLock-9-th-11 has the internal "-\d+" "-9" which belongs to "streamLock-9-th"
Any ideas?

grok regex parsing not matching a log. when specifying a group as optional, but not the last group

Example:
info: 2014-10-28T22:39:46.593Z - info: an error occurred while trying
to handle command: PlaceMarketOrderCommand, xkkdAAGRIl. Error:
Insufficient Cash #userId=5 #orderId=Y5545
pattern:
> %{LOGLEVEL:stream_level}: %{TIMESTAMP_ISO8601:timestamp} -
> %{LOGLEVEL:log_level}: %{MESSAGE:message}
> (#userId=%{USER_ID:user_id})? (#orderId=%{ORDER_ID:order_id})?
extra patterns used:
USER_ID (\d+|None)
ORDER_ID .*
ORDER_ID_HASH \s*(#orderId=%{ORDER_ID:order_id})?
USER_ID_HASH \s*(#userId=%{USER_ID:user_id})?
MESSAGE (.*?)
Works fine:
removing the optional last orderId also works
info: 2014-10-28T22:39:46.593Z - info: an error occurred while trying
to handle command: PlaceMarketOrderCommand, xkkdAAGRIl. Error:
Insufficient Cash #userId=5
but if I keep the orderId and remove the userId then I get a "no match"
info: 2014-10-28T22:39:46.593Z - info: an error occurred while trying
to handle command: PlaceMarketOrderCommand, xkkdAAGRIl. Error:
Insufficient Cash #orderId=Y5545
Also the user_id group is ending with a ? as an optional group..
working with the grok debugger in heroku:
Is this a bug? (logstash 1.4.2) missing something with the regex? (more probable.. but what?)
I looked at the regex lib grok is using and looks this syntax supposed to work. It does work for the last group (orderId) but not for the one before..
Thanks for the help!
You are forcing a space to be before your optional last... you need to do ?:
%{LOGLEVEL:stream_level}: %{TIMESTAMP_ISO8601:timestamp} -> %{LOGLEVEL:log_level}: %{MESSAGE:message} ?(#userId=%{USER_ID:user_id})? ?(#orderId=%{ORDER_ID:order_id})?

Issues filtering some events in splunk on an indexer

I'm working with Splunk version 5.0.1.
I want to filter the logs being indexed.
I've added these lines to transforms.conf file:
[setparsing]
REGEX = log: myCompany|\[CRIT\]|\[ERR\]
DEST_KEY = queue
FORMAT = indexQueue
So what I want is to index all the log entries that have one of these strings.
But for some reason only logs with these entries are indexed:
log: myCompany
while log entries with the string "[CRIT]" or "[ERR]" aren't indexed.
What am I missing? Is there something wrong with the regex, because I checked many Perl examples. And that's how we write a regex for log: myCompany or [CRIT] or [ERR].