Add additional field in fluentd - regex

I have a messesae as below
{"log":"kubernetes.var.log.dev-2019-12-24.log\u0009{\"msg\":\"[2019-12-24 10:34:58] app.ERROR: [ApiExceptionHandler:onKernelException]: default not match= exception is Symfony\\\\Component\\\\HttpKernel\\\\Exception\\\\NotFoundHttpException [] []\"}\n","stream":"stdout","time":"2019-12-24T10:34:58.295814385Z"}
Now I want to split it in 4 parts:
file_name: kubernetes.var.log.dev-2019-12-24.log
time: 2019-12-24 10:34:58
messeage_type: app.ERROR
msg: all remainding messeage
In fluentd configuration, I set a regex like:
<parse>
#type "regexp"
expression [(?<time>.+)] (?<kind>.*ERROR|.*INFO): (?<msg>.*)$
</parse>
but it not work as expect, it show a warning
2019-12-27 02:34:30 +0000 [warn]: [fluentd-containers.log] pattern not match: "{\"log\":\"kubernetes.var.log.dev-2019-12-27.log\\u0009{\\\"msg\\\":\\\"[2019-12-27 02:34:27] security.INFO: Populated the TokenStorage with an anonymous Token. [] []\\\"}\\n\",\"stream\":\"stdout\",\"time\":\"2019-12-27T02:34:30.699454425Z\"}"
2019-12-27 02:34:30 +0000 [warn]: [fluentd-containers.log] pattern not match: "{\"log\":\"kubernetes.var.log.dev-2019-12-27.log\\u0009{\\\"msg\\\":\\\"[2019-12-27 02:34:27] app.INFO: [UserCtrl:Login]: request_data= {\\\\\\\"email\\\\\\\":\\\\\\\"tui#gmail.com\\\\\\\",\\\\\\\"password\\\\\\\":\\\\\\\"asfasfd\\\\\\\"} [] []\\\"}\\n\",\"stream\":\"stdout\",\"time\":\"2019-12-27T02:34:30.699458964Z\"}"
I think there is something wrong in regex, please advise me how to fix it

You need to escape [ and ]:
expression \[(?<time>.+)\] (?<kind>.*ERROR|.*INFO): (?<msg>.*)$

Related

Sending logs from fluentd to splunk

I am using log4j , so have different formats of logs. I am able to send most of the logs using the below multiline format from fluentd to splunk, but few of them behave differently(The logs with different date format).
<source>
#type tail
path /tmp/LOG_SPLUNK.*
pos_file /tmp/my-splunk.pos
path_key log_type
read_from_head true
tag "splunk.#log.mylogs"
format multiline
format_firstline /^\[/
format1 /\[(?<timestamp>[^ ]* [^ ]*)\] (?<level>[^ ]*) (?<message>.*)/
time_type string
time_key timestamp
time_format %Y-%m-%d %H:%M:%S,%N
keep_time_key true
</source>
Below are logs formats:
[2022-04-13 06:27:08,340] INFO Loading plugin from: /my/path (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
java.lang.NullPointerException
at java.util.Properties$LineReader.readLine(Properties.java:434)
at java.util.Properties.load0(Properties.java:353)
at java.util.Properties.load(Properties.java:341)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:380)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:385)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.getPluginDesc(DelegatingClassLoader.java:355)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:328)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:261)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:253)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initPluginLoader(DelegatingClassLoader.java:222)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:199)
at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.java:60)
at org.apache.kafka.connect.cli.ConnectDistributed.startConnect(ConnectDistributed.java:91)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:78)
[2022-04-13 06:27:09,520] INFO Registered loader: PluginClassLoader{pluginLocation=file:/my/path/} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
Apr 13, 2022 6:27:17 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be ignored.
I am able to send all the above formats to splunk, but some behave differently. Is there any format using which i will be able to handle all. If i got a pattern not match error i could have included a format, but I don't
Try this.
[(?[^ ]* [^ ])] (?[^ ]) (?[\W\w]+)
.* stops at a new line . [\W\w]+ will capture your whole stack trace in the message field.

Create a cakephp filter for fail2ban

i would like to create a filter in fail2ban for searching and blocking bad request like "Controller class * could not be found."
For this problem i was create a cakephp.conf file in the filter.d directory in fail2ban. The Content:
[Definition]
failregex = ^[0-9]{4}\-[0-9]{2}\-[0-9]{2}.*Error:.*\nStack Trace:\n(\-.*|\n)*\n.*\n.*\nClient IP: <HOST>\n$
ignoreregex =
My example error log looks like this:
...
2020-10-08 19:59:46 Error: [Cake\Http\Exception\MissingControllerException] Controller class Webfig could not be found. in /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Controller/ControllerFactory.php on line 158
Stack Trace:
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Controller/ControllerFactory.php:46
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/BaseApplication.php:249
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:77
- /home/myapplication/htdocs/vendor/cakephp/authentication/src/Middleware/AuthenticationMiddleware.php:122
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:77
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Middleware/CsrfProtectionMiddleware.php:146
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:58
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Routing/Middleware/RoutingMiddleware.php:172
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Routing/Middleware/AssetMiddleware.php:68
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Error/Middleware/ErrorHandlerMiddleware.php:121
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:73
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Runner.php:58
- /home/myapplication/htdocs/vendor/cakephp/cakephp/src/Http/Server.php:90
- /home/myapplication/htdocs/webroot/index.php:40
Request URL: /webfig/
Referer URL: http://X.X.X.X/webfig/
Client IP: X.X.X.X
...
X.X.X.X are replaced
But i can't match any ip adresses. The fail2ban tester says:
root#test:~# fail2ban-regex /home/myapplication/htdocs/logs/error.log /etc/fail2ban/filter.d/cakephp.conf
Running tests
=============
Use failregex filter file : cakephp, basedir: /etc/fail2ban
Use log file : /home/myapplication/htdocs/logs/error.log
Use encoding : UTF-8
Results
=======
Failregex: 0 total
Ignoreregex: 0 total
Date template hits:
|- [# of hits] date format
| [719] {^LN-BEG}ExYear(?P<_sep>[-/.])Month(?P=_sep)Day(?:T| ?)24hour:Minute:Second(?:[.,]Microseconds)?(?:\s*Zone offset)?
`-
Lines: 15447 lines, 0 ignored, 0 matched, 15447 missed
[processed in 10.02 sec]
Missed line(s): too many to print. Use --print-all-missed to print all 15447 lines
i can't see any problems. Can you help me? :)
Thanks
The issue is your log is poor suitable to parse - it is a multiline log-file (IP takes place in other line as the failure message).
Let alone the line with IP does not has any ID (common information with line of failure), it can be still worse if several messages are crossing (so Client IP from other message that is not a failure, coming after failure message).
If you can change the log-format better do that (so date, IP and failure sign are in the same line), e.g. if you use nginx, organize a conditional logging for access log from php-location in error case like this.
See Fail2ban :: wiki :: Best practice for more info.
If you cannot do that (well better would be to change it), you can use multi-line buffering and parsing using maxlines parameter and <SKIPLINES> regex.
Your filter would be something like that:
[Definition]
# we ignore stack trace, so don't need to hold buffer window too large,
# 5 would be enough, but to be sure (if some log-messages crossing):
maxlines = 10
ignoreregex = ^(?:Stack |- /)
failregex = ^\s+Error: \[[^\]]+\] Controller class \S+ could not be found\..*<SKIPLINES>^((?:Request|Referer) URL:.*<SKIPLINES>)*^Client IP: <HOST>
To test it directly use:
fail2ban-regex --maxlines=5 /path/to/log '^\s+Error: \[[^\]]+\] Controller class \S+ could not be found\..*<SKIPLINES>^((?:Request|Referer) URL:.*<SKIPLINES>)*^Client IP: <HOST>' '^(?:Stack |- /)'
But as already said, it is really ugly - better you find the way to log everything in a single line.

fluentd regexp to extract events from a log file

I'm new to fluentd.
I have a log that I want to push to AWS with fluentd but I can't figure out what the regexp should be.
All the log lines, except the multilines, start with a UUID.
Here's a sample log:
6b0815f2-8ff1-4181-a4e6-058148288281 2020-11-03 13:00:05.976366 [DEBUG] switch_core_state_machine.c:611 (some_other_data) State Change CS_REPORTING -> CS_DESTROY
And, I'm trying to get UUID, DateTime, and Message.
With this regex:
/^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>.*) (?<message>[^ ]*)/gm
I'm getting the last word CS_DESTROY.
I tried fluentular and still got:
text:
f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914 2020-11-03 14:32:34.975779 [CRIT] mod_dptools.c:1866 audio3: https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg
regexp:
^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>.*) (?<message>[^ ]*)$
and got:
time 2020/11/03 14:32:34 +0000
UUID f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914
message https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg
It's missing what's between the datetime and "https".
Try:
^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>[^\[]*) (?<message>\[.*)$
Live at rubular: https://rubular.com/r/JQQXs5VTkr2IxM
Here's the output for both logs:
Match 1
UUID 6b0815f2-8ff1-4181-a4e6-058148288281
time 2020-11-03 13:00:05.976366
message [DEBUG] switch_core_state_machine.c:611 (some_other_data) State Change CS_REPORTING -> CS_DESTROY
Match 2
UUID f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914
time 2020-11-03 14:32:34.975779
message [CRIT] mod_dptools.c:1866 audio3: https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg

Grok Filter for Confluence Logs

I am trying to write a Grok expression to parse Confluence logs and I am partially successful.
My Current Grok pattern is :
%{TIMESTAMP_ISO8601:conflog_timestamp} %{LOGLEVEL:conflog_severity} \[%{APPNAME:conflog_ModuleName}\] \[%{DATA:conflog_classname}\] (?<conflog_message>(.|\r|\n)*)
APPNAME [a-zA-Z0-9\.\#\-\+_%\:]+
And I am able to parse the below log line :
Log line 1:
2020-06-14 10:44:01,575 INFO [Caesium-1-1] [directory.ldap.cache.AbstractCacheRefresher] synchroniseAllGroupAttributes finished group attribute sync with 0 failures in [ 2030ms ]
However I do have other log lines such as :
Log line 2:
2020-06-15 09:24:32,068 WARN [https-jsse-nio2-8443-exec-13] [atlassian.confluence.pages.DefaultAttachmentManager] getAttachmentData Could not find data for attachment:
-- referer: https://confluence.jira.com/index.action | url: /download/attachments/393217/global.logo | traceId: 2a0bfc77cad7c107 | userName: abcd
and Log Line 3 :
2020-06-12 01:19:03,034 WARN [https-jsse-nio2-8443-exec-6] [atlassian.seraph.auth.DefaultAuthenticator] login login : 'ABC' tried to login but they do not have USE permission or weren't found. Deleting remember me cookie.
-- referer: https://confluence.jira.com/login.action?os_destination=%2Findex.action&permissionViolation=true | url: /dologin.action | traceId: 8744d267e1e6fcc9
Here the params "userName" , "referer", "url" and "traceId" may or maynot be present in the Log line.
I can write concrete grok expressions for each of these. Instead can we handle all these in the same grok expression ?
In shorts - Match all log lines..
If log line has "referer" param store it in a variable. If not, proceed to match rest of the params.
If log line has "url" param store it, if not try to match rest of the params.
Repeat for 'traceId' and 'userName'
Thank you..

Fluentd Regular Expression Matching Error

I am trying to parse the logs from kubernetes like this for example
2018-08-14 13:21:20.013 [INFO][67] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
And this is the configuration
<source>
#id calico-node.log
#type tail
format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/
time_format %Y-%m-%d %H:%M:%S
path /var/log/containers/calico-node**.log
pos_file /var/log/es-calico.pos
tag calico-node
</source>
According to regex101.com, this pattern should match this string. However, I get an error from fluentd while trying to parse this
2018-08-14 13:21:20 +0000 [warn]: [calico-node.log] "{\"log\":\"2018-08-14 13:21:20.013 [INFO][67] health.go 150: Overall health summary=\\u0026health.HealthReport{Live:true, Ready:true}\\n\",\"stream\":\"stdout\",\"time\":\"2018-08-14T13:21:20.013908223Z\"}" error="invalid time format: value = {\"log\":\"2018-08-14 13:21:20.013, error_class = ArgumentError, error = string doesn't match"```
What could be wrong? I have had similar errors with the built-in parser for apache logs as well?
From what I can see, you are missing something in the fluentd config.
Your time_format %Y-%m-%d %H:%M:%S will not work with the timestamp 2018-08-14 13:21:20.013, as it's missing .%3N.
It should be as follows:
time_format %Y-%m-%d %H:%M:%S.%3N or time_format %Y-%m-%d %H:%M:%S.%L
Just faced a similar issue.
I think the #Crou's answer is correct but maybe try %N instead.
according to the document, fluentd parser does not support %3N, %6N, %9N, and %L
https://docs.fluentd.org/configuration/parse-section