FluentD datetime format doesn't match - regex

This is my FluentD parser config:
<filter format.3.**>
#type parser
format /^\[(?<module>[^\]]+)\] (?<time>.+): (?<msg>.*)$/
time_format %Y-%m-%d %H:%M:%S
key_name log
keep_time_key true
reserve_data false
</filter>
And that's an example log line:
[Macaron] 2017-04-26 16:54:26: Started GET / for 172.20.0.0
In the FluentD error log I'm getting:
2017-04-27 12:01:58 +0000 [warn]: plugin/filter_parser.rb:69:rescue in block in filter_stream: invalid time format: value = 2017-04-27 12:01:58, error_class = ArgumentError, error = invalid strptime format - `%Y-%m-%d %H:%M:%S'
The important part is this:
invalid time format: value = 2017-04-27 12:01:58, error_class = ArgumentError, error = invalid strptime format - `%Y-%m-%d %H:%M:%S'
But I can't see how %Y-%m-%d %H:%M:%S doesn't match 2017-04-27 12:01:58, or why this format would be invalid.
according to this tool it should match

I figured it out, there were some special characters to set colors in the log. Apparently when copy-pasting them into fluentular.herokuapp.com they did not get copied and that's why it worked there:
{"log":"[Macaron] \u001b[1;32m2017-04-27 12:34:07: Completed /node 200 OK in 1.163351ms\u001b[0m\n","stream":"stdout","time":"2017-04-27T12:34:07.953993591Z"}

Related

Sending logs from fluentd to splunk

I am using log4j , so have different formats of logs. I am able to send most of the logs using the below multiline format from fluentd to splunk, but few of them behave differently(The logs with different date format).
<source>
#type tail
path /tmp/LOG_SPLUNK.*
pos_file /tmp/my-splunk.pos
path_key log_type
read_from_head true
tag "splunk.#log.mylogs"
format multiline
format_firstline /^\[/
format1 /\[(?<timestamp>[^ ]* [^ ]*)\] (?<level>[^ ]*) (?<message>.*)/
time_type string
time_key timestamp
time_format %Y-%m-%d %H:%M:%S,%N
keep_time_key true
</source>
Below are logs formats:
[2022-04-13 06:27:08,340] INFO Loading plugin from: /my/path (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
java.lang.NullPointerException
at java.util.Properties$LineReader.readLine(Properties.java:434)
at java.util.Properties.load0(Properties.java:353)
at java.util.Properties.load(Properties.java:341)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:380)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:385)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.getPluginDesc(DelegatingClassLoader.java:355)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:328)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:261)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:253)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initPluginLoader(DelegatingClassLoader.java:222)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:199)
at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.java:60)
at org.apache.kafka.connect.cli.ConnectDistributed.startConnect(ConnectDistributed.java:91)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:78)
[2022-04-13 06:27:09,520] INFO Registered loader: PluginClassLoader{pluginLocation=file:/my/path/} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
Apr 13, 2022 6:27:17 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be ignored.
I am able to send all the above formats to splunk, but some behave differently. Is there any format using which i will be able to handle all. If i got a pattern not match error i could have included a format, but I don't
Try this.
[(?[^ ]* [^ ])] (?[^ ]) (?[\W\w]+)
.* stops at a new line . [\W\w]+ will capture your whole stack trace in the message field.

Google Fluentd Decode Base64

I have log file which have record between two tags RecordStart and RecordEnd the recorded message is base64 encoded I want to decode the message using google-fluentd and so it can send to other services.
My Config:
<source>
#type tail
path <path_ot>/metrics.log
pos_file /var/lib/google-fluentd/pos/metrics.pos
read_from_head true
format multiline
multiline_flush_interval 2s
format_firstline /^RecordStart/
format1 /^RecordStart\n(?<record>(\n|.)*)RecordEnd$/
tag presto_server
</source>
I am not able to figure out how to decode base64 Any help ?
Try using the filter plugin to decode base64 files.
Your config file in this case may look like this:
<source>
#type tail
path <path_ot>/metrics.log
pos_file /var/lib/google-fluentd/pos/metrics.pos
read_from_head true
format multiline
multiline_flush_interval 2s
format_firstline /^RecordStart/
format1 /^RecordStart\n(?<record>(\n|.)*)RecordEnd$/
tag presto_server
</source>
<filter presto_server>
type base64_decode
fields mesg
</filter>
This is an adaptation of the config file I found here.
You may also find this documentation helpful: HYow to modify log records ingested by fluentd.

fluentd regexp to extract events from a log file

I'm new to fluentd.
I have a log that I want to push to AWS with fluentd but I can't figure out what the regexp should be.
All the log lines, except the multilines, start with a UUID.
Here's a sample log:
6b0815f2-8ff1-4181-a4e6-058148288281 2020-11-03 13:00:05.976366 [DEBUG] switch_core_state_machine.c:611 (some_other_data) State Change CS_REPORTING -> CS_DESTROY
And, I'm trying to get UUID, DateTime, and Message.
With this regex:
/^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>.*) (?<message>[^ ]*)/gm
I'm getting the last word CS_DESTROY.
I tried fluentular and still got:
text:
f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914 2020-11-03 14:32:34.975779 [CRIT] mod_dptools.c:1866 audio3: https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg
regexp:
^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>.*) (?<message>[^ ]*)$
and got:
time 2020/11/03 14:32:34 +0000
UUID f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914
message https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg
It's missing what's between the datetime and "https".
Try:
^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>[^\[]*) (?<message>\[.*)$
Live at rubular: https://rubular.com/r/JQQXs5VTkr2IxM
Here's the output for both logs:
Match 1
UUID 6b0815f2-8ff1-4181-a4e6-058148288281
time 2020-11-03 13:00:05.976366
message [DEBUG] switch_core_state_machine.c:611 (some_other_data) State Change CS_REPORTING -> CS_DESTROY
Match 2
UUID f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914
time 2020-11-03 14:32:34.975779
message [CRIT] mod_dptools.c:1866 audio3: https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg

Add additional field in fluentd

I have a messesae as below
{"log":"kubernetes.var.log.dev-2019-12-24.log\u0009{\"msg\":\"[2019-12-24 10:34:58] app.ERROR: [ApiExceptionHandler:onKernelException]: default not match= exception is Symfony\\\\Component\\\\HttpKernel\\\\Exception\\\\NotFoundHttpException [] []\"}\n","stream":"stdout","time":"2019-12-24T10:34:58.295814385Z"}
Now I want to split it in 4 parts:
file_name: kubernetes.var.log.dev-2019-12-24.log
time: 2019-12-24 10:34:58
messeage_type: app.ERROR
msg: all remainding messeage
In fluentd configuration, I set a regex like:
<parse>
#type "regexp"
expression [(?<time>.+)] (?<kind>.*ERROR|.*INFO): (?<msg>.*)$
</parse>
but it not work as expect, it show a warning
2019-12-27 02:34:30 +0000 [warn]: [fluentd-containers.log] pattern not match: "{\"log\":\"kubernetes.var.log.dev-2019-12-27.log\\u0009{\\\"msg\\\":\\\"[2019-12-27 02:34:27] security.INFO: Populated the TokenStorage with an anonymous Token. [] []\\\"}\\n\",\"stream\":\"stdout\",\"time\":\"2019-12-27T02:34:30.699454425Z\"}"
2019-12-27 02:34:30 +0000 [warn]: [fluentd-containers.log] pattern not match: "{\"log\":\"kubernetes.var.log.dev-2019-12-27.log\\u0009{\\\"msg\\\":\\\"[2019-12-27 02:34:27] app.INFO: [UserCtrl:Login]: request_data= {\\\\\\\"email\\\\\\\":\\\\\\\"tui#gmail.com\\\\\\\",\\\\\\\"password\\\\\\\":\\\\\\\"asfasfd\\\\\\\"} [] []\\\"}\\n\",\"stream\":\"stdout\",\"time\":\"2019-12-27T02:34:30.699458964Z\"}"
I think there is something wrong in regex, please advise me how to fix it
You need to escape [ and ]:
expression \[(?<time>.+)\] (?<kind>.*ERROR|.*INFO): (?<msg>.*)$

Fluentd Regular Expression Matching Error

I am trying to parse the logs from kubernetes like this for example
2018-08-14 13:21:20.013 [INFO][67] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
And this is the configuration
<source>
#id calico-node.log
#type tail
format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/
time_format %Y-%m-%d %H:%M:%S
path /var/log/containers/calico-node**.log
pos_file /var/log/es-calico.pos
tag calico-node
</source>
According to regex101.com, this pattern should match this string. However, I get an error from fluentd while trying to parse this
2018-08-14 13:21:20 +0000 [warn]: [calico-node.log] "{\"log\":\"2018-08-14 13:21:20.013 [INFO][67] health.go 150: Overall health summary=\\u0026health.HealthReport{Live:true, Ready:true}\\n\",\"stream\":\"stdout\",\"time\":\"2018-08-14T13:21:20.013908223Z\"}" error="invalid time format: value = {\"log\":\"2018-08-14 13:21:20.013, error_class = ArgumentError, error = string doesn't match"```
What could be wrong? I have had similar errors with the built-in parser for apache logs as well?
From what I can see, you are missing something in the fluentd config.
Your time_format %Y-%m-%d %H:%M:%S will not work with the timestamp 2018-08-14 13:21:20.013, as it's missing .%3N.
It should be as follows:
time_format %Y-%m-%d %H:%M:%S.%3N or time_format %Y-%m-%d %H:%M:%S.%L
Just faced a similar issue.
I think the #Crou's answer is correct but maybe try %N instead.
according to the document, fluentd parser does not support %3N, %6N, %9N, and %L
https://docs.fluentd.org/configuration/parse-section