Fluentd Regular Expression Matching Error - regex

I am trying to parse the logs from kubernetes like this for example
2018-08-14 13:21:20.013 [INFO][67] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
And this is the configuration
<source>
#id calico-node.log
#type tail
format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/
time_format %Y-%m-%d %H:%M:%S
path /var/log/containers/calico-node**.log
pos_file /var/log/es-calico.pos
tag calico-node
</source>
According to regex101.com, this pattern should match this string. However, I get an error from fluentd while trying to parse this
2018-08-14 13:21:20 +0000 [warn]: [calico-node.log] "{\"log\":\"2018-08-14 13:21:20.013 [INFO][67] health.go 150: Overall health summary=\\u0026health.HealthReport{Live:true, Ready:true}\\n\",\"stream\":\"stdout\",\"time\":\"2018-08-14T13:21:20.013908223Z\"}" error="invalid time format: value = {\"log\":\"2018-08-14 13:21:20.013, error_class = ArgumentError, error = string doesn't match"```
What could be wrong? I have had similar errors with the built-in parser for apache logs as well?

From what I can see, you are missing something in the fluentd config.
Your time_format %Y-%m-%d %H:%M:%S will not work with the timestamp 2018-08-14 13:21:20.013, as it's missing .%3N.
It should be as follows:
time_format %Y-%m-%d %H:%M:%S.%3N or time_format %Y-%m-%d %H:%M:%S.%L

Just faced a similar issue.
I think the #Crou's answer is correct but maybe try %N instead.
according to the document, fluentd parser does not support %3N, %6N, %9N, and %L
https://docs.fluentd.org/configuration/parse-section

Related

Sending logs from fluentd to splunk

I am using log4j , so have different formats of logs. I am able to send most of the logs using the below multiline format from fluentd to splunk, but few of them behave differently(The logs with different date format).
<source>
#type tail
path /tmp/LOG_SPLUNK.*
pos_file /tmp/my-splunk.pos
path_key log_type
read_from_head true
tag "splunk.#log.mylogs"
format multiline
format_firstline /^\[/
format1 /\[(?<timestamp>[^ ]* [^ ]*)\] (?<level>[^ ]*) (?<message>.*)/
time_type string
time_key timestamp
time_format %Y-%m-%d %H:%M:%S,%N
keep_time_key true
</source>
Below are logs formats:
[2022-04-13 06:27:08,340] INFO Loading plugin from: /my/path (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
java.lang.NullPointerException
at java.util.Properties$LineReader.readLine(Properties.java:434)
at java.util.Properties.load0(Properties.java:353)
at java.util.Properties.load(Properties.java:341)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:380)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:385)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.getPluginDesc(DelegatingClassLoader.java:355)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:328)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:261)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:253)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initPluginLoader(DelegatingClassLoader.java:222)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:199)
at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.java:60)
at org.apache.kafka.connect.cli.ConnectDistributed.startConnect(ConnectDistributed.java:91)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:78)
[2022-04-13 06:27:09,520] INFO Registered loader: PluginClassLoader{pluginLocation=file:/my/path/} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
Apr 13, 2022 6:27:17 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be ignored.
I am able to send all the above formats to splunk, but some behave differently. Is there any format using which i will be able to handle all. If i got a pattern not match error i could have included a format, but I don't
Try this.
[(?[^ ]* [^ ])] (?[^ ]) (?[\W\w]+)
.* stops at a new line . [\W\w]+ will capture your whole stack trace in the message field.

Google Fluentd Decode Base64

I have log file which have record between two tags RecordStart and RecordEnd the recorded message is base64 encoded I want to decode the message using google-fluentd and so it can send to other services.
My Config:
<source>
#type tail
path <path_ot>/metrics.log
pos_file /var/lib/google-fluentd/pos/metrics.pos
read_from_head true
format multiline
multiline_flush_interval 2s
format_firstline /^RecordStart/
format1 /^RecordStart\n(?<record>(\n|.)*)RecordEnd$/
tag presto_server
</source>
I am not able to figure out how to decode base64 Any help ?
Try using the filter plugin to decode base64 files.
Your config file in this case may look like this:
<source>
#type tail
path <path_ot>/metrics.log
pos_file /var/lib/google-fluentd/pos/metrics.pos
read_from_head true
format multiline
multiline_flush_interval 2s
format_firstline /^RecordStart/
format1 /^RecordStart\n(?<record>(\n|.)*)RecordEnd$/
tag presto_server
</source>
<filter presto_server>
type base64_decode
fields mesg
</filter>
This is an adaptation of the config file I found here.
You may also find this documentation helpful: HYow to modify log records ingested by fluentd.

Add additional field in fluentd

I have a messesae as below
{"log":"kubernetes.var.log.dev-2019-12-24.log\u0009{\"msg\":\"[2019-12-24 10:34:58] app.ERROR: [ApiExceptionHandler:onKernelException]: default not match= exception is Symfony\\\\Component\\\\HttpKernel\\\\Exception\\\\NotFoundHttpException [] []\"}\n","stream":"stdout","time":"2019-12-24T10:34:58.295814385Z"}
Now I want to split it in 4 parts:
file_name: kubernetes.var.log.dev-2019-12-24.log
time: 2019-12-24 10:34:58
messeage_type: app.ERROR
msg: all remainding messeage
In fluentd configuration, I set a regex like:
<parse>
#type "regexp"
expression [(?<time>.+)] (?<kind>.*ERROR|.*INFO): (?<msg>.*)$
</parse>
but it not work as expect, it show a warning
2019-12-27 02:34:30 +0000 [warn]: [fluentd-containers.log] pattern not match: "{\"log\":\"kubernetes.var.log.dev-2019-12-27.log\\u0009{\\\"msg\\\":\\\"[2019-12-27 02:34:27] security.INFO: Populated the TokenStorage with an anonymous Token. [] []\\\"}\\n\",\"stream\":\"stdout\",\"time\":\"2019-12-27T02:34:30.699454425Z\"}"
2019-12-27 02:34:30 +0000 [warn]: [fluentd-containers.log] pattern not match: "{\"log\":\"kubernetes.var.log.dev-2019-12-27.log\\u0009{\\\"msg\\\":\\\"[2019-12-27 02:34:27] app.INFO: [UserCtrl:Login]: request_data= {\\\\\\\"email\\\\\\\":\\\\\\\"tui#gmail.com\\\\\\\",\\\\\\\"password\\\\\\\":\\\\\\\"asfasfd\\\\\\\"} [] []\\\"}\\n\",\"stream\":\"stdout\",\"time\":\"2019-12-27T02:34:30.699458964Z\"}"
I think there is something wrong in regex, please advise me how to fix it
You need to escape [ and ]:
expression \[(?<time>.+)\] (?<kind>.*ERROR|.*INFO): (?<msg>.*)$

Stackdriver Logging - Log severity levels not reported/received when sent via syslog

It appears that log severity is not being passed to Google Cloud Logging platform via fluentd agent, to reproduce you can try:
Bash:
logger -p user.crit "My log"
or PHP:
php -r "syslog(LOG_CRIT,'My log');"
or Python:
import syslog
syslog.syslog(syslog.LOG_ERR, 'My log')
things are getting passed to Google Logs Viewer as below:
but severity is not being sent across, any ideas why would that be?
OK, managed to find the solution, here you go:
update your syslog output format under /etc/rsyslog.conf to the following:
$template googlelogger,"%syslogseverity-text% %timegenerated% %HOSTNAME% %syslogtag% %msg%\n"
$ActionFileDefaultTemplate googlelogger
then update /etc/google-fluentd/config.d/syslog.conf template format:
format /^(?<severity>[a-zA-Z]*) (?<time>[^ ]*\s*[^ ]* [^ ]*) (?<host>[^ ]*) (?<service>[a-zA-Z0-9_\/\.\-]*): *(?<message>.*)$/
time_format %b %d %H:%M:%S
make sure to restart both rsyslog and google-fluentd after that severity will be sent to Google Cloud Logging

FluentD datetime format doesn't match

This is my FluentD parser config:
<filter format.3.**>
#type parser
format /^\[(?<module>[^\]]+)\] (?<time>.+): (?<msg>.*)$/
time_format %Y-%m-%d %H:%M:%S
key_name log
keep_time_key true
reserve_data false
</filter>
And that's an example log line:
[Macaron] 2017-04-26 16:54:26: Started GET / for 172.20.0.0
In the FluentD error log I'm getting:
2017-04-27 12:01:58 +0000 [warn]: plugin/filter_parser.rb:69:rescue in block in filter_stream: invalid time format: value = 2017-04-27 12:01:58, error_class = ArgumentError, error = invalid strptime format - `%Y-%m-%d %H:%M:%S'
The important part is this:
invalid time format: value = 2017-04-27 12:01:58, error_class = ArgumentError, error = invalid strptime format - `%Y-%m-%d %H:%M:%S'
But I can't see how %Y-%m-%d %H:%M:%S doesn't match 2017-04-27 12:01:58, or why this format would be invalid.
according to this tool it should match
I figured it out, there were some special characters to set colors in the log. Apparently when copy-pasting them into fluentular.herokuapp.com they did not get copied and that's why it worked there:
{"log":"[Macaron] \u001b[1;32m2017-04-27 12:34:07: Completed /node 200 OK in 1.163351ms\u001b[0m\n","stream":"stdout","time":"2017-04-27T12:34:07.953993591Z"}