Google Fluentd Decode Base64 - google-cloud-platform

I have log file which have record between two tags RecordStart and RecordEnd the recorded message is base64 encoded I want to decode the message using google-fluentd and so it can send to other services.
My Config:
<source>
#type tail
path <path_ot>/metrics.log
pos_file /var/lib/google-fluentd/pos/metrics.pos
read_from_head true
format multiline
multiline_flush_interval 2s
format_firstline /^RecordStart/
format1 /^RecordStart\n(?<record>(\n|.)*)RecordEnd$/
tag presto_server
</source>
I am not able to figure out how to decode base64 Any help ?

Try using the filter plugin to decode base64 files.
Your config file in this case may look like this:
<source>
#type tail
path <path_ot>/metrics.log
pos_file /var/lib/google-fluentd/pos/metrics.pos
read_from_head true
format multiline
multiline_flush_interval 2s
format_firstline /^RecordStart/
format1 /^RecordStart\n(?<record>(\n|.)*)RecordEnd$/
tag presto_server
</source>
<filter presto_server>
type base64_decode
fields mesg
</filter>
This is an adaptation of the config file I found here.
You may also find this documentation helpful: HYow to modify log records ingested by fluentd.

Related

how can i parse only error logs using fluentd

Want to push only error and warning logs to cloudwatch log groups. I want to use fluentd for this approach.
This is how my general log file look like.
2022-08-18 06:15:48,983 | 3349 | process_message | INFO | N.A | -1 | -1 | -1 | N.A. | message is empty |
I am using fluent-plugin-cloudwatch-logs plugin.
This how my td-agent conf file look like.
```
<source>
#type tail
path /var/log/*/*.log
pos_file /var/log/td-agent/apps.pos
tag disagg-logs
<parse>
#type regexp
expression /\[\w+\] ERROR\s|(?<message>.*)$/
</parse>
</source>
<match disagg-logs>
#type cloudwatch_logs
log_group_name disagg-logs
log_stream_name disagg-logs
auto_create_stream true
region us-east-1
</match>
With the above configuration file it is pushing even INFO Logs.
able to do with the below regex ^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3})[\s\|]+(?<pid>\d+)[\s\|]+(?<location>[\w.]+)[\s\|]+(?<level>(INFO|ERROR|WARNING))[\s\|]+(?<uuid>[(\w\d\-|N\.A)]+)[\s\|]+(?<timestart>-?\d+)[\s\|]+(?<timeend>-?\d+)[\s\|]+(?<Id>-?\d+)[\s\|]+(?<type>[(\w\d\-|N\.A)]+)[\s\|]+(?<message>[A-Za-z0-9_ ]+)[\s\|]+$

Sending logs from fluentd to splunk

I am using log4j , so have different formats of logs. I am able to send most of the logs using the below multiline format from fluentd to splunk, but few of them behave differently(The logs with different date format).
<source>
#type tail
path /tmp/LOG_SPLUNK.*
pos_file /tmp/my-splunk.pos
path_key log_type
read_from_head true
tag "splunk.#log.mylogs"
format multiline
format_firstline /^\[/
format1 /\[(?<timestamp>[^ ]* [^ ]*)\] (?<level>[^ ]*) (?<message>.*)/
time_type string
time_key timestamp
time_format %Y-%m-%d %H:%M:%S,%N
keep_time_key true
</source>
Below are logs formats:
[2022-04-13 06:27:08,340] INFO Loading plugin from: /my/path (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
java.lang.NullPointerException
at java.util.Properties$LineReader.readLine(Properties.java:434)
at java.util.Properties.load0(Properties.java:353)
at java.util.Properties.load(Properties.java:341)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:380)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:385)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.getPluginDesc(DelegatingClassLoader.java:355)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:328)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:261)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:253)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initPluginLoader(DelegatingClassLoader.java:222)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:199)
at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.java:60)
at org.apache.kafka.connect.cli.ConnectDistributed.startConnect(ConnectDistributed.java:91)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:78)
[2022-04-13 06:27:09,520] INFO Registered loader: PluginClassLoader{pluginLocation=file:/my/path/} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
Apr 13, 2022 6:27:17 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be ignored.
I am able to send all the above formats to splunk, but some behave differently. Is there any format using which i will be able to handle all. If i got a pattern not match error i could have included a format, but I don't
Try this.
[(?[^ ]* [^ ])] (?[^ ]) (?[\W\w]+)
.* stops at a new line . [\W\w]+ will capture your whole stack trace in the message field.

How to parse a specific message and send it to a different output with fluent bit

I need to parse a specific message from a log file with fluent-bit and send it to a file. All messages should be send to stdout and every message containing a specific string should be sent to a file. I have managed to do it with a filter with the following configuration
[SERVICE]
Flush 1
Log_Level info
[INPUT]
Name tail
Path inputfile.log
Tag test_tag
[FILTER]
Name rewrite_tag
Match test_tag
Rule $log (user_temporarily_disabled) from.$TAG.new true
Emitter_Name re_emitted
[OUTPUT]
Name stdout
Match test_tag
[OUTPUT]
Name file
Match from.*
File myoutput.log
With the following configuration whenever i send a line to the input file it goes to stdout in any case and it goes to file if the line contains the "user_temporarily_disabled" string. This is achieved by rewriting the tag with the rewrite_tag filter.
What i need more is to parse the message and rewrite it a new form. I have tried to add a Parser with no success
ok i found it after spending some time
[SERVICE]
Parsers_File parserFile.conf
[INPUT]
Name tail
Path inputfile.log
Tag inputtag
#first filter to redirect to parser
[FILTER]
Name parser
Match inputtag*
Key_Name log
Parser myparser
#second filter to rewrite tag after parser
[FILTER]
Name rewrite_tag
Match *
Rule $ALARMTEXT (user_temporarily_disabled) newtag true
Emitter_Name re_emitted
[OUTPUT]
Name file
Match newtag*
File output.log
[OUTPUT]
Name stdout
Match *
and the parser should be something like this
[PARSER]
Name myparser
Format regex
Regex ^(?<INT>[^ ]+) (?<FLOAT>[^ ]+) (?<BOOL>[^ ]+) (?<ALARMTEXT>.+)$
now if i send something like this to the input file :
echo "111 0.1 true user_temporarily_disabled" >> inputfile.log
it goes to the file AND the output.
Anything not parsed goes to the output only

Fluentd Regular Expression Matching Error

I am trying to parse the logs from kubernetes like this for example
2018-08-14 13:21:20.013 [INFO][67] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
And this is the configuration
<source>
#id calico-node.log
#type tail
format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/
time_format %Y-%m-%d %H:%M:%S
path /var/log/containers/calico-node**.log
pos_file /var/log/es-calico.pos
tag calico-node
</source>
According to regex101.com, this pattern should match this string. However, I get an error from fluentd while trying to parse this
2018-08-14 13:21:20 +0000 [warn]: [calico-node.log] "{\"log\":\"2018-08-14 13:21:20.013 [INFO][67] health.go 150: Overall health summary=\\u0026health.HealthReport{Live:true, Ready:true}\\n\",\"stream\":\"stdout\",\"time\":\"2018-08-14T13:21:20.013908223Z\"}" error="invalid time format: value = {\"log\":\"2018-08-14 13:21:20.013, error_class = ArgumentError, error = string doesn't match"```
What could be wrong? I have had similar errors with the built-in parser for apache logs as well?
From what I can see, you are missing something in the fluentd config.
Your time_format %Y-%m-%d %H:%M:%S will not work with the timestamp 2018-08-14 13:21:20.013, as it's missing .%3N.
It should be as follows:
time_format %Y-%m-%d %H:%M:%S.%3N or time_format %Y-%m-%d %H:%M:%S.%L
Just faced a similar issue.
I think the #Crou's answer is correct but maybe try %N instead.
according to the document, fluentd parser does not support %3N, %6N, %9N, and %L
https://docs.fluentd.org/configuration/parse-section

FluentD regex parsed values not appearing in Elasticsearch

So I saw there were a few other questions of this type, but none seemed to solve my issue.
I am attempting to take Springboot logs from files, parse out useful information, and send the result to Elasticsearch, and ultimately read from Kibana. My fluentd.conf looks like the following:
<source>
type tail
read_from_head true
path /path/to/log/
pos_file /path/to/pos_file
format /^(?<date>[0-9]+-[0-9]+-[0-9]+\s+[0-9]+:[0-9]+:[0-9]+.[0-9]+)\s+(?<log_level>[Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)\s+(?<pid>[0-9]+)\s+---\s+(?<message>.*)$/
tag my.app
</source>
<match my.app>
type stdout
</match>
<match my.app>
type elasticsearch
logstash_format true
host myhosthere
port 9200
index_name fluentd-app
type_name fluentd
</match>
Given a typical Springboot log line:
2015-07-16 19:20:04.074 INFO 16649 --- [ main] {springboot message}
By also writing to stdout as a test, I see my parser is resulting in:
{
"date":"2015-07-16 19:20:04.074",
"log_level":"INFO",
"pid":"16649",
"message":"[ main] {springboot message}"
}
However, when this gets written to Elasticsearch, all that results is:
{
_index: "fluentd-app-2015.07.16",
_type: "fluentd",
_id: "AU6YT5sjvkxiJXWCxeM8",
_score: 1,
_source: {
message: "2015-07-16 19:20:04.074 INFO 16649 --- [ main] {springboot message}",
#timestamp: "2015-07-16T19:20:04+00:00"
}
},
From what I had read about fluentd-plugin-elasticsearch I expected _source to contain all of the parsed fields that I see in stdout. I have also tried the grok parser - though it seems apparent the issue lies with understanding of the fluentd elasticsearch plugin. How do I get the fields I parsed to persist to elasticsearch?