FluentD regex parsed values not appearing in Elasticsearch - regex

So I saw there were a few other questions of this type, but none seemed to solve my issue.
I am attempting to take Springboot logs from files, parse out useful information, and send the result to Elasticsearch, and ultimately read from Kibana. My fluentd.conf looks like the following:
<source>
type tail
read_from_head true
path /path/to/log/
pos_file /path/to/pos_file
format /^(?<date>[0-9]+-[0-9]+-[0-9]+\s+[0-9]+:[0-9]+:[0-9]+.[0-9]+)\s+(?<log_level>[Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)\s+(?<pid>[0-9]+)\s+---\s+(?<message>.*)$/
tag my.app
</source>
<match my.app>
type stdout
</match>
<match my.app>
type elasticsearch
logstash_format true
host myhosthere
port 9200
index_name fluentd-app
type_name fluentd
</match>
Given a typical Springboot log line:
2015-07-16 19:20:04.074 INFO 16649 --- [ main] {springboot message}
By also writing to stdout as a test, I see my parser is resulting in:
{
"date":"2015-07-16 19:20:04.074",
"log_level":"INFO",
"pid":"16649",
"message":"[ main] {springboot message}"
}
However, when this gets written to Elasticsearch, all that results is:
{
_index: "fluentd-app-2015.07.16",
_type: "fluentd",
_id: "AU6YT5sjvkxiJXWCxeM8",
_score: 1,
_source: {
message: "2015-07-16 19:20:04.074 INFO 16649 --- [ main] {springboot message}",
#timestamp: "2015-07-16T19:20:04+00:00"
}
},
From what I had read about fluentd-plugin-elasticsearch I expected _source to contain all of the parsed fields that I see in stdout. I have also tried the grok parser - though it seems apparent the issue lies with understanding of the fluentd elasticsearch plugin. How do I get the fields I parsed to persist to elasticsearch?

Related

How to disable JSON format and send only the log message to Sumologic with Fluentbit?

We are using Fluentbit as as Sidecar container in our ECS fargate Cluster which is running a dotnet application, initially we faced the issue of fluentbit sending the logs in multiline and we solved it using Fluentbit Multilne feature. Now the logs are being sent to Sumologic in Multiple however it is being sent as Json format whereas we just want fluentbit send only the raw log
Logs are currently
{
date:1675120653.269619,
container_id:"xvgbertytyuuyuyu",
container_name:"XXXXXXXXXX",
source:"stdout",
log:"2023-01-30 23:17:33.269Z DEBUG [.NET ThreadPool Worker] Connection.ManagedDbConnection - ComponentInstanceEntityAsync - Executing stored proc: dbo.prcGetComponentInstance"
}
We want only the line
2023-01-30 23:17:33.269Z DEBUG [.NET ThreadPool Worker] Connection.ManagedDbConnection - ComponentInstanceEntityAsync - Executing stored proc: dbo.prcGetComponentInstance
You need to modify Fluent Bit configuration to have the following filters and output configuration:
fluent.conf:
## prepare headers for Sumo Logic
[FILTER]
Name record_modifier
Match *
Record headers.content-type text/plain
## Set headers as headers attribute
[FILTER]
Name nest
Match *
Operation nest
Wildcard headers.*
Nest_under headers
Remove_prefix headers.
[OUTPUT]
Name http
...
# use log key as body
body_key $log
# use headers key as headers
headers_key $headers
That way, you are going to craft HTTP request manually. This is going to send request per log, which is not necessary a good idea. In order to mitigate that you can add the following parser and use it (flush_timeout may need an adjustment):
parsers.conf
# merge everything as one big log
[MULTILINE_PARSER]
name multiline-all
type regex
flush_timeout 500
#
# Regex rules for multiline parsing
# ---------------------------------
#
# configuration hints:
#
# - first state always has the name: start_state
# - every field in the rule must be inside double quotes
#
# rules | state name | regex pattern | next state
# ------|---------------|--------------------------------------------
rule "start_state" ".*" "cont"
rule "cont" ".*" "cont"
fluent.conf:
[INPUT]
name tail
...
multiline.parser multiline-all

Unable to get log messages from GELF appender (graylog) - spring

Graylog web page is running as below:
Following the documentation for spring boot: graylog-springboot
However, nothing shows in the result. Could you Please advise me if you know what im doing wrong.
I have created log4j.xml file as below:
<appender name="graylog" class="org.graylog2.log.GelfAppender">
<param name="graylogHost" value="ec2-x-x-x-x.ap-west-1.compute.amazonaws.com"/>
<param name="originHost" value="ec2-x-x-x-x.eu-west-1.compute.amazonaws.com"/>
<param name="graylogPort" value="12201"/>
<param name="extractStacktrace" value="true"/>
<param name="addExtendedInformation" value="true"/>
<param name="facility" value="log4j"/>
<param name="Threshold" value="INFO"/>
<param name="additionalFields" value="{'environment': 'DEV', 'application': 'GraylogDemoApplication'}"/>
Accordingly opened a port for 9000 and 12201 in the security group.
in the build.gradle:
dependencies {
compile 'org.springframework.boot:spring-boot-starter-web'
compile 'org.springframework.boot:spring-boot-starter-log4j2'
implementation group: 'org.graylog2', name: 'gelfj', version: '1.1.16'
}
configurations {
all {
exclude group: 'org.springframework.boot', module: 'spring-boot-starter-logging'
}}
In the application.properties file:
Ensure that the version of Elastic you are running is compatible, currently the highest supported version is 7.10.2. If this is a fresh install it would be worth considering running Opensearch, this would also mean Graylog version installed should be above 4.3. Elastic and Graylog are going through a divorce.
Try the below command, replacing the hostname to test if anything is being ingested.
echo '{ "version": "1.1", "host": "example.org", "short_message": "A short message that helps you identify what is going on", "level": 5, "_some_info": "foo" }\0' | nc -w 1 HOSTNAME 12201
Is there anything in the logs suggesting the message is being dropped?
Where are you sending logs from, does networking/firewalls need to be considered.

Sending logs from fluentd to splunk

I am using log4j , so have different formats of logs. I am able to send most of the logs using the below multiline format from fluentd to splunk, but few of them behave differently(The logs with different date format).
<source>
#type tail
path /tmp/LOG_SPLUNK.*
pos_file /tmp/my-splunk.pos
path_key log_type
read_from_head true
tag "splunk.#log.mylogs"
format multiline
format_firstline /^\[/
format1 /\[(?<timestamp>[^ ]* [^ ]*)\] (?<level>[^ ]*) (?<message>.*)/
time_type string
time_key timestamp
time_format %Y-%m-%d %H:%M:%S,%N
keep_time_key true
</source>
Below are logs formats:
[2022-04-13 06:27:08,340] INFO Loading plugin from: /my/path (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
java.lang.NullPointerException
at java.util.Properties$LineReader.readLine(Properties.java:434)
at java.util.Properties.load0(Properties.java:353)
at java.util.Properties.load(Properties.java:341)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:380)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:385)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.getPluginDesc(DelegatingClassLoader.java:355)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:328)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:261)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:253)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initPluginLoader(DelegatingClassLoader.java:222)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:199)
at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.java:60)
at org.apache.kafka.connect.cli.ConnectDistributed.startConnect(ConnectDistributed.java:91)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:78)
[2022-04-13 06:27:09,520] INFO Registered loader: PluginClassLoader{pluginLocation=file:/my/path/} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
Apr 13, 2022 6:27:17 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be ignored.
I am able to send all the above formats to splunk, but some behave differently. Is there any format using which i will be able to handle all. If i got a pattern not match error i could have included a format, but I don't
Try this.
[(?[^ ]* [^ ])] (?[^ ]) (?[\W\w]+)
.* stops at a new line . [\W\w]+ will capture your whole stack trace in the message field.

Google Fluentd Decode Base64

I have log file which have record between two tags RecordStart and RecordEnd the recorded message is base64 encoded I want to decode the message using google-fluentd and so it can send to other services.
My Config:
<source>
#type tail
path <path_ot>/metrics.log
pos_file /var/lib/google-fluentd/pos/metrics.pos
read_from_head true
format multiline
multiline_flush_interval 2s
format_firstline /^RecordStart/
format1 /^RecordStart\n(?<record>(\n|.)*)RecordEnd$/
tag presto_server
</source>
I am not able to figure out how to decode base64 Any help ?
Try using the filter plugin to decode base64 files.
Your config file in this case may look like this:
<source>
#type tail
path <path_ot>/metrics.log
pos_file /var/lib/google-fluentd/pos/metrics.pos
read_from_head true
format multiline
multiline_flush_interval 2s
format_firstline /^RecordStart/
format1 /^RecordStart\n(?<record>(\n|.)*)RecordEnd$/
tag presto_server
</source>
<filter presto_server>
type base64_decode
fields mesg
</filter>
This is an adaptation of the config file I found here.
You may also find this documentation helpful: HYow to modify log records ingested by fluentd.

AWS logs agent setup

We have recently setup AWS logs agent on one of our test servers. Our log files usually contain multi-line events. e.g one of our log event is:
[10-Jun-2016 07:30:16 UTC] SQS Post Response: Array
(
[Status] => 200
[ResponseBody] => <?xml version="1.0"?><SendMessageResponse xmlns="http://queue.amazonaws.com/doc/2009-02-01/"><SendMessageResult><MessageId>053c7sdf5-1e23-wa9d-99d8-2a0cf9eewe7a</MessageId><MD5OfMessageBody>8e542d2c2a1325a85eeb9sdfwersd58f</MD5OfMessageBody></SendMessageResult><ResponseMetadata><RequestId>4esdfr30-c39b-526b-bds2-14e4gju18af</RequestId></ResponseMetadata></SendMessageResponse>
)
The log agent reference documentation says to use 'multi_line_start_pattern' option for such logs. Our AWS Log agent config is as follows:
[httpd_info.log]
file = /var/log/httpd/info.log*
log_stream_name = info.log
initial_position = start_of_file
log_group_name = test.server.name
multi_line_start_pattern = '(\[)+\d{2}-[a-zA-Z]{3}+-\d{4}'
However, the logs agent reporting breaks on aforementioned and similar events. The way it is being reported to CloudWatch Logs is as follows:
Event 1:
[10-Jun-2016 11:21:26 UTC] SQS Post Response: Array
Event 2:
( [Status] => 200 [ResponseBody] => <?xml version="1.0"?><SendMessageResponse xmlns="http://queue.amazonaws.com/doc/2009-02-01/"><SendMessageResult><MessageId>053c7sdf5-1e23-wa9d-99d8-2a0cf9eewe7a</MessageId><MD5OfMessageBody>8e542d2c2a1325a85eeb9sdfwersd58f</MD5OfMessageBody></SendMessageResult><ResponseMetadata><RequestId>4esdfr30-c39b-526b-bds2-14e4gju18af</RequestId></ResponseMetadata></SendMessageResponse>
Event 3:
)
Despite of the fact that its only a single event. Any clue whats going on here?
I think all you need to add is the following to your awslogs.conf
datetime_format = %d-%b-%Y %H:%M:%S UTC
time_zone = UTC
multi_line_start_pattern = {datetime_format}
http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html
multi_line_start_pattern
Specifies the pattern for identifying the start of a log message. A log message is made of a line that matches the pattern and any following lines that don't match the pattern. The valid values are regular expression or {datetime_format}. When using {datetime_format}, the datetime_format option should be specified. The default value is ‘^[^\s]' so any line that begins with non-whitespace character closes the previous log message and starts a new log message.
If that datetime format didn't work, you would need to update your regex to actually match your specific datetime. I don't think the one you have listed above actually works for your given format.
You could try this for instance:
[\d{2}-[\w]{3}-\d{4}\s{1}\d{2}:\d{2}:\d{2}\s{1}\w+]
does match
[10-Jun-2016 11:21:26 UTC]
See here: http://www.regexpal.com/?fam=96811
Once completed, issue a restart of the service and check to see if its parsing correctly.
$ sudo service awslogs restart