how can i parse only error logs using fluentd - amazon-web-services

Want to push only error and warning logs to cloudwatch log groups. I want to use fluentd for this approach.
This is how my general log file look like.
2022-08-18 06:15:48,983 | 3349 | process_message | INFO | N.A | -1 | -1 | -1 | N.A. | message is empty |
I am using fluent-plugin-cloudwatch-logs plugin.
This how my td-agent conf file look like.
```
<source>
#type tail
path /var/log/*/*.log
pos_file /var/log/td-agent/apps.pos
tag disagg-logs
<parse>
#type regexp
expression /\[\w+\] ERROR\s|(?<message>.*)$/
</parse>
</source>
<match disagg-logs>
#type cloudwatch_logs
log_group_name disagg-logs
log_stream_name disagg-logs
auto_create_stream true
region us-east-1
</match>
With the above configuration file it is pushing even INFO Logs.

able to do with the below regex ^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3})[\s\|]+(?<pid>\d+)[\s\|]+(?<location>[\w.]+)[\s\|]+(?<level>(INFO|ERROR|WARNING))[\s\|]+(?<uuid>[(\w\d\-|N\.A)]+)[\s\|]+(?<timestart>-?\d+)[\s\|]+(?<timeend>-?\d+)[\s\|]+(?<Id>-?\d+)[\s\|]+(?<type>[(\w\d\-|N\.A)]+)[\s\|]+(?<message>[A-Za-z0-9_ ]+)[\s\|]+$

Related

Sending logs from fluentd to splunk

I am using log4j , so have different formats of logs. I am able to send most of the logs using the below multiline format from fluentd to splunk, but few of them behave differently(The logs with different date format).
<source>
#type tail
path /tmp/LOG_SPLUNK.*
pos_file /tmp/my-splunk.pos
path_key log_type
read_from_head true
tag "splunk.#log.mylogs"
format multiline
format_firstline /^\[/
format1 /\[(?<timestamp>[^ ]* [^ ]*)\] (?<level>[^ ]*) (?<message>.*)/
time_type string
time_key timestamp
time_format %Y-%m-%d %H:%M:%S,%N
keep_time_key true
</source>
Below are logs formats:
[2022-04-13 06:27:08,340] INFO Loading plugin from: /my/path (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
java.lang.NullPointerException
at java.util.Properties$LineReader.readLine(Properties.java:434)
at java.util.Properties.load0(Properties.java:353)
at java.util.Properties.load(Properties.java:341)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:380)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:385)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.getPluginDesc(DelegatingClassLoader.java:355)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:328)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:261)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:253)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initPluginLoader(DelegatingClassLoader.java:222)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:199)
at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.java:60)
at org.apache.kafka.connect.cli.ConnectDistributed.startConnect(ConnectDistributed.java:91)
at org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:78)
[2022-04-13 06:27:09,520] INFO Registered loader: PluginClassLoader{pluginLocation=file:/my/path/} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
Apr 13, 2022 6:27:17 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be ignored.
I am able to send all the above formats to splunk, but some behave differently. Is there any format using which i will be able to handle all. If i got a pattern not match error i could have included a format, but I don't
Try this.
[(?[^ ]* [^ ])] (?[^ ]) (?[\W\w]+)
.* stops at a new line . [\W\w]+ will capture your whole stack trace in the message field.

Google Fluentd Decode Base64

I have log file which have record between two tags RecordStart and RecordEnd the recorded message is base64 encoded I want to decode the message using google-fluentd and so it can send to other services.
My Config:
<source>
#type tail
path <path_ot>/metrics.log
pos_file /var/lib/google-fluentd/pos/metrics.pos
read_from_head true
format multiline
multiline_flush_interval 2s
format_firstline /^RecordStart/
format1 /^RecordStart\n(?<record>(\n|.)*)RecordEnd$/
tag presto_server
</source>
I am not able to figure out how to decode base64 Any help ?
Try using the filter plugin to decode base64 files.
Your config file in this case may look like this:
<source>
#type tail
path <path_ot>/metrics.log
pos_file /var/lib/google-fluentd/pos/metrics.pos
read_from_head true
format multiline
multiline_flush_interval 2s
format_firstline /^RecordStart/
format1 /^RecordStart\n(?<record>(\n|.)*)RecordEnd$/
tag presto_server
</source>
<filter presto_server>
type base64_decode
fields mesg
</filter>
This is an adaptation of the config file I found here.
You may also find this documentation helpful: HYow to modify log records ingested by fluentd.

Fluentd Regular Expression Matching Error

I am trying to parse the logs from kubernetes like this for example
2018-08-14 13:21:20.013 [INFO][67] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}
And this is the configuration
<source>
#id calico-node.log
#type tail
format /^(?<time>[^ ]* [^ ,]*)[^\[]*\[[^\]]*\]\[(?<severity>[^ \]]*) *\] (?<message>.*)$/
time_format %Y-%m-%d %H:%M:%S
path /var/log/containers/calico-node**.log
pos_file /var/log/es-calico.pos
tag calico-node
</source>
According to regex101.com, this pattern should match this string. However, I get an error from fluentd while trying to parse this
2018-08-14 13:21:20 +0000 [warn]: [calico-node.log] "{\"log\":\"2018-08-14 13:21:20.013 [INFO][67] health.go 150: Overall health summary=\\u0026health.HealthReport{Live:true, Ready:true}\\n\",\"stream\":\"stdout\",\"time\":\"2018-08-14T13:21:20.013908223Z\"}" error="invalid time format: value = {\"log\":\"2018-08-14 13:21:20.013, error_class = ArgumentError, error = string doesn't match"```
What could be wrong? I have had similar errors with the built-in parser for apache logs as well?
From what I can see, you are missing something in the fluentd config.
Your time_format %Y-%m-%d %H:%M:%S will not work with the timestamp 2018-08-14 13:21:20.013, as it's missing .%3N.
It should be as follows:
time_format %Y-%m-%d %H:%M:%S.%3N or time_format %Y-%m-%d %H:%M:%S.%L
Just faced a similar issue.
I think the #Crou's answer is correct but maybe try %N instead.
according to the document, fluentd parser does not support %3N, %6N, %9N, and %L
https://docs.fluentd.org/configuration/parse-section

FluentD regex parsed values not appearing in Elasticsearch

So I saw there were a few other questions of this type, but none seemed to solve my issue.
I am attempting to take Springboot logs from files, parse out useful information, and send the result to Elasticsearch, and ultimately read from Kibana. My fluentd.conf looks like the following:
<source>
type tail
read_from_head true
path /path/to/log/
pos_file /path/to/pos_file
format /^(?<date>[0-9]+-[0-9]+-[0-9]+\s+[0-9]+:[0-9]+:[0-9]+.[0-9]+)\s+(?<log_level>[Aa]lert|ALERT|[Tt]race|TRACE|[Dd]ebug|DEBUG|[Nn]otice|NOTICE|[Ii]nfo|INFO|[Ww]arn?(?:ing)?|WARN?(?:ING)?|[Ee]rr?(?:or)?|ERR?(?:OR)?|[Cc]rit?(?:ical)?|CRIT?(?:ICAL)?|[Ff]atal|FATAL|[Ss]evere|SEVERE|EMERG(?:ENCY)?|[Ee]merg(?:ency)?)\s+(?<pid>[0-9]+)\s+---\s+(?<message>.*)$/
tag my.app
</source>
<match my.app>
type stdout
</match>
<match my.app>
type elasticsearch
logstash_format true
host myhosthere
port 9200
index_name fluentd-app
type_name fluentd
</match>
Given a typical Springboot log line:
2015-07-16 19:20:04.074 INFO 16649 --- [ main] {springboot message}
By also writing to stdout as a test, I see my parser is resulting in:
{
"date":"2015-07-16 19:20:04.074",
"log_level":"INFO",
"pid":"16649",
"message":"[ main] {springboot message}"
}
However, when this gets written to Elasticsearch, all that results is:
{
_index: "fluentd-app-2015.07.16",
_type: "fluentd",
_id: "AU6YT5sjvkxiJXWCxeM8",
_score: 1,
_source: {
message: "2015-07-16 19:20:04.074 INFO 16649 --- [ main] {springboot message}",
#timestamp: "2015-07-16T19:20:04+00:00"
}
},
From what I had read about fluentd-plugin-elasticsearch I expected _source to contain all of the parsed fields that I see in stdout. I have also tried the grok parser - though it seems apparent the issue lies with understanding of the fluentd elasticsearch plugin. How do I get the fields I parsed to persist to elasticsearch?

What effect to SPF-record qualifiers have if they are included into another SPF record, if the parent record is stricter than the child?

I want to put a strict FAIL qualifier on all email sources that are not listed explicitly in the SPF record of my domain.
This can simply be accomplished by the following record (the -all designates that all other sources should not be accepted)
mydomain.com. IN TXT "v=spf1 ip4:my-ip-address/32 -all"
Now my problem is that I in addition want to white-list my email provider (mailgun.com) as well as google apps, so I created the following record:
mydomain.com. IN TXT "v=spf1 include:mailgun.org include:_spf.google.com ip4:my-ip-address/32 -all"
Now the SPF record of mailgun.com (in case of google the same situation applies) resolves to:
mailgun.org. 3600 IN TXT "v=spf1 ip4:173.193.210.32/27 ip4:50.23.218.192/27 ip4:174.37.226.64/27 ip4:208.43.239.136/30 ip4:50.23.215.176/30 ip4:184.173.105.0/24 ip4:184.173.153.0/24 ip4:209.61.151.0/24 ip4:166.78.68.0/22 ip4:198.61.254.0/23 ip4:192.237.158.0/23 " "~all"
Now what,s interesting is, is, that they put a soft fail qualifier "~all" on their spf record.
Wikipedia describes the include directive as follows:
If the included (a misnomer) policy passes the test this mechanism
matches. This is typically used to include policies of more than one
ISP.
I interpret this in the way that an unknown sender is qualified as SOFT FAIL by the included records, and therefore passes as SOFT FAIL, because they are included in the root record. Even if the root record puts a FAIL on all not included sources.
So that the incldued records effectively render the FAIL qualifier of the root record useless. So the laeast strict record deinfes the overall qualifier for unknown sources.
Am I correct in this assumption? If not, how is in the example given, an unknown sender qualified?
The behaviour is described in seciton 5.2 of the RFC where it says
Whether this mechanism matches, does not match, or throws an
exception depends on the result of the recursive evaluation of
check_host():
+---------------------------------+---------------------------------+
| A recursive check_host() result | Causes the "include" mechanism |
| of: | to: |
+---------------------------------+---------------------------------+
| Pass | match |
| | |
| Fail | not match |
| | |
| SoftFail | not match |
| | |
| Neutral | not match |
| | |
| TempError | throw TempError |
| | |
| PermError | throw PermError |
| | |
| None | throw PermError |
+---------------------------------+---------------------------------+
The mechanism in this contects refers to the "include" functionality.
As shown in the table a softfail causes a not-match.
It also says:
In hindsight, the name "include" was poorly chosen. Only the
evaluated result of the referenced SPF record is used, rather than
acting as if the referenced SPF record was literally included in the
first.
Which I interpret in the way that only the result of the included record is relevant, which is, in the cas of a soft fail, a not-match (same as if the record woul have a FAIL) qualifier.
Here's also a test result with the py spf library performed on this website
Input accepted, querying now...
Mail sent from this IP address: 1.2.3.4
Mail from (Sender): scknpbi#cacxjxv.com
Mail checked using this SPF policy: v=spf1 ip4:4.5.6.7/32 include:mailgun.org -all
Results - FAIL Message may be rejected
Mail sent from: 1.2.3.4
Mail Server HELO/EHLO identity: blanivzsrxvbla#saucjw.com
HELO/EHLO Results - none