AWS logs agent setup - amazon-web-services

We have recently setup AWS logs agent on one of our test servers. Our log files usually contain multi-line events. e.g one of our log event is:
[10-Jun-2016 07:30:16 UTC] SQS Post Response: Array
(
[Status] => 200
[ResponseBody] => <?xml version="1.0"?><SendMessageResponse xmlns="http://queue.amazonaws.com/doc/2009-02-01/"><SendMessageResult><MessageId>053c7sdf5-1e23-wa9d-99d8-2a0cf9eewe7a</MessageId><MD5OfMessageBody>8e542d2c2a1325a85eeb9sdfwersd58f</MD5OfMessageBody></SendMessageResult><ResponseMetadata><RequestId>4esdfr30-c39b-526b-bds2-14e4gju18af</RequestId></ResponseMetadata></SendMessageResponse>
)
The log agent reference documentation says to use 'multi_line_start_pattern' option for such logs. Our AWS Log agent config is as follows:
[httpd_info.log]
file = /var/log/httpd/info.log*
log_stream_name = info.log
initial_position = start_of_file
log_group_name = test.server.name
multi_line_start_pattern = '(\[)+\d{2}-[a-zA-Z]{3}+-\d{4}'
However, the logs agent reporting breaks on aforementioned and similar events. The way it is being reported to CloudWatch Logs is as follows:
Event 1:
[10-Jun-2016 11:21:26 UTC] SQS Post Response: Array
Event 2:
( [Status] => 200 [ResponseBody] => <?xml version="1.0"?><SendMessageResponse xmlns="http://queue.amazonaws.com/doc/2009-02-01/"><SendMessageResult><MessageId>053c7sdf5-1e23-wa9d-99d8-2a0cf9eewe7a</MessageId><MD5OfMessageBody>8e542d2c2a1325a85eeb9sdfwersd58f</MD5OfMessageBody></SendMessageResult><ResponseMetadata><RequestId>4esdfr30-c39b-526b-bds2-14e4gju18af</RequestId></ResponseMetadata></SendMessageResponse>
Event 3:
)
Despite of the fact that its only a single event. Any clue whats going on here?

I think all you need to add is the following to your awslogs.conf
datetime_format = %d-%b-%Y %H:%M:%S UTC
time_zone = UTC
multi_line_start_pattern = {datetime_format}
http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html
multi_line_start_pattern
Specifies the pattern for identifying the start of a log message. A log message is made of a line that matches the pattern and any following lines that don't match the pattern. The valid values are regular expression or {datetime_format}. When using {datetime_format}, the datetime_format option should be specified. The default value is ‘^[^\s]' so any line that begins with non-whitespace character closes the previous log message and starts a new log message.
If that datetime format didn't work, you would need to update your regex to actually match your specific datetime. I don't think the one you have listed above actually works for your given format.
You could try this for instance:
[\d{2}-[\w]{3}-\d{4}\s{1}\d{2}:\d{2}:\d{2}\s{1}\w+]
does match
[10-Jun-2016 11:21:26 UTC]
See here: http://www.regexpal.com/?fam=96811
Once completed, issue a restart of the service and check to see if its parsing correctly.
$ sudo service awslogs restart

Related

How to disable JSON format and send only the log message to Sumologic with Fluentbit?

We are using Fluentbit as as Sidecar container in our ECS fargate Cluster which is running a dotnet application, initially we faced the issue of fluentbit sending the logs in multiline and we solved it using Fluentbit Multilne feature. Now the logs are being sent to Sumologic in Multiple however it is being sent as Json format whereas we just want fluentbit send only the raw log
Logs are currently
{
date:1675120653.269619,
container_id:"xvgbertytyuuyuyu",
container_name:"XXXXXXXXXX",
source:"stdout",
log:"2023-01-30 23:17:33.269Z DEBUG [.NET ThreadPool Worker] Connection.ManagedDbConnection - ComponentInstanceEntityAsync - Executing stored proc: dbo.prcGetComponentInstance"
}
We want only the line
2023-01-30 23:17:33.269Z DEBUG [.NET ThreadPool Worker] Connection.ManagedDbConnection - ComponentInstanceEntityAsync - Executing stored proc: dbo.prcGetComponentInstance
You need to modify Fluent Bit configuration to have the following filters and output configuration:
fluent.conf:
## prepare headers for Sumo Logic
[FILTER]
Name record_modifier
Match *
Record headers.content-type text/plain
## Set headers as headers attribute
[FILTER]
Name nest
Match *
Operation nest
Wildcard headers.*
Nest_under headers
Remove_prefix headers.
[OUTPUT]
Name http
...
# use log key as body
body_key $log
# use headers key as headers
headers_key $headers
That way, you are going to craft HTTP request manually. This is going to send request per log, which is not necessary a good idea. In order to mitigate that you can add the following parser and use it (flush_timeout may need an adjustment):
parsers.conf
# merge everything as one big log
[MULTILINE_PARSER]
name multiline-all
type regex
flush_timeout 500
#
# Regex rules for multiline parsing
# ---------------------------------
#
# configuration hints:
#
# - first state always has the name: start_state
# - every field in the rule must be inside double quotes
#
# rules | state name | regex pattern | next state
# ------|---------------|--------------------------------------------
rule "start_state" ".*" "cont"
rule "cont" ".*" "cont"
fluent.conf:
[INPUT]
name tail
...
multiline.parser multiline-all

set alertmanager to distribute alerts to different channel by job name

I want to send my alert to two different distribution lists in Alertmanager for Prometheus. The only way to distinguish my alerts is by their job name.
my alert names are like below:
sample1:
Labels
alertname = SyslogErrors
instance = 22.32.23.32:2324
job = my-job-sample-service-dev
message = Exception raised during message subscription. Trying again in 60 seconds
monitor = server1
severity = critical
Annotations
description = Errors have been found for my-job-sample-service-dev application in /data/logs/messages/my-job-sample-service-dev syslog file
Source
sample2:
Labels
alertname = SyslogErrors
instance = 22.32.23.32:2324
job = my-job-sample-service-pre-dev
message = Exception raised during message subscription. Trying again in 60 seconds
monitor = server1
severity = critical
Annotations
description = Errors have been found for my-job-sample-service-pre-dev application in /data/logs/messages/my-job-sample-service-pre-dev syslog file
Source
here is my sample alertmanager config file:
global:
smtp_smarthost: 'mail.server.com:25'
smtp_from: 'dev#server.com'
smtp_require_tls: false
templates:
- '/etc/alertmanager/template/*.tmpl'
route:
receiver: mail-receiver-dev
group_by: ['alertname']
group_wait: 3s
group_interval: 5s
repeat_interval: 1h
# All alerts that do not match the following child routes
# will remain at the root node and be dispatched to 'default-receiver'.
routes:
- receiver: 'mail-pre-dev'
group_wait: 10s
match_re:
- job = .*pre-dev.*
- receiver: 'mail-dev'
group_wait: 10s
match_re:
- job = .*dev.*
receivers:
- name: 'mail-dev'
email_configs:
- to: 'dev-group#server.com'
send_resolved: true
- name: 'mail-pre-dev'
email_configs:
- to: 'pre-dev-group#server.com'
send_resolved: true
I am using the below link as a reference:
reference
Testing config file link
testscript for using above link: {service="foo-service",severity="critical",job="my-job-sample-service-dev"}
So the question is, how to send an alert to a different channel by using regex for the job title? At the moment when I test all the alert goes to pre-dev.
Change the following:
match_re:
- job = .*pre-dev.*
To:
matchers:
- job =~ ".*pre-dev.*"
Note:
"match_re" is deprecated and must be replaced by "matchers", but if you want to use it, the correct syntax is:
match_re:
- job: ".*pre-dev.*"

How do I consume json logs inside Fargate using CDK?

I have a docker container running in Fargate that emits json logs to the console using log4j-layout-template.
The logs emitted look like this:
{"#timestamp":"2022-03-22T09:08:16.838Z","ecs.version":"1.2.0","log.level":"INFO","message":"Server version name: Apache Tomcat/8.5.76","process.thread.name":"main","log.logger":"org.apache.catalina.startup.VersionLoggerListener"}
{"#timestamp":"2022-03-22T09:08:16.838Z","ecs.version":"1.2.0","log.level":"INFO","message":"Server built: Feb 23 2022 17:59:11 UTC","process.thread.name":"main","log.logger":"org.apache.catalina.startup.VersionLoggerListener"}
I configure my CDK with the following:
var def = ingestGatewayTaskDefinition.addContainer(
id + "Container",
ContainerDefinitionOptions
.builder()
.image(fromEcrRepository(ecrRepository))
.memoryLimitMiB(memory)
.cpu(cpu)
.environment(environment)
.secrets(secrets)
.logging(
LogDriver.awsLogs(
AwsLogDriverProps
.builder()
.logGroup(
LogGroup.Builder
.create(this, props.getServiceName())
.logGroupName("dev/" + props.getServiceName())
.retention(RetentionDays.ONE_DAY)
.build()
)
.streamPrefix("dev/" + props.getServiceName())
//.datetimeFormat("%Y-%m-%dT%H:%M:%SZ") //??
.build()
)
)
.build()
);
But in Cloud Watch the message portion is the json and is not parsed but should be discoverable.
How do I parse these fields?
This is what is ends up looking like:
What I am looking for in Cloud Watch is this:
#timestamp
ecs.version
log.level
message
log.logger
2022-03-22T09:08:16.838Z
1.2.0
INFO
Server version name:...
org.apache...
2022-03-22T09:08:16.838Z
1.2.0
INFO
"Server built:...
org.apache...
There's nothing wrong with the parsing, your events are being parsed correctly.
The following query should work correctly:
fields #timestamp, #message
| filter log.level="INFO"
| sort #timestamp desc
The Log Stream UI does not show the inferred nested structure, but it's still available for querying.

fluentd regexp to extract events from a log file

I'm new to fluentd.
I have a log that I want to push to AWS with fluentd but I can't figure out what the regexp should be.
All the log lines, except the multilines, start with a UUID.
Here's a sample log:
6b0815f2-8ff1-4181-a4e6-058148288281 2020-11-03 13:00:05.976366 [DEBUG] switch_core_state_machine.c:611 (some_other_data) State Change CS_REPORTING -> CS_DESTROY
And, I'm trying to get UUID, DateTime, and Message.
With this regex:
/^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>.*) (?<message>[^ ]*)/gm
I'm getting the last word CS_DESTROY.
I tried fluentular and still got:
text:
f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914 2020-11-03 14:32:34.975779 [CRIT] mod_dptools.c:1866 audio3: https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg
regexp:
^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>.*) (?<message>[^ ]*)$
and got:
time 2020/11/03 14:32:34 +0000
UUID f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914
message https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg
It's missing what's between the datetime and "https".
Try:
^(?<UUID>[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}) (?<time>[^\[]*) (?<message>\[.*)$
Live at rubular: https://rubular.com/r/JQQXs5VTkr2IxM
Here's the output for both logs:
Match 1
UUID 6b0815f2-8ff1-4181-a4e6-058148288281
time 2020-11-03 13:00:05.976366
message [DEBUG] switch_core_state_machine.c:611 (some_other_data) State Change CS_REPORTING -> CS_DESTROY
Match 2
UUID f6a6e1ae-e52e-4aba-a8a5-4e3cc7f40914
time 2020-11-03 14:32:34.975779
message [CRIT] mod_dptools.c:1866 audio3: https://mydomain.s3-eu-west-1.amazonaws.com/media/576d06e5-04fc-11eb-a52c-020fd8c14d18/5f9ddf2d5df0f698094395.mpg

CloudWatch logs acting weird

I have two log files with multi-line log statements. Both of them have same datetime format at the begining of each log statement. The configuration looks like this:
state_file = /var/lib/awslogs/agent-state
[/opt/logdir/log1.0]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log1.0
log_stream_name = /opt/logdir/logs/log1.0
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group
[/opt/logdir/log2-console.log]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log2-console.log
log_stream_name = /opt/logdir/log2-console.log
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group
The cloudwatch logs agent is sending log1.0 logs correctly to my log group on cloudwatch, however, its not sending log files for log2-console.log.
awslogs.log says:
2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196444000, 'start_position': 42330916L, 'end_position': 42331504L}, reason: timestamp is more than 2 hours in future.
2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196451000, 'start_position': 42331504L, 'end_position': 42332092L}, reason: timestamp is more than 2 hours in future.
Though server time is correct. Also weird thing is Line numbers mentioned in start_position and end_position does not exist in actual log file being pushed.
Anyone else experiencing this issue?
I was able to fix this.
The state of awslogs was broken. The state is stored in a sqlite database in /var/awslogs/state/agent-state. You can access it via
sudo sqlite3 /var/awslogs/state/agent-state
sudo is needed to have write access.
List all streams with
select * from stream_state;
Look up your log stream and note the source_id which is part of a json data structure in the v column.
Then, list all records with this source_id (in my case it was 7675f84405fcb8fe5b6bb14eaa0c4bfd) in the push_state table
select * from push_state where k="7675f84405fcb8fe5b6bb14eaa0c4bfd";
The resulting record has a json data structure in the v column which contains a batch_timestamp. And this batch_timestamp seams to be wrong. It was in the past and any newer (more than 2 hours) log entries were not processed anymore.
The solution is to update this record. Copy the v column, replace the batch_timestamp with the current timestamp and update with something like
update push_state set v='... insert new value here ...' where k='7675f84405fcb8fe5b6bb14eaa0c4bfd';
Restart the service with
sudo /etc/init.d/awslogs restart
I hope it works for you!
We had the same issue and the following steps fixed the issue.
If log groups are not updating with latest events:
Run These steps:
Stopped the awslogs service
Deleted file /var/awslogs/state/agent-state
Updated /var/awslogs/etc/awslogs.conf configuration from hostaname to
instance ID Ex:
log_stream_name = {hostname} to log_stream_name = {instance_id}
Started awslogs service.
I was able to resolve this issue on Amazon Linux by:
sudo yum reinstall awslogs
sudo service awslogs restart
This method retained my config files in /var/awslogs/, though you may wish to back them up before a reinstall.
Note: In my troubleshooting, I had also deleted my Log Group via the AWS Console. The restart fully reloaded all historical logs, but at the present timestamp, which is of less value. I'm unsure if deleting the Log Group was this was necessary for this method to work. You might want to look at setting the initial_position config to end_of_file before you restart.
I found the reason. The time zone in my docker container is inconsistent with the time zone of my host computer. After setting the two time zones to be consistent, the problem is solved