logs from only some files showing up aws cloudwatch - amazon-web-services

I configured aws cloudwatch log service on my linux instance. In the config file I set it to keep track of 3 log files:
[general]
state_file = /var/lib/awslogs/agent-state
[plugins]
cwlogs = cwlogs
[default]
region = us-west-1
[/var/log/cron]
file = /var/log/cron
log_group_name = /var/log/cron
log_stream_name = {instance_id}
datetime_format = %b %d %H:%M:%S
[/var/log/messages]
file = /var/log/messages
log_group_name = /var/log/messages
log_stream_name = {instance_id}
datetime_format = %b %d %H:%M:%S
[/var/log/test.log]
file = /var/log/test.log
log_group_name = /var/log/test.log
log_stream_name = {instance_id}
datetime_format = %b %d %H:%M:%S
However, in my console I'm only seeing logs showing up from messages. The permissions for the 3 files I'm trying to keep track of are -rw-------.
Does anybody know why this might be happening? I'm echoing test logs into each individual file and only the ones inserted into messages are showing up.
EDIT**: Here is my awslogs.log
2016-08-25 17:58:31,227 - cwlogs.push - INFO - 631 - MainThread - Missing or invalid value for use_gzip_http_content_encoding config. Defaulting to using gzip encoding.
2016-08-25 17:58:31,228 - cwlogs.push - INFO - 631 - MainThread - Using default logging configuration.
2016-08-25 17:58:31,234 - cwlogs.push.stream - INFO - 631 - Thread-1 - Starting publisher for [d4a8beb9b6b4535cac41dc75f252df59, /var/log/messages]
2016-08-25 17:58:31,234 - cwlogs.push.stream - INFO - 631 - Thread-1 - Starting reader for [d4a8beb9b6b4535cac41dc75f252df59, /var/log/messages]
2016-08-25 17:58:31,235 - cwlogs.push.reader - INFO - 631 - Thread-4 - Replay events end at 52578.
2016-08-25 17:58:31,235 - cwlogs.push.reader - INFO - 631 - Thread-4 - Start reading file from 52284.
2016-08-25 17:58:32,308 - cwlogs.push.publisher - WARNING - 631 - Thread-2 - Caught exception: An error occurred (DataAlreadyAcceptedException) when calling the PutLogEvents operation: The given batch of log events has already been accepted. The next batch can be sent with sequenceToken: 49561203985967314162297491311273568778757530964511949634

It's possible your agent state file is corrupted because you kept making changes to the configuration. There are two ways to fix this:
Option 1: Use a new name for your configuration block header.
That is, change [/var/log/cron] to [/something/else].
Option 2: Delete the agent state file after stopping the service.
sudo service awslogs stop
sudo rm /var/lib/awslogs/agent-state
sudo service awslogs start
Please note that Option 2 may initially cause duplicate logs to be pushed to CloudWatch as a new state file is created.

Related

set alertmanager to distribute alerts to different channel by job name

I want to send my alert to two different distribution lists in Alertmanager for Prometheus. The only way to distinguish my alerts is by their job name.
my alert names are like below:
sample1:
Labels
alertname = SyslogErrors
instance = 22.32.23.32:2324
job = my-job-sample-service-dev
message = Exception raised during message subscription. Trying again in 60 seconds
monitor = server1
severity = critical
Annotations
description = Errors have been found for my-job-sample-service-dev application in /data/logs/messages/my-job-sample-service-dev syslog file
Source
sample2:
Labels
alertname = SyslogErrors
instance = 22.32.23.32:2324
job = my-job-sample-service-pre-dev
message = Exception raised during message subscription. Trying again in 60 seconds
monitor = server1
severity = critical
Annotations
description = Errors have been found for my-job-sample-service-pre-dev application in /data/logs/messages/my-job-sample-service-pre-dev syslog file
Source
here is my sample alertmanager config file:
global:
smtp_smarthost: 'mail.server.com:25'
smtp_from: 'dev#server.com'
smtp_require_tls: false
templates:
- '/etc/alertmanager/template/*.tmpl'
route:
receiver: mail-receiver-dev
group_by: ['alertname']
group_wait: 3s
group_interval: 5s
repeat_interval: 1h
# All alerts that do not match the following child routes
# will remain at the root node and be dispatched to 'default-receiver'.
routes:
- receiver: 'mail-pre-dev'
group_wait: 10s
match_re:
- job = .*pre-dev.*
- receiver: 'mail-dev'
group_wait: 10s
match_re:
- job = .*dev.*
receivers:
- name: 'mail-dev'
email_configs:
- to: 'dev-group#server.com'
send_resolved: true
- name: 'mail-pre-dev'
email_configs:
- to: 'pre-dev-group#server.com'
send_resolved: true
I am using the below link as a reference:
reference
Testing config file link
testscript for using above link: {service="foo-service",severity="critical",job="my-job-sample-service-dev"}
So the question is, how to send an alert to a different channel by using regex for the job title? At the moment when I test all the alert goes to pre-dev.
Change the following:
match_re:
- job = .*pre-dev.*
To:
matchers:
- job =~ ".*pre-dev.*"
Note:
"match_re" is deprecated and must be replaced by "matchers", but if you want to use it, the correct syntax is:
match_re:
- job: ".*pre-dev.*"

awslogs all logs are being skipped

I've awslogs setup and it ships log to cloudwatch. It works fine for a few hours and then suddenly it stops.
Here is the log from awslogs.log
2020-07-06 14:58:27,701 - cwlogs.push.reader - WARNING - 23093 - Thread-6 - Fall back to previous event time: {'timestamp': 1594062573000, 'start_position': 85848600L, 'end_position': 85848777L}, previousEventTime: 1594062573000, reason: timestamp could not be parsed from message.
2020-07-06 14:58:27,701 - cwlogs.push.batch - WARNING - 23093 - Thread-6 - Skip event: {'timestamp': 1594062573000, 'start_position': 85848600L, 'end_position': 85848777L}, reason: timestamp is more than 2 hours in future.
2020-07-06 14:58:27,701 - cwlogs.push.reader - WARNING - 23093 - Thread-6 - Fall back to previous event time: {'timestamp': 1594062573000, 'start_position': 85848777L, 'end_position': 85848952L}, previousEventTime: 1594062573000, reason: timestamp could not be parsed from message.
2020-07-06 14:58:27,701 - cwlogs.push.batch - WARNING - 23093 - Thread-6 - Skip event: {'timestamp': 1594062573000, 'start_position': 85848777L, 'end_position': 85848952L}, reason: timestamp is more than 2 hours in future.
Here's my configuration in /var/awslogs/etc/config/api.conf
[/var/log/app.js/api.log]
datetime_format = %Y-%m-%d %H:%M:%S
buffer_duration = 5000
log_stream_name = {hostname}
initial_position = end_of_file
log_group_name = app-js-logs-prod
file = /var/log/app.js/api.log
[/root/.pm2/pm2.log]
datetime_format = %Y-%m-%d %H:%M:%S
buffer_duration = 5000
log_stream_name = {hostname}
initial_position = end_of_file
log_group_name = pm2-logs-prod
I'm not able to find out why the logs are skipped. Any help will be greatly appreciated.
EDIT:
timedatectl output:
root#ip-10-0-5-68:/home/ubuntu# timedatectl
Local time: Mon 2020-07-06 15:18:42 UTC
Universal time: Mon 2020-07-06 15:18:42 UTC
RTC time: Mon 2020-07-06 15:18:43
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
systemd-timesyncd.service active: yes
My local time should be in IST. Is there something wrong with my system?
Before requests are sent to CloudWatch a check is being performed against the parsed log in your log(s) files against what the agent believes the time is.
You should first validate that the system time is correct (in its current timezone) and ensure that the logs date timezone matches the local timezone (this is the default).
If these match then give the cloudwatch agent a restart.
If that still does not fix then follow some of the suggestions on this previous post

AWS Cloudwatch logs not working as expected

I am trying to use AWS CloudWatch to maintain the application logs in a Ubuntu EC2 instance. I have installed the awslogs agent using the following command as suggested in their documentation to monitor the file application.log and push any new entries in the file to CloudWatch.
Setup command - sudo python3 ./awslogs-agent-setup.py --region ap-south-1
It was working fine for a day when I tested it out after setting it up, then it stopped working from the next day. I can see that the changes in the log files are being detected by the AWS Agent, as there is an entry in the awslogs.log file as soon as there is a new entry in the application.log file. However, the same updates are not being pushed/reflected in the CloudWatch console.
What might have gone wrong here?
Entry in /var/log/awslogs.log
2020-02-27 12:19:03,376 - cwlogs.push.reader - WARNING - 1388 - Thread-4 - Fall back to previous event time: {'end_position': 10483213, 'timestamp': 1582261391000, 'start_position': 10483151}, previousEventTime: 1582261391000, reason: timestamp could not be parsed from message.
2020-02-27 12:19:07,437 - cwlogs.push.publisher - INFO - 1388 - Thread-3 - Log group: branchpayout-python-pilot, log stream: ip-172-27-99-136_application.log, queue size: 0, Publish batch: {'fallback_events_count': 2, 'source_id': 'c0bd7124acf1c35ede963da6b8ec9882', 'num_of_events': 2, 'first_event': {'end_position': 10483151, 'timestamp': 1582261391000, 'start_position': 10482278}, 'skipped_events_count': 0, 'batch_size_in_bytes': 985, 'last_event': {'end_position': 10483213, 'timestamp': 1582261391000, 'start_position': 10483151}}
Configuration in /var/awslogs/etc/awslogs.conf
[/home/ubuntu/application-name/application.log]
file = /home/ubuntu/application-name/application.log
datetime_format = %Y-%m-%d %H:%M:%S,%f
log_stream_name = {hostname}_application.log
buffer_duration = 5000
log_group_name = branchpayout-python-pilot
initial_position = end_of_file
multi_line_start_pattern = {datetime_format}
Check you log format and accordingly update your awslogs.conf.
for me nginx access log format in access.log was "%d/%b/%Y:%H:%M:%S %z" hence my config file contains :
datetime_format = %d/%b/%Y:%H:%M:%S %z
Below are the example .
Nginx error.log 2017/08/12 05:04:00 %Y/%m/%d %H:%M:%S
Nginx access.log 12/Aug/2017:06:19:17 +0900 %d/%b/%Y:%H:%M:%S %z
php-fpm error.log 12-Aug-2017 05:24:38 %d-%b-%Y %H:%M:%S
php-fpm www-error.log 10-Aug-2017 23:40:46 UTC %d-%b-%Y %H:%M:%S
messages Aug 12 06:13:36 %b %d %H:%M:%S
secure Aug 11 04:03:33 %b %d %H:%M:%S

Invalid Token Error Asing AWS logs

I have been battling this for hours and it's driving me nuts. I installed log agent and set it up correctly.
I can access the instance via this command. eb ssh
However, when I run the command sudo service awslogs restart , I get weird errors like
2017-06-12 16:31:41,899 - cwlogs.push.publisher - WARNING - 31909 -
Thread-7 - Caught exception: An error occurred
(UnrecognizedClientException) when calling the PutLogEvents operation:
The security token included in the request is invalid.
2017-06-12 16:31:41,899 - cwlogs.threads - ERROR - 31909 - Thread-7 -
Exception caught in <EventBatchPublisher(Thread-7, started daemon
140242458298112)>
Traceback (most recent call last):
I have changed the credentials multiple times, all to no avail.
Also, I get this error in the awslogs.log file:
2017-06-12 16:31:40,862 - cwlogs.push.reader -
WARNING - 31909 - Thread-8 - Fall back to previous event time:
{'timestamp': 1497246644000, 'start_position': 7142L, 'end_position':
7246L}, previousEventTime: 1497246644000, reason: timestamp could not
be parsed from message.
I am using the following format:
[/var/log/tomcat8/catalina.out]
datetime_format = %d-%b-%Y %H:%M:%S
file = /var/log/tomcat8/catalina.out
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = Catalina
Any help at this point will be appreciated.
Kindly append "sudo" to the "aws configure" command.

CloudWatch logs acting weird

I have two log files with multi-line log statements. Both of them have same datetime format at the begining of each log statement. The configuration looks like this:
state_file = /var/lib/awslogs/agent-state
[/opt/logdir/log1.0]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log1.0
log_stream_name = /opt/logdir/logs/log1.0
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group
[/opt/logdir/log2-console.log]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log2-console.log
log_stream_name = /opt/logdir/log2-console.log
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group
The cloudwatch logs agent is sending log1.0 logs correctly to my log group on cloudwatch, however, its not sending log files for log2-console.log.
awslogs.log says:
2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196444000, 'start_position': 42330916L, 'end_position': 42331504L}, reason: timestamp is more than 2 hours in future.
2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196451000, 'start_position': 42331504L, 'end_position': 42332092L}, reason: timestamp is more than 2 hours in future.
Though server time is correct. Also weird thing is Line numbers mentioned in start_position and end_position does not exist in actual log file being pushed.
Anyone else experiencing this issue?
I was able to fix this.
The state of awslogs was broken. The state is stored in a sqlite database in /var/awslogs/state/agent-state. You can access it via
sudo sqlite3 /var/awslogs/state/agent-state
sudo is needed to have write access.
List all streams with
select * from stream_state;
Look up your log stream and note the source_id which is part of a json data structure in the v column.
Then, list all records with this source_id (in my case it was 7675f84405fcb8fe5b6bb14eaa0c4bfd) in the push_state table
select * from push_state where k="7675f84405fcb8fe5b6bb14eaa0c4bfd";
The resulting record has a json data structure in the v column which contains a batch_timestamp. And this batch_timestamp seams to be wrong. It was in the past and any newer (more than 2 hours) log entries were not processed anymore.
The solution is to update this record. Copy the v column, replace the batch_timestamp with the current timestamp and update with something like
update push_state set v='... insert new value here ...' where k='7675f84405fcb8fe5b6bb14eaa0c4bfd';
Restart the service with
sudo /etc/init.d/awslogs restart
I hope it works for you!
We had the same issue and the following steps fixed the issue.
If log groups are not updating with latest events:
Run These steps:
Stopped the awslogs service
Deleted file /var/awslogs/state/agent-state
Updated /var/awslogs/etc/awslogs.conf configuration from hostaname to
instance ID Ex:
log_stream_name = {hostname} to log_stream_name = {instance_id}
Started awslogs service.
I was able to resolve this issue on Amazon Linux by:
sudo yum reinstall awslogs
sudo service awslogs restart
This method retained my config files in /var/awslogs/, though you may wish to back them up before a reinstall.
Note: In my troubleshooting, I had also deleted my Log Group via the AWS Console. The restart fully reloaded all historical logs, but at the present timestamp, which is of less value. I'm unsure if deleting the Log Group was this was necessary for this method to work. You might want to look at setting the initial_position config to end_of_file before you restart.
I found the reason. The time zone in my docker container is inconsistent with the time zone of my host computer. After setting the two time zones to be consistent, the problem is solved