Cloudwatch Log Agent Issue - amazon-web-services

I have been working on this Cloudwatch Log Agent for long time but never faced the below issue:
Actually, I did some changes in the Cloudwatch agent JSON file:
/opt/aws/amazon-cloudwatch-agent/bin/config.json
After doing the changes the log export stopped to the Cloudwatch
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [{
"file_path": "/var/lib/jenkins/jobs/**/builds/**/logs",
"log_group_name": "log_data",
"log_stream_name": "log_stream",
"retention_in_days": 3
}]
}
}
}
}
After running the command:
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s
Command Output
***** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source file:/opt/aws/amazon-cloudwatch-agent/bin/config.json --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
I! Trying to detect region from ec2
D! [EC2] Found active network interface
Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json.tmp
Start configuration validation...
/opt/aws/amazon-cloudwatch-agent/bin/config-translator --input /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json --input-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --output /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
2022/11/23 06:37:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json.tmp ...
2022/11/23 06:37:55 I! Valid Json input schema.
I! Detecting run_as_user...
I! Trying to detect region from ec2
D! [EC2] Found active network interface
No csm configuration found.
Configuration validation first phase succeeded
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded
Still not able to export the logs
Check for the issue also in the amazon-cloudwatch-agent.log
Command:
cat /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log
Output
2022/11/22 19:22:35 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2022/11/22 19:22:35 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json ...
2022/11/22 19:22:36 I! Valid Json input schema.
2022/11/22 19:22:36 I! Detected runAsUser: root
2022/11/22 19:22:36 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 0:0
2022-11-22T19:22:36Z I! Starting AmazonCloudWatchAgent 1.247355.0
2022-11-22T19:22:36Z I! AWS SDK log level not set
2022-11-22T19:22:36Z I! Loaded inputs: disk logfile mem
2022-11-22T19:22:36Z I! Loaded aggregators:
2022-11-22T19:22:36Z I! Loaded processors: ec2tagger
2022-11-22T19:22:36Z I! Loaded outputs: cloudwatch cloudwatchlogs
2022-11-22T19:22:36Z I! Tags enabled: host=ip-172-31-1-218
2022-11-22T19:22:36Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-172-31-1-218", Flush Interval:1s
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: Check ec2 metadata
2022-11-22T19:22:36Z I! [logagent] starting
2022-11-22T19:22:36Z I! [logagent] found plugin cloudwatchlogs is a log backend
2022-11-22T19:22:36Z I! [logagent] found plugin logfile is a log collection
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: Check ec2 metadata
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
2022-11-22T19:22:36Z I! cloudwatch: get unique roll up list []
2022-11-22T19:22:36Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 1s
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
Attaching the old Logs also where you can see the actual difference from the same file:
022/10/11 06:58:04 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2022/10/11 06:58:04 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json ...
2022/10/11 06:58:04 I! Valid Json input schema.
2022/10/11 06:58:04 I! Detected runAsUser: root
2022/10/11 06:58:04 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var
] to 0:0
2022-10-11T06:58:04Z I! Starting AmazonCloudWatchAgent 1.247355.0
2022-10-11T06:58:04Z I! AWS SDK log level not set
2022-10-11T06:58:04Z I! Loaded inputs: disk logfile mem
2022-10-11T06:58:04Z I! Loaded aggregators:
2022-10-11T06:58:04Z I! Loaded processors: ec2tagger
2022-10-11T06:58:04Z I! Loaded outputs: cloudwatch cloudwatchlogs
2022-10-11T06:58:04Z I! Tags enabled: host=ip-172-31-1-218
2022-10-11T06:58:04Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-172-31-1-218", Flush Interval:1s
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: Check ec2 metadata
2022-10-11T06:58:04Z I! [logagent] starting
2022-10-11T06:58:04Z I! [logagent] found plugin cloudwatchlogs is a log backend
2022-10-11T06:58:04Z I! [logagent] found plugin logfile is a log collection
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: Check ec2 metadata
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
2022-10-11T06:58:04Z I! cloudwatch: get unique roll up list []
2022-10-11T06:58:04Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 54s
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
2022-10-11T06:58:05Z I! [inputs.logfile] Reading from offset 841815 in /var/log/syslog
2022-10-11T06:58:05Z I! [logagent] piping log from Test/syslog(/var/log/syslog) to cloudwatchlogs with retention 3
2022-10-11T06:58:10Z I! [outputs.cloudwatchlogs] First time sending logs to Test/syslog since startup so sequenceToken is nil, learned new token:(0xc00090410
0): The given sequenceToken is invalid. The next expected sequenceToken is: 49623781346189785515230421563841207337059424902309742034
2022-10-11T06:58:10Z W! [outputs.cloudwatchlogs] Retried 0 time, going to sleep 112.557969ms before retrying.
2022-10-11T07:35:05Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:13Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:15Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:16Z W! [agent] ["outputs.cloudwatchlogs"] did not complete within its flush interval
2022-10-11T07:35:17Z W! [agent] ["outputs.cloudwatchlogs"] did not complete within its flush interval
2022-10-11T07:35:18Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:19Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:19Z W! [agent] ["outputs.cloudwatchlogs"] did not complete within its flush interval
2022-10-11T07:35:22Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022/10/11 07:46:06 I! D! [EC2] Found active network interface
I! Detected the instance is EC2
2022/10/11 07:46:06 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2022/10/11 07:46:06 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json ...
2022/10/11 07:46:06 I! Valid Json input schema.
I! Detecting run_as_user...
I! Trying to detect region from ec2
No csm configuration found.
Configuration validation first phase succeeded
The difference I can spot between the old and the new logs are:
initial retrieval of tags and Volumes
2022-10-11T06:58:05Z I! [inputs.logfile] Reading from offset 841815 in /var/log/syslog
2022-10-11T06:58:05Z I! [logagent] piping log from Test/syslog(/var/log/syslog) to cloudwatchlogs with retention 3
2022-10-11T06:58:10Z I! [outputs.cloudwatchlogs] First time sending logs to Test/syslog since startup so sequenceToken is nil, learned new token:(0xc00090410
0): The given sequenceToken is invalid. The next expected sequenceToken is: 49623781346189785515230421563841207337059424902309742034
2022-10-11T06:58:10Z W! [outputs.cloudwatchlogs] Retried 0 time, going to sleep 112.557969ms before retrying.
2022-10-11T07:35:05Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:13Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:15Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:16Z W! [agent] ["outputs.cloudwatchlogs"] did not complete within it
Please help me out resolving the issue because I am stuck on this for very long time and additionally I have also tried installing the agent again but it didn't helped.

Related

Amazon CloudWatch Agent not sending OnPremise metrics/logs

I'm trying to connect my on-premise instance to AWS via CloudWatch Agent to record system metrics. Installed latest version from .deb package and nightly version from .deb package and results are the same (except that nightly version runs via systemd).
Even though logging is enabled both for agent and SDK nothing appears in the logs.
After launching via ctl script it is launched but in fact service daemon is constantly restarted; when started with service amazon-cloudwatch-agent start it is reported as failed:
Apr 22 00:20:56 sero1 systemd[1]: Started Amazon CloudWatch Agent.
Apr 22 00:21:02 sero1 start-amazon-cloudwatch-agent[7614]: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
Apr 22 00:21:02 sero1 start-amazon-cloudwatch-agent[7614]: I! Detecting run_as_user...
Apr 22 00:21:02 sero1 start-amazon-cloudwatch-agent[7614]: 2022-04-21T22:21:02Z I! AWS_SDK_LOG_LEVEL is set to "LogDebug"
Apr 22 00:21:02 sero1 start-amazon-cloudwatch-agent[7614]: 2022-04-21T22:21:02Z I! CWAGENT_LOG_LEVEL is set to "DEBUG"
Apr 22 00:25:03 sero1 systemd[1]: amazon-cloudwatch-agent.service: Main process exited, code=exited, status=1/FAILURE
Apr 22 00:25:03 sero1 systemd[1]: amazon-cloudwatch-agent.service: Failed with result 'exit-code'.
In logfile there are more details:
2022-04-21T22:21:02Z I! Starting AmazonCloudWatchAgent 1.247351.0-10-g2ece109-nightly-build
2022-04-21T22:21:02Z I! AWS SDK log level, LogDebug
2022-04-21T22:21:02Z I! Loaded inputs: net swap cpu disk diskio logfile mem
2022-04-21T22:21:02Z I! Loaded aggregators:
2022-04-21T22:21:02Z I! Loaded processors: delta ec2tagger
2022-04-21T22:21:02Z I! Loaded outputs: cloudwatch cloudwatchlogs
2022-04-21T22:21:02Z I! Tags enabled: host=sero1
2022-04-21T22:21:02Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"sero1", Flush Interval:1s
2022-04-21T22:21:02Z D! [agent] Initializing plugins
2022-04-21T22:21:02Z I! [logagent] starting
2022-04-21T22:21:02Z I! [logagent] found plugin cloudwatchlogs is a log backend
2022-04-21T22:21:02Z I! [logagent] found plugin logfile is a log collection
2022-04-21T22:21:32Z I! CWAGENT_LOG_LEVEL is set to "DEBUG"
2022-04-21T22:21:32Z I! AWS_SDK_LOG_LEVEL is set to "LogDebug"
2022-04-21T22:22:02Z D! Profiler dump:
[no stats is available...]
2022-04-21T22:23:02Z D! Profiler dump:
[no stats is available...]
2022-04-21T22:24:02Z D! Profiler dump:
[no stats is available...]
2022-04-21T22:25:02Z D! Profiler dump:
[no stats is available...]
2022-04-21T22:25:03Z E! [processors.ec2tagger] ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance
2022-04-21T22:25:03Z E! [telegraf] Error running agent: could not initialize processor ec2tagger: ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance
2022/04/22 00:26:09 I! 2022/04/22 00:26:03 D! [EC2] Found active network interface
2022/04/22 00:26:09 E! ec2metadata is not available
I! Detected the instance is OnPremise
2022/04/22 00:26:09 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2022/04/22 00:26:09 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2022/04/22 00:26:09 I! Valid Json input schema.
I! Detecting run_as_user...
Got Home directory: /root
I! Set home dir Linux: /root
I! SDKRegionWithCredsMap region: eu-central-1
No csm configuration found.
Under path : /logs/ | Info : Got hostname sero1 as log_stream_name
Configuration validation first phase succeeded
2022/04/22 00:26:09 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2022/04/22 00:26:09 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2022/04/22 00:26:09 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2022/04/22 00:26:09 I! Valid Json input schema.
2022/04/22 00:26:09 I! Detected runAsUser: cwagent
2022/04/22 00:26:09 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 996:996
2022/04/22 00:26:09 I! Set HOME: /home/cwagent
After this line service simply dies. No metrics in CloudWatch ever reported.
I had the same error, and in my case it was because I was setting "append_dimensions". (I was misreading the official document and setting custom field and value.)
This field can only contain values related to EC2 metadata, and it seems that if thisfield is configured, CloudWatch Agent tries to read processors.ec2tagger plugin and fails.
I could have uploaded my metrics and logs after deleting "append_dimensions" field.

AWS Cloudwatch agent config file removed after startup

Problem
I am simply trying to install Cloudwatch Agent on Amazon Linux 2 instances at startup, using AWS userdata. For some reason, after Cloud Init has finished running, all services get restarted and the configuration file I put in the cloudwatch folder is not there anymore.
I am using a custom AMI which is pre-built with Packer, my configuration file being put in /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json from an Ansible template. This is the configuration file I want to use, holding all metrics and logs I want to send. I am then copying it to /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json at startup after the agent installation.
Here is my userdata script:
#!/bin/bash
yum install amazon-cloudwatch-agent -y
cp /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
What is happening
After startup has finished, I can see the script ran correctly. If I run cat /opt/aws/amazon-cloudwatch-agent/log/amazon-cloudwatch-agent.log I can see that the following:
2021/07/16 13:33:46 I! I! Detected the instance is EC2
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
Configuration validation first phase succeeded
2021/07/16 13:33:46 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 I! Detected runAsUser: root
2021/07/16 13:33:46 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to root:root
2021-07-16T13:33:46Z I! Starting AmazonCloudWatchAgent 1.247347.4
2021-07-16T13:33:46Z I! Loaded inputs: netstat diskio logfile mem net processes swap cpu disk
2021-07-16T13:33:46Z I! Loaded aggregators:
2021-07-16T13:33:46Z I! Loaded processors: delta ec2tagger
2021-07-16T13:33:46Z I! Loaded outputs: cloudwatch cloudwatchlogs
2021-07-16T13:33:46Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:46Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:46Z I! [logagent] starting
2021-07-16T13:33:46Z I! [logagent] found plugin cloudwatchlogs is a log backend
2021-07-16T13:33:46Z I! [logagent] found plugin logfile is a log collection
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:46Z I! cloudwatch: get unique roll up list [[AutoScalingGroupName] [InstanceId InstanceType] []]
2021-07-16T13:33:46Z I! cloudwatch: publish with ForceFlushInterval: 30s, Publish Jitter: 11s
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
=======> 2021-07-16T13:33:47Z I! [logagent] piping log from APP-DEV-php-errors-logs/XX.XX.X.XXX(/var/log/php-fpm/error.log) to cloudwatchlogs
2021-07-16T13:33:54Z I! Profiler is stopped during shutdown
2021-07-16T13:33:54Z I! [agent] Hang on, flushing any cached metrics before shutdown
2021/07/16 13:33:55 I! I! Detected the instance is EC2
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded
2021/07/16 13:33:55 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
2021/07/16 13:33:55 I! Detected runAsUser: cwagent
2021/07/16 13:33:55 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 994:992
2021/07/16 13:33:55 I! Set HOME: /home/cwagent
2021-07-16T13:33:55Z I! Starting AmazonCloudWatchAgent 1.247348.0
2021-07-16T13:33:55Z I! Loaded inputs: disk mem
2021-07-16T13:33:55Z I! Loaded aggregators:
2021-07-16T13:33:55Z I! Loaded processors: ec2tagger
2021-07-16T13:33:55Z I! Loaded outputs: cloudwatch
2021-07-16T13:33:55Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:55Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:55Z I! [logagent] starting
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:55Z I! cloudwatch: get unique roll up list []
2021-07-16T13:33:55Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 26s
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
2021-07-16T13:39:07Z I! [processors.ec2tagger] ec2tagger: Refresh is no longer needed, stop refreshTicker.
So as you can see, the initial command from userdata runs fine and custom metrics and logs are collected (see the ====> mark before the relevant lines).
However a few seconds later, after Cloud Init is over, the cloudwatch agent is restarted by systemd somehow and again, somehow, the file amazon-cloudwatch-agent.json is absent from the filesystem, so the agent runs with default parameters.
However if I rerun the command manually after startup everything works fine but of course I need it automated for when autoscaling fires up.
What I have tried
Launching amazon cloudwatch agent directly with systemd, trying to chown the config file to read-only, fetching config only and let the system start the agent itself, but the problem still persists.
Thank you for your help
Workaround
The preinstalled ssm-agent conflicts with the Cloudwtach Agent. Uninstall ssm-agent during Packer build:
sudo yum erase amazon-ssm-agent --assumeyes
Explanation
I finally found out that the newly install cloudwatch agent conflicts with the SSM agent installed by default in the Amazon Linux 2 image.
Indeed, I first tried an ugly workaround which would be to replace the StartExec line of the amazon-cloudwatch-agent service using sed in the user data :
sed -i '/ExecStart/c\ExecStart=/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json' /etc/systemd/system/amazon-cloudwatch-agent.service
That way when the service gets restarted after instance startup it would use my custom configuration.
However I then found out that the service file got also replaced after Cloud Init ended.
Reviewing the system messages I noticed that ssm-agent was performing some configuration reloading after Cloud Init ended, and thus I assumed that it could possibly be the culprit.
I ended up uninstalling it in the packer build which is building my AMI so it would not be present at instance startup, and finally my configuration did not get overwritten anymore.
Note that I do not have a deep understanding of how ssm-agent works, and there is probably a proper way to instantiate Cloudwatch Agent using some SSM configuration.
Since we do not currently use SSM and I do not have enough time to study this option, I choosed this compromise.
If someone can come up with a cleaner solution, using ssm-agent through an automated method, this would be greatly appreciated.
You can try using AWS Systems Manager Parameter Store - the SSM agent shouldn't/can't remove the config there.
Ensure the server has AWS-managed policy CloudWatchAgentServerPolicy attached, this allows any Parameter Store parameter named AmazonCloudWatch-* to be read
Ensure the contents of what would have been in amazon-cloudwatch-agent.json is stored in Parameter Store as valid json, with an appropriate name from (1)
As found in cat /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl, run amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:AmazonCloudWatch-Config.json -s (replacing AmazonCloudWatch-Config.json with the name of the parameter from Parameter Store)
See aws docs.
Update 2022-10-02 - multiple configs can be added by using append-config, see aws docs, or by placing config files into /etc/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.d/ and restarting the agent sudo service amazon-cloudwatch-agent restart. This helps when attempting to deploy on Amazon Linux 2, which ships its application logs using a cloudwatch log config from this folder, as well as a custom config to ship other logs from the server. Using fetch-config would otherwise overwrite the log config for the AL2 application.

Bash, Conda, Docker, and Ray: What startup commands should be given to Ray to properly source the bash profile in a docker container at runtime?

I'm trying to use Ray and Docker to launch jobs programatically on EC2. I want to use conda in my Docker container for package management. I've figured out how to build the container such that if I run
docker run -i -t my_container:my_tag /bin/bash I can launch my jobs in the container locally. The problem is that when I add Ray into the picture to launch the jobs remotely, Ray fails with errors like these:
start: ray: command not found
Cluster: my-cluster
Checking AWS environment settings
AWS config
IAM Profile: ray-head-v1
EC2 Key pair (head & workers): [redacted]
VPC Subnets (head & workers): [redacted]
EC2 Security groups (head & workers): [redacted]
EC2 AMI (head & workers): [redacted]
No head node found. Launching a new cluster. Confirm [y/N]: y [automatic, due to --yes]
Acquiring an up-to-date head node
Launched 1 nodes [subnet_id=[redacted]]
Launched instance i-067e250cc8591da86 [state=pending, info=pending]
Launched a new head node
Fetching the new head node
<1/1> Setting up head node
Prepared bootstrap config
New status: waiting-for-ssh
[1/6] Waiting for SSH to become available
Running `uptime` as a test.
Waiting for IP
Not yet available, retrying in 10 seconds
Not yet available, retrying in 10 seconds
Not yet available, retrying in 10 seconds
Received: 3.21.104.163
SSH still not available SSH command failed., retrying in 5 seconds.
SSH still not available SSH command failed., retrying in 5 seconds.
Success.
Updating cluster configuration. [hash=1e011279ffec6f94b2bff4ebf536e6966be5c79a]
New status: syncing-files
[3/6] Processing file mounts
[4/6] No worker file mounts to sync
New status: setting-up
[3/6] No initialization commands to run.
[4/6] No setup commands to run.
[6/6] Starting the Ray runtime
New status: update-failed
!!!
SSH command failed.
!!!
Failed to setup head node.
At this point I've reached the limit of what I understand about how Ray and Docker interact. I assume the problem is that head_start_ray_commands gets passed to docker run somehow. Since Docker uses the sh shell to run commands, the bash profile isn't getting sourced properly, so packages like conda and ray aren't working. That explains why there's nothing wrong with the container when I launch a bash shell in interactive mode in a local container instance. I've tried adding /bin/bash --login at the beginning of head_start_ray_commands but that only seems to cause the whole program to freeze.
What is the right way to get Ray to source the bash profile before executing commands? If that isn't possible, is there a better way to do this? For reference, here's my current ray config:
init:
address: null
remote: {}
cluster:
cluster_name: my-cluster
min_workers: 0
max_workers: 2
initial_workers: 0
autoscaling_mode: default
target_utilization_fraction: 0.8
idle_timeout_minutes: 5
docker:
image: [redacted]
container_name: 'my-container'
pull_before_run: true
run_options: ["--gpus 'all'"]
provider:
type: aws
region: us-east-2
availability_zone: us-east-2a,us-east-2b
cache_stopped_nodes: false
key_pair:
key_name: [redacted]
auth:
ssh_user: ubuntu
head_node:
IamInstanceProfile:
Arn: [redacted]
InstanceType: p2.xlarge
ImageId: ami-08e16447bd5caf26a
worker_nodes:
IamInstanceProfile:
Arn: [redacted]
InstanceType: p2.xlarge
ImageId: ami-08e16447bd5caf26a
file_mounts: {}
initialization_commands: []
setup_commands: []
head_setup_commands: []
worker_setup_commands: []
head_start_ray_commands:
- ray stop
- ulimit -n 65536; ray start --head --port=6379 --object-manager-port=8076
--autoscaling-config=~/ray_bootstrap_config.yaml
worker_start_ray_commands:
- ray stop
- ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076
Edit
The simplest fix seems to be just avoiding conda altogether in favor of venv.

Elastic Beanstalk - Command failed on instance.An unexpected error has occurred [ErrorCode: 0000000001]

First time I'm trying to deploy a django app to elastic beanstalk. The application uses django channels.
These are my config files:
option_settings:
aws:elasticbeanstalk:container:python:
WSGIPath: "dashboard/dashboard/wsgi.py"
aws:elasticbeanstalk:application:environment:
DJANGO_SETTINGS_MODULE: "dashboard/dashboard/settings.py"
PYTHONPATH: /opt/python/current/app/dashboard:$PYTHONPATH
aws:elbv2:listener:80:
DefaultProcess: http
ListenerEnabled: 'true'
Protocol: HTTP
Rules: ws
aws:elbv2:listenerrule:ws:
PathPatterns: /websockets/*
Process: websocket
Priority: 1
aws:elasticbeanstalk:environment:process:http:
Port: '80'
Protocol: HTTP
aws:elasticbeanstalk:environment:process:websocket:
Port: '5000'
Protocol: HTTP
container_commands:
00_pip_upgrade:
command: "source /opt/python/run/venv/bin/activate && pip install --upgrade pip"
ignoreErrors: false
01_migrate:
command: "django-admin.py migrate"
leader_only: true
02_collectstatic:
command: "django-admin.py collectstatic --noinput"
03_wsgipass:
command: 'echo "WSGIPassAuthorization On" >> ../wsgi.conf'
When I run eb create django-env I get the following logs:
Creating application version archive "app-200617_112710".
Uploading: [##################################################] 100% Done...
Environment details for: django-env
Application name: dashboard
Region: us-west-2
Deployed Version: app-200617_112710
Environment ID: e-rdgipdg4z3
Platform: arn:aws:elasticbeanstalk:us-west-2::platform/Python 3.7 running on 64bit Amazon Linux 2/3.0.2
Tier: WebServer-Standard-1.0
CNAME: UNKNOWN
Updated: 2020-06-17 10:27:48.898000+00:00
Printing Status:
2020-06-17 10:27:47 INFO createEnvironment is starting.
2020-06-17 10:27:49 INFO Using elasticbeanstalk-us-west-2-041741961231 as Amazon S3 storage bucket for environment data.
2020-06-17 10:28:10 INFO Created security group named: sg-0942435ec637ad173
2020-06-17 10:28:25 INFO Created load balancer named: awseb-e-r-AWSEBLoa-19UYXEUG5IA4F
2020-06-17 10:28:25 INFO Created security group named: awseb-e-rdgipdg4z3-stack-AWSEBSecurityGroup-17RVV1ZT14855
2020-06-17 10:28:25 INFO Created Auto Scaling launch configuration named: awseb-e-rdgipdg4z3-stack-AWSEBAutoScalingLaunchConfiguration-H5E4G2YJ3LEC
2020-06-17 10:29:30 INFO Created Auto Scaling group named: awseb-e-rdgipdg4z3-stack-AWSEBAutoScalingGroup-1I2C273N6RN8S
2020-06-17 10:29:30 INFO Waiting for EC2 instances to launch. This may take a few minutes.
2020-06-17 10:29:30 INFO Created Auto Scaling group policy named: arn:aws:autoscaling:us-west-2:041741961231:scalingPolicy:8d4c8dcf-d77d-4d18-92d8-67f8a2c1cd9e:autoScalingGroupName/awseb-e-rdgipdg4z3-stack-AWSEBAutoScalingGroup-1I2C273N6RN8S:policyName/awseb-e-rdgipdg4z3-stack-AWSEBAutoScalingScaleDownPolicy-1JAUAII3SCELN
2020-06-17 10:29:30 INFO Created Auto Scaling group policy named: arn:aws:autoscaling:us-west-2:041741961231:scalingPolicy:0c3d9c2c-bc65-44ed-8a22-2f9bef538ba7:autoScalingGroupName/awseb-e-rdgipdg4z3-stack-AWSEBAutoScalingGroup-1I2C273N6RN8S:policyName/awseb-e-rdgipdg4z3-stack-AWSEBAutoScalingScaleUpPolicy-XI8Z22SYWQKR
2020-06-17 10:29:30 INFO Created CloudWatch alarm named: awseb-e-rdgipdg4z3-stack-AWSEBCloudwatchAlarmHigh-572C6W1QYGIC
2020-06-17 10:29:30 INFO Created CloudWatch alarm named: awseb-e-rdgipdg4z3-stack-AWSEBCloudwatchAlarmLow-1RTNBIHPHISRO
2020-06-17 10:33:05 ERROR [Instance: i-01576cfe5918af1c3] Command failed on instance. An unexpected error has occurred [ErrorCode: 0000000001].
2020-06-17 10:33:05 INFO Command execution completed on all instances. Summary: [Successful: 0, Failed: 1].
2020-06-17 10:34:07 ERROR Create environment operation is complete, but with errors. For more information, see troubleshooting documentation.
ERROR: ServiceError - Create environment operation is complete, but with errors. For more information, see troubleshooting documentation.
The error is extremely vague, and I have no clue as to what I'm doing wrong.
I had a similar issue. I used psycopg2-binary instead of psycopg2 and created a new environment. The health status is now ok
Since this is getting some attention, I suggest you check your Elastic Beanstalk logs on the aws console, since the error is completely generic and can be anything. I suggest checking mainly the cmd execution and activity logs.
In my case, it was because I had the following listed in requirements.txt, and they failed to install on EC2:
mkl-fft==1.1.0
mkl-random==1.1.0
mkl-service==2.3.0
pypiwin32==223
pywin32==228
Removing those from requirements.txt fixed the issue
it is most likely a connection error. make sure the instance can access to the internet and you have VPC endpoints for SQS/Cloudformation/CloudWatch/S3/elasticbeanstalk/elasticbeanstalk-health. also make sure the security groups for these endpoints allow access to your instance

aws cloudwatch log for elasticbeanstalk

From webapp, end-user will specify start time and end time for fetching logs (either in .zip format or to just display log in new tab). I want to use cloudwatch for logging of elasticbeanstalk. What are the available JAVA api's for doing this. like enabling cloudwatch log in elasticbeanstalk and creating log stream etc
Why do you want use java api? You can follow the below step to install & configure cloud-watch log in EB ENV.
You should add cloud-watch policy with your elastic beanstalk ec2 role.
Write config in .ebextension to install & configure the cloud-watch logs on EB based servers.
Example config for cloud-watch log installation & configuration:
packages:
yum:
awslogs: []
container_commands:
01_get_awslogs_conf_file:
command: "cp .ebextensions/awslogs.conf /etc/awslogs/awslogs.conf"
03_restart_awslogs:
command: "sudo service awslogs restart"
04_start_awslogs_at_system_boot:
command: "sudo chkconfig awslogs on"
Your awslogs.conf should be available in .ebextensions directory.
Example file of awslogs.conf
[general]
state_file = value
logging_config_file = value
use_gzip_http_content_encoding = [true | false]
[logstream1]
log_group_name = value
log_stream_name = value
datetime_format = value
time_zone = [LOCAL|UTC]
file = value
file_fingerprint_lines = integer | integer-integer
multi_line_start_pattern = regex | {datetime_format}
initial_position = [start_of_file | end_of_file]
encoding = [ascii|utf_8|..]
buffer_duration = integer
batch_count = integer
batch_size = integer
If you are not getting log under cloud-watch log on AWS console then check the agent log on your server. Agent default log path will be /var/log/awslogs.log
Hope, This will help you to setup cloud watch log on EB.