Amazon CloudWatch Agent not sending OnPremise metrics/logs - amazon-web-services

I'm trying to connect my on-premise instance to AWS via CloudWatch Agent to record system metrics. Installed latest version from .deb package and nightly version from .deb package and results are the same (except that nightly version runs via systemd).
Even though logging is enabled both for agent and SDK nothing appears in the logs.
After launching via ctl script it is launched but in fact service daemon is constantly restarted; when started with service amazon-cloudwatch-agent start it is reported as failed:
Apr 22 00:20:56 sero1 systemd[1]: Started Amazon CloudWatch Agent.
Apr 22 00:21:02 sero1 start-amazon-cloudwatch-agent[7614]: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
Apr 22 00:21:02 sero1 start-amazon-cloudwatch-agent[7614]: I! Detecting run_as_user...
Apr 22 00:21:02 sero1 start-amazon-cloudwatch-agent[7614]: 2022-04-21T22:21:02Z I! AWS_SDK_LOG_LEVEL is set to "LogDebug"
Apr 22 00:21:02 sero1 start-amazon-cloudwatch-agent[7614]: 2022-04-21T22:21:02Z I! CWAGENT_LOG_LEVEL is set to "DEBUG"
Apr 22 00:25:03 sero1 systemd[1]: amazon-cloudwatch-agent.service: Main process exited, code=exited, status=1/FAILURE
Apr 22 00:25:03 sero1 systemd[1]: amazon-cloudwatch-agent.service: Failed with result 'exit-code'.
In logfile there are more details:
2022-04-21T22:21:02Z I! Starting AmazonCloudWatchAgent 1.247351.0-10-g2ece109-nightly-build
2022-04-21T22:21:02Z I! AWS SDK log level, LogDebug
2022-04-21T22:21:02Z I! Loaded inputs: net swap cpu disk diskio logfile mem
2022-04-21T22:21:02Z I! Loaded aggregators:
2022-04-21T22:21:02Z I! Loaded processors: delta ec2tagger
2022-04-21T22:21:02Z I! Loaded outputs: cloudwatch cloudwatchlogs
2022-04-21T22:21:02Z I! Tags enabled: host=sero1
2022-04-21T22:21:02Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"sero1", Flush Interval:1s
2022-04-21T22:21:02Z D! [agent] Initializing plugins
2022-04-21T22:21:02Z I! [logagent] starting
2022-04-21T22:21:02Z I! [logagent] found plugin cloudwatchlogs is a log backend
2022-04-21T22:21:02Z I! [logagent] found plugin logfile is a log collection
2022-04-21T22:21:32Z I! CWAGENT_LOG_LEVEL is set to "DEBUG"
2022-04-21T22:21:32Z I! AWS_SDK_LOG_LEVEL is set to "LogDebug"
2022-04-21T22:22:02Z D! Profiler dump:
[no stats is available...]
2022-04-21T22:23:02Z D! Profiler dump:
[no stats is available...]
2022-04-21T22:24:02Z D! Profiler dump:
[no stats is available...]
2022-04-21T22:25:02Z D! Profiler dump:
[no stats is available...]
2022-04-21T22:25:03Z E! [processors.ec2tagger] ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance
2022-04-21T22:25:03Z E! [telegraf] Error running agent: could not initialize processor ec2tagger: ec2tagger: Unable to retrieve InstanceId. This plugin must only be used on an EC2 instance
2022/04/22 00:26:09 I! 2022/04/22 00:26:03 D! [EC2] Found active network interface
2022/04/22 00:26:09 E! ec2metadata is not available
I! Detected the instance is OnPremise
2022/04/22 00:26:09 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2022/04/22 00:26:09 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2022/04/22 00:26:09 I! Valid Json input schema.
I! Detecting run_as_user...
Got Home directory: /root
I! Set home dir Linux: /root
I! SDKRegionWithCredsMap region: eu-central-1
No csm configuration found.
Under path : /logs/ | Info : Got hostname sero1 as log_stream_name
Configuration validation first phase succeeded
2022/04/22 00:26:09 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2022/04/22 00:26:09 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2022/04/22 00:26:09 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2022/04/22 00:26:09 I! Valid Json input schema.
2022/04/22 00:26:09 I! Detected runAsUser: cwagent
2022/04/22 00:26:09 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 996:996
2022/04/22 00:26:09 I! Set HOME: /home/cwagent
After this line service simply dies. No metrics in CloudWatch ever reported.

I had the same error, and in my case it was because I was setting "append_dimensions". (I was misreading the official document and setting custom field and value.)
This field can only contain values related to EC2 metadata, and it seems that if thisfield is configured, CloudWatch Agent tries to read processors.ec2tagger plugin and fails.
I could have uploaded my metrics and logs after deleting "append_dimensions" field.

Related

Cloudwatch Log Agent Issue

I have been working on this Cloudwatch Log Agent for long time but never faced the below issue:
Actually, I did some changes in the Cloudwatch agent JSON file:
/opt/aws/amazon-cloudwatch-agent/bin/config.json
After doing the changes the log export stopped to the Cloudwatch
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [{
"file_path": "/var/lib/jenkins/jobs/**/builds/**/logs",
"log_group_name": "log_data",
"log_stream_name": "log_stream",
"retention_in_days": 3
}]
}
}
}
}
After running the command:
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s
Command Output
***** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source file:/opt/aws/amazon-cloudwatch-agent/bin/config.json --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
I! Trying to detect region from ec2
D! [EC2] Found active network interface
Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json.tmp
Start configuration validation...
/opt/aws/amazon-cloudwatch-agent/bin/config-translator --input /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json --input-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --output /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
2022/11/23 06:37:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json.tmp ...
2022/11/23 06:37:55 I! Valid Json input schema.
I! Detecting run_as_user...
I! Trying to detect region from ec2
D! [EC2] Found active network interface
No csm configuration found.
Configuration validation first phase succeeded
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded
Still not able to export the logs
Check for the issue also in the amazon-cloudwatch-agent.log
Command:
cat /opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log
Output
2022/11/22 19:22:35 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2022/11/22 19:22:35 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json ...
2022/11/22 19:22:36 I! Valid Json input schema.
2022/11/22 19:22:36 I! Detected runAsUser: root
2022/11/22 19:22:36 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 0:0
2022-11-22T19:22:36Z I! Starting AmazonCloudWatchAgent 1.247355.0
2022-11-22T19:22:36Z I! AWS SDK log level not set
2022-11-22T19:22:36Z I! Loaded inputs: disk logfile mem
2022-11-22T19:22:36Z I! Loaded aggregators:
2022-11-22T19:22:36Z I! Loaded processors: ec2tagger
2022-11-22T19:22:36Z I! Loaded outputs: cloudwatch cloudwatchlogs
2022-11-22T19:22:36Z I! Tags enabled: host=ip-172-31-1-218
2022-11-22T19:22:36Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-172-31-1-218", Flush Interval:1s
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: Check ec2 metadata
2022-11-22T19:22:36Z I! [logagent] starting
2022-11-22T19:22:36Z I! [logagent] found plugin cloudwatchlogs is a log backend
2022-11-22T19:22:36Z I! [logagent] found plugin logfile is a log collection
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: Check ec2 metadata
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
2022-11-22T19:22:36Z I! cloudwatch: get unique roll up list []
2022-11-22T19:22:36Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 1s
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2022-11-22T19:22:36Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
Attaching the old Logs also where you can see the actual difference from the same file:
022/10/11 06:58:04 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2022/10/11 06:58:04 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json ...
2022/10/11 06:58:04 I! Valid Json input schema.
2022/10/11 06:58:04 I! Detected runAsUser: root
2022/10/11 06:58:04 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var
] to 0:0
2022-10-11T06:58:04Z I! Starting AmazonCloudWatchAgent 1.247355.0
2022-10-11T06:58:04Z I! AWS SDK log level not set
2022-10-11T06:58:04Z I! Loaded inputs: disk logfile mem
2022-10-11T06:58:04Z I! Loaded aggregators:
2022-10-11T06:58:04Z I! Loaded processors: ec2tagger
2022-10-11T06:58:04Z I! Loaded outputs: cloudwatch cloudwatchlogs
2022-10-11T06:58:04Z I! Tags enabled: host=ip-172-31-1-218
2022-10-11T06:58:04Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-172-31-1-218", Flush Interval:1s
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: Check ec2 metadata
2022-10-11T06:58:04Z I! [logagent] starting
2022-10-11T06:58:04Z I! [logagent] found plugin cloudwatchlogs is a log backend
2022-10-11T06:58:04Z I! [logagent] found plugin logfile is a log collection
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: Check ec2 metadata
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
2022-10-11T06:58:04Z I! cloudwatch: get unique roll up list []
2022-10-11T06:58:04Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 54s
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2022-10-11T06:58:04Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
2022-10-11T06:58:05Z I! [inputs.logfile] Reading from offset 841815 in /var/log/syslog
2022-10-11T06:58:05Z I! [logagent] piping log from Test/syslog(/var/log/syslog) to cloudwatchlogs with retention 3
2022-10-11T06:58:10Z I! [outputs.cloudwatchlogs] First time sending logs to Test/syslog since startup so sequenceToken is nil, learned new token:(0xc00090410
0): The given sequenceToken is invalid. The next expected sequenceToken is: 49623781346189785515230421563841207337059424902309742034
2022-10-11T06:58:10Z W! [outputs.cloudwatchlogs] Retried 0 time, going to sleep 112.557969ms before retrying.
2022-10-11T07:35:05Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:13Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:15Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:16Z W! [agent] ["outputs.cloudwatchlogs"] did not complete within its flush interval
2022-10-11T07:35:17Z W! [agent] ["outputs.cloudwatchlogs"] did not complete within its flush interval
2022-10-11T07:35:18Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:19Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:19Z W! [agent] ["outputs.cloudwatchlogs"] did not complete within its flush interval
2022-10-11T07:35:22Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022/10/11 07:46:06 I! D! [EC2] Found active network interface
I! Detected the instance is EC2
2022/10/11 07:46:06 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2022/10/11 07:46:06 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json ...
2022/10/11 07:46:06 I! Valid Json input schema.
I! Detecting run_as_user...
I! Trying to detect region from ec2
No csm configuration found.
Configuration validation first phase succeeded
The difference I can spot between the old and the new logs are:
initial retrieval of tags and Volumes
2022-10-11T06:58:05Z I! [inputs.logfile] Reading from offset 841815 in /var/log/syslog
2022-10-11T06:58:05Z I! [logagent] piping log from Test/syslog(/var/log/syslog) to cloudwatchlogs with retention 3
2022-10-11T06:58:10Z I! [outputs.cloudwatchlogs] First time sending logs to Test/syslog since startup so sequenceToken is nil, learned new token:(0xc00090410
0): The given sequenceToken is invalid. The next expected sequenceToken is: 49623781346189785515230421563841207337059424902309742034
2022-10-11T06:58:10Z W! [outputs.cloudwatchlogs] Retried 0 time, going to sleep 112.557969ms before retrying.
2022-10-11T07:35:05Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:13Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:15Z W! [agent] ["outputs.cloudwatch"] did not complete within its flush interval
2022-10-11T07:35:16Z W! [agent] ["outputs.cloudwatchlogs"] did not complete within it
Please help me out resolving the issue because I am stuck on this for very long time and additionally I have also tried installing the agent again but it didn't helped.

before install: CodeDeploy agent was not able to receive the lifecycle event. Check the CodeDeploy agent logs on your host and make sure the agent is

I have set up a pipeline, but i get the following error during deployment:
before install CodeDeploy agent was not able to receive the lifecycle event. Check the CodeDeploy agent logs on your host and make sure the agent is running and can connect to the CodeDeploy server.
Code Agent is running, but i do not know, what the problem is. I checked the logs of codedeploy:
[ec2-user#ip-172-31-255-11 ~]$ sudo cat /var/log/aws/codedeploy-agent/codedeploy-agent.log
2022-09-27 00:00:02 INFO [codedeploy-agent(3694)]: [Aws::CodeDeployCommand::Client 200 45.14352 0 retries] poll_host_command(host_identifier:"arn:aws:ec2:us-east-1:632547665100:instance/i-01d3b4303d7c9c948")
2022-09-27 00:00:03 INFO [codedeploy-agent(3694)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.4.0-2218_rpm.
2022-09-27 00:00:03 INFO [codedeploy-agent(3694)]: Version file found in /opt/codedeploy-agent/.version with agent version OFFICIAL_1.4.0-2218_rpm.
Also was unlucky enough to meet this problem today.
Please use this guide and look at the CodeDeploy agent logs of your compute platform instance (EC2, probably).
in my case, it turned out that I did not have an AppSpec file added to the project.

AWS Cloudwatch agent config file removed after startup

Problem
I am simply trying to install Cloudwatch Agent on Amazon Linux 2 instances at startup, using AWS userdata. For some reason, after Cloud Init has finished running, all services get restarted and the configuration file I put in the cloudwatch folder is not there anymore.
I am using a custom AMI which is pre-built with Packer, my configuration file being put in /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json from an Ansible template. This is the configuration file I want to use, holding all metrics and logs I want to send. I am then copying it to /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json at startup after the agent installation.
Here is my userdata script:
#!/bin/bash
yum install amazon-cloudwatch-agent -y
cp /opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
What is happening
After startup has finished, I can see the script ran correctly. If I run cat /opt/aws/amazon-cloudwatch-agent/log/amazon-cloudwatch-agent.log I can see that the following:
2021/07/16 13:33:46 I! I! Detected the instance is EC2
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
Configuration validation first phase succeeded
2021/07/16 13:33:46 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_amazon-cloudwatch-agent.json ...
2021/07/16 13:33:46 I! Detected runAsUser: root
2021/07/16 13:33:46 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to root:root
2021-07-16T13:33:46Z I! Starting AmazonCloudWatchAgent 1.247347.4
2021-07-16T13:33:46Z I! Loaded inputs: netstat diskio logfile mem net processes swap cpu disk
2021-07-16T13:33:46Z I! Loaded aggregators:
2021-07-16T13:33:46Z I! Loaded processors: delta ec2tagger
2021-07-16T13:33:46Z I! Loaded outputs: cloudwatch cloudwatchlogs
2021-07-16T13:33:46Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:46Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:46Z I! [logagent] starting
2021-07-16T13:33:46Z I! [logagent] found plugin cloudwatchlogs is a log backend
2021-07-16T13:33:46Z I! [logagent] found plugin logfile is a log collection
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:46Z I! cloudwatch: get unique roll up list [[AutoScalingGroupName] [InstanceId InstanceType] []]
2021-07-16T13:33:46Z I! cloudwatch: publish with ForceFlushInterval: 30s, Publish Jitter: 11s
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:46Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
=======> 2021-07-16T13:33:47Z I! [logagent] piping log from APP-DEV-php-errors-logs/XX.XX.X.XXX(/var/log/php-fpm/error.log) to cloudwatchlogs
2021-07-16T13:33:54Z I! Profiler is stopped during shutdown
2021-07-16T13:33:54Z I! [agent] Hang on, flushing any cached metrics before shutdown
2021/07/16 13:33:55 I! I! Detected the instance is EC2
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
Valid Json input schema.
I! Detecting run_as_user...
No csm configuration found.
No log configuration found.
Configuration validation first phase succeeded
2021/07/16 13:33:55 I! Config has been translated into TOML /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2021/07/16 13:33:55 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/default ...
2021/07/16 13:33:55 I! Detected runAsUser: cwagent
2021/07/16 13:33:55 I! Changing ownership of [/opt/aws/amazon-cloudwatch-agent/logs /opt/aws/amazon-cloudwatch-agent/etc /opt/aws/amazon-cloudwatch-agent/var] to 994:992
2021/07/16 13:33:55 I! Set HOME: /home/cwagent
2021-07-16T13:33:55Z I! Starting AmazonCloudWatchAgent 1.247348.0
2021-07-16T13:33:55Z I! Loaded inputs: disk mem
2021-07-16T13:33:55Z I! Loaded aggregators:
2021-07-16T13:33:55Z I! Loaded processors: ec2tagger
2021-07-16T13:33:55Z I! Loaded outputs: cloudwatch
2021-07-16T13:33:55Z I! Tags enabled: host=ip-XX-XX-X-XXX.eu-west-1.compute.internal
2021-07-16T13:33:55Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"ip-XX-XX-X-XXX.eu-west-1.compute.internal", Flush Interval:1s
2021-07-16T13:33:55Z I! [logagent] starting
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started initialization.
=======> 2021-07-16T13:33:55Z I! cloudwatch: get unique roll up list []
2021-07-16T13:33:55Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 26s
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: Initial retrieval of tags succeded
2021-07-16T13:33:55Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
2021-07-16T13:39:07Z I! [processors.ec2tagger] ec2tagger: Refresh is no longer needed, stop refreshTicker.
So as you can see, the initial command from userdata runs fine and custom metrics and logs are collected (see the ====> mark before the relevant lines).
However a few seconds later, after Cloud Init is over, the cloudwatch agent is restarted by systemd somehow and again, somehow, the file amazon-cloudwatch-agent.json is absent from the filesystem, so the agent runs with default parameters.
However if I rerun the command manually after startup everything works fine but of course I need it automated for when autoscaling fires up.
What I have tried
Launching amazon cloudwatch agent directly with systemd, trying to chown the config file to read-only, fetching config only and let the system start the agent itself, but the problem still persists.
Thank you for your help
Workaround
The preinstalled ssm-agent conflicts with the Cloudwtach Agent. Uninstall ssm-agent during Packer build:
sudo yum erase amazon-ssm-agent --assumeyes
Explanation
I finally found out that the newly install cloudwatch agent conflicts with the SSM agent installed by default in the Amazon Linux 2 image.
Indeed, I first tried an ugly workaround which would be to replace the StartExec line of the amazon-cloudwatch-agent service using sed in the user data :
sed -i '/ExecStart/c\ExecStart=/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/custom/amazon-cloudwatch-agent.json' /etc/systemd/system/amazon-cloudwatch-agent.service
That way when the service gets restarted after instance startup it would use my custom configuration.
However I then found out that the service file got also replaced after Cloud Init ended.
Reviewing the system messages I noticed that ssm-agent was performing some configuration reloading after Cloud Init ended, and thus I assumed that it could possibly be the culprit.
I ended up uninstalling it in the packer build which is building my AMI so it would not be present at instance startup, and finally my configuration did not get overwritten anymore.
Note that I do not have a deep understanding of how ssm-agent works, and there is probably a proper way to instantiate Cloudwatch Agent using some SSM configuration.
Since we do not currently use SSM and I do not have enough time to study this option, I choosed this compromise.
If someone can come up with a cleaner solution, using ssm-agent through an automated method, this would be greatly appreciated.
You can try using AWS Systems Manager Parameter Store - the SSM agent shouldn't/can't remove the config there.
Ensure the server has AWS-managed policy CloudWatchAgentServerPolicy attached, this allows any Parameter Store parameter named AmazonCloudWatch-* to be read
Ensure the contents of what would have been in amazon-cloudwatch-agent.json is stored in Parameter Store as valid json, with an appropriate name from (1)
As found in cat /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl, run amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:AmazonCloudWatch-Config.json -s (replacing AmazonCloudWatch-Config.json with the name of the parameter from Parameter Store)
See aws docs.
Update 2022-10-02 - multiple configs can be added by using append-config, see aws docs, or by placing config files into /etc/amazon/amazon-cloudwatch-agent/amazon-cloudwatch-agent.d/ and restarting the agent sudo service amazon-cloudwatch-agent restart. This helps when attempting to deploy on Amazon Linux 2, which ships its application logs using a cloudwatch log config from this folder, as well as a custom config to ship other logs from the server. Using fetch-config would otherwise overwrite the log config for the AL2 application.

Amazon ECS agent on ubuntu not starting

I am currently trying to build a custom ubuntu ami for AWS batch and following the document mentioned here
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-install.html
However when I try to start the docker agent on that machine it always keeps giving me this error
2018-07-04T23:34:01Z [INFO] Amazon ECS agent Version: 1.18.0, Commit: c0defea9
2018-07-04T23:34:01Z [INFO] Loading state! module="statemanager"
2018-07-04T23:34:01Z [INFO] Event stream ContainerChange start listening...
2018-07-04T23:34:01Z [INFO] Creating root ecs cgroup: /ecs
2018-07-04T23:34:01Z [INFO] Creating cgroup /ecs
2018-07-04T23:34:01Z [WARN] Disabling TaskCPUMemLimit because agent is unabled to setup '/ecs' cgroup: cgroup create: unable to create controller: mkdir /sys/fs/cgroup/systemd/ecs: read-only file system
2018-07-04T23:34:01Z [WARN] Error getting valid credentials (AKID ): NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
2018-07-04T23:34:01Z [INFO] Registering Instance with ECS
2018-07-04T23:34:01Z [ERROR] Could not register: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
2018-07-04T23:34:01Z [ERROR] Error registering: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
I made sure the instance has the ecsInstanceRole associated with that.
Can you guys let me know what I am missing?
Not certain how you are starting the ecs-agent. Ran into the error of
Disabling TaskCPUMemLimit because agent is unabled to setup '/ecs cgroup: cgroup create: unable to create controller: /sys/fs/cgroup/systemd/ecs: read-only file system
We resolved this by adding the volume --volume=/sys/fs/cgroup:/sys/fs/cgroup:ro to the systemd unit file that we having launching ecs.
Outside of that, I assume the issue resides with the ecsInstanceRole. Can you verify it has the following permissions? AmazonEC2ContainerRegistryFullAccess, AmazonEC2ContainerServiceFullAccess, AmazonEC2ContainerServiceforEC2Role
Below is the full systemd file for ecs-agent.
[Unit]
Description=Docker Container %I
Requires=docker.service
After=docker.service
[Service]
Restart=always
ExecStartPre=-/usr/bin/docker rm -f %i
ExecStart=/usr/bin/docker run --name %i \
--restart=on-failure:10 \
--volume=/var/run:/var/run \
--volume=/var/log/ecs/:/log:Z \
--volume=/var/lib/ecs/data:/data:Z \
--volume=/etc/ecs:/etc/ecs \
--volume=/sys/fs/cgroup:/sys/fs/cgroup:ro \
--net=host \
--env-file=/etc/ecs/ecs.config \
--env LOGSPOUT=ignore \
amazon/amazon-ecs-agent:latest
ExecStop=/usr/bin/docker stop %i
[Install]
WantedBy=default.target
I ran into the same messages. You need to create the IAM role and launch the instance with that role, per this documentation: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html

"Permission denied (publickey)" when starting AWS EC2 Spark multi node cluster from master

I have following this guide to create a multi node Spark cluster on AWS using EC2 instances. Everything has gone perfectly fine without error until I try to start the cluster from the master EC2 instance with:
$SPARK_HOME/sbin/start-all.sh
where i get a "Permission denied (publickey)" as follows:
$SPARK_HOME/sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-ubuntu-org.apache.spark.deploy.master.Master-1-ip-172-31-2-17.out
ec2-34-205-81-113.compute-1.amazonaws.com: Warning: Permanently added 'ec2-34-205-81-113.compute-1.amazonaws.com,172.31.6.154' (ECDSA) to the list of known hosts.
ec2-34-205-255-52.compute-1.amazonaws.com: Warning: Permanently added 'ec2-34-205-255-52.compute-1.amazonaws.com,172.31.11.106' (ECDSA) to the list of known hosts.
ec2-34-201-21-89.compute-1.amazonaws.com: Warning: Permanently added 'ec2-34-201-21-89.compute-1.amazonaws.com,172.31.4.124' (ECDSA) to the list of known hosts.
ec2-34-205-81-113.compute-1.amazonaws.com: Permission denied (publickey).
ec2-34-205-255-52.compute-1.amazonaws.com: Permission denied (publickey).
ec2-34-201-21-89.compute-1.amazonaws.com: Permission denied (publickey).
The log file is as follows:
Spark Command: /usr/bin/java -cp /usr/local/spark/conf/:/usr/local/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master
--host ip-172-31-2-17.ec2.internal --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/01/15 22:31:57 INFO Master: Started daemon with process name: 5772#ip-172-31-2-17
18/01/15 22:31:57 INFO SignalUtils: Registered signal handler for TERM
18/01/15 22:31:57 INFO SignalUtils: Registered signal handler for HUP
18/01/15 22:31:57 INFO SignalUtils: Registered signal handler for INT
18/01/15 22:31:58 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
18/01/15 22:31:58 INFO SecurityManager: Changing view acls to: ubuntu
18/01/15 22:31:58 INFO SecurityManager: Changing modify acls to: ubuntu
18/01/15 22:31:58 INFO SecurityManager: Changing view acls groups to:
18/01/15 22:31:58 INFO SecurityManager: Changing modify acls groups to:
18/01/15 22:31:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users
with view permissions: Set(ubuntu); groups with view permissions: Set(); users with modify permissions:
Set(ubuntu); groups with modify permissions: Set()
18/01/15 22:31:59 INFO Utils: Successfully started service 'sparkMaster' on port 7077.
18/01/15 22:31:59 INFO Master: Starting Spark master at spark://ip-172-31-2-17.ec2.internal:7077
18/01/15 22:31:59 INFO Master: Running Spark version 2.0.2
18/01/15 22:31:59 INFO Utils: Successfully started service 'MasterUI' on port 8080.
18/01/15 22:31:59 INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://ec2-34-205-4-1.compute-1.amazonaws.com:8080
18/01/15 22:31:59 INFO Utils: Successfully started service on port 6066.
18/01/15 22:31:59 INFO StandaloneRestServer: Started REST server for submitting applications on port 6066
18/01/15 22:31:59 INFO Master: I have been elected leader! New state: ALIVE
I have searched online for what this error is and a suggested solution was running the following on my master:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
No luck there either, it gave me this:
cat: /home/ubuntu/.ssh/id_rsa.pub: No such file or directory
Also, at the end of the guide someone suggested this:
If you are having “permission denied” problems while starting up the
cluster, it’s possible you don’t have the private ssh key on the
master node. Once you move the private key to the master node, you’ll
also have to designate the private key as the default by specifying
the key location in .ssh/config. For example:
IdentityFile ~/.ssh/spark_cluster.pem
But there is no part of the tutorial which talks of putting or moving ssh keys and I have no idea where to go from here now. I don't think I missed any instruction from the tutorial. All my machines are Ubuntu 16.04. Please help
It looks like there is a problem with the SSH. A previous, similar, question got solved by copying the public key on the master node to a new slave node. In your case, try to generate a key (since it doesn't seem to exist) and then copy it to all the slave nodes.
cd ~/.ssh
ssh-keygen -t rsa
scp ~/.ssh/id_rsa.pub remote_username#host:~/.ssh/authorized_keys