Does anyone know a way to install Cloudwatch agents automatically on EC2 instances while launching them through a launch template/configuration on terraform ?
I have just struggled through the process myself and would have benefited from a clear guide. So here's my attempt to provide one (for Amazon Linux 2 AMI):
Create your Cloudwatch agent configuration json file, which defines the metrics you want to collect. Easiest way is to SSH onto your EC2 instance and run this command to generate the file using the wizard: sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard. This is what my file looks like, it is the most basic config which only collects metrics on disk and memory usage every 60 seconds:
{
"agent": {
"metrics_collection_interval": 60,
"region": "eu-west-1",
"logfile": "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log",
"run_as_user": "root"
},
"metrics": {
"metrics_collected": {
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
}
}
}
}
Create a shell script template file which will run when the EC2 instance is created. This is what mine looks like, it is called userdata.sh.tpl:
Content-Type: multipart/mixed; boundary="==BOUNDARY=="
MIME-Version: 1.0
--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
# Install Cloudwatch agent
sudo yum install -y amazon-cloudwatch-agent
# Write Cloudwatch agent configuration file
sudo cat >> /opt/aws/amazon-cloudwatch-agent/bin/config_temp.json <<EOF
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"metrics": {
"metrics_collected": {
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
}
}
}
}
EOF
# Start Cloudwatch agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json
--==BOUNDARY==--
Create a directory called templates in your terraform module directory and store the userdata.sh.tpl file in there.
Create a data block in the appropriate .tf file as follows:
data "template_file" "user_data" {
template = file("${path.module}/templates/userdata.sh.tpl")
vars = {
...
}
}
In your aws_launch_configuration block, pass in the following value for the user_data variable:
resource "aws_launch_configuration" "example" {
name = "example_server_name"
image_id = data.aws_ami.ubuntu.id
instance_type = "t2.micro"
user_data = data.template_file.user_data.rendered
}
Add the CloudWatchAgentServerPolicy policy to the IAM role used by your EC2 server. This will give your role all the required service-level permissions e.g. "cloudwatch:PutMetricData".
Relaunch your EC2 server, and SSH on to check that the CloudWatch agent is installed and running using systemctl status amazon-cloudwatch-agent.service
Navigate to the CloudWatch UI and select Metrics from the left-hand menu. You should see CWAgent in the list of namespaces.
Yes this can be achieved with a Bash script (assuming Linux)
Steps to consider
Create UserData.sh file
Use templatefile to link userdata.sh to launch template
Write userdata to install AWS Cloudwatch agent (https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Agent-on-EC2-Instance.html)
Terminate/create instance
Check cloudwatch agent is installed, up and running systemctl status amazon-cloudwatch-agent
Related
I would like to forward the logs of select services running on my EKS cluster to CloudWatch for cluster-independent storage and better observability.
Following the quickstart outlined at https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-EKS-quickstart.html I've managed to get the logs forwarded via Fluent Bit service, but that has also generated 170 Container Insights metrics channels. Not only are those metrics not required, but they also appear to cost a fair bit.
How can I disable the collection of cluster metrics such as cpu / memory / network / etc, and only keep forwarding container logs to CloudWatch? I'm having a very hard time finding any documentation on this.
I think I figured it out - the cloudwatch-agent daemonset from quickstart guide is what's sending the metrics, but it's not required for log forwarding. All the objects with names related to cloudwatch-agent in quickstart yaml file are not required for log forwarding.
As suggested by Toms Mikoss, you need to delete the metrics object in your configuration file. This file is the one that you pass to the agent when starting
This applies to "on-premises" "linux" installations. I havent tested this on windows, nor EC2 but I imagine it will be similar. The AWS Documentation here says that you can also distribute the configuration via SSM, but again, I imagine the answer here is still applicable.
Example of file with metrics:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/nginx.log",
"log_group_name": "nginx",
"log_stream_name": "{hostname}"
}
]
}
}
},
"metrics": {
"metrics_collected": {
"cpu": {
"measurement": [
"cpu_usage_idle",
"cpu_usage_iowait"
],
"metrics_collection_interval": 60,
"totalcpu": true
}
}
}
}
Example of file without metrics:
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"logs": {
"logs_collected": {
"files": {
"collect_list": [
{
"file_path": "/var/log/nginx.log",
"log_group_name": "nginx",
"log_stream_name": "{hostname}"
}
]
}
}
}
}
For reference, the command to start for linux on-premises servers:
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config \
-m onPremise -s -c file:configuration-file-path
More details in the AWS Documentation here
I was able to setup AutoScaling events as rules in EventBridge to trigger SSM Commands, but I've noticed that with my chosen Target Value the event is passed to all my active EC2 Instances. My Target key is a tag shared by those instances, so my mistake makes sense now.
I'm pretty new to EventBridge, so I was wondering if there's a way to actually target the instance that triggered the AutoScaling event (as in extracting the "InstanceId" that's present in the event data and use that as my new Target Value). I saw the Input Transformer, but I think that just transforms the event data to pass to the target.
Thanks!
EDIT - help with js code for Lambda + SSM RunCommand
I realize I can achieve this by setting EventBridge to invoke a Lambda function instead of the SSM RunCommand directly. Can anyone help with the javaScript code to call a shell command on the ec2 instance specified in the event data (event.detail.EC2InstanceId)? I can't seem to find a relevant and up-to-date base template online, and I'm not familiar enough with js or Lambda. Any help is greatly appreciated! Thanks
Sample of Event data, as per aws docs
{
"version": "0",
"id": "12345678-1234-1234-1234-123456789012",
"detail-type": "EC2 Instance Launch Successful",
"source": "aws.autoscaling",
"account": "123456789012",
"time": "yyyy-mm-ddThh:mm:ssZ",
"region": "us-west-2",
"resources": [
"auto-scaling-group-arn",
"instance-arn"
],
"detail": {
"StatusCode": "InProgress",
"Description": "Launching a new EC2 instance: i-12345678",
"AutoScalingGroupName": "my-auto-scaling-group",
"ActivityId": "87654321-4321-4321-4321-210987654321",
"Details": {
"Availability Zone": "us-west-2b",
"Subnet ID": "subnet-12345678"
},
"RequestId": "12345678-1234-1234-1234-123456789012",
"StatusMessage": "",
"EndTime": "yyyy-mm-ddThh:mm:ssZ",
"EC2InstanceId": "i-1234567890abcdef0",
"StartTime": "yyyy-mm-ddThh:mm:ssZ",
"Cause": "description-text"
}
}
Edit 2 - my Lambda code so far
'use strict'
const ssm = new (require('aws-sdk/clients/ssm'))()
exports.handler = async (event) => {
const instanceId = event.detail.EC2InstanceId
var params = {
DocumentName: "AWS-RunShellScript",
InstanceIds: [ instanceId ],
TimeoutSeconds: 30,
Parameters: {
commands: ["/path/to/my/ec2/script.sh"],
workingDirectory: [],
executionTimeout: ["15"]
}
};
const data = await ssm.sendCommand(params).promise()
const response = {
statusCode: 200,
body: "Run Command success",
};
return response;
}
Yes, but through Lambda
EventBridge -> Lambda (using SSM api) -> EC2
Thank you #Sándor Bakos for helping me out!! My JavaScript ended up not working for some reason, so I ended up just using part of the python code linked in the comments.
1. add ssm:SendCommand permission:
After I let Lambda create a basic role during function creation, I added an inline policy to allow Systems Manager's SendCommand. This needs access to your documents/*, instances/* and managed-instances/*
2. code - python 3.9
import boto3
import botocore
import time
def lambda_handler(event=None, context=None):
try:
client = boto3.client('ssm')
instance_id = event['detail']['EC2InstanceId']
command = '/path/to/my/script.sh'
client.send_command(
InstanceIds = [ instance_id ],
DocumentName = 'AWS-RunShellScript',
Parameters = {
'commands': [ command ],
'executionTimeout': [ '60' ]
}
)
You can do this without using lambda, as I just did, by using eventbridge's input transformers.
I specified a new automation document that called the document I was trying to use (AWS-ApplyAnsiblePlaybooks).
My document called out the InstanceId as a parameter and is passed this by the input transformer from EventBridge. I had to pass the event into lambda just to see how to parse the JSON event object to get the desired instance ID - this ended up being
$.detail.EC2InstanceID
(it was coming from an autoscaling group).
I then passed it into a template that was used for the runbook
{"InstanceId":[<instance>]}
This template was read in my runbook as a parameter.
This was the SSM playbook inputs I used to run the AWS-ApplyAnsiblePlaybook Document, I just mapped each parameter to the specified parameters in the nested playbook:
"inputs": {
"InstanceIds": ["{{ InstanceId }}"],
"DocumentName": "AWS-ApplyAnsiblePlaybooks",
"Parameters": {
"SourceType": "S3",
"SourceInfo": {"path": "https://testansiblebucketab.s3.amazonaws.com/"},
"InstallDependencies": "True",
"PlaybookFile": "ansible-test.yml",
"ExtraVariables": "SSM=True",
"Check": "False",
"Verbose": "-v",
"TimeoutSeconds": "3600"
}
See the document below for reference. They used a document that was already set up to receive the variable
https://docs.aws.amazon.com/systems-manager/latest/userguide/automation-tutorial-eventbridge-input-transformers.html
This is the full automation playbook I used, most of the parameters are defaults from the nested playbook:
{
"description": "Runs Ansible Playbook on Launch Success Instances",
"schemaVersion": "0.3",
"assumeRole": "<Place your automation role ARN here>",
"parameters": {
"InstanceId": {
"type": "String",
"description": "(Required) The ID of the Amazon EC2 instance."
}
},
"mainSteps": [
{
"name": "RunAnsiblePlaybook",
"action": "aws:runCommand",
"inputs": {
"InstanceIds": ["{{ InstanceId }}"],
"DocumentName": "AWS-ApplyAnsiblePlaybooks",
"Parameters": {
"SourceType": "S3",
"SourceInfo": {"path": "https://testansiblebucketab.s3.amazonaws.com/"},
"InstallDependencies": "True",
"PlaybookFile": "ansible-test.yml",
"ExtraVariables": "SSM=True",
"Check": "False",
"Verbose": "-v",
"TimeoutSeconds": "3600"
}
}
}
]
}
I am trying to start EC2 instance automatically using crontab, but it doesn't work properly.
My crontab setting is like this.
0 1 * * * aws ec2 stop-instances --region=ap-northeast-1 --instance-ids=i-*******
0 8 * * * aws ec2 start-instances --region=ap-northeast-1 --instance-ids=i-*******
In this setting, 'ec2 stop-instances' works perfectly, but 'ec2 start-instances' don't.
It's very weird for me.
If both start and stop feature don't work, I could be convinced.
is there anyone who know what the problem is?
Thank you for your cooperation.
P.S.
If I run 'aws ec2 start-instances' manually, the message what I get is down below.
{
"StartingInstances": [
{
"InstanceId": "i-**********",
"CurrentState": {
"Code": 16,
"Name": "running"
},
"PreviousState": {
"Code": 16,
"Name": "running"
}
}
]
}
'aws ec2 stop-instances'
{
"StoppingInstances": [
{
"InstanceId": "i-024a225909929750a",
"CurrentState": {
"Code": 64,
"Name": "stopping"
},
"PreviousState": {
"Code": 16,
"Name": "running"
}
}
]
}
I solved this question by myself. When I change aws ec2 start-instances to /Users/UserName/.local/bin/aws ec2 start-instances, it works as I expected.
I have been struggling to get basic metrics from the CloudWatch agent. I've been getting this error and I have no idea what it means nor can I find resources online which talk much about it
refresh EC2 Instance Tags failed: SharedCredsLoad: failed to get profile, metrics will be dropped until it got fixed
I followed the instructions here and have read through the documentation carefully. Again, the goal is just to read in some basic metrics from my EC2 instance to CloudWatch. Here are the steps I have followed:
Followed instructions here "To create the IAM role necessary to run the CloudWatch agent on EC2 instances" and then assigned it to my instance.
wget https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
ami id is ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20190628 (ami-0cfee17793b08a293)
Install the .deb with command sudo dpkg --install --skip-same-version ./amazon-cloudwatch-agent.deb
note --install and --skip-same-version is just -i -E as done in the docs
generated a config.json with the wizard, located here /opt/aws/amazon-cloudwatch-agent/bin/config.json. I pasted the contents under the error message below.
modify the /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml file to point to new credentials of cwagent (since not using root user) with the following:
root#ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/etc# tail -n 4 common-config.toml
#### BEGIN ANSIBLE MANAGED BLOCK ####
[credentials]
shared_credential_file = "/home/cwagent/.aws/credentials"
#### END ANSIBLE MANAGED BLOCK ####
fetch config and start agent with sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s
here's the error I'm seeing in the logs now, and I'm assuming why this is why I can't see any metrics
root#ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/logs# tail -n 20 amazon-cloudwatch-agent.log
2019/10/29 22:41:08 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json ...
2019/10/29 22:41:08 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json ...
2019/10/29 22:41:08 I! Detected runAsUser: cwagent
2019/10/29 22:41:08 I! Change ownership to cwagent:cwagent
2019/10/29 22:41:08 I! Set HOME: /home/cwagent
2019-10-29T22:41:08Z I! will use file based credentials provider
2019-10-29T22:41:08Z I! cloudwatch: get unique roll up list []
2019-10-29T22:41:08Z I! Starting AmazonCloudWatchAgent (version 1.230621.0)
2019-10-29T22:41:08Z I! Loaded outputs: cloudwatch
2019-10-29T22:41:08Z I! cloudwatch: publish with ForceFlushInterval: 1m0s, Publish Jitter: 37s
2019-10-29T22:41:08Z I! Loaded inputs: disk mem
2019-10-29T22:41:08Z I! Tags enabled: host=ip-172-31-71-5
2019-10-29T22:41:08Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"ip-172-31-71-5", Flush Interval:1s
2019-10-29T22:41:08Z I! will use file based credentials provider
2019-10-29T22:41:08Z E! refresh EC2 Instance Tags failed: SharedCredsLoad: failed to get profile, metrics will be dropped until it got fixed
2019-10-29T22:42:37Z E! refresh EC2 Instance Tags failed: SharedCredsLoad: failed to get profile, metrics will be dropped until it got fixed
2019-10-29T22:43:37Z E! refresh EC2 Instance Tags failed: SharedCredsLoad: failed to get profile, metrics will be dropped until it got fixed
2019-10-29T22:46:37Z E! refresh EC2 Instance Tags failed: SharedCredsLoad: failed to get profile, metrics will be dropped until it got fixed
2019-10-29T22:49:37Z E! refresh EC2 Instance Tags failed: SharedCredsLoad: failed to get profile, metrics will be dropped until it got fixed
2019-10-29T22:52:37Z E! refresh EC2 Instance Tags failed: SharedCredsLoad: failed to get profile, metrics will be dropped until it got fixed
and the config.json I used
root#ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/bin# cat config.json
{
"agent": {
"metrics_collection_interval": 10,
"run_as_user": "cwagent"
},
"metrics": {
"namespace": "TestNamespace",
"append_dimensions": {
"AutoScalingGroupName": "${aws:AutoScalingGroupName}",
"ImageId": "${aws:ImageId}",
"InstanceId": "${aws:InstanceId}",
"InstanceType": "${aws:InstanceType}"
},
"metrics_collected": {
"disk": {
"measurement": [
"used_percent"
],
"metrics_collection_interval": 60,
"resources": [
"*"
]
},
"mem": {
"measurement": [
"mem_used_percent"
],
"metrics_collection_interval": 60
}
}
}
}
EDITS
I got it working after I removed the credentials modification
root#ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/etc# tail -n 4 common-config.toml
#### BEGIN ANSIBLE MANAGED BLOCK ####
#[credentials]
#shared_credential_file = "/home/cwagent/.aws/credentials"
#### END ANSIBLE MANAGED BLOCK ####
and after I went ahead and copied the config file to the default location it checks (even though the docs say you can pass the file name as I did).
root#ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/bin# cp config.json /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
root#ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/bin# cd ../etc/
root#ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/etc# chown cwagent:cwagent amazon-cloudwatch-agent.json
root#ip-172-31-71-5:/opt/aws/amazon-cloudwatch-agent/etc# ls -l
total 16
drwxr-xr-x 2 cwagent cwagent 4096 Oct 30 22:05 amazon-cloudwatch-agent.d
-rwxr-xr-x 1 cwagent cwagent 611 Oct 30 22:11 amazon-cloudwatch-agent.json
-rw-rw-r-- 1 cwagent cwagent 1144 Oct 30 22:05 amazon-cloudwatch-agent.toml
-rw-r--r-- 1 cwagent cwagent 1073 Oct 30 22:05 common-config.toml
The error appears to be related to accessing tags that are associated with Amazon EC2 instances.
The installation instructions you linked suggest creating an IAM Role with the CloudWatchAgentServerPolicy policy attached. This policy includes permission to describe tags:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData",
"ec2:DescribeVolumes",
"ec2:DescribeTags",
"logs:PutLogEvents",
"logs:DescribeLogStreams",
"logs:DescribeLogGroups",
"logs:CreateLogStream",
"logs:CreateLogGroup"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ssm:GetParameter"
],
"Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*"
}
]
}
It would appear that the CloudWatch Agent on that server is not receiving such permissions, and is therefore unable to list the tags.
Therefore:
Confirm that an IAM Role has been created and that it includes the CloudWatchAgentServerPolicy policy
Confirm that this IAM Role has been assigned to the Amazon EC2 instance that is running the CloudWatch Agent
If it is still failing, check whether there are any credentials stored locally on the instance that the Agent could be using instead of the IAM Role assigned to the instance
I'm trying to setup an ubuntu server and login with a non-default user. I've used cloud-config with the user data to setup an initial user, and packer to provision the server:
system_info:
default_user:
name: my_user
shell: /bin/bash
home: /home/my_user
sudo: ['ALL=(ALL) NOPASSWD:ALL']
Packer logs in and provisions the server as my_user, but when I launch an instance from the AMI, AWS installs the authorized_keys files under /home/ubuntu/.ssh/
Packer config:
{
"variables": {
"aws_profile": ""
},
"builders": [{
"type": "amazon-ebs",
"profile": "{{user `aws_profile`}}",
"region": "eu-west-1",
"instance_type": "c5.large",
"source_ami_filter": {
"most_recent": true,
"owners": ["099720109477"],
"filters": {
"name": "*ubuntu-xenial-16.04-amd64-server-*",
"virtualization-type": "hvm",
"root-device-type": "ebs"
}
},
"ami_name": "my_ami_{{timestamp}}",
"ssh_username": "my_user",
"user_data_file": "cloud-config"
}],
"provisioners": [{
"type": "shell",
"pause_before": "10s",
"inline": [
"echo 'run some commands'"
]}
]
}
Once the server has launched, both ubuntu and my_user users exist in /etc/passwd:
my_user:1000:1002:Ubuntu:/home/my_user:/bin/bash
ubuntu:x:1001:1003:Ubuntu:/home/ubuntu:/bin/bash
At what point does the ubuntu user get created, and is there a way to install the authorized_keys file under /home/my_user/.ssh at launch instead of ubuntu?
To persist the default user when using the AMI to launch new EC2 instances from it you have to change the value is /etc/cloud/cloud.cfg and update this part:
system_info:
default_user:
# Update this!
name: ubuntu
You can add your public keys when you create the user using cloud-init. Here is how you do it.
users:
- name: <username>
groups: [ wheel ]
sudo: [ "ALL=(ALL) NOPASSWD:ALL" ]
shell: /bin/bash
ssh-authorized-keys:
- ssh-rsa AAAAB3Nz<your public key>...
Addding additional SSH user account with cloud-init