CloudWatch auto alarm deletion executing multiple times - amazon-web-services

I have a piece of python script that is running in an AWS Lambda function that deletes a CloudWatch alarm when an EC2 instance is going to the Stopped state.
elif 'source' in event and event['source'] == 'aws.ec2' and event['detail']['state'] == 'stopped':
instanceID = event['detail']['instance-id']
GetAlarmNamePrefix = "AutoAlarm-" + instanceID
print(GetAlarmNamePrefix)
for instance in instanceID:
print("deleting alarms for instance :" + instanceID)
AlarmNamePrefix = GetAlarmNamePrefix
response = cloudwatch.describe_alarms(AlarmNamePrefix=AlarmNamePrefix,)
alarm_list = []
if 'MetricAlarms' in response:
for alarm in response['MetricAlarms']:
alarm_name = alarm['AlarmName']
alarm_list.append(alarm_name)
print(alarm_list)
cloudwatch.delete_alarms(AlarmNames=alarm_list)
This code is deleting the alarms fine but when I look at the Lambda function's execution logs in the CloudWatch Log Group I could see there is a huge number of events created for the same CloudWatch alarm multiple times.
Please help me to fix this code.

Take a look at these lines:
instanceID = event['detail']['instance-id']
GetAlarmNamePrefix = "AutoAlarm-" + instanceID
print(GetAlarmNamePrefix)
for instance in instanceID:
print("deleting alarms for instance :" + instanceID)
In theory, it is looping through each instance. However:
The print() statement is printing instanceID rather than instance
Nothing in the loop is actually referring to instance
The fact is, instanceID is a string that contains one Instance ID, as can be seen when printing GetAlarmNamePrefix.
Therefore, you can remove the for loop.
It is possible that multiple events are being passed to the Lambda function. However, the section of your code that extracts the event is not shown, so I can't comment on whether that should be changed.

Related

AWS Lambda function that automatically starts a stopped instance

I am having some trouble writing this function that will automatically start a stopped instance. Relatively new to this and just playing around.
I am able to start only a single instance by hardcoding the instance ID.
I am trying to filter for any instance with a state that is stopped.
This is my code:
import json
import boto3
region = 'us-east-1'
ec2 = boto3.resource('ec2')
instances = ec2.instances.filter(
Filters=[{'Name': 'instance-state-name', 'Values':['stopped']}])
def lambda_handler(event, context):
ec2 = boto3.client('ec2', region_name=region)
ec2.start_instances(InstanceIds=instances)
This Lambda function is trigged by an event.
I am getting an "invalid type for parameter InstanceIds" error. Must be list or tuple. Tried to loop through it but no link. Wondering if there is a simpler way to do this or if you have a suggestion?
You should iterate instances collection to get IDs (instance.id):
all_instance_ids=[]
for instance in instances:
all_instance_ids.append(instance.id)

How to be alerted of AWS volumes with State=Available?

How can I get an email alert when there are 1 or more EBS volumes in AWS with a state of 'Available'?
In AWS we have a team of people who manage EC2 instances. Sometimes instances are deleted and redundant volumes are left, showing as State = Available (seen here https://eu-west-1.console.aws.amazon.com/ec2/v2/home?region=eu-west-1#Volumes:sort=state).
I would like to be notified by e-mail when this happens, so I can manually review and delete them as required. A scheduled check and alert (e-mail) of once per day would be ok.
I think this should be possible via AWS Cloudwatch, but I can't see how to do it...
This is what I am using in an AWS Lambda process:
import boto3
ec2 = boto3.resource('ec2')
sns = boto3.client('sns')
def chk_vols(event, context):
vol_array = ec2.volumes.all()
vol_avail = []
for v in vol_array:
if v.state == 'available':
vol_avail.append(v.id)
if vol_avail:
sns.publish(
TopicArn='arn:aws:sns:<your region>:<your account>:<your topic>',
Message=str(vol_avail),
Subject='AWS Volumes Available'
)

AWS Lambda - Copy monthly snapshots to another region

I am trying to run a lambda that will kick off on a schedule to copy all snapshots taken the day prior to another region for DR purposes. I have a bit of code but it seems to not work as intended.
Symptoms:
It's grabbing the same snapshots multiple times and copying them
It always errors out on 2 particular snapshots, I don't know enough coding to write a log to figure out why. These snapshots work if I manually copy them though.
import boto3
from datetime import date, timedelta
SOURCE_REGION = 'us-east-1'
DEST_REGION = 'us-west-2'
ec2_source = boto3.client('ec2', region_name = SOURCE_REGION)
ec2_destination = boto3.client('ec2', region_name = DEST_REGION)
snaps = ec2_source.describe_snapshots(OwnerIds=['self'])['Snapshots']
yesterday = date.today() - timedelta(days = 1)
yesterday_snaps = [ s for s in snaps if s['StartTime'].date() == yesterday ]
for yester_snap in yesterday_snaps:
DestinationSnapshot = ec2_destination.copy_snapshot(
SourceSnapshotId = yester_snap['SnapshotId'],
SourceRegion = SOURCE_REGION,
Encrypted = True,
KmsKeyId='REMOVED FOR SECURITY',
DryRun = False
)
DestinationSnapshotID = DestinationSnapshot['SnapshotId']
ec2_destination.create_tags(Resources=[DestinationSnapshotID],
Tags=yester_snap['Tags']
)
waiter = ec2_destination.get_waiter('snapshot_completed')
waiter.wait(
SnapshotIds=[DestinationSnapshotID],
DryRun=False,
WaiterConfig={'Delay': 10,'MaxAttempts': 123}
)
Debugging
You can debug by simply putting print() statements in your code.
For example:
for yester_snap in yesterday_snaps:
print('Copying:', yester_snap['SnapshotId'])
DestinationSnapshot = ec2_destination.copy_snapshot(...)
The logs will appear in CloudWatch Logs. You can access the logs via the Monitoring tab in the Lambda function. Make sure the Lambda function has AWSLambdaBasicExecutionRole permissions so that it can write to CloudWatch Logs.
Today/Yesterday
Be careful about your definition of yesterday. Amazon EC2 instances run in the UTC timezone, so your concept of a today and yesterday might not match what is happening.
It might be better to add a tag to snapshots after they are copied (eg 'copied') rather than relying on dates to figure out which ones to copy.
CloudWatch Events rule
Rather than running this program once per day, an alternative method would be:
Create an Amazon CloudWatch Events rule that triggers on Snapshot creation:
{
"source": [
"aws.ec2"
],
"detail-type": [
"EBS Snapshot Notification"
],
"detail": {
"event": [
"createSnapshot"
]
}
}
Configure the rule to trigger an AWS Lambda function
In the Lambda function, copy the Snapshot that was just created
This way, the Snapshots are created immediately and there is no need to search for them or figure out which Snapshots to copy

Extracting EC2InstanceId from SNS/SQS Auto Scaling message

I'm using python Boto3 code, when an instance is terminated from Auto Scaling group it notifies SNS which publishes the message to SQS. Lambda is also triggered when SNS is notified, which executes a boto script to grab the message from SQS.
I am using reference code from Sending and Receiving Messages in Amazon SQS.
Here is the code snippet:
if messages.get('Messages'):
m = messages.get('Messages')[0]
body = m['Body']
print('Received and deleted message: %s' % body)
The result is:
START RequestId: 1234-xxxxxxxx Version: $LATEST
{
"Type" : "Notification",
"MessageId" : "d1234xxxxxx",
"TopicArn" : "arn:aws:sns:us-east-1:xxxxxxxxxx:AutoScale-Topic",
"Subject" : "Auto Scaling: termination for group \"ASG\"",
"Message" : "{\"Progress\":50,\"AccountId\":\"xxxxxxxxx\",\"Description\":\"Terminating EC2 instance: i-123456\",\"RequestId\":\"db-xxxxx\",\"EndTime\":\"2017-07-13T22:17:19.678Z\",\"AutoScalingGroupARN\":\"arn:aws:autoscaling:us-east-1:360695249386:autoScalingGroup:fef71649-b184xxxxxx:autoScalingGroupName/ASG\",\"ActivityId\":\"db123xx\",\"EC2InstanceId\":\"i-123456\",\"StatusCode\"\"}",
"Timestamp" : "2017-07-",
"SignatureVersion" : "1",
"Signature" : "",
"SigningCertURL" : "https://sns.us-east-1.amazonaws.com/..",
"UnsubscribeURL" : "https://sns.us-east-1.amazonaws.com/
}
I only need EC2InstanceId of the terminated instance not the whole message. How can I extract the ID?
If your goal is to execute an AWS Lambda function (having the EC2 Instance ID as a parameter), there is no need to also publish the message to an Amazon SQS queue. In fact, this would be unreliable because you cannot guarantee that the message being retrieved from the SQS queue matches the invocation of your Lambda function.
Fortunately, when Auto Scaling sends an event to SNS and SNS then triggers a Lambda function, SNS passes the necessary information directly to the Lambda function.
Start your Lambda function with this code (or similar):
def lambda_handler(event, context):
# Dump the event to the log, for debugging purposes
print("Received event: " + json.dumps(event, indent=2))
# Extract the EC2 instance ID from the Auto Scaling event notification
message = event['Records'][0]['Sns']['Message']
autoscalingInfo = json.loads(message)
ec2InstanceId = autoscalingInfo['EC2InstanceId']
Your code then has the EC2 Instance ID, without having to use Amazon SQS.
The instance id is in the message. It's raw JSON, so you can parse it with the json package and get the information.
import json
if messages.get('Messages'):
m = messages.get('Messages')[0]
body = m['Body']
notification_message = json.loads(body["Message"])
print('instance id is: %s' % notification_message["EC2InstanceId"])

Trying to disable all the Cloud Watch alarms in one shot

My organization is planning for a maintenance window for the next 5 hours. During that time, I do not want Cloud Watch to trigger alarms and send notifications.
Earlier, when I had to disable 4 alarms, I have written the following code in AWS Lambda. This worked fine.
import boto3
import collections
client = boto3.client('cloudwatch')
def lambda_handler(event, context):
response = client.disable_alarm_actions(
AlarmNames=[
'CRITICAL - StatusCheckFailed for Instance 456',
'CRITICAL - StatusCheckFailed for Instance 345',
'CRITICAL - StatusCheckFailed for Instance 234',
'CRITICAL - StatusCheckFailed for Instance 123'
]
)
But now, I was asked to disable all the alarms which are 361 in number. So, including all those names would take a lot of time.
Please let me know what I should do now?
Use describe_alarms() to obtain a list of them, then iterate through and disable them:
import boto3
client = boto3.client('cloudwatch')
response = client.describe_alarms()
names = [[alarm['AlarmName'] for alarm in response['MetricAlarms']]]
disable_response = client.disable_alarm_actions(AlarmNames=names)
You might want some logic around the Alarm Name to only disable particular alarms.
If you do not have the specific alarm arns, then you can use the logic in the previous answer. If you have a specific list of arns that you want to disable, you can fetch names using this:
def get_alarm_names(alarm_arns):
names = []
response = client.describe_alarms()
for i in response['MetricAlarms']:
if i['AlarmArn'] in alarm_arns:
names.append(i['AlarmName'])
return names
Here's a full tutorial: https://medium.com/geekculture/terraform-structure-for-enabling-disabling-alarms-in-batches-5c4f165a8db7