How to verify volume successfully created/attached in boto3? - amazon-web-services

I'm using boto3 client.create_volume and client.attach_volume APIs, but the return values are dictionaries, and the key State within the dictionary is creating for create_volume, and attaching for attach_volume. Is there any way to check if the volume is successfully created/attached within boto3?

Fortunately, boto3 has a concept called Waiters that can do the waiting for you!
See: EC2.Waiter.VolumeInUse
Polls EC2.Client.describe_volumes() every 15 seconds until a successful state is reached. An error is returned after 40 failed checks.
For those using ec2 client (ec2 = boto3.client('ec2')), you can do
ec2.get_waiter('volume_available').wait(VolumeIds=[new_volume['VolumeId']])

See describe_volumes
Pass your volume_id and describe_volumes returns the information about:
Creation State:
'State': 'creating'|'available'|'in-use'|'deleting'|'deleted'|'error'
Attachment State:
'State': 'attaching'|'attached'|'detaching'|'detached'
and lot more information about your volume.

Related

Restart EC2 instance on Website unavailability

I have a website hosted on an EC2 server. I want to monitor the website endpoint and restart the EC2 instance if the website in unavailable for a certain time frame (say 60 seconds).
What tools do I use in AWS and how do I accomplish this?
This is not a recommended approach.
Firstly, if a website is unavailable, you would probably want to investigate the cause rather than just restarting the instance. Your goal should be to run a stable system by removing root causes of problems rather than just ignoring the problem by restarting all the time.
The recommended design would be to run in a Highly Available configuration with:
The application running on at least two servers across at least two Availability Zones (in case of failure of an AZ). This is not necessarily more expensive because each server can be smaller than a single, large server.
A load balancer in front of the instances, distributing the traffic to the instances. The load balancer also performs continuous health checks and stops sending requests to servers that fail the health check
An Auto Scaling group that can terminate unhealthy instances and automatically launch replacement servers. This also works well if an Availability Zone should fail.
In this design, an unhealthy instance would be terminated (stopped and destroyed) and a new instance created with a pre-defined disk image and startup script. Alternatively, you might choose to move bad instances out of the Auto Scaling group for investigation of the problem, with a new instance being launched to take its place.
If your application requires a database, the database should be external to the instances so that all instances can connect to the database and replacing application instances does not cause any data loss.
As to the speed of noticing problems on a server, the load balancer can perform checks every few seconds. Amazon CloudWatch, on the other hand, would need at least a minute to detect problems (probably longer since metrics are calculated over a period rather than being "now" metrics).
John's approach is the correct one, but at its simplest:
Write a lambda function that can query your website and see if it is running or not and if not have that lambda function restart the instance.
Setup a cloudwatch event rule that runs on a frequency you determine to call the lambda function
I'll leave to you the work of writing the code that determines if the website is functional and restarting the server - but that is pretty straightforward. You can use python, java, node, go or .net core in your lambda function - I would think python would be the easiest in this case, but that is an opinion.
It is clear that this is not a best practice in AWS but can make some sense - e.g. you are running a small personal web server with low demand where availability is a less issue than costs.
At least that was my reason why I built automation for it.
diagram
lambda code
import json
import os
import boto3
import time
env_vars = [
'ALARM_NAME',
'REGION',
'INSTANCE_ID',
'OUTPUT_SNS_ARN'
]
ENV = {}
for env_var in env_vars:
ENV[env_var] = os.environ.get(env_var, None)
if not ENV[env_var]:
raise Exception(f"Environment variable {env_var} must be set!")
def reboot_instance(instanceID, regionName) -> "instanceID":
"""
InstanceID
instanceID - ID of instance
regionName - name of region
return InstanceID or False in case of exception
"""
ec2 = boto3.resource('ec2', region_name=regionName)
instance = ec2.Instance(instanceID)
try:
instance.stop()
time.sleep(30)
instance.stop(Force=True)
except:
pass
for i in range(180): # wait 3 minutes
instance = ec2.Instance(instanceID)
if instance.state['Code'] == 80:
break
time.sleep(1)
else:
raise Exception('Unable to stop instance')
instance.start()
return instanceID
def notify_about_reboot(instanceID, snsarn) -> True:
"""
Put SNS message about reboot to snsarn
"""
client = boto3.client('sns', region_name='us-east-1')
client.publish(TopicArn=snsarn, Message=f'EC2 instance {instanceID} was rebooted!')
return True
def lambda_handler(event, context) -> "status about reboot":
"""
event: see events/event.json
"""
print('EVENT:')
print(event)
for record in event.get('Records', None):
sns = record.get('Sns', None)
message = json.loads(sns.get('Message', None))
msgalarm = message.get('AlarmName', None)
msgstatus = message.get('NewStateValue', None)
if not all([sns,message,msgalarm,msgstatus]):
continue
if (msgalarm == ENV['ALARM_NAME']) and (msgstatus == 'ALARM'):
notify_about_reboot(reboot_instance(ENV['INSTANCE_ID'], ENV['REGION']), ENV['OUTPUT_SNS_ARN'])
return 'rebooting'
else:
return 'nothing to do'
return 'no sns record found'
I have released whole tested automation with SAM template and installation instructions also on https://github.com/koss822/misc/tree/master/Aws/route53-healthcheck-instance-reboot

what's the proper way to check if terminating AWS EC2 instance is successful using boto3?

I use the following code to terminate an aws EC2 instance. What is the proper way to check whether the termination is successful?
s = boto3.Session(profile_name='dev')
ec2 = s.resource('ec2', region_name='us-east-1')
ins = ec2.Instance(instance_id)
res = ins.terminate()
Should I check whether
res['TerminatingInstances'][0]['CurrentState']['Name']=='shutting-down'
Or ignore res and describe the instance again to check?
The best way is to use the EC2.Waiter.InstanceTerminated waiter.
It polls EC2.Client.describe_instances() every 15 seconds until a successful state is reached. An error is returned after 40 failed checks.
import boto3
client = boto3.client('ec2')
waiter = client.get_waiter('instance_terminated')
client.terminate_instances(InstanceIds=['i-0974da9ff5318c395'])
waiter.wait(InstanceIds=['i-0974da9ff5318c395'])
The program exited once the instance was in the terminating state.

Can I schedule a lambda function execution with a lambda function?

I'm looking for the ability to programmatically schedule a lambda function to run a single time with another lambda function. For example, I made a request to myFirstFunction with date and time parameters, and then at that date and time, have mySecondFunction execute. Is that possible only with stateless AWS services? I'm trying to avoid an always-on ec2 instance.
Most of the results I'm finding for scheduling a lambda functions have to do with cloudwatch and regularly scheduled events, not ad-hoc events.
This is a perfect use case for aws step functions.
Use Wait state with SecondsPath or TimestampPath to add the required delay before executing the Next State.
What you're tring to do (schedule Lambda from Lambda) it's not possible with the current AWS services.
So, in order to avoid an always-on ec2 instance, there are other options:
1) Use AWS default or custom metrics. You can use, for example, ApproximateNumberOfMessagesVisible or CPUUtilization (if your app fires a big CPU utilization when process a request). You can also create a custom metric and fire it when your instance is idle (depending on the app that's running in your instance).
The problem with this option is that you'll waste already paid minutes (AWS always charge a full-hour, no matter if you used your instance for 15 minutes).
2) A better option, in my opinion, would be to run a Lambda function once per minute to check if your instances are idle and shut them down only if they are close to the full hour.
import boto3
from datetime import datetime
def lambda_handler(event, context):
print('ManageInstances function executed.')
environments = [['instance-id-1', 'SQS-queue-url-1'], ['instance-id-2', 'SQS-queue-url-2'], ...]
ec2_client = boto3.client('ec2')
for environment in environments:
instance_id = environment[0]
queue_url = environment[1]
print 'Instance:', instance_id
print 'Queue:', queue_url
rsp = ec2_client.describe_instances(InstanceIds=[instance_id])
if rsp:
status = rsp['Reservations'][0]['Instances'][0]
if status['State']['Name'] == 'running':
current_time = datetime.now()
diff = current_time - status['LaunchTime'].replace(tzinfo=None)
total_minutes = divmod(diff.total_seconds(), 60)[0]
minutes_to_complete_hour = 60 - divmod(total_minutes, 60)[1]
print 'Started time:', status['LaunchTime']
print 'Current time:', str(current_time)
print 'Minutes passed:', total_minutes
print 'Minutes to reach a full hour:', minutes_to_complete_hour
if(minutes_to_complete_hour <= 2):
sqs_client = boto3.client('sqs')
response = sqs_client.get_queue_attributes(QueueUrl=queue_url, AttributeNames=['All'])
messages_in_flight = int(response['Attributes']['ApproximateNumberOfMessagesNotVisible'])
messages_available = int(response['Attributes']['ApproximateNumberOfMessages'])
print 'Messages in flight:', messages_in_flight
print 'Messages available:', messages_available
if(messages_in_flight + messages_available == 0):
ec2_resource = boto3.resource('ec2')
instance = ec2_resource.Instance(instance_id)
instance.stop()
print('Stopping instance.')
else:
print('Status was not running. Nothing is done.')
else:
print('Problem while describing instance.')
UPDATE - I wouldn't recommend using this approach. Things changed in when TTL deletions happen and they are not close to TTL time. The only guarantee is that the item will be deleted after the TTL. Thanks #Mentor for highlighting this.
2 months ago AWS announced DynamoDB item TTL, which allows you to insert an item and mark when you wish for it to be deleted. It will be deleted automatically when the time comes.
You can use this feature in conjunction with DynamoDB Streams to achieve your goal - your first function inserts an item to a DynamoDB table. The record TTL should be when you want the second lambda triggered. Setup a stream that triggers your second lambda. In this lambda you will identify deletion events and if that's a delete then run your logic.
Bonus point - you can use the table item as a mechanism for the first lambda to pass parameters to the second lambda.
About DynamoDB TTL:
https://aws.amazon.com/blogs/aws/new-manage-dynamodb-items-using-time-to-live-ttl/
It does depend on your use case, but the idea that you want to trigger something at a later date is a common pattern. The way I do it serverless is I have a react application that triggers an action to store a date in the future. I take the date format like 24-12-2020 and then convert it using date(), having researched that the date format mentioned is correct, so I might try 12-24-2020 and see what I get(!). When I am happy I convert it to a Unix number in javascript React I use this code:
new Date(action.data).getTime() / 1000
where action.data is the date and maybe the time for the action.
I run React in Amplify (serverless), I store that to dynamodb (serverless). I then run a Lambda function (serverless) to check my dynamodb for any dates (I actually use the Unix time for now) and compare the two Unix dates now and then (stored) which are both numbers, so the comparison is easy. This seems to me to be super easy and very reliable.
I just set the crontab on the Lambda to whatever is needed depending on the approximate frequency required, in most cases running a lambda every five minutes is pretty good, although if I was only operating this in a certain time zone for a business weekday application I would control the Lambda a little more. Lambda is free for the first 1m functions per month and running it every few minutes will cost nothing. Obviously things change, so you will need to look that up in your area.
You will never get perfect timing in this scenario. It will, however, for the vast majority of use cases be close enough according to the timing settings of the Lambda function, you could set it up to check every minute or just once per day, it all depends on your application.
Alternatively, If I wanted an instant reaction to an event I might use SMS, SQS, or Kinesis to instantly stream a message, it all depends on your use case.
I'd opt for enqueuing deferred work to SQS using message timers in myFirstFunction.
Currently, you can't use SQS as a Lambda event source, but you can either periodically schedule mySecondFunction to check the queue via scheduled CloudWatch Events (somewhat of a variant of the other options you've found) or use a CloudWatch alarm on the ApproximateNumberOfMessagesVisible to fire an SNS message to a Lambda and avoid constant polling for queues that are frequently inactive for long periods.

Hello World PipeLine with ShelCommandlActivity

I'm trying to create a simple dataFlow pipeline with a single Activity of ShellCommandActivity type. I've attached the configuration of the activity and ec2 resource.
When I execute this the Ec2Resource sits in the WAITING_ON_DEPENDENCIES state then after sometime changes to TIMEDOUT. The ShellCommandActivity is always in the CANCELED state. I see the instance launch and very quicky changes to the terminated stated.
I've specified a s3 log file url, but that never gets updated.
Can anyone give me any pointers? Also is there any guidance out there on debugging this?
Thanks!!
You are currently forcing your instance to shut down after 1 minute which gives the TIMEOUT status if it can't execute in that time. Try increasing it to 50 minutes.
Also make sure you are using an AMI that runs Amazon Linux and that you are using full absolute paths in your scripts.
S3 log files are written as:
s3://bucket/folder/

EBS volume stuck on 'creating' when using Boto API

I'm attempting to create and attach a new EBS volume to an existing instance using Boto. The Boto script is running on the instance itself.
The problem is that the status continuously returns 'creating' much of the time. (Frustratingly, not always!) The code snippet is:
volume = conn.create_volume(args.ebs_volume_size, instance.placement)
status = ''
while status != 'available':
status = conn.get_all_volumes([volume.id])[0].status
print "Volume status: %s" % status
time.sleep(4)
Most of the time, it hangs on 'creating', even though the volume is created and available (it can be seen in the management console as ready to go). Sometimes, it works fine. I must be missing something obvious... but what?
Right after you run you create_volume method, call update on the newly created volume.
volume = conn.create_volume(args.ebs_volume_size, instance.placement)
while volume.status != 'available':
time.sleep(5)
volume.update()
print volume.status