So I have this scenario where an Amazon EC2 instance in an Auto Scaling group will be terminated. My problem is that I don’t want it terminated until it has finished whatever it’s doing.
If I hook up a lambda, the lambda would check a metric, if this metric is > 0 then it needs to wait 60 seconds.
I have this done already, the problem is it may take more than the Max timeout for lambdas of 15 minutes, to finish the jobs it’s processing.
If I read correctly, the lifecycle notification is only sent once, so this lambda won’t work for me.
Is there any other way of doing this?
Here is how I would try to approach this problem (this is a needs a POC, the answer is theoretical):
Create an Auto Scaling Group
Put a lifecycle hook on this ASG like described here, sending notification to Amazon SNS
Create a launch script for instances which will do the following
subscribe to SNS on instance launch and start SNS listener script
SNS listener will wait for instance termination message, do whatever necessary to wait until instance is ready to terminate, including sending heartbeats if termination needs more than 1h and completing lifecycle hook (described here). This should also handle unsubscription from SNS.
Related
If a Lambda function has a concurrency>1, and there are several instances running, does a CloudWatch event Lambda trigger get sent to all the running instances?
The question wording is a little bit ambiguous. I will try my best to make it more clear.
If a Lambda function has a concurrency>1, and there are several instances running
I think OP is talking about reserved concurrency which is set to a value that's greater than 1. In other words, the function is not throttled by default and can run multiple instances in parallel.
does a CloudWatch event Lambda trigger get sent to all the running instances?
This part is ambiguous. #hephalump provided one interpretation in the question comment.
I have another interpretation. If you are asking whether the currently-running lambda containers will be reused after the job is done, then here is the answer:
Based on #hephalump's comment, now it's clear that one CloudWatch event will only trigger one lambda instance to run. If multiple events come in during a short period of time, then multiple lambda instances will be triggered to run in parallel. Back to the question, if all existing lambda instances of that function are busy running, then no container will be reused, and another new lambda instance will be spun up to handle this event. If one of the running instances has just finished its job, then that container along with the execution environment will be reused to handle this incoming event from CloudWatch.
Hope this helps.
I have a autoscalling in AWS, that basically do:
Run a python process script.py
This script get messages from sqs queue to process
My autoscalling is configured to start/terminate instance based on # of avaliables messages in queue. But sometimes when i processing something in machines and my # of messages, my autoscaling trigger to terminate instances, so i end losting message in the middle of processing.
I starting trying to handler signals but does not seem to be working.
My main goal is:
If i know that my instance will be terminate soon, i will wait finishing my current processes (i will not get any new message) and them i send a signal "OK" to AWS to shutdown the instance.
Is there anyway to archive this? I'm not using load balancing because i manually get the messages from queue.
you can use AWS autoscaling lifecycle hooks, they will put your ec2 instance in wait state before terminating it and deliver a message to SNS or cloudwatch that your instance is ready to terminate you can finish your already processed message in the mean time, i found an interesting blog post explaining the use case similar to yours .
AWS autoscaling lifecycle hooks
Cases to consider:
Given: An instance attached to an ELB with state InService
When: Its state changes from InService to OutOfService
Then: An AWS Lambda Function is invoked with the InstanceId as part of the event
Given: A fresh new instance, registered to an ELB for the first time
When: It is still in the process of starting up (and therefore not yet reporting positive in health checks)
Then: No AWS Lambda Function is invoked, since the desired state change has not occured
Solutions I have attempted
I set up a Lambda Function to poll a hard-coded list of ELBs every minute. It was successfully invoked for every instance with state: OutOfService. However this was not desirable, since it does not support the second case in the list of cases above.
I modified the health check function which reports back to the ELB. If health check failed, the ELB instance state of the instance from where it was being called. But this too was difficult to establish whether the instance was still starting up.
There are further options available to explore in the second solution but ideally I would prefer not to poll the ELB for information in order to trigger the Lambda Function. I would instead like the Lambda Function to recieve an event of such a transition (either through CloudWatch, SNS or something else if available).
Any insight into options I have not yet considered?
I would agree the polling for the status is not the right answer. One option to explore would be using cloudwatch alarms to trigger your lambda function.
Not 100% sure if there is a built in metric or if you would need to create a custom metric for this but from cloudwatch you can then trigger a SNS notification which in turn could trigger your lambda function.
I am trying to setup a EC2 Scaling group that scales depending on how many items are in an SQS queue.
When the SQS queue has items visible I need the Scaling group to have 1 instance available and when the SQS queue is empty (e.g. there are no visible or non-visible messages) I want there to be 0 instances.
Desired instances it set to 0, min is set to 0 and max is set to 1.
I have setup cloudwatch alarms on my SQS queue to trigger when visible messages are greater than zero, and also triggers an alarm when non visible messages are less than one (i.e no more work to do).
Currently the Cloudwatch Alarm Triggers to create an instance but then the scaling group automatically kills the instance to meet the desired setting. I expected the alarm to adjust the desired instance count within the min and max settings but this seems to not be the case.
Yes, you can certainly have an Auto Scaling group with:
Minimum = 0
Maximum = 1
Alarm: When ApproximateNumberOfMessagesVisible > 0 for 1 minute, Add 1 Instance
This will cause Auto Scaling to launch an instance when there are messages waiting in the queue. It will keep trying to launch more instances, but the Maximum setting will limit it to 1 instance.
Scaling-in when there are no messages is a little bit tricker.
Firstly, it can be difficult to actually know when to scale-in. If there are messages waiting to be processed, then ApproximateNumberOfMessagesVisible will be greater than zero. However, there are no messages waiting, it doesn't necessarily mean you wish to scale-in because messages might be currently processing ("in flight"), as indicated by ApproximateNumberOfMessagesNotVisible. So, you only want to scale-in if both of these are zero. Unfortunately, a CloudWatch alarm can only reference one metric, not two.
Secondly, when an Amazon SQS queue is empty, it does not send metrics to Amazon CloudWatch. This sort of makes sense, because queues are mostly empty, so it would be continually sending a zero metric. However, it causes a problem that CloudWatch does not receive a metric when the queue is empty. Instead, the alarm will enter the INSUFFICIENT_DATA state.
Therefore, you could create your alarm as:
When ApproximateNumberOfMessagesVisible = 0 for 15 minutes, Remove 1 instance but set the action to trigger on INSUFFICIENT_DATA rather than ALARM
Note the suggested "15 minutes" delay to avoid thrashing instances. This is where instances are added and removed in rapid succession because messages are coming in regularly, but infrequently. Therefore, it is better to wait a while before deciding to scale-in.
This leaves the problem of having instances terminated while they are still processing messages. This can be avoided by taking advantage of Auto Scaling Lifecycle Hooks, which send a signal when an instance is about to be terminated, giving the application the opportunity to delay the termination until work is complete. Your application should then signal that it is ready for termination only when message processing has finished.
Bottom line
Much of the above depends upon:
How often your application receives messages
How long it takes to process a message
The cost savings involved
If your messages are infrequent and simple to process, it might be worthwhile to continuously run a t2.micro instance. At 2c/hour, the benefit of scaling-in is minor. Also, there is always the risk when adding and removing instances that you might actually pay more, because instances are charged by the hour -- running an instance for 30 minutes, terminating it, then launching another instance for 30 minutes will actually be charged as two hours.
Finally, you could consider using AWS Lambda instead of an Amazon EC2 instance. Lambda is ideal for short-lived code execution without requiring a server. It could totally remove the need to use Amazon EC2 instances, and you only pay while the Lambda function is actually running.
for simple conf, with per sec aws ami/ubuntu billing dont worry about wasted startup/shutdown time, just terminate your ec2 by yourself, w/o any asg down policy add a little bash in client startup code or preinstal it in cron and poll for process presence or cpu load and term ec2 or shutdown (termination is better if you attach volumes and need 'em to autodestruct) after processing is done. there's ane annoying thing about asg defined as 0/0/1 (min/desired/max) with defaults and ApproximateNumberOfMessagesNotVisible on sqs - after ec2 is fired somehow it switches to 1/0/1 and it start to loop firing ec2 even if there's nothing is sqs (i'm doing video transcoding, queing jobs to do to sns/sqs and firing ffmpeg nodes with asg defined on non empty sqs)
I'm creating bunch of EBS snapshots as part of AWS Lambda. I need to capture events when these snapshots complete so I can create an ec2 instance based on these.
I could use snapshot waiter but this polls and sometimes snapshot creation can take long time. I don't want Lambda to keep running for a while and plus the maximum time for Lambda seems to be five minutes. I looked at CloudWatch and AWS Config to see if I can capture snapshot events but had no luck.
There is now a new Event when Snapshots are completed in AWS Cloudwatch Events:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-cloud-watch-events.html
You are correct -- there is no notification event that signifies completion of an EBS Snapshot. Instead, you would need to check the status until the status changes to completed.
You are also correct that AWS Lambda functions can run for a maximum of 5 minutes and having a Lambda function waiting on an external process is not a good architecture.
Instead, you could break-up the architecture:
Have your existing process trigger the EBS Snapshot(s) and then push a message into an SQS queue
Schedule a Lambda function (eg every 5 minutes) to check the SQS queue. If a message exists:
Retrieve details about the instance and snapshot(s) from the message
Check the status of the snapshot(s)
If the status is completed, perform the next step in the process
The down-side is that the scheduled Lambda function will trigger even when there are no messages in the queue. The Lambda function will exit very quickly (cost: 100ms).
The alternative is to run a cron script on an Amazon EC2 instance (or on any computer connected to the Internet). A t2.nano instance is about 15.6c per day, which might be more expensive than a schedule Lambda function. If you already have an instance being used, then there would be no additional cost.