As the question states, is there any way to monitor when ECS is constantly registering and deregistering instances due to some error causing my instances to crash? I would love to be able to create an alarm or something that notifies me if this is the case.
I am not able to put comment, so here are some thoughts.
I would run ECS cluster EC2 instances under an Auto-Scaling Group and based on ASG CloudWatch metrics, setup a SNS notification when instances are being added/removed.
We can have AWS ecs-agent docker container logs also sent to CloudWatch and get some SNS notifications based on errors or filtered events.
We can have subscription to CW from ECS as well when each service tasks being started/stopped.
References -
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch_event_stream.html
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwet.html
Example event entries are in below link –
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwe_events.html
Reference for setting alarm based on custom metrics.
https://medium.com/#martatatiana/insufficient-data-cloudwatch-alarm-based-on-custom-metric-filter-4e41c1f82050
Please do let me know your thoughts as well :).
Related
I am using CloudWatch alarm to stop ec2 instance. For my case I am pushing log information from my ec2 instance to CloudWatch via log grope. And I filter those information via filter with specific pattern that detect error messages due to failed authentication. Unlike standard ways to stop ec2 instance using CPU utilization. I am using a custom metric (figure). Then I am configuring the action to stop the ec2 instance (figure).
But my alarm appears with state "Insufficient data" all the time. can any one help me to solve the problem and stop my ec2 instance once it is in alarm (means that the logs match the pattern of the filter), Thanks a lot!
I have a Kubernetes cluster that relies on AWS EC2 spot requests.
I sometimes have this failure message from the aws auto-scaling group:
Could not launch Spot Instances. InsufficientInstanceCapacity - There is no Spot capacity available that matches your request. Launching EC2 instance failed.
I knew the downfall of using spot requests and that's not why I am here.
I'd like to track this kind of failed activity from my auto-scaling group and I did not find anything inside CloudWatch.
Is there any "legit" way of doing this?
The final aim is to have an alert where AWS does not have capacity for my instance request(s) so I can act appropriately.
I came across this question when I was looking for the same thing, and now I have found an answer!
You can detect this event by creating a Cloud Trail that logs management events for your account, and looking for an event where the EventName = RunInstances, and the ErrorCode field is populated.
I have seen this particular event come through as ErrorCode: Server.InsufficientInstanceCapacity.
There are a variety of ways to consume and alert on the Cloud Trail logs, including CloudWatch.
I have a utility in Lambda which does automatic DNS registration via an intelligent automated process. Instances that are created with a Name tag including XXXXXX and a autodns tag set to true will have Route 53 records created and deleted when these instances are created and deleted.
Previously, I had been using an autoscaling event listener on targeted autoscaling groups, but this has the unfortunate side-effect of not catching events when autoscaling groups are initially created, as the ASG needs to be created before the subscription can be, so I'm missing instances. A workaround that I've used is to just schedule the Lambda execution every minute and have it search and apply actions, but this is severely limiting.
Is there a way for me to listen to EC2 to receive instance creation and deletion events for all EC2 instances? I have been digging around in CloudWatch and haven't found anything useful.
Yes, you can use Amazon CloudWatch Events to trigger an AWS Lambda function when an Amazon EC2 instance changes state.
Just looking the way to start/stop a AWS EC2 instance in case of CPU utilization increase or decrease on another EC2 instacne. I know there is service available Auto Scaling in AWS but I have a scenario where I can't take advantage of this service.
So just looking if it is possible or anyone can help me on this.
Just detailing the concern like suppose I have 2 EC2 instance on AWS account by name EC21 and EC22. By default, EC22 instance is stopped.
Now I need to setup CloudWatch or any other service to check if load/CPU utilization increase on EC21 instance by 70% then need to start EC22 server and similarly if load decrease on EC21 instance by 30% then stop EC22 server.
Please advice!
When your CloudWatch alarm is triggered, it will notify an SNS topic. You can have that SNS topic then invoke a Lambda function, which can then start your EC2 instance.
Create an AWS Lambda function that starts your EC2 instance.
Configure your SNS topic to invoke your Lambda function when it receives messages. You can read about that here: Invoking Lambda functions using Amazon SNS notifications
Finally, ensure your CloudWatch alert sends messages to the SNS topic.
Yes this is possible for certain types of EC2 instances. Check this detailed guide using which you can set up the triggers in your EC2 instances based on AWS Cloud Watch metrics.
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/UsingAlarmActions.html
I think your problem might fit the scenario which I'm also trying to solve now - I have some functionality which cannot be solved with Lambdas because of their low lifetime, so I need a relatively short-lived EC2 instance to accomplish the task.
The solution is similar to the one described by Matt, but without SNS, using AWS triggers to launch a lambda function to start the instance. Added benefit is that the lambda function can itself verify whether the EC2 start is really needed.
How do I stop and start Amazon EC2 instances at regular intervals using AWS Lambda?
Issue
I want to reduce my Amazon Elastic Cloud Compute (Amazon EC2) usage by
stopping and starting instances at predefined times or utilization
thresholds. Can I configure AWS Lambda and Amazon CloudWatch to help
me do that automatically?
Short Description
You can use a CloudWatch Event to trigger a Lambda function to start
and stop your EC2 instances at scheduled intervals.
Source: AWS Knowledge Center
I'm wanting to apply a CloudWatch alarm to stop instances which aren't being used in our pre-production environment. We often have instances being spun up, used and then left turned on which is really starting to cost us a fair amount of money.
CloudWatch alarms have a handy feature whereby we can stop based on some metrics - this is awesome and what I'd like to use to constantly keep an eye on the servers with but let it tidy up the instances for me.
The problem with this is that it appears that the CloudWatch alarms need to be created individually against each instance. Is there a way in which I can create one alarm which would share values across all current and future instances which will be started?
ETA - Alternatively, tell me that these options are better than CloudWatch and I'll be happy at that.
AWS EC2 stop all through PowerShell/CMD tools
Add a startup script that creates the CloudWatch alarm to the base image you use to generate your VMs.
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/CLIReference.html
I don't believe this is possible - CloudWatch seems designed to be 'very manual' or 'very automated'. i.e. You can't setup one alarm which would go off if any one instance is idle, you have to setup individual alarms for each instance.
A couple of possible solutions, which are probably not what you want to hear:
Script your instance creation, and add a call to cloudwatch to create an alarm for each instance.
Run a service continually, which looks for instances and checks to ensure that there is an alarm for the instance, create alarms for the new instances, and remove alarms for instances which have been terminated.
I think what you are actually looking for would be auto-scaling:
https://aws.amazon.com/documentation/autoscaling/