I have a Kubernetes cluster that relies on AWS EC2 spot requests.
I sometimes have this failure message from the aws auto-scaling group:
Could not launch Spot Instances. InsufficientInstanceCapacity - There is no Spot capacity available that matches your request. Launching EC2 instance failed.
I knew the downfall of using spot requests and that's not why I am here.
I'd like to track this kind of failed activity from my auto-scaling group and I did not find anything inside CloudWatch.
Is there any "legit" way of doing this?
The final aim is to have an alert where AWS does not have capacity for my instance request(s) so I can act appropriately.
I came across this question when I was looking for the same thing, and now I have found an answer!
You can detect this event by creating a Cloud Trail that logs management events for your account, and looking for an event where the EventName = RunInstances, and the ErrorCode field is populated.
I have seen this particular event come through as ErrorCode: Server.InsufficientInstanceCapacity.
There are a variety of ways to consume and alert on the Cloud Trail logs, including CloudWatch.
Related
I am using CloudWatch alarm to stop ec2 instance. For my case I am pushing log information from my ec2 instance to CloudWatch via log grope. And I filter those information via filter with specific pattern that detect error messages due to failed authentication. Unlike standard ways to stop ec2 instance using CPU utilization. I am using a custom metric (figure). Then I am configuring the action to stop the ec2 instance (figure).
But my alarm appears with state "Insufficient data" all the time. can any one help me to solve the problem and stop my ec2 instance once it is in alarm (means that the logs match the pattern of the filter), Thanks a lot!
Just looking the way to start/stop a AWS EC2 instance in case of CPU utilization increase or decrease on another EC2 instacne. I know there is service available Auto Scaling in AWS but I have a scenario where I can't take advantage of this service.
So just looking if it is possible or anyone can help me on this.
Just detailing the concern like suppose I have 2 EC2 instance on AWS account by name EC21 and EC22. By default, EC22 instance is stopped.
Now I need to setup CloudWatch or any other service to check if load/CPU utilization increase on EC21 instance by 70% then need to start EC22 server and similarly if load decrease on EC21 instance by 30% then stop EC22 server.
Please advice!
When your CloudWatch alarm is triggered, it will notify an SNS topic. You can have that SNS topic then invoke a Lambda function, which can then start your EC2 instance.
Create an AWS Lambda function that starts your EC2 instance.
Configure your SNS topic to invoke your Lambda function when it receives messages. You can read about that here: Invoking Lambda functions using Amazon SNS notifications
Finally, ensure your CloudWatch alert sends messages to the SNS topic.
Yes this is possible for certain types of EC2 instances. Check this detailed guide using which you can set up the triggers in your EC2 instances based on AWS Cloud Watch metrics.
http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/UsingAlarmActions.html
I think your problem might fit the scenario which I'm also trying to solve now - I have some functionality which cannot be solved with Lambdas because of their low lifetime, so I need a relatively short-lived EC2 instance to accomplish the task.
The solution is similar to the one described by Matt, but without SNS, using AWS triggers to launch a lambda function to start the instance. Added benefit is that the lambda function can itself verify whether the EC2 start is really needed.
How do I stop and start Amazon EC2 instances at regular intervals using AWS Lambda?
Issue
I want to reduce my Amazon Elastic Cloud Compute (Amazon EC2) usage by
stopping and starting instances at predefined times or utilization
thresholds. Can I configure AWS Lambda and Amazon CloudWatch to help
me do that automatically?
Short Description
You can use a CloudWatch Event to trigger a Lambda function to start
and stop your EC2 instances at scheduled intervals.
Source: AWS Knowledge Center
As the question states, is there any way to monitor when ECS is constantly registering and deregistering instances due to some error causing my instances to crash? I would love to be able to create an alarm or something that notifies me if this is the case.
I am not able to put comment, so here are some thoughts.
I would run ECS cluster EC2 instances under an Auto-Scaling Group and based on ASG CloudWatch metrics, setup a SNS notification when instances are being added/removed.
We can have AWS ecs-agent docker container logs also sent to CloudWatch and get some SNS notifications based on errors or filtered events.
We can have subscription to CW from ECS as well when each service tasks being started/stopped.
References -
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch_event_stream.html
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwet.html
Example event entries are in below link –
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwe_events.html
Reference for setting alarm based on custom metrics.
https://medium.com/#martatatiana/insufficient-data-cloudwatch-alarm-based-on-custom-metric-filter-4e41c1f82050
Please do let me know your thoughts as well :).
I have launched an ec2 instance from an ami using lambda function.
I haven't enabled detailed monitoring. Now I want to keep track of the instances each time lambda set triggered to launch an instance. I want to get an email with instance id and status of that, when an instance is launched, stops/terminates and instances which running more than 2 hours. I tried cloudwatch, but is instance specific can't get configured for a newly launched instance. I can uses SNS, but how to keep track of these?
Use AWS Cloudtrail: http://aws.amazon.com/cloudtrail/
It gives more info than you are asking for. In cloudtrail, enable SNS Notifications of API activity and set a filter to notify you only when an instance creation/start/stop/terminate etc., For instances that are running for more than 2 hours, you can explore if cloudtrail provides it or it is very easy to write a script using Boto to fetch that information.
There are some AWS partners providing a similar service. Hope this helps.
I think you will be better off using an auto scale group and then use the auto scale life cycle hook to get notifications whenever any instance is added or removed.
http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/adding-lifecycle-hooks.html
When you use the life cycle hooks, you can have an SNS notification to not only get intimidated about the new instance but also take custom actions.
You can still use the aws lambda function to continue adding and removing new instances, only that you will increase and decrease the size of the auto scaling group.
Just wondering if the AWS cloudwatch runs on the same VPC where i have all my applications are running?
Is there any chance that AWS cloudwatch might go down and we may loose the monitoring capability?
Do we need to have a monitoring mechanism to check the Cloudwatch health?
Thanks
AWS Cloudwatch isn't run on your instances. Its infrastructure is fully managed by Amazon and independent from your VPC. You can see it as a SaaS (Software as a Service).
So you don't have to worry about that. For more informations, please see: https://aws.amazon.com/cloudwatch/
Cloudwatch collects data from the host OS, where your VMs are actually running.
If the physical server had a significant issue both cloudwatch and your VM would go down but in that case the VM would get started automatically on another physical server. In such a case, recovery would be usually quite quickly.
You don't need to check Cloudwatch at all because AWS handles that but you could add alerts for things such as CPU usage on your VMs.
Because Cloudwatch doesn't run on your machines it can't know some things such as memory usage, disk space usage or others so if you need more advanced monitoring capabilities you might consider running something like collectd inside your virtual machine.
Just wondering if the AWS cloudwatch runs on the same VPC where i have all my applications are running?
If you chose to install CloudWatch Agent on your EC2 then only it runs in your EC2 and thus in the VPC your EC2 is provisioned.
CloudWatch service that publishes/maintain logs, metrics, alarms etc is managed by AWS and runs outside your VPC.
CloudWatch has a SLA of 99.9%
https://aws.amazon.com/cloudwatch/sla/
Is there any chance that AWS cloudwatch might go down and we may loose the monitoring capability?
CloudWatch like any other service can have outages and it did have some in the past but I have never seen any data getting lost, only temporarily not being available or slow to retrieve during the outage.
Do we need to have a monitoring mechanism to check the Cloudwatch health?
SLA is already 99.9% for CloudWatch Service so chances of catching a blip is very rare on your own monitoring mechanism.
If you are using CloudWatch Agent then consider checking health of agent to make sure it is in running state (you can use AWS System Manager Run command).