CUstom Alarm used to stop ec2 instance - amazon-web-services

I am using CloudWatch alarm to stop ec2 instance. For my case I am pushing log information from my ec2 instance to CloudWatch via log grope. And I filter those information via filter with specific pattern that detect error messages due to failed authentication. Unlike standard ways to stop ec2 instance using CPU utilization. I am using a custom metric (figure). Then I am configuring the action to stop the ec2 instance (figure).
But my alarm appears with state "Insufficient data" all the time. can any one help me to solve the problem and stop my ec2 instance once it is in alarm (means that the logs match the pattern of the filter), Thanks a lot!

Related

Terminate specific ec2 instance in an autoscaling group

I've created aws cloudwatch alarm based on ASG's group metrics cpuutilization. It sends an email alert email whenever cpuutilization exceeds more than 99% for more than an hour.
Is there a way to execute an event/action that will terminate specific ec2 instances that triggered the alarm? These instances hang and has to be terminated.
I would create an additional alarm that would terminate any instance that reaches 99% cpu for an hour. This is directly supported by CloudWatch.
From Create Alarms to Stop, Terminate, Reboot, or Recover an Instance:
Using Amazon CloudWatch alarm actions, you can create alarms that automatically stop, terminate, reboot, or recover your EC2 instances. You can use the reboot and recover actions to automatically reboot those instances or recover them onto new hardware if a system impairment occurs.
See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/UsingAlarmActions.html
I feel possible solution for this requirement is to write AWS Cli script which would run probably every 15 mins and get list of all EC2 instances running and then terminate if needed. Also, need historical info for ec2's w/c cpu is at 100% for more than 45mins

How to track InsufficientInstanceCapacity from AWS EC2?

I have a Kubernetes cluster that relies on AWS EC2 spot requests.
I sometimes have this failure message from the aws auto-scaling group:
Could not launch Spot Instances. InsufficientInstanceCapacity - There is no Spot capacity available that matches your request. Launching EC2 instance failed.
I knew the downfall of using spot requests and that's not why I am here.
I'd like to track this kind of failed activity from my auto-scaling group and I did not find anything inside CloudWatch.
Is there any "legit" way of doing this?
The final aim is to have an alert where AWS does not have capacity for my instance request(s) so I can act appropriately.
I came across this question when I was looking for the same thing, and now I have found an answer!
You can detect this event by creating a Cloud Trail that logs management events for your account, and looking for an event where the EventName = RunInstances, and the ErrorCode field is populated.
I have seen this particular event come through as ErrorCode: Server.InsufficientInstanceCapacity.
There are a variety of ways to consume and alert on the Cloud Trail logs, including CloudWatch.

monitoring aws ec2 instance ports

I have an application running in EC2 that listen to many ports, some external devices connect to those ports to send data to my application. This is fine, but my client has a requirement that i must monitoring those ports and if one of them stop listening, the instance must be terminated and a new one started.
I was reading about couldwatch, but i didn't found an alarm that i can customize like this (doing requests to ports). Is it possible to do this using cloudwatch ? i'm looking for a direction to create this monitoring, using internal aws services or develop a new solution (maybe a sheel script).
thanks!
I'm not aware of any AWS provided EC2 healthcheck monitoring system for custom checks.
You could write an AWS lambda function which sends requests to the ports on the EC2 instance you require. You can then schedule that lambda to run periodically with whatever frequency you want with Cloudwatch Events. The lambda function could publish this as a metric to cloudwatch which would then make it possible for you to use it in an alarm and thus take action when whatever threshold you deem reasonable to spin up a new replacement instance.
One part of AWS that does have basically what you are looking for built-in though is ECS. Instead of an EC2 instance, you'd have a Docker instance (running on an EC2 instance or Fargate) which can have healthchecks defined.
There are many ways to do what you are asking for.
Simplest solution: I will write a boto3/shell script to monitor the port and call TerminateInstance API or use AWS CLI to terminate the current instance. Needless to say, you need to pass AWS credentials or attach instance profile with sufficient privileges to terminate the instance.
Using Cloudwatch: Have a script to check port status and send 1 or 0 (Dimension: Count) to Cloudwatch. Set a threshold in Cloudwatch if there is consecutive 0s or NoData, then terminate the instance. Or do not send any data to Cloudwatch if the port is not available and NoData in Cloudwatch can trigger TerminateInstance. See: Cloudwatch - AddingTerminateActions

How to monitor ECS Task Cycling?

As the question states, is there any way to monitor when ECS is constantly registering and deregistering instances due to some error causing my instances to crash? I would love to be able to create an alarm or something that notifies me if this is the case.
I am not able to put comment, so here are some thoughts.
I would run ECS cluster EC2 instances under an Auto-Scaling Group and based on ASG CloudWatch metrics, setup a SNS notification when instances are being added/removed.
We can have AWS ecs-agent docker container logs also sent to CloudWatch and get some SNS notifications based on errors or filtered events.
We can have subscription to CW from ECS as well when each service tasks being started/stopped.
References -
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch_event_stream.html
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwet.html
Example event entries are in below link –
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_cwe_events.html
Reference for setting alarm based on custom metrics.
https://medium.com/#martatatiana/insufficient-data-cloudwatch-alarm-based-on-custom-metric-filter-4e41c1f82050
Please do let me know your thoughts as well :).

Using a stop alarm with a g2.2xlarge instance on Amazon's ec2 aws

While working with a g2.2xlarge spot instance, I have tried to set up an alarm that will notify me when the average CPU usage over a two hour period has dropped below 5% and will then automatically stop the instance. Here's a link to a nice article Amazon wrote up on how to use the stop/start instance feature. The AWS alarms seem to allow you to do this however after the trigger goes off I get this reply:
Dear AWS customer,
We are unable to execute the 'Stop' action on Amazon EC2 instance i-e60e21ec that you specified in the Amazon CloudWatch alarm awsec2-i-e60e21ec-Low-CPU-Utilization.
You may want to check the alarm configuration to ensure that it is compatible with your instance configuration. You can also attempt to execute the action manually.
These are some possible reasons for this failure and steps you can try to resolve it:
Incompatible action selected:
Your instance’s configuration may not be compatible with the selected action.
To execute the 'Terminate' action, your instance may have Termination Protection enabled. Disable this feature if you want to terminate your instance. Once you do that, the alarm will execute the action after the next applicable alarm state change.
To execute the 'Stop' action, your instance’s root device type must be an EBS volume. If the root device type is the instance store, select the 'Terminate' action instead. Once you do that, the alarm will execute the action after the next applicable alarm state change.
Temporary service interruption: There may have been an issue with Amazon CloudWatch or Amazon EC2. We have retried the action without success. You can try to execute the action manually, or wait for the next applicable alarm state change.
Sincerely, Amazon Web Services
Stop seems to be an option for the free micro instance but not for these other instances. When I try to change the shutdown behavior to stop in actions it says:
An error occurred while changing the shutdown behavior of this instance.
Modifying 'instanceInitiatedShutdownBehavior is not supported for spot instances.
Is there another way to get around this problem or will we have to wait until Amazon makes this feature available?
Use standard instances instead of spot instances. Spot instances allow you to bid on extra capacity within ec2. However, they may automatically shut down if the spot price exceeds your bid.
Its not really intended for an always on instance.