Stop traffic to unhealthy instances without replacing them by auto scaling - amazon-web-services

Using ELB for TCP protocol and AWS Auto Scaling I run into the following problem when scaling out.
three EC2 instances each with 2,000 connections
scaling out because that is my specified threshold
a new instance gets added by Auto Scaling
How can I stop now traffic going to the three EC2 instances which have too many connections?
Removing it from ELB will mean that it will get terminated after a maximum of 1h using connection draining. Bad: TCP connections will get closed.
Marking the EC2 instance as unhealthy using CloudWatch. Bad: Auto Scaling will detect and replace unhealthy instances
Detaching EC2 instance from Auto Scaling group manually via AWS CLI.
Bad: Detaching it from Auto Scaling will also remove it from ELB, see 1.
The only possible solution I can see here and I am not sure if it is feasible:
Using CloudWatch mark the EC2 instance as unhealthy. ELB will stop distributing traffic to it. At the same time update the EC2 health for Auto Scaling manually:
aws autoscaling set-instance-health --instance-id i-123abc45d –-health-status healthy
This should override the health in a way that ELB will continue to ignore the EC2 instance and AWS Auto Scaling will not try to replace the instance. Would that work or is there a better solution?

Related

Does AWS Autoscaling marks "stopped" instance as unhealty and spins a new instance?

I am working on building a quick start and trying to understand below statement from https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Stop_Start.html
"If your instance is in an Auto Scaling group, the Amazon EC2 Auto Scaling service marks the stopped instance as unhealthy, and may terminate it and launch a replacement instance. For more information, see Health Checks for Auto Scaling Instances in the Amazon EC2 Auto Scaling User Guide."
Can someone help cases that may lead to termination or is it always a termination?

How to debug EC2 instances where custom health checks fail

I have an auto-scaling group with EC2 that implement a custom health.
From time to time, the health check fails and instances are terminated and replaced.
The health check itself is implemented as a shell script that runs on the instances. If it detects problems, it will inform the auto scaling group via the AWS API:
aws autoscaling set-instance-health --instance-id $instance --health-status Unhealthy
The problem is only that I have no information about what check failed, beside the notification:
Cause: At 2017-06-13T09:11:47Z an instance was taken out of service in response to a user health-check
What is the recommended way to debug these type of problems. Is there a way to make AWS only stop instances and not terminate them, so their disk state could be inspected?
(First I thought about "enable termination protection", but from my understanding this will not make a difference, here. Autoscaling group will still terminate the instances when the shutdown was requested by a failing custom health check.)
Using the set-instance-health command tells Auto Scaling that the instance is unhealthy and needs to be replaced. Auto Scaling will then terminate the unhealthy instance and launch a new instance to replace it.
If you wish to perform forensic analysis on an unhealthy instance, remove it from the Auto Scaling group with the aws autoscaling detach-instances command:
Removes one or more instances from the specified Auto Scaling group. After the instances are detached, you can manage them independent of the Auto Scaling group.
If you do not specify the option to decrement the desired capacity, Auto Scaling launches instances to replace the ones that are detached.
If there is a Classic Load Balancer attached to the Auto Scaling group, the instances are deregistered from the load balancer. If there are target groups attached to the Auto Scaling group, the instances are deregistered from the target groups.
So, instead of calling set-instance-health, call detach-instances (and optionally replace it). You can then debug the instance. If you wish to send it back into service, use aws autoscaling attach-instances.

AWS Beanstalk, how to reboot (or terminate) automatically an instance that is not responding

I have my Beanstalk environment with a "Scaling Trigger" using "CPUUtilization" and it works well.
The problem is that I can not combine this with a system that automatically reboots (or terminate) instances that have been considered "OutOfService" for a certain amount of time.
Into the "Scaling > Scaling Trigger > Trigger measurement" there is the option of "UnHealthyHostCount". But this won't solve my problem optimally, because it will create new instances as far there is one unhealthy, this will provoque my environment to grow until the limit without a real reason. Also, I can not combine 2 "Trigger measurements" and I need the CPU one.
The problem becomes crucial when there is only one instance in the environment, and it becomes OutOfService. The whole environment dies, the Trigger measurement is never triggered.
If you use Classic Load Balancer in your Elastic Beanstalk.
You can go to EC2 -> Auto Scaling Groups.
Then change the Health Check Type of the load balancer from EC2 to ELB.
By doing this, your instances of the Elastic Beanstalk will be terminated once they are not responding. A new instance will be created to replace the terminated instance.
AWS Elastic Beanstalk uses AWS Auto Scaling to manage the creation and termination of instances, including the replacement of unhealthy instances.
AWS Auto Scaling can integrate with the ELB (load balancer), also automatically created by Elastic Beanstalk, for health checks. ELB has a health check functionality. If the ELB detects that an instance is unhealthy, and if Auto Scaling has been configured to rely on ELB health checks (instead of the default EC2-based health checks), then Auto Scaling automatically replaces that instance that was deemed unhealthy by ELB.
So all you have to do is configure the ELB health check properly (you seem to have it correctly configured already, since you mentioned that you can see the instance being marked as OutOfService), and you also have to configure the Auto Scaling Group to use the ELB health check.
For more details on this subject, including the specific steps to configure all this, check these 2 links from the official documentation:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.healthstatus.html#using-features.healthstatus.understanding
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/environmentconfig-autoscaling-healthchecktype.html
This should solve the problem. If you have trouble with that, please add a comment with any additional info that you might have after trying this.
Cheers!
You can setup a CloudWatch alarm to reboot the unhealthy instance using StatusCheckFailed_Instance metric.
For detailed information on each step, go through the Adding Reboot Actions to Amazon CloudWatch Alarms section in the following AWS Documentation.
If you want Auto Scaling to replace instances whose application has stopped responding, you can use a configuration file to configure the Auto Scaling group to use Elastic Load Balancing health checks. The following example sets the group to use the load balancer's health checks, in addition to the Amazon EC2 status check, to determine an instance's health.
Example .ebextensions/autoscaling.config
Resources:
AWSEBAutoScalingGroup:
Type: "AWS::AutoScaling::AutoScalingGroup"
Properties:
HealthCheckType: ELB
HealthCheckGracePeriod: 300
See: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/environmentconfig-autoscaling-healthchecktype.html

Will setting health check to ELB instead of EC2 ignore EC2 metrics like CPU Utilization?

If the autoscaling group's health check type is set to ELB then it will automatically remove any instances that fail the ELB health checks ( set in the healthcheck URL )
As long as the configured health check properly reports than an instance is bad (which sounds like it is the case since you say ELB is marking the instance as unhealthy) this should work, but does this mean other autoscaling triggers like CPU Utilization ( set in Configuration->Scaling->Scaling Trigger) be ignored?
Autoscaling group will not health check to ELB and vice versa.
ELB will check the health status of registered EC2 instances. ELB will continuously ping EC2 instance with specific port and specific page example port 80 and index.html page for every time period say 30 seconds or 60 seconds.
If any one of the registered instance is unhealthy then ELB will not send traffic to those instances and will not terminate or stop EC2 instances. ELB continuously check health status of EC2 instances which is registered in ELB.
If an unhealthy instance become healthy then ELB will send traffic to an instance.
AutoScaling group will health check to EC2 instances same like ELB do. But in AutoScaling group, if an EC2 instance goes to stopped state then it will terminate from the group and launch new instances with same configurations.
If Autoscaling group is integrated with ELB, newly added instance in the group will be added to ELB dashboard.
Health check cannot be done with ELB. You can monitor ELB using AWS CloudWatch logs and upload to target S3 bucket. You can enable monitoring feature in ELB and provide your target S3 bucket to store the logs.

Connection between auto scaling and ELB

I have read that we can configure Auto Scaling to used CloudWatch metrics to scale in/or a pool of EC2 instances.
I'm curious to understand how ELB gets to know there is an EC2 instance added/removed to the Auto Scaling Group, so it does send workload to the newly added instance ? (or stop sending workload to the instance that has been removed?)
Regards,
Pascal
When you set them up, you associate the Auto Scaling group with the Elastic Load Balancer. The Auto Scaling Group then notifies the Elastic Load Balancer when instances are added or removed.