I have 2 machines running under an Elastic Beanstalk environment.
One of them is down since the last deployment.
I was hoping that the auto scaling configuration will initiate a new machine due to having a single machine available.
That didn't happen and I'm trying to figure out what's wrong with my auto scaling configuration:
The first thing I see is that your rules contradict each other. It says if the number of unhealthy hosts are above 0, add a single host. If they are below 2, remove a single host. That may explain why you aren't seeing anything happening with your trigger.
Scaling triggers are used to bring in, or reduce, EC2 instances in your Auto Scaling group. This would be useful to bring in an additional instance(s) to maintain the same amount of computational power for your application while you investigate what caused the bad instance to fail. But this will not replace the instance.
To setup your instances to terminate after a certain period of being unhealthy you can follow the documentation here.
By default ELB pings port 80 with TCP, this is what determines the "health" of the EC2 instance, along with the on host EC2 instance status check. You can specify a Application health check URL to setup a customized health check that your application returns. Check out the more detailed customization of Beanstalk ELBs here.
Related
I have a backend server deployed on aws in a single EC2 instance via elastic beanstalk. The server has ip whitelisting and hence does not respond to ALB health checks, so all target groups always remain unhealthy.
According to the official AWS docs on health checks,
If a target group contains only unhealthy registered targets, the load balancer nodes route requests across its unhealthy targets.
This is what keeps my application running even though the ALB target groups are always unhealthy.
This changed last night and I faced an outage where all requests started getting rejected with 503s for reasons I'm not able to figure out. I was able to get things to work again by provisioning another EC2 instance by increasing minimum capacity of elastic beanstalk.
During the window of the outage, cloudwatch shows there is neither healthy nor unhealthy instances, though nothing actually changed as there was one EC2 instance running for past few months untouched.
In that gap, I can find metrics on TCP connections though:
I don't really understand what happened here, can someone explain what or how to debug this?
I need help in order to achieve Blue-Green Deployment.
What I have in My Bucket -
One Blue Environment hosted on ElasticBeanStalk.
One Green Environment hosted on ElasticBeanStalk.
Both Environments are getting created by CF-Template.Both are having their own ELB.
What I am looking for -
I need to switch traffic from Blue to green.
First I need to know which Environment is currently live so that I can plan my app deployment to next Environment.
Once I knew my current Environment(Blue in this case) , I deployed my app to Green-Environment and now this Environment is ready to accept the traffic.
I need to migrate 25% traffic to Green and do a health check, If health check is okay I will add another 25% and do a health check and so-on.
At any point if health check fails , I should be able to route entire traffic back to Blue Environment.
I need to implement this solution in my CI-CD Job. My CI job is creating my package and deploying this to S3. My CD job is provision the Infrastructure (ElasticBeanStalk) and
uploading the package to newly created Environment.
You can't control deployment on AWS Elastic Beanstalk like that, since it involves having two live environments and doing a cname swap. Not exactly what your'e trying to achieve but something close it is called Immudatable Deployments which are available out of the box.
From the documentaion:
To perform an immutable environment update, Elastic Beanstalk creates
a second, temporary Auto Scaling group behind your environment's load
balancer to contain the new instances. First, Elastic Beanstalk
launches a single instance with the new configuration in the new
group. This instance serves traffic alongside all of the instances in
the original Auto Scaling group that are running the previous
configuration.
When the first instance passes health checks, Elastic Beanstalk
launches additional instances with the new configuration, matching the
number of instances running in the original Auto Scaling group. When
all of the new instances pass health checks, Elastic Beanstalk
transfers them to the original Auto Scaling group, and terminates the
temporary Auto Scaling group and old instances.
I have my Beanstalk environment with a "Scaling Trigger" using "CPUUtilization" and it works well.
The problem is that I can not combine this with a system that automatically reboots (or terminate) instances that have been considered "OutOfService" for a certain amount of time.
Into the "Scaling > Scaling Trigger > Trigger measurement" there is the option of "UnHealthyHostCount". But this won't solve my problem optimally, because it will create new instances as far there is one unhealthy, this will provoque my environment to grow until the limit without a real reason. Also, I can not combine 2 "Trigger measurements" and I need the CPU one.
The problem becomes crucial when there is only one instance in the environment, and it becomes OutOfService. The whole environment dies, the Trigger measurement is never triggered.
If you use Classic Load Balancer in your Elastic Beanstalk.
You can go to EC2 -> Auto Scaling Groups.
Then change the Health Check Type of the load balancer from EC2 to ELB.
By doing this, your instances of the Elastic Beanstalk will be terminated once they are not responding. A new instance will be created to replace the terminated instance.
AWS Elastic Beanstalk uses AWS Auto Scaling to manage the creation and termination of instances, including the replacement of unhealthy instances.
AWS Auto Scaling can integrate with the ELB (load balancer), also automatically created by Elastic Beanstalk, for health checks. ELB has a health check functionality. If the ELB detects that an instance is unhealthy, and if Auto Scaling has been configured to rely on ELB health checks (instead of the default EC2-based health checks), then Auto Scaling automatically replaces that instance that was deemed unhealthy by ELB.
So all you have to do is configure the ELB health check properly (you seem to have it correctly configured already, since you mentioned that you can see the instance being marked as OutOfService), and you also have to configure the Auto Scaling Group to use the ELB health check.
For more details on this subject, including the specific steps to configure all this, check these 2 links from the official documentation:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.healthstatus.html#using-features.healthstatus.understanding
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/environmentconfig-autoscaling-healthchecktype.html
This should solve the problem. If you have trouble with that, please add a comment with any additional info that you might have after trying this.
Cheers!
You can setup a CloudWatch alarm to reboot the unhealthy instance using StatusCheckFailed_Instance metric.
For detailed information on each step, go through the Adding Reboot Actions to Amazon CloudWatch Alarms section in the following AWS Documentation.
If you want Auto Scaling to replace instances whose application has stopped responding, you can use a configuration file to configure the Auto Scaling group to use Elastic Load Balancing health checks. The following example sets the group to use the load balancer's health checks, in addition to the Amazon EC2 status check, to determine an instance's health.
Example .ebextensions/autoscaling.config
Resources:
AWSEBAutoScalingGroup:
Type: "AWS::AutoScaling::AutoScalingGroup"
Properties:
HealthCheckType: ELB
HealthCheckGracePeriod: 300
See: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/environmentconfig-autoscaling-healthchecktype.html
With an ELB setup, there as healthcheck timeout, e.g. take a server out of the LB if it fails X fail checks.
For a real zero down time deployment, I actually want to be able to avoid these extra 4-5 seconds of down time.
Is there a simple way to do that on the ops side, or does this needs to be in the level of the web server itself?
If you're doing continuous deployment you should deregister the instance you're deploying to from ELB (say, aws elb deregister-instances-from-load-balancer), wait for the current connections to drain, deploy you app and then register an instance with ELB.
http://docs.aws.amazon.com/cli/latest/reference/elb/deregister-instances-from-load-balancer.html
http://docs.aws.amazon.com/cli/latest/reference/elb/register-instances-with-load-balancer.html
It is also a common strategy to deploy to another AutoScaling Group, then just switch ASG on the load balancer.
We are using CodeDeploy to load code onto our instances as they boot up. Our intention was that they would not be added to the LB prior to the code being loaded. To do this, we set a health check which looked for one of the files being deployed. What we have found is that some times instances without code are created (I assume code deploy failed) and these instances are staying in the LB even when marked unhealthy? How is this possible? Is this related to the grace period? Shouldn't instances that are unhealthy be removed automatically?
I believe I have found a large part of my problem: My Auto-scale group was set to use EC2 health checks and not my ELB health check. This resulted in the instance not being terminated. The traffic may have continued to flow longer to this crippled instance due to the need the need for a very long unhealthy state before having traffic completely stopped.