ec2 instances getting removed from elastic beanstalk - amazon-web-services

EB dashboard:
Removed instance [i-0c6e4cba4392d1ace] from your environment.
And if I'm on the EC2 instance, I get these messages on console:
Broadcast message from root#ip-172-31-20-119
(unknown) at 21:20 ...
The system is going down for power off NOW!
Connection to 54.186.171.133 closed by remote host.
Connection to 54.186.171.133 closed.
Any pointers on why this is happening and how can I debug this? Are there any logs which can I look after the instance has terminated?

It is likely that the Auto Scaling group associated with your Elastic Beanstalk application decided to scale-in the number of instances.
You can go to Auto Scaling in the EC2 console, find the Auto Scaling group and look at the History tab to determine why it happened (eg due to low CPU load).
It might also be because the instance failed a Health Check, so Auto Scaling removed it.

Related

AWS auto scaling adds additional node resulting in 5xx errors

I am new to AWS, I have auto scaling enabled for my Elastic Beanstalk based server. For some reason the below healthd process is almost fully consuming the cpu causing the auto scaler to add a new instance as I have set the expansion policy to add a new instance to the beanstalk environment when resource utilization is > 70.
healthd 20 0 1024648 43660 9876 S 75.7 1.1 121:16.03 ruby
I have two questions:
How to avoid 5xx (network) errors when a new instance is added?
What is the process Healthd needed for? why it is running when I didn't start it? and how can I prevent that process from draining the CPU?
Maybe the Load Balancer is starting to send traffic to the new instance before the application is completely up on that instance and this is why I am getting the network errors! How can I verify this is the cause and how to avoid it?

Why is Elastic Beanstalk Traffic Splitting deploy strategy ignoring HTTP errors?

I am using AWS Elastic Beanstalk. In there, I selected a Traffic Splitting deploy strategy, with a 100% split (so that 100% of new instances will have the new version and have their health evaluated).
Here's how (according to their documentation) that is supposed to work:
During a traffic-splitting deployment, Elastic Beanstalk creates a new set of instances in a separate temporary Auto Scaling group. Elastic Beanstalk then instructs the load balancer to direct a certain percentage of your environment's incoming traffic to the new instances. Then, for a configured amount of time, Elastic Beanstalk tracks the health of the new set of instances. If all is well, Elastic Beanstalk shifts remaining traffic to the new instances and attaches them to the environment's original Auto Scaling group, replacing the old instances. Then Elastic Beanstalk cleans up—terminates the old instances and removes the temporary Auto Scaling group.
And more specifically:
Rolling back the deployment to the previous application version is quick and doesn't impact service to client traffic. If the new instances don't pass health checks, or if you choose to abort the deployment, Elastic Beanstalk moves traffic back to the old instances and terminates the new ones.
However, it seems silly that it only looks at my internal /health health checks, and not the overall health status of the environment, from the HTTP status codes, that it already has information on.
I tried the following scenario:
Deploy a new version.
As soon as the "health evaluation period" begins, flood the server with error 500s (from an endpoint I made specifically for this purpose).
AWS then moves all my instances into "degraded" state, and "unhealthy", but then seems to ignore it, and goes on anyway.
See the following two log dump screenshots (they are oldest-first).
Is there any way that I can make AWS respect the HTTP status based health checks that it already performs, during a traffic split? Or am I bound to only rely on custom-developed health checks entirely?
Update 1: Even weirder, I tried making my own healthchecks fail always too, but it still decides to deploy the new version with the failed healthcheck!
Update 2: I noticed that the temporary auto scaling group that it creates while assessing health does only have an "EC2" type health check, and not "ELB". I think that might be the root cause. If I could only get it to use "ELB" instead.
That is interesting! I do not know if setting the health check type to "ELB" may do the job because we use CodeDeploy, which has far better rollback capabilities than AWS Elastic Beanstalk.
However, there is a well-document way in the docs [1] to apply the setting you are looking for:
[...] By default, the Auto Scaling group, created for your environment uses Amazon EC2 status checks. If an instance in your environment fails an Amazon EC2 status check, Auto Scaling takes it down and replaces it.
Amazon EC2 status checks only cover an instance's health, not the health of your application, server, or any Docker containers running on the instance. If your application crashes, but the instance that it runs on is still healthy, it may be kicked out of the load balancer, but Auto Scaling won't replace it automatically. [...]
If you want Auto Scaling to replace instances whose application has stopped responding, you can use a configuration file to configure the Auto Scaling group to use Elastic Load Balancing health checks. The following example sets the group to use the load balancer's health checks, in addition to the Amazon EC2 status check, to determine an instance's health.
Example .ebextensions/autoscaling.config
Resources:
AWSEBAutoScalingGroup:
Type: "AWS::AutoScaling::AutoScalingGroup"
Properties:
HealthCheckType: ELB
HealthCheckGracePeriod: 300
It does not mention the new traffic splitting deployment feature, though.
Thus, I cannot confirm this is the actual solution, but at least you can give it a shot.
[1] https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/environmentconfig-autoscaling-healthchecktype.html
Once upon a time I thought that the Immutable Deployment option in Elastic Beanstalk was a holy panacea -- but it only works when a deployment involves no changes to the application's database schema.
We've now resorted to blue-green deployments. However, this only works if you control the DNS. If you are a SaaS solution and you allow customers to create a CNAME then B/G is often a spectacular failure as the enterprise: a) sets a very high TTL, and/or b) their internal DNS or firewalls caches the underlaying IP addresses of the ALB (which are dynamic and, of course, replaced when you swap the URL of the blue and green environments).
Traffic splitting is written as an option in the Elastic Beanstalk documentation.
But it's not actually an option in the configuration section in the console.
This wouldn't be the first time I've seen Elastic Beanstalk's docs are out of date so it could be AWS have removed that feature.
Since AWS introduced CodeStar I suspect Elastic Beanstalk is getting the cold shoulder.

ELB backend connection errors when deregister ec2 instances

I've written a custom release script to manage releases for an EC2 autoscaling application. The processing works like so...
Create an AMI based on an application git tag.
Create launch config.
Configure ASG to use new launch config.
Find current desired capacity for ASG.
Set desired capacity to 2x previous capacity.
Wait for new instances to become healthy by querying ELB.
Set desired capacity back to previous value.
This all works fairly well, except whenever I run this, the monitoring for the ELB is showing a lot of backend connection errors.
I don't know why this would be occurring, as it should (based on my understanding) still service current connections if the "Connection draining" option is enabled for the ELB (which it is).
I thought perhaps the ASG was terminating the instances before the connections could finish, so I changed my script to first deregister the instances from the ELB, and then wait a while before changing the desired capacity at the ASG. This however didn't make any difference. As soon as the instances were deregistered from the ELB (even though they're still running and healthy) the backend connection errors occur.
It seems as though it's ignoring the connection draining option and simply dropping connections as soon as the instance has been deregistered.
This is the command I'm using to deregister the instances...
aws elb deregister-instances-from-load-balancer --load-balancer-name $elb_name --instances $old_instances
Is there some preferred method to gracefully remove the instances from the ELB before removing them from the ASG?
Further investigation suggests that the back-end connection errors are occurring because the new instances aren't yet ready to take the full load when the old instances are removed from the ELB. They're healthy, but seem to require a bit more warming.
I'm working on tweaking the health-check settings to give the instances a bit more time before they start trying to serve requests. I may also need to change the apache2 settings to get them ready quicker.

EC2 Auto scaling

I have one EC2 instance and running a tomcat service on the EC2 machine. I know, how to configure auto scaling when the CPU usage goes up, ... But, not sure how to configure auto scaling to launch a new instance when my tomcat service goes down even the EC2 instance is up. Also how to configure auto scaling when the tomcat service is hung even the tomcat process is up and running.
If this is not possible with Ec2 auto scaling, Is this possible with ELB and Beanstalk?
If you go to the auto scaling page in the web console and click edit, you can choose either ec2 or elb health check. Ec2 monitors instance performance characteristics. Elb health checks can be used to monitor server response. As the name implies the auto scaling health status is controlled by the response given to a load balancer. This could be a tcp check to port 80 that just checks that the server is there, listening and responding, all the way up to a custom http check to a page you define, e.g. You could do hostname/myserverstatus and at that page have a script that checks server status, database availability etc, and then return either a success or error. See http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-add-elb-healthcheck.html
Good Luck!
There are some standard unix tools that do that for you. Upstart will watch your server and restart if it goes down. I don't know about it hanging. If you run on beanstalk, you can set up a call that the load balancer will make to see if your app is responsive, and it can then message you to let you know there is a problem. You can probably set it up to reboot the box, or restart the process.

Elastic Beanstalk Availability Zones & Initial Instances -- volume not preserved?

I've spent a few days just going over Elastic Beanstalk trying to identify the benefits of using this. I'm new to this service but also don't have much experience with EC2, so I'm really just trying to make sense of everything. My main objective is to have auto-scaling and Elastic seemed to make sense.
Last night, suddenly my existing instance terminated a new one was spun-up (automatically). Upon SSH'ing to this new instance, all my files were gone. I expected the volume to be replicated over?
I'm just not understanding what took place and why a instance was spun up and my other terminated - or more importantly, why the new instance didn't have my files.
Here is a log of what happened:
2013-12-26 23:03:23 UTC-0800 WARN Environment health has transitioned from YELLOW to RED
2013-12-26 23:01:21 UTC-0800 WARN Environment health has transitioned from GREEN to YELLOW
2013-12-26 23:01:21 UTC-0800 WARN Elastic Load Balancer awseb-e-i-AWSEBLoa-K5TNOB5OZNKU has zero healthy instances.
2013-12-26 23:00:36 UTC-0800 INFO Removed instance 'i-c75df99a' from your environment. (Reason: Instance is in 'shutting-down' state)
2013-12-26 22:55:14 UTC-0800 INFO Adding instance 'i-4d46d010' to your environment.
2013-12-26 22:54:14 UTC-0800 INFO Added EC2 instance 'i-4d46d010' to Auto Scaling Group 'awseb-e-ikszmdzite-stack-AWSEBAutoScalingGroup-TC41QI6DT3O0'.
Is this because I have 2 availability zones? I'm really confused.
Update
When I developed my Elastic Environment, I indicated that I wanted to use multiple availability zones. I then identified 2 zones to use. I indicated to use a minimum of 1 instance. I feel that this is where the problem happened -- I should have set the minimum to the same number of zones I identified. But I can't confirm except continued testing... Still looking for insight.
The storage on an EC2 instance is ephemeral, and is gone when that instance terminates. Rather than uploading your codebase to that specific instance manually, you should let Elastic Beanstalk do it for you. That way, you application's code base, including previous versions of it, are stored with Elastic Beanstalk, which is automatically deployed to new instances when they are spun up.
For example, for a PHP application, this link explains how it can be deployed using Elastic Beanstalk:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_PHP_eb.html