What's the difference between elb health check and ec2 health check? - amazon-web-services

I'm a little confused about Elastic Load Balancer health check and Amazon EC2 health check.
In Adding Health Checks to Your Auto Scaling Group it says:
If you have attached one or more load balancers to your Auto Scaling group and an instance fails the load balancer health checks, Auto Scaling does not replace the instance by default.
If you enable load balancer health checks and an instance fails the health checks, Auto Scaling considers the instance unhealthy and replaces it.
So if I don't enable the ELB health checks, EC2 health checks will work and if some instance fail health checks auto scaling will consider unhealthy instance and replaces it, and if I enable ELB health checks, the same thing will happen. So what's the difference between ELB health checks and EC2 health checks?

EC2 health check watches for instance availability from hypervisor and networking point of view. For example, in case of a hardware problem, the check will fail. Also, if an instance was misconfigured and doesn't respond to network requests, it will be marked as faulty.
ELB health check verifies that a specified TCP port on an instance is accepting connections OR a specified web page returns 2xx code. Thus ELB health checks are a little bit smarter and verify that actual app works instead of verifying that just an instance works.
That being said there is a third check type: custom health check. If your application can't be checked by simple HTTP request and requires advanced test logic, you can implement a custom check in your code and set instance health though API:
Health Checks for Auto Scaling Instances

Related

Does an Application Load Balancer does automatic health check on an unhealthy instance?

We have a private EC2 Linux instance running behind an ALB. There is only one instance running and no auto-scaling configured.
Sometimes ALB marks the instance as unhealthy for some reasons. This mostly happens when network traffic is high on the instance, which generally one or two hours. This behavior is unpredictable. So when try to access the web application which is deployed in the EC2 instance, we get 502 bad gateway. We reboot the EC2 instance and only then the issue is resolved.
Does an ALB perform a health check on a target group again after it marks it as unhealthy? Suppose an ALB marks the target group with one EC2 instance as unhealthy. ALB is configured to perform a health check every 30 seconds. Will it check for healthiness after 30 seconds after it marked as unhealthy on the same target group? Or will it look for new healthy instance?
I assume auto-scaling configuration may resolve this problem by setting AS group with 1 when an instance go unhealthy? Our AWS architect feels the Tomcat is creating memory leak when too many requests come at a time.Tomcat does not run in the EC2.
What is the way to troubleshoot this problem? I search for system logs and configured ALB access logs, but no clue is available.
In this link I see ALB routes requests to the unhealthy targets when no other healths target is available .
https://docs.aws.amazon.com/elasticloadbalancing/latest/application/target-group-health-checks.html
My question is will ALB perform health check on the target group again after it marks it as unhealthy?
Indeed even when marked as unhealthy, the ALB continues the health checking. You can configure a 'healthy threshold count', which indicates how many 'healthy' responses should be received before an unhealthy host is marked as healthy again.
According to the docs:
When the health checks exceed HealthyThresholdCount consecutive successes, the load balancer puts the target back in service.
If your health check interval is 60 seconds, and the healthy threshold count is 3, it takes a minimum of 3 minutes before an unhealthy host will be marked healthy again.

Amazon ECS: Target Group Health Check vs Container Health Check

Amazon ECS supports two different types of health checks:
Target Group health checks make a configurable network request
Container health checks run in the docker container and can be configured to run any shell command that the container supports
If both health checks are configured, which one wins? If either fails is the Service marked as UNHEALTHY? Or both? Can I configure one to override the other?
I'd very much like the Target Group health status to not cause ECS to continually bounce the service and I was hoping the container Health Check could be used to override it.
The AWS documentation is somewhat vague on this topic, but does suggest a high degree of coupling between ALB & ECS when it comes to health checks. i.e. see the documentation for healthCheckGracePeriodSeconds and minimumHealthyPercent for examples of ECS health check behaviour which is influenced by the presence or absence of a load balancer.
The healthCheckGracePeriodSeconds may be useful to avoid a failed ALB health check from causing the ECS container to be restarted (during service startup at least):
The period of time, in seconds, that the Amazon ECS service scheduler should ignore unhealthy Elastic Load Balancing target health checks, container health checks, and Route 53 health checks after a task enters a RUNNING state. This is only valid if your service is configured to use a load balancer. If your service has a load balancer defined and you do not specify a health check grace period value, the default value of 0 is used.
If your service's tasks take a while to start and respond to health checks, you can specify a health check grace period of up to 2,147,483,647 seconds during which the ECS service scheduler ignores the health check status. This grace period can prevent the ECS service scheduler from marking tasks as unhealthy and stopping them before they have time to come up.
In my experience, either one will cause your container to be decommissioned. I would say you probably don't need the container health check if you have a target group performing the check.

Random failure of NLB Health Check while registering ECS fargate instances

I'm seeing a random failure of NLB health checks when registering the ECS fargate instances, healths checks get passed after a couple of failures. I do a wide open SG that's attached to fargate Instances. Did anyone had a similar behaviour while registering the tasks under NLB targegroups?
You application can take some time before it starts responding to the health checks from the ELB.
When you create a ECS service, there is an option called healthCheckGracePeriodSeconds.
It governs how many seconds ECS scheduler will ignore health checks information from the ELB. This option is only available if you use ELB.
So I recomend you to play with it and pick a suitable time frame for you.

AWS Beanstalk, how to reboot (or terminate) automatically an instance that is not responding

I have my Beanstalk environment with a "Scaling Trigger" using "CPUUtilization" and it works well.
The problem is that I can not combine this with a system that automatically reboots (or terminate) instances that have been considered "OutOfService" for a certain amount of time.
Into the "Scaling > Scaling Trigger > Trigger measurement" there is the option of "UnHealthyHostCount". But this won't solve my problem optimally, because it will create new instances as far there is one unhealthy, this will provoque my environment to grow until the limit without a real reason. Also, I can not combine 2 "Trigger measurements" and I need the CPU one.
The problem becomes crucial when there is only one instance in the environment, and it becomes OutOfService. The whole environment dies, the Trigger measurement is never triggered.
If you use Classic Load Balancer in your Elastic Beanstalk.
You can go to EC2 -> Auto Scaling Groups.
Then change the Health Check Type of the load balancer from EC2 to ELB.
By doing this, your instances of the Elastic Beanstalk will be terminated once they are not responding. A new instance will be created to replace the terminated instance.
AWS Elastic Beanstalk uses AWS Auto Scaling to manage the creation and termination of instances, including the replacement of unhealthy instances.
AWS Auto Scaling can integrate with the ELB (load balancer), also automatically created by Elastic Beanstalk, for health checks. ELB has a health check functionality. If the ELB detects that an instance is unhealthy, and if Auto Scaling has been configured to rely on ELB health checks (instead of the default EC2-based health checks), then Auto Scaling automatically replaces that instance that was deemed unhealthy by ELB.
So all you have to do is configure the ELB health check properly (you seem to have it correctly configured already, since you mentioned that you can see the instance being marked as OutOfService), and you also have to configure the Auto Scaling Group to use the ELB health check.
For more details on this subject, including the specific steps to configure all this, check these 2 links from the official documentation:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.healthstatus.html#using-features.healthstatus.understanding
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/environmentconfig-autoscaling-healthchecktype.html
This should solve the problem. If you have trouble with that, please add a comment with any additional info that you might have after trying this.
Cheers!
You can setup a CloudWatch alarm to reboot the unhealthy instance using StatusCheckFailed_Instance metric.
For detailed information on each step, go through the Adding Reboot Actions to Amazon CloudWatch Alarms section in the following AWS Documentation.
If you want Auto Scaling to replace instances whose application has stopped responding, you can use a configuration file to configure the Auto Scaling group to use Elastic Load Balancing health checks. The following example sets the group to use the load balancer's health checks, in addition to the Amazon EC2 status check, to determine an instance's health.
Example .ebextensions/autoscaling.config
Resources:
AWSEBAutoScalingGroup:
Type: "AWS::AutoScaling::AutoScalingGroup"
Properties:
HealthCheckType: ELB
HealthCheckGracePeriod: 300
See: https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/environmentconfig-autoscaling-healthchecktype.html

Load Balancer marks instances as unhealthy even though no health checks are enabled

I am trying to setup a TCP load balancer for ports 80 & 443. I have created an instance group and I can curl the instances and verify they are running. However the load balancer states the instances are unhealthy. I originally tried setting them up with health checks, but removed the health checks just to get the load balancer working. Even with the health checks removed, the load balancer says the nodes are unhealthy.
http://imgur.com/a/4Jefv
I'm also using Docker & Mesos, and changing the networking mode from Bridge to Host seems to have fixed the issue.