Load balancer health check vs docker health check? - amazon-web-services

I have an ECS cluster with multiple nodes (task defs) fronted by an application load balancer. Does it make sense to configure a health check at the load balancer and at the container level (within the task definition)?
The load balancer runs the configured health check against every registered target so it can unregister failing nodes. Setting the health check at the container level accomplishes the same thing: ECS will unregister any container that fails the health check (according to your configuration). ECS will always instantiate more instances of your task def to satisfy your desired count.
To me it sounds like if your task definition only has a single container, then only setting the health check at the load balancer (since it's required) is enough. Am I missing anything?

Unregistering means different things in case of a load balancer and in case of ECS. In case of a load balancer unregistering means that there is no further traffic sent to the container. In case of an ECS service, unrigestering means that the container is killed and the ECS service will attempt to replace it with a healthy one.
Even if you have just a single container, in case of a failure the load balancer will stop sending traffic to it, but it is not the load balancer's job to restart the container. Replacing the container with a healthy one should be done by an ECS service scheduler.

Related

Is there a way to configure health checks to an ECS service without a load balancer?

I have an ECS Cluster with 2 ECS Services (1 app-controller, 1 app-event-processor). Is there a way to get health checks on both while API traffic only goes to app-controller? I realize health checks normally come from the load balancer but if I configure the load balancer to hit app-event-processor then API traffic also starts flowing to app-event-processor which is undesirable since I want that to handle only messages from SQS for example.
As #jordanm mentioned in their comment ECS does provide a built in health-check mechamism that is orthogonal (and in addition) to the "outside" LB health-check.

heartbeat/autowakup a vm instance in a scaleset

need to deploy a workload in the vm as iaas, issue is the legacy workload wont work in a cluster multi-instance environment, so can only have one instance in the entire vm scaleset. is there a way to heartbeat the vmss so that if the vm instance is down or service crashed, wake up another replica to start taking the load. Is this setting in the vmss level or at the load balancer level? thanks
You could use either Application Health extension or Load balancer health probes to enable application health monitoring for instances. Only one of these can be enabled at a time. As the extension reports health from within a VM, the extension can be used in situations where external probes such as Application Health Probes (that utilize custom Azure Load Balancer probes) can’t be used.
You also could perform instance repairs using automatic instance repairs. After the automatic repairs policy is enabled, when an instance is found to be unhealthy, the scale set will automatically delete the unhealthy instance and create a new one to replace it.
In my opinion, if there is no load balancer mechanism in front of your VMSS, you can directly deploy the Application Health extension on your VMSS instances, otherwise, you prefer to use the load balancer health probe to monitor your backend endpoint health.

Amazon ECS: Target Group Health Check vs Container Health Check

Amazon ECS supports two different types of health checks:
Target Group health checks make a configurable network request
Container health checks run in the docker container and can be configured to run any shell command that the container supports
If both health checks are configured, which one wins? If either fails is the Service marked as UNHEALTHY? Or both? Can I configure one to override the other?
I'd very much like the Target Group health status to not cause ECS to continually bounce the service and I was hoping the container Health Check could be used to override it.
The AWS documentation is somewhat vague on this topic, but does suggest a high degree of coupling between ALB & ECS when it comes to health checks. i.e. see the documentation for healthCheckGracePeriodSeconds and minimumHealthyPercent for examples of ECS health check behaviour which is influenced by the presence or absence of a load balancer.
The healthCheckGracePeriodSeconds may be useful to avoid a failed ALB health check from causing the ECS container to be restarted (during service startup at least):
The period of time, in seconds, that the Amazon ECS service scheduler should ignore unhealthy Elastic Load Balancing target health checks, container health checks, and Route 53 health checks after a task enters a RUNNING state. This is only valid if your service is configured to use a load balancer. If your service has a load balancer defined and you do not specify a health check grace period value, the default value of 0 is used.
If your service's tasks take a while to start and respond to health checks, you can specify a health check grace period of up to 2,147,483,647 seconds during which the ECS service scheduler ignores the health check status. This grace period can prevent the ECS service scheduler from marking tasks as unhealthy and stopping them before they have time to come up.
In my experience, either one will cause your container to be decommissioned. I would say you probably don't need the container health check if you have a target group performing the check.

How configure health check for containers deployed to AWS ECS

I am currently working with AWS ECS and I'm a little confused on how you should configure the health check for containers deployed to AWS ECS.
You can define the healthcheck on the TargetGroup but you can also define the health check on the TaskDefinition.
I wanted to know what is best practice and why. Currently I have defined it in the TargetGroup and it works as expected.
But I wanted clarity on why you would use one over the other? And would you ever define it in both places?
I am using an Application Load Balancer with ECS.
You should use health check in ALB if you are using ALB.
If ALB check failed, ALB will make target group unhealthy and as a result, your container will be killed.
The most important in health check is the HTTP status code, it should be 200 or 3xx or 4xx depend on configuration. if the specified code does not match target will be marked unhealthy.
Both checks has difference purpose,
If you are using ALB, you should use ALB healthcheck
If you are using scheduler base Task, then you can use Docker container health checks.
Amazon Elastic Container Service (ECS) now supports Docker container
health checks. This gives you more control over monitoring the health
of your tasks and improves the ability of the ECS service scheduler to
ensure your services are healthy.
Previously, the ECS service scheduler relied on the Elastic Load
Balancer (ELB) to report container health status and to restart
unhealthy containers. This required you to configure your ECS Service
to use a load balancer, and only supported HTTP and TCP health-checks.
ecs-supports-container-health-checks-and-task-health-mana
If a service's task fails the load balancer health check criteria, the
task is stopped and restarted. This process continues until your
service reaches the number of desired running tasks.
service-load-balancing-health

Should health-check for my Application Load Balancer be EC2 when using ECS?

I've been trying to configure a Cloudformation template for ECS along with Application Load Balancer (ALB) with dynamic ports.
Does the AutoScalingGroup's (ASG) health check type need to be EC2? The examples seem to use EC2 and when I set it to ELB the health check seems to fail.
If it does indeed need to be set to EC2 then does ECS manage the health of the containers itself and the ALB only manages the health of the container instances and not the containers?
Edit:
Having thought about this a bit more it probably makes sense to use EC2 health check since if I had multiple containers on the container instance then one unhealthy container shouldn't cause the whole container instance to go down. However if the ALB only monitors the instance then does ECS monitor the health of the containers?
Googling my question I came across this AWS blog but it references using ELB for health checks...
Your Auto Scaling Group health check is independent of the ECS/loadbalancer monitoring. I'm not exactly sure which health check setting of your ASG you mean for health checks.
In any case, for your ECS monitoring to be aware of the health of your container, you'll want to set the health check settings on your target groups that are connected to your services. ECS will use the information that's visible in the target group to kill containers that are not considered healthy.
The templates here are great:
http://templates.cloudonaut.io/en/stable/ecs/
The ECS templates for the cluster and on top of it the service include everything you need including auto-scaling, load-balancing, health-checks, you name it..
They require a bit of tweaking but they should get you started well even out of the box.
Pay attention to the stack dependencies. Before running the ecs service template, you need to install the stacks for vpc, vpc-s3-endpoint, alert,
nat-gateway (if you're building a service confined to private subnets), and the cluster layer itself.
Have fun!