ELB health check history / log - amazon-web-services

I have an ELB (Network Load Balancer with a couple of Auto Scaling Groups as Target Group) that has periodical health check fails (i.e. some instances would be marked as unhealthy and then recover after a few minutes). The health check is a simple static page (i.e. /health_check).
The timing seems to be at the same time when the host is having heavy network load (downloading large files from S3), but I want to have more information (e.g. they are failing the active health check or passive health check as mentioned in https://docs.aws.amazon.com/elasticloadbalancing/latest/network/target-group-health-checks.html).
However, I am not able to find the health check history or logs from ELB. All my search finds is about the Access Log for ELB (https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-access-logs.html) which is about the actual user requests.
Is this health check history / log accessible anywhere?

Related

AWS ALB health check delay

How can I delay AWS ALB health check until all services have been started on the newly created EC2 instance by autoscaling?
My current health check points to a login page on the app server but some services are not fully up when healt check starts returning success. Is there a way I can add a 2 mins delay before LB starts the health check allowing newly created instance to load all services?
I don't think there is a direct way to do this. You can config the interval time and healthy threshhold to meet your requirements.
For example, you can config interval to 30s and healthy threshold to 5, so that your target will be registered after 30s x 5 = 150s

AWS - Does Elastic Load Balancing actually prevent LOAD BALANCER failover?

I've taken this straight from some AWS documentation:
"As traffic to your application changes over time, Elastic Load Balancing scales your load balancer and updates the DNS entry. Note that the DNS entry also specifies the time-to-live (TTL) as 60 seconds, which ensures that the IP addresses can be remapped quickly in response to changing traffic."
Two questions:
1) I was under the impression originally that a single static IP address would be mapped to multiple instances of an AWS load balancer, thereby causing fault tolerance on the balancer level, if for instance one machine crashed for whatever reason, the static IP address registered to my domain name would simply be dynamically 'moved' to another balancer instance and continue serving requests. Is this wrong? Based on the quote above from AWS, it seems that the only magic happening here is that AWS's DNS servers hold multiple A records for your AWS registered domain name, and after 60 seconds of no connection from the client, the TTL expires and Amazon's DNS entry is updated to only start sending requests to active IP's. This still takes 60 seconds on the client side of failed connection. True or false? And why?
2) If the above is true, would it be functionally equivalent if I were using a host provider of say, GoDaddy, entered multiple "A" name records, and set the TTL to 60 seconds?
Thanks!
The ELB is assigned a DNS name which you can then assign to an A record as an alias, see here. If you have your ELB set up with multiple instances you define the health check. You can determine what path is checked, how often, and how many failures indicate an instance is down (for example check / every 10s with a 5s timeout and if it fails 2 times consider it unhealthy. When an instance becomes unhealthy all the remaining instances still serve requests just fine without delay. If the instance returns to a healthy state (for example its passes 2 checks in a row) then it returns as a healthy host in the load balancer.
What the quote is referring to is the load balancer itself. In the event it has an issue or an AZ becomes unavailable its describing what happens with the underlying ELB DNS record, not the alias record you assign to it.
Whether or not traffic is effected is partially dependent on how sessions are handled by your setup. Whether they are sticky or handled by another system like elasticache or your database.

AWS Health Check Restart API

I have an AWS load balancer with 2 ec2 instances serving an API in Python.
If I have 10K request come in at the same time, and the AWS health check comes in, the health check will fail, and there is a 502/504 gateway error because of instances restart due the to failed health check.
I check the instances CPU usage, max at 30%, and memory maxed at 25%.
What's the best option to fix this?
A few things to consider here:
Keep the health check API fairly light, but ensure that the health check API/URL indeed returns correct responses based on the health of the app.
You can configure the health check to mark the instance as failed only after X failed checks. You can tune this parameter and the Health check frequency to match your needs.
You can disable the EC2 restart from failed health-check by configuring your autoscaling group health-check type to EC2. This will prevent instances from being terminated due to a failed ELB health-check.

AWS load balancer IIS

I have configured app load balancer on amazon. Set up DNS LB to route 53 with alias for A. Behind LB i have 2 instances with IIS. If i set up 2 sites on both instances, balancer automatically balance client by rotation
(as i know round robin). But, if i turn off site on IIS in one instance, load balancer continue go to that instance and if i go to exapmle.com i will have one time worked site and if refresh the page i will have error (because site turned off in IIS). Could you please tell me, how can i set up load balance to route traffic in working instance if one of them not working. Thank you
Load balancers continue to distribute the traffic on healthy servers. If it is not happening in your case, I would recheck the health check configuration under Target Groups.
You need to modify the port/path so that health checks start failing once the site is turned off. Only then, the load balancer will pass all traffic to healthy host, not the unhealthy host
What does the LB health checks say? If the back-end instances are not listening on the health check port then LB marks it as unhealthy and stops forwarding requests to it. If you are using Application loadbalancer then I think you can get the health check status within the target groups associated with the loadbalancer.

AWS autoscale ELB status checks grace period

I'm running servers in a AWS auto scale group. The running servers are behind a load balancer. I'm using the ELB to mange the auto scaling groups healthchecks. When servers are been started and join the auto scale group they are currently immediately join to the load balancer.
How much time (i.e. the healthcheck grace period) do I need to wait until I let them join to the load balancer?
Should it be only after the servers are in a state of running?
Should it be only after the servers passed the system and the instance status checks?
There are two types of Health Check available for Auto Scaling groups:
EC2 Health Check: This uses the EC2 status check to determine whether the instance is healthy. It only operates at the hypervisor level and cannot see the health of an application running on an instance.
Elastic Load Balancer (ELB) Health Check: This causes the Auto Scaling group to delegate the health check to the Elastic Load Balancer, which is capable of checking a specific HTTP(S) URL. This means it can check that an application is correctly running on an instance.
Given that your system is using an ELB health check, Auto Scaling will trust the results of the ELB health check when determining the health of each EC2 instance. This can be slightly dangerous because, if the instance takes a while to start, the health check could incorrectly mark the instance as Unhealthy. This, in turn, would cause Auto Scaling to terminate the instance and launch a replacement.
To avoid this situation, there is a Health Check Grace Period setting (in seconds) in the Auto Scaling group configuration. This indicates how long Auto Scaling should wait until it starts using the ELB health check (which, in turn, has settings for how often to check and how many checks are required to mark an instance as Healthy/Unhealthy).
So, if your application takes 3 minutes to start, set the Health Check Grace Period to a minimum of 180 seconds (3 minutes). The documentation does not state whether the timing starts from the moment that an instance is marked as "Running" or whether it is when the Status Checks complete, so perform some timing tests to avoid any "bounce" situations.
In fact, I would recommend setting the Health Check Grace Period to a significantly higher value (eg double the amount of time required). This will not impact the operation of your system since a Healthy Instance will start serving traffic as soon as the ELB Health Check is satisfied, which sooner than the Auto Scaling grace period. The worst case is that a genuinely unhealthy instance will be terminated a few minutes later, but this should be a rare occurrence.
the documentation (now) states "The grace period starts after the instance passes the EC2 system status check and instance status check."
So, at least according to the mid-2015 AWS documentation, the answer is "after the servers passed the system and the instance status checks." This is how we've set up our environment, and although I haven't done precise timings it appears to be correct.
If you closely monitor your cloudformation stack events, you will get success signal that your ASG got updated.
The time difference between ASG started updating and ASG received success signal is the health check grace period.
This health check grace period is always recommended to have double than the application startup time. Suppose, your application takes 10 min to start, you should put health check grace period for 20 min.
The reason is you never know your application might throw some kind of error and go for several retry.