AWS ELB zero downtime deploy - amazon-web-services

With an ELB setup, there as healthcheck timeout, e.g. take a server out of the LB if it fails X fail checks.
For a real zero down time deployment, I actually want to be able to avoid these extra 4-5 seconds of down time.
Is there a simple way to do that on the ops side, or does this needs to be in the level of the web server itself?

If you're doing continuous deployment you should deregister the instance you're deploying to from ELB (say, aws elb deregister-instances-from-load-balancer), wait for the current connections to drain, deploy you app and then register an instance with ELB.
http://docs.aws.amazon.com/cli/latest/reference/elb/deregister-instances-from-load-balancer.html
http://docs.aws.amazon.com/cli/latest/reference/elb/register-instances-with-load-balancer.html
It is also a common strategy to deploy to another AutoScaling Group, then just switch ASG on the load balancer.

Related

ECS ELB Health Checks

My main issue is trying to work out why my health checks are failing on ECS.
My setup
I have successfully set up an ECS cluster using an EC2 auto-scaling group. All the EC2 are in private subnets with NAT gateways.
I have a load-balancer all connected up to the target group which is linked to ECS.
When I try and get an HTTP response from the load balancer from my local machine, it times out. So I am obviously not getting responses back from the containers.
I have been able to ssh into the EC2 instances and confirmed the following:
ECS is deploying containers onto the EC2 instances, then after some time killing them and then firing them up again
I can curl the healthcheck endpoint from the EC2 instance (localhost) and it runs successfully
I can reach the internet from the EC2 instance, eg curl google.com returns an html response
My question is there seems to be two different types of health-check going on, and I can't figure out which is which.
ELB health-checks
The ELB seems, as far as I can tell, to use the health-checks defined in the target group.
The target group is defined as a list of EC2 instances. So does that mean the ELB is sending requests to the instances to see if they are running?
This would of course fail because we cannot guarantee that ECS will have deployed a container to each instance.
ECS health-checks
ECS however is responsible for deploying containers into these instances, in what could turn out to be a many-to-many relationship.
So surely ECS would be querying the actual running containers to find out if they are healthy and then killing them if required.
My confusion / question
I don't really understand what role the ELB has in managing the EC2 instances in this context.
It doesn't seem like the EC2 instances are being stopped and started. However from reading the docs it seems to indicate that the ASG / ELB will manage the EC2 instances and restart them if they fail the healthcheck.
Does ECS somehow override this default behaviour and take responsibility for running the healthchecks instead of the ELB?
And if not, won't the health check just fail on any EC2 instance that happens not to have a container running on it?

ECS Fargate with ALB challenges

I am currently exploring AWS ECS with Fargate. At first, I thought that it is a good feature that can do autoscaling and save cost, as well as without server maintenance overhead.
But when I look further onto it, the more I see the pain into it.
Fargate does not provide static IP, and the recommendation is to use ALB. ALB alone is costing plenty of money.
ALB doing health check interval and thus Fargate rarely sleeps due to ALB (and thus incurring more charges then it should)
My question is:
Anyone have an alternative of dealing with issue (1) and (2)? a way that can overcome it for cost saving purposes and at the same time retaining an elastic IP
If ALB is forwarded to Fargate with IP address, what happen if IP ECS (Fargate) IP got restarted? Will ALB auto detect it or what are the way to detect Fargate IP changes and update back to ALB automatically, and if not, how to handle such situation where the IP is recycled?
Anyone have an alternative of dealing with issue (1) and (2)? a way that can overcome it for cost saving purposes and at the same time
retaining an elastic IP
A load balancer (ALB or NLB) is really your only option. Your statement about health checks preventing Fargate from "sleep" is incorrect though. There is no "sleep" option in Fargate.
If you want something very similar, but with a sleep option, I suggest looking into AWS App Runner.
If ALB is forwarded to Fargate with IP address, what happen if IP ECS (Fargate) IP got restarted? Will ALB auto detect it or what are
the way to detect Fargate IP changes and update back to ALB
automatically, and if not, how to handle such situation where the IP
is recycled?
ECS integrates directly with the load balancer target group. You configure the ECS service what load balancer and target group you are using, and ECS will keep the load balancer's target group up-to-date with your Fargate IP address(es) automatically.

AWS ALB in front of one server

I have a server (apache/php) running the front end of saas platform.
This will not receive high traffic and therefore does not need load balancing.
Does it make sense to add load balancer and auto scaling group (with count of 1 server) for security reasons?
It allows the server to be isolated in the VPC + it allow services such as WAF that increase security. The extra cost is not a problem.
It does make sense in the following ways,
It can help you in configuring health checks for your instance. If
you instance fails for some reasons, the load balancer will
instantiate another EC2 instance for you hence minimizing the
downtime of your application
Naturally makes your instance more secure by hiding it in a VPC (as you suggested)
Lastly, it will future-proof your architecture and will enable you
to quickly scale up your infrastructure if need be
As you said you have a single server and do not get much traffic add a load balancer to your server.
You can enable health checks so that by integrating it with SNS you will get notified if a health check fails( server unhealthy)
By adding WAF to your application load balancer you can monitor HTTP/S requests and control access to web applications.
It depends upon your requirement like with WAF you can
Block or allow traffic to your application from a specific region
Block or allow traffic to your application from a specified IP range
You can mention the specific number of requests to your application within 5 minutes if it exceeds you can block or count.

AWS ECS Fargate Target Group Failing HealthChecks

The SpringBoot application is running as an ECS Task in a ECS Service of an AWS Fargate Cluster. The ECS Service is LoadBalanced as such the Tasks spawned by the Services are automatically registered to a target group.
I am able to call the Health endpoint via API Gateway => VPC Link => Network ELB => Application ELB => ECS Task, as shown below:
However, the HealthChecks seem to be failing and as such, the tasks are being deregistered continously resulting in totally unusable setup.
I have made sure to configure the HealthCheck of the Target Group point towards the right endpoint URL, as shown below:
I also made sure that the Security Group that the Fargate Tasks belong in allows traffics from the Application Load Balancer, as shown below:
But somehow, the HealthChecks kept failing and the tasks are being deregistered, and I'm very confused!
Your help is much appreciated!
The problem actually is with the Health Check Intervals (30 seconds) and Threshold (2 checks) which is too frequent when the Task is just starting up and is unable to respond to the HTTP request.
So, I increased the interval and the threshold, and everything is fine now!

AWS Elastic Beanstalk Auto Scaling configuration

I have 2 machines running under an Elastic Beanstalk environment.
One of them is down since the last deployment.
I was hoping that the auto scaling configuration will initiate a new machine due to having a single machine available.
That didn't happen and I'm trying to figure out what's wrong with my auto scaling configuration:
The first thing I see is that your rules contradict each other. It says if the number of unhealthy hosts are above 0, add a single host. If they are below 2, remove a single host. That may explain why you aren't seeing anything happening with your trigger.
Scaling triggers are used to bring in, or reduce, EC2 instances in your Auto Scaling group. This would be useful to bring in an additional instance(s) to maintain the same amount of computational power for your application while you investigate what caused the bad instance to fail. But this will not replace the instance.
To setup your instances to terminate after a certain period of being unhealthy you can follow the documentation here.
By default ELB pings port 80 with TCP, this is what determines the "health" of the EC2 instance, along with the on host EC2 instance status check. You can specify a Application health check URL to setup a customized health check that your application returns. Check out the more detailed customization of Beanstalk ELBs here.