AWS ECS Fargate with an Application Load Balancer - amazon-web-services

I am using ECS Fargate to execute java code in threads. Is it possible to have an ALB stop sending requests to a Fargate task once I have a certain number of java threads running in that task? And then, as my threads finish I would allow the ALB to send another request.
I see an ALB performs health checks and if an task fails a health check the task is temporarily put out of the target group. But I don't see a way I would be able to utilize this as these health checks are timed(not checked before every sent request) and there is no way to manually send a fail signal.
Does anyone know a way in which I could force ALB to stop sending requests to a task once I have my set limit of threads in that task met.

Related

How does ALB distribute requests to Fargate service during rolling update deployment?

I deploy a Fargate service in a cluster and use rolling update deployment. I configured an ALB in front of my service, and it is doing a health check as well. During the upgrade, I can see that my current task is marked as INACTIVE, and the new task is deployed. Both of the two tasks are in running state.
I understand that the ALB is doing a health check on the newly deployed tasks, but it keeps two tasks running for 5 minutes.
I have a few questions about this deployment period of time.
Does ALB distribute user requests to my new tasks before passing health check?
If the answer for the first question is no, Does ALB distribute user requests to the new service after passing health check before the old services is down?
If the second answer is yes, then there will be two versions of tasks running inside my service to serve user requests for 5 minutes. Is this true? How can I make sure it only send requests to one service at a time.
I don't want to change the deployment method to BLUE/GREEN. I want to keep the rolling update at the moment.
ALB will not send traffic to a task that is not yet passing health checks, so no to #1. ALB will send traffic to both old and new whilst deploying, so yes to #2. As soon as a replacement task is available ALB will start to drain the task it is replacing. The default time for that is 5 minutes. During that time the draining instance will not receive traffic, so sort of no to #3. The sort of part is that you will have some time with version A and B of your service will both be deployed. How long that is depends on the number of tasks and how long it takes for them to start to receive traffic.
The only way I can think of to send all traffic to one version and then hard cut over to the other is to create a completely new target group each time, keeping the old one active. Then, once the new target group is running switch to it. You'd have to change the routes in the ALB as you do that.
By the way, what is happening now is what I would call a rolling deployment.

AWS - Load balancing for ECS service with hard connections limit per container

I have a container deployed on ECS Fargate as a service.
The container should serve long HTTP Websocket connections and perform real time processing. Each connection can live from few minutes to few hours in different use cases.
Each container can serve up to constant amount of connections simultaneously (e.g max 10 connection) to be able to process to input in real-time.
AWS Application Load balancer is at the front of this service.
On regular autoscaling rules - containers number can be scaled out or down by monitoring CPU.
This Application Load balancer is using round robin routing algo for each incoming request.
My question :
Having the requirement of constant HARD limit of connections per container, how can I enforce ALB not to route new connection to a container with no available connection slot?
The service itself inside the container - can it tell ALB that it is closed for new connections? By specific HTTP response maybe?
Is there any other good practice to handle this requirement?
You will need to write your own code for this.
A possible solution is to combine:
Auto Scaling
Lifecycle hooks
Container Instance Draining.
Your code will need to detect how many connections it is processing. When the number hits your limit of 10, remove the container from the auto scaling group. By using Lifecycle hooks, you can keep the container alive. Once your 10 connections reach 0, complete the termination of the container.
Note this will cause a new container to be launched while you are draining the container that has reached its peak.
I don't know of another method to tell the ALB to stop sending traffic to a specific container without removing it. They key is the draining and termination lifecycle part as you want the container to continue to have its connections to the client.

Amazon AWS WebSocket Load Balancing Scale-In

We are in the process of developing a WebSocket application that will run on the same application servers that serve our APIs, which are all within a target group of a new Amazon Application Load Balancer.
I'm not certain that a sticky session would even be needed once the socket is upgraded, however, using listener forwarding rules should make easy work of that.
My concern comes from scaling actions performed during auto scaling of the target groups, specifically the scale-in action. As currently scaling actions are based on RequestCountPerTarget, when instances get terminated because that metric isn't above the threshold, this doesn't guarantee that the instance has no active WebSocket connections.
I'm assuming this would mean that when the instance is shutdown and terminated, those socket connections would abruptly be interrupted.
What would be the best way to combat this?
Is there another metric that I could use to auto scale-out the group based on the number of active connections per target to better facilitate WebSocket scaling in addition to API requests?
I thought about creating an SNS topic and a LifecycleHook for instance termination on the auto scaling group that I could handle in the WebSocketHandler to send a message to all sockets on that server that they will need to disconnect and connect to another server in the load balancer.

Grace Period? - AWS EC2 Container Service and Elastic Load Balancers

When an elastic load balancer (ELB) is associated with an auto-scaling group, it is possible to specify a grace period during which new EC2 instances will not be terminated even if they are marked as unhealthy by the ELB. Is it possible to specify a similar grace period, during which new ECS tasks will not be killed and restarted by their associated ECS service, even if the ECS instance on which a task is running has been marked unhealthy by the ELB?
Update:
In our current use case, the docker container being run as an ECS task contains a JBoss instance that loads a number of caches on startup. These caches can take several minutes to load. However, the ECS service registers the container instance with the ELB, as soon as the container has started. This means that traffic can be routed to the new container before it is ready to accept it. We could increase the health check interval and the "healthy/unhealthy thresholds" on the ELB to prevent the ELB from routing traffic to the instance and the ECS service from restarting the container until the caches have been loaded. However, increasing the health check interval and thresholds is not desirable, because if an instance is marked as unhealthy after the caches have been loaded, the ECS service should restart the container as soon as possible (which necessitates a shorter health check interval and smaller thresholds).
Thus, is it possible to apply a grace period during which traffic will not be routed to a new container by the ELB and the ECS service will not restart the container (even if it fails the health checks)? Or failing that, are there any suggestions regarding a solution for our use case?
In case anyone else finds themselves here via google, in the linked support thread, it is noted that this has been added to AWS, it is called healthCheckGracePeriodSeconds https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_CreateService.html#ECS-CreateService-request-healthCheckGracePeriodSeconds
After a discussion with the support team, it turns out that ECS cannot support our current use case.
There is a workaround that solves one of the issues we are facing. That workaround is to create a separate, essential, health-check container and in the same ECS task as the actual application container. The purpose of the health-check container is to monitor the application container to determine when the application has been started completely. If it detects that the application has failed to start, it exits, causing the ECS service to cycle the task. The ELB is then configured to perform its health checks against the health-check container, which will always report that it is up via the relevant port. This workaround will prevent the ECS service from cycling the ECS task due to failed health checks.
However, the ELB will begin routing traffic to the application container immediately. It will do so, even if the application container is not yet ready to receive traffic (for example, because it is still waiting for a cache to load). Currently, there is no way to delay the ELB from sending traffic to the application container, as the ECS service provides no support a grace period. We have managed to workaround this issue by providing messages to our application containers via SQS and only having them pull from the queue when their caches are fully loaded. However, we have future use cases (such as serving web requests) where this is not a feasible option. To this end, I intend to raise a feature request for the grace period.
As an aside, both Kubernetes (http://kubernetes.io/v1.0/docs/user-guide/walkthrough/k8s201.html#application-health-checking) and Marathon (https://mesosphere.github.io/marathon/docs/health-checks.html) already support this option for health checking, if someone reading this is happy not to use a managed service.
Use env var ECS_CONTAINER_STOP_TIMEOUT
See https://github.com/aws/amazon-ecs-agent/issues/126

Prevent machine on Amazon from shutting down before all users finished tasks

I'm planning a server environment on AWS with auto scaling over VPC.
My application has some process that is done in several steps on server, and the user should stick to the same server by using ELB's sticky session.
The problem is, that when the auto scaling group suppose to shut down server, some users may be in the middle of the process (the process takes multiple request - for example -
1. create an album
2. upload photos to the album each at a time
3. convert photos to movie and delete photos
4. store movie on S3)
Is it possible to configure the ELB to stop passing NEW users to the server that is about to shut down, while still passing previous users (that has the sticky session set)?, and - is it possible to tell the server to wait for, let's say, 10 min. after the shutdown rule applied before it actually shut down?
Thank you very much
This feature hasn't been available in Elastic Load Balancing at the time of your question, however, AWS has meanwhile addressed the main part of your question by adding ELB Connection Draining to avoid breaking open network connections while taking an instance out of service, updating its software, or replacing it with a fresh instance that contains updated software.
Please not that you still need to specify a sufficiently large timeout based on the maximum time you expect users to finish their activity, see Connection Draining:
When you enable connection draining for your load balancer, you can set a maximum time for the load balancer to continue serving in-flight requests to the deregistering instance before the load balancer closes the connection. The load balancer forcibly closes connections to the deregistering instance when the maximum time limit is reached.
[...]
If your instances are part of an Auto Scaling group and if connection draining is enabled for your load balancer, Auto Scaling will wait for the in-flight requests to complete or for the maximum timeout to expire, whichever comes first, before terminating instances due to a scaling event or health check replacement. [...] [emphasis mine]
The emphasized part confirms that it is not possible to specify an additional timeout that only applies after the last connection has been drained.