Problem: Fargate tasks are being shut down without completing the processes within the task upon scaling in. (Auto Scaling implemented)
Is there a possibility for the Fargate task to exit gracefully (to complete all the processes within the task before shutting it down)?
There is a way in EC2 to handle this through Life cycle hooks but I'm not sure if there is anything similar in the Amazon Fargate cluster.
Capture the SIGTERM signal and do your cleanup in there. You can trap it in your application, using whatever programming language that you want, or trap it in a shell entrypoint script.
For more information see this blogpost from AWS.
In 2022 AWS introduced the Task scale-in protection endpoint.
The following task scale-in protection endpoint path is available to containers: $ECS_AGENT_URI/task-protection/v1/state
Related
I have ECS container running some tasks. The server running inside the task may take 1~10 minutes to complete one request.
I am using SQS for task queuing. When certain amount tasks exceeds it scale-up the ECS tasks. And it scale-down when task in queue go below certain numbers.
However, as there is no LifeCycleHook feature for ECS task, during the time of scale-down the ECS tasks are shut down while the processing is still running. And it's not possible to delay the task termination due to the lack of LifeCycleHook.
According to our specification, we can't use the timeout feature, as we don't know earlier how much time it will take to finish the job.
Please suggest how to solve the problem.
There is no general solution to this problem, especially if you don't want to use timeout. In fact there is long lasting, still open, github issue dedicated to this:
[ECS] [request]: Control which containers are terminated on scale in
You could somehow control this through running your services on EC2 (EC2 scale-in protection), not Fargate. So either you have to re-architect your solution, or manually scale-out and in your service.
I want to run Druid on EKS but was concerned about using EC2 autoscaling groups to scale my middle managers. If every middle manager is running an ingestion task but AWS decides to scale down, will a middle manager be terminated or will there be termination protection in place? If so, what other alternatives to scaling do people suggest?
A signal will be sent to your containers to give them an opportunity to shutdown gracefully. This is part of lifecycle management.
By default, the orchestrator will wait 30 seconds before forcefully stopping the container. You can adjust this by setting terminationGracePeriodSeconds. You can also add hooks like PostStart or PreStop to do any extra operations to ensure consistency in your system.
See also: EC2 Autoscaling lifecycle hooks
Is there a way to ensure an AWS ECS container instance doesn't shut down in the middle of running a critical task?
I have an auto-scaling AWS ECS service that scales the number of instances based on CPU usage. These instances process long-running batch jobs that may take anywhere from 5 to 30 minutes.
The problem is that sometimes, during a scale-down, an instance that's actively running a critical job gets shut down which ultimately causes the job to fail.
You can use a feature called managed termination protection.
When the scaling policy reduces the number of instances, it has no control over which instances actually terminate. The default behavior of the auto-scaling group may well terminate instances that are running tasks, even though there are instances not running tasks. This is where managed termination protection comes into the picture. With this option enabled, ECS dynamically manage instance termination protection on your behalf.
Please have a look at Controlling which Auto Scaling instances terminate during scale in and specifically the section Instance scale-in protection in the AWS documentation.
There is a lot of discussion at work about how our services will be shutdown when they are running in an auto scaling group. The main concern is allowing services to perform some tasks before the instance is terminated. For instance if I add a destroy method to a spring service in java is it reasonable to expect that method to be called before the instance terminates?
<bean class="com.github.moaxcp.service.Service" destroy-method="destroy">
In this case Service will stop accepting data and save current state to s3.
Or what if there is a systemd shutdown script that should run before an instance is terminated. In this case send any remaining logs to s3.
The autoscaling guide mentions that when a scale-in policy is met an EC2 is picked and terminated. Does this mean the instance is not shutdown gracefully? Would our services be able to finish some tasks before the instance is shutdown?
The EC2 Lifecycle gives some detail about what termination does. It first shutsdown the instance and then terminates. In this case the services may be able to finish some tasks before being stopped.
From the documentation it seems as if when the EC2 instance is terminated directly it will shutdown gracefully but when it is terminated by the ASG it will not. Is this true? Is there any documentation about this behavior?
You have got something called Lifecycle Hooks that lets you perform custom actions as an Autoscaling group launches or terminates. The hooks responds to scale-out events and scale-in events.
Check out https://docs.aws.amazon.com/autoscaling/ec2/userguide/lifecycle-hooks.html for more information.
I have an application that is deployed to Tomcat 8 that is hosted on ElasticBeanstalk environment with enabled auto-scaling. In the application I have long-running jobs that must be finished and all changes must be committed to a database.
The problem is that AWS might kill any EC2 instance during scale in and then some jobs might be not finished as it is expected. By default, AWS waits just 30 seconds and then kill the Tomcat process.
I've already changed /etc/tomcat8/tomcat8.conf file: set parameter SHUTDOWN_WAIT to 3600 (60 by default). But it didn't fix the issue - the whole instance is killed after 20-25 minutes.
Then I've tried to configure lifecycle hook via .ebextensions file (as it's explained here). But I couldn't approve that the lifecycle hook really postpones termination of the instance (still waiting for an answer from AWS support about that).
So the question is: do you know any "legal" ways to postpone or cancel instance termination when the autoscaling group scales in?
I want to have something like that:
AWS starts to scale in the autoscaling group
autoscaling group sends shutdown signal to the EC2 instance
EC2 instance starts to stop all active processes
Tomcat process receives a signal to shutdown, but waits until the active job is finished
the application commits the job result (it might take even 60 minutes)
Tomcat process is terminated
EC2 instances in terminated
Elastic Beanstalk consists of two parts - API, and Worker. API is auto scaled, so it can go down. Worker is something that runs longer. You can communicate between them with SQS. That is how they designed it.
For sure you can tweak the system. That is platform as a service, so you can force auto scaling group not to go down - by setting min instances to max. Also you can turn off health check - that can also kill instance... But that is hacking, latter can kick.