AWS ECS upgrade to new task definitions kills long running outgoing connections - amazon-web-services

We are using Celery for asynchronous tasks that keep a connection open to a remote server. These Celery jobs can run for up to 10 minutes.
When we deploy a new version of our code, AWS ECS won't wait for these jobs to be ready, so it kills the instances with the Celery workers before they are ready.
One solution is to tell Celery to retry it if it failed, but that could potentially cause other problems.
Is there a way to avoid this? Can we instruct AWS ECS to wait for completion of outgoing connections? Any other way to approach this?

Related

How to avoid redis queu lose if a redis task ends

I have a service running always one task which loads a docker Redis image. This service runs in an ECS.
Now my question is: Given this configuration, if the Redis task ends, it will load a new Redis task. However, I understand the queu Redis has will be lost. Is there any way to keep the Redis queu so the next Redis task will take it? Do I have to use any other Amazon system? (Elasticache?)

Does AWS ECS internally maintain tasks queue

We are in the development phase. So we are using AWS ECS cluster consisting of 2 EC2 instances. We are submitting tasks to ECS cluster using ECSOperator of Airflow. We are looking to scale this process. So we are going to use Celeryexecutor of Airflow which is used to concurrently submit and schedule tasks on Airflow.
So the question is, should we care about number of task submitted to ECS or irrespective of number of tasks submitted to ECS, it will service all the tasks without failure by any internal queuing mechanism?

Using AWS ECS service tasks as disposable/consumable workers?

Right now I have a web app running on ECS and have a pretty convoluted method of running background jobs:
I have a single task service that polls an SQS queue. When it reads a message, it attempts to place the requested task on the cluster. If this fails due to lack of available resources, the service backs off/sleeps for a period before trying again.
What I'd like to move to instead is as follows:
Run a multi task worker service. Each task periodically polls the queue. When a message is received it runs the job itself (as opposed to trying to schedule a new task) and then exits. The AWS service scheduler would then replenish the service with a new task. This is analogous to gunicorn's prefork model.
My only concern is that I may be abusing the concept of services - are planned and frequent service task exits well supported or should service tasks only exit when something bad happens like an error
Thanks

Celery Beat on Amazon ECS

I am using Amazon Web Services ECS (Elastic Container Service).
My task definition contains Application + Redis + Celery and these containers are defined in task definition. Automatic scaling is set, so at the moment there are three instances with same mirrored infrastructure. However, there is a demand for a Celery Beat instance for scheduled tasks, so Celery Beat would be a great tool, since Celery is already in my infrastructure.
But here is the problem: if I add Celery Beat container together with other containers (add it to task definition), it will be mirrored and multiple instances will execute same scheduled tasks at the same moment. What would be a solution to this infrastructure problem? Should I create a seperate service?
We use single-beat to solve this problem and it works like a charm:
Single-beat is a nice little application that ensures only one
instance of your process runs across your servers.
Such as celerybeat (or some kind of daily mail sender, orphan file
cleaner etc...) needs to be running only on one server, but if that
server gets down, well, you go and start it at another server etc.
You should still set the number of desired tasks for the service to 1.
You can use ECS Task Placement strategy to place your Celery Beat task and choose "One Task Per Host". Make sure to choose Desire state to "1". In this way, your celery beat task will run only in 1 container in your cluster.
Ref:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_run_task.html
The desired task is the number of tasks you want to run in the cluster. You may set the "Number of tasks" while configuring the service or in the run task section. You may refer the below links for references.
Configuring service:
Ref:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/create-service.html
Run Task:
Ref:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_run_task.html
Let me know if you find any issue with it.

Django + Celery on Amazon AWS - Using separate EC2 instances as workers

I have a Django application. I am using Celery to run long-running processes on the background. Both the application and celery workers run on the same machine.
Now we are moving our servers to AWS. On AWS, we want to create a setup like the following:
We have n EC2 instances that run the app servers, and we have m EC2 instances as workers. When we need to do a long-running process, app server sends this job to the worker, and the worker processes the job. But the job is dependent on Django models and the database.
How can we setup the workers to enable them run these django model dependent jobs?
This is not AWS specific.
You have to:
make sure every server has same version of app code
all workers spread across servers use same task broker and result backend
workers can connect to your DB (if it's needed)
More verbose advise for config needs additional info :)
Another approach to this would be to use EC2 Container Service, with two different running docker containers, one for the app and one for the worker.