ECS graceful shutdown - amazon-web-services

ECS graceful shutdown - amazon-web-services

I am beginner to ECS. My team has an RESTful API service processing asynchronous jobs on top ECS using EC2. The main API is to create some assets n number of times. Depending on size of n, the API time can take upto many hours. During deployment, there is abrupt termination of these processes, we hence need to do the manual work of noting down the processes that are ongoing and then send these requests again after the deployment to prevent jobs requests being lost. I read this article Graceful shutdowns with ECS and it seems one option is to listen to SIGTERM signal, make note of the assets remaining to be made and then terminate the hosts. After deployment, when the hosts start up, they can check if any such requests before the deployment were pending and do those first. Could someone please tell if there is a better way to handle this? If not, could someone please tell an example or some references or more details how above approach can be implemented?
Thanks

Related

When to use AWS copilot Worker Service?

I've read the docs about Services and in particular Worker Service. I am confused about when to use the Worker Service.
Say my worker consumes messages from an SNS+SQS which arrive once in three days, but once arrived it may take 15-20 minutes to process all of them (1 message takes no more than 60 seconds to process). So, my worker is kind of idle for most time. Does it mean I need to pay for my worker service even when it is idle? Lambdas seem to be more suitable in this case but the worker service does not use lambdas.
Note: Spots do not work since the jobs are NOT fault-tolerant.

Does it mean I need to pay for my worker service even when it is idle?
Yes. ECS is charged based on the amount of CPU/RAM you have reserved. You are reserving physical resources for your container, and you will be paying for them regardless of how much you are making use of them. Besides which, your container won't be entirely idle, it will have to be constantly polling SQS.
If you are deploying to EC2 targets then this may not be a big deal, since you would simply be running another container on an EC2 instance you are already paying for, but if you are deploying to Fargate then it is definitely an added expense.
Lambdas seem to be more suitable in this case but the worker service does not use lambdas.
Then I suggest deploying this portion of your application as a Lambda function. Just because you are using Copilot to deploy some containers to AWS doesn't mean you should limit yourself to only using the services supported by CoPilot.

AWS ECS does not drain connections or remove tasks from Target Group before stopping them

I've been experiencing this with my ECS service for a few months now. Previously, when we would update the service with a new task definition, it would perform the rolling update correctly, deregistering them from the target group and draining all http connections to the old tasks before eventually stopping them. However, lately ECS is going straight to stopping the old tasks before draining connections or removing them from the target group. This is resulting in 8-12 seconds of API down time for us while new http requests continue to be routed to the now-stopped tasks that are still in the target group. This happens now whether we trigger the service update via the CLI or the console - same behaviour. Shown here are a screenshot showing a sample sequence of Events from ECS demonstrating the issue as well as the corresponding ECS agent logs for the same instance.
Of particular note when reviewing these ECS agent logs against the sequence of events is that the logs do not have an entry at 21:04:50 when the task was stopped. This feels like a clue to me, but I'm not sure where to go from here with it. Has anyone experienced something like this, or have any insights as to why the tasks wouldn't drain and be removed from the target group before being stopped?
For reference, the service is behind an AWS application load balancer. Happy to provide additional details if someone thinks of what else may be relevant

It turns out that ECS changed the timing of when the events would be logged in the UI in the screenshot. In fact, the targets were actually being drained before being stopped. The "stopped n running task(s)" message is now logged at the beginning of the task shutdown lifecycle steps (before deregistration) instead of at the end (after deregistration) like it used to.
That said, we were still getting brief downtime spikes on our service at the load balancer level during deployments, but ultimately this turned out to be because of the high startup overhead on the new versions of the tasks spinning up briefly pegging the CPU of the instances in the cluster to 100% when there was also sufficient taffic happening during the deployment, thus causing some requests to get dropped.
A good-enough for now solution was to adjust our minimum healthy deployment percentage up to 100% and set the maximum deployment percentage to 150% (as opposed to the old 200% setting), which forces the deployments to "slow down", only launching 50% of the intended new tasks at a time and waiting until they are stable before launching the rest. This spreads out the high task startup overhead to two smaller CPU spikes rather than one large one and has so far successfully prevented any more downtime during deployments. We'll also be looking into reducing the startup overhead itself. Figured I'd update this in case it helps anyone else out there.

Custom Worker Prioritization on Elastic Beanstalk

I have an Elastic Beanstalk setup where I want to do two things:
Have all workers prioritize certain jobs (premium > free)
Have some workers only do specific jobs (enterprise worker does only enterprise jobs)
The workers use the SQS daemon that fetches from the queue and I'm not sure if and how to modify them.
How would you achieve this using Elastic Beanstalk?

The main EB adventure is that it is out-of-the-box system you setup in minutes. The disadvantage is the you have limited control over it.
What you described could be achieved on the worker environment. I think you could disable the worker daemon, and handle all the message processing yourself in your up according to your criteria.
You could also create multiple queues if you want using by using Resources setup options.
However, the futher you deviate from its behavior, the more management you will have to do yourself. Subsequently, you may get to the point where it is simply easier to make your own environment for processing your messages outside of EB.

With SQS this is usually accomplished by having multiple queues. You could have one for Enterprise, one for Premium, and one for Free. Then have your worker check them in that order (and depending on your application, perhaps have some worker that only check Enterprise/Premium/Free. This may depend how long your jobs take and what your user's expectations are).
I do not know exactly how to set this up in Elastic Beanstalk, but hopefully this is enough to get you started.

Difference between worker and service in the context of infrastructure primitives

I am building infrastructure primitives to support workers and http services.
workers are standalone
http services have a web server and a load balancer
The way I understand it, a worker generally pull from an external resource to consume tasks while a service handles inbound requests and talks to upstream services.
Celery is an obvious worker and a web app is an obvious service. The lines can get blurry though and I'm not sure what the best approach is:
Is the worker/service primitive a good idea?
What if there's a service that consumes tasks like a worker but also handles some http requests to add tasks? Is this a worker or a service?
What about services that don't go through nginx, does that mean a third "network" primitive with an NLB is the way to go?
What about instances of a stateful service that a master service connects to? The master has to know the individual agent instances so we cannot
hide them behind a LB. How would you go about representing that?

Is the worker/service primitive a good idea?
IMO, the main difference between service and worker can be that workers should be only designated to one task but service can perform multiple tasks. A service can
utilize a worker or a chain of workers to process the user request.
What if there's a service that consumes tasks like a worker but also handles some http requests to add tasks?
Services can be of different forms like web service, FTP service, SNMP
service or processing service. Writing the processing logic in service may not be a good idea unless it is taking the form of a worker.
What about services that don't go through nginx, does that mean a third "network" primitive with an NLB is the way to go?
I believe you are assuming service to be only HTTP based but as I mentioned in the previous answer, services can be of different types. Yes you may write a TCP service for a particular protocol implementation that can be attached behind an NLB
What about instances of a stateful service that a hub service connects to? The hub has to know the individual instances so we cannot hide them behind a LB.
Not sure what you mean by hub here? But good practice for a scalable architecture is to use stateless servers/services behind. The session state should not be stored in the service memory but should be serialized to a data store like DynamoDB.

One way to see the difference is to look at their names - workers do what their name says - they perform (typically) heavy tasks (work) - something you do not want your service to be bothered with, especially if it is a microservice. - For this particular reason, you will rarely see 32-core machines with hundred GBs of RAM running services, but you will very often see them running workers. Finally, they complement each other well. Services off-load heavy tasks to the workers. This aligns with the well-known UNIX philosophy "do one thing, and do it well".

Launch and shutting down instances suited for AWS ECS or Kubernetes?

I am trying to create a certain kind of networking infrastructure, and have been looking at Amazon ECS and Kubernetes. However I am not quite sure if these systems do what I am actually seeking, or if I am contorting them to something else. If I could describe my task at hand, could someone please verify if Amazon ECS or Kubernetes actually will aid me in this effort, and this is the right way to think about it?
What I am trying to do is on-demand single-task processing on an AWS instance. What I mean by this is, I have a resource heavy application which I want to run in the cloud and have process a chunk of data submitted by a user. I want to submit a this data to be processed on the application, have an EC2 instance spin up, process the data, upload the results to S3, and then shutdown the EC2 instance.
I have already put together a functioning solution for this using Simple Queue Service, EC2 and Lambda. But I am wondering would ECS or Kubernetes make this simpler? I have been going through the ECS documenation and it seems like it is not very concerned with starting up and shutting down instances. It seems like it wants to have an instance that is constantly running, then docker images are fed to it as task to run. Can Amazon ECS be configured so if there are no task running it automatically shuts down all instances?
Also I am not understanding how exactly I would submit a specific chunk of data to be processed. It seems like "Tasks" as defined in Amazon ECS really correspond to a single Docker container, not so much what kind of data that Docker container will process. Is that correct? So would I still need to feed the data-to-be-processed into the instances via simple queue service, or other? Then use Lambda to poll those queues to see if they should submit tasks to ECS?
This is my naive understanding of this right now, if anyone could help me understand the things I've described better, or point me to better ways of thinking about this it would be appreciated.

This is a complex subject and many details for a good answer depend on the exact requirements of your domain / system. So the following information is based on the very high level description you gave.
A lot of the features of ECS, kubernetes etc. are geared towards allowing a distributed application that acts as a single service and is horizontally scalable, upgradeable and maintanable. This means it helps with unifying service interfacing, load balancing, service reliability, zero-downtime-maintenance, scaling the number of worker nodes up/down based on demand (or other metrics), etc.
The following describes a high level idea for a solution for your use case with kubernetes (which is a bit more versatile than AWS ECS).
So for your use case you could set up a kubernetes cluster that runs a distributed event queue, for example an Apache Pulsar cluster, as well as an application cluster that is being sent queue events for processing. Your application cluster size could scale automatically with the number of unprocessed events in the queue (custom pod autoscaler). The cluster infrastructure would be configured to scale automatically based on the number of scheduled pods (pods reserve capacity on the infrastructure).
You would have to make sure your application can run in a stateless form in a container.
The main benefit I see over your current solution would be cloud provider independence as well as some general benefits from running a containerized system: 1. not having to worry about the exact setup of your EC2-Instances in terms of operating system dependencies of your workload. 2. being able to address the processing application as a single service. 3. Potentially increased reliability, for example in case of errors.
Regarding your exact questions:
Can Amazon ECS be configured so if there are no task running it
automatically shuts down all instances?
The keyword here is autoscaling. Note that there are two levels of scaling: 1. Infrastructure scaling (number of EC2 instances) and application service scaling (number of application containers/tasks deployed). ECS infrastructure scaling works based on EC2 autoscaling groups. For more info see this link . For application service scaling and serverless ECS (Fargate) see this link.
Also I am not understanding how exactly I would submit a specific
chunk of data to be processed. It seems like "Tasks" as defined in
Amazon ECS really correspond to a single Docker container, not so much
what kind of data that Docker container will process. Is that correct?
A "Task Definition" in ECS is describing how one or multiple docker containers can be deployed for a purpose and what its environment / limits should be. A task is a single instance that is run in a "Service" which itself can deploy a single or multiple tasks. Similar concepts are Pod and Service/Deployment in kubernetes.
So would I still need to feed the data-to-be-processed into the
instances via simple queue service, or other? Then use Lambda to poll
those queues to see if they should submit tasks to ECS?
A queue is always helpful in decoupling the service requests from processing and to make sure you don't lose requests. It is not required if your application service cluster can offer a service interface and process incoming requests directly in a reliable fashion. But if your application cluster has to scale up/down frequently that may impact its ability to reliably process.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js