How to define rate limiting for ECS fargate service? - amazon-web-services

I am developing a service using ECS on fargate. This service calls AWS textract to process documents. The service listens to a SQS with requests to process.
As per textract documentation: https://docs.aws.amazon.com/general/latest/gr/textract.html, there is a limit on TPS per account at which the textract apis can be called.
Since there are "no visible instances" in fargate and all architecture from launching instances to auto scale them is managed by fargate, I am confused on how should I define rate limiter while calling AWS textract apis.
For instance suppose there comes 1000 message requests in the SQS, the service(internally) might spawn multiple instances to process the requests and the TPS at which textract is called might easily exceed the limits.
Is there any way to define a rate limiter for such scenario such that it blocks the request if limits are breached?

Fargate doesn't handle autoscaling for you at all. Your description of how Fargate works sounds more like Lambda than Fargate. ECS handles the autoscaling, not Fargate. Fargate just runs the containers ECS tells it to run. In ECS you would have complete control over the autoscaling settings, such as the maximum number of Fargate tasks that you want to run in your ECS service.

Related

How can I understand `Nodes` in EKS Fargate?

I deployed a EKS cluster and a fargate profile. Then I deployed a few application to this cluster. I can see these fargate instances are launched.
when I click each of this instance, it shows me some information like os, image etc. But it doesn't tell me the CPU and memory. When I look at fargate pricing: https://aws.amazon.com/fargate/pricing/. It is calculated based on CPU and Memory.
I have used ECS and it is very clear that I need to provision CPU/Memory in service/task level. But I can't find anything in EKS.
How do I know how much resources they are consuming?
With Fargate you don`t have provision, configure or scale virtual machines to run your containers so that they become fundamental compute primitive.
This solution model is called serverless where you are being charged for only the compute resources and storage that are need to execute some piece of your code. It does not mean that there are not server involved in this, it just you don`t need to care about those.
To monitor there those you can use CloudWatch. Below documents describe how this can be achieved:
How do I troubleshoot high CPU utilization on an Amazon ECS task on
Fargate?
How can I monitor high memory utilization for Amazon ECS tasks on
Fargate?
It is worth to mention that Fargate is just a launch type for ECS (Another one is EC2). Please have a look at the diagram in this document for clear image of how those are connected. The CloudWatch metrics are collected automatically for Fargate. If you are using the AKS with Fargate you can monitor them with usage of metrics-addon or prometheus inside your kubernetes cluster.
Here's an example of monitoring Fargate with Prometheus. Notice that it scrapes the metrics from CloudWatch.

How does the fargte lifecycle handle multiple requests?

We have a lambda that will fire requests to another system and I am thinking about using Fargate for this system. What I want to know is whether Fargate will spin up with every request sent to it(like a lambda), or whether it will spin up once and stay alive to handle subsequent requests from the lambda.
Each lambda invocation will only fire one request to the Fargate system.
There will be many requests, but will be dormant during night times. How does Fargate handle spinning up and down between requests?
I've found extended answer in the article Concurrency Compared: AWS Lambda, AWS App Runner, and AWS Fargate:
AWS Fargate is similar to AWS App Runner in that each container can serve many concurrent requests.
This means that if your load balancer receives a large spike of traffic then the requests will be distributed across all the available containers.
However, it is up to you to use the metrics to define your own scaling rules. You can create scaling rules based on metrics that ECS captures, such as application CPU or memory consumption. Or you can create scaling rules based on metrics from the load balancer, such as concurrent requests or request latency. You can even create custom scaling metrics powered by your application itself. This gives you maximum control over the scaling and concurrency of your application.

How to shutdown EC2 instances backed by ECS to save the cost for staging/QA

We have hosted a docker container on AWS ECS with EC2 instances and would like to terminate/showdown these EC2 instances in the night & weekend for Staging/QA to save the cost.
Thanks in advance :)
The AWS Instance Scheduler is a simple AWS-provided solution that enables customers to easily configure custom start and stop schedules for their Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Relational Database Service (Amazon RDS) instances. The solution is easy to deploy and can help reduce operational costs for both development and production environments.
https://aws.amazon.com/solutions/implementations/instance-scheduler/
If you run the instances in an AutoScaling Group (ASG) , you could use scheduled policy to set a desired capacity of the ASG to zero for the off-peak times. A second policy would start it for work time.
Alternative would be setup a CloudWatch Event scheduled rule using cron with target of lambda function. The function would do same as the scaling policy. But because this is lambda function, you could also do some other things there. For example, do some pre-shutdown checks or post-shutdown cleanup.
This will work, because if your tasks run in service, ECS will automatically relaunch your tasks when the instances are back.
You could also manage the number of tasks using scheduling capability of Amazon ECS.

Is AWS Fargate true serverless like Lambda? Does it automatically shutsdown when it finishes the task?

This really wasn't clear for me in the Docs. And the console configuration is very confusing.
Will a Docker Cluster running in Fargate mode behind a Load Balancer shutdown and not charge me while it's not being used?
What about cold starts? Do I need to care about this in Fargate like in Lambda?
Is it less horizontal than Lambda? A lambda hooked to API Gateway will spawn a new function for every concurrent request, will Fargate do this too? Or will the load balancer decide it?
I've been running Flask/Django applications in Lambda for some time (Using Serverless/Zappa), are there any benefits in migrating them to Fargate?
It seems to be that it is more expensive than Lambda but if the Lambda limitations are not a problem then Lambda should always be the better choice right?
Will a Docker Cluster running in Fargate mode behind a Load Balancer shutdown and not charge me while it's not being used?
This will depend on how you configure your AutoScaling Group. If you allow it to scale down to 0 then yes.
What about cold starts? Do I need to care about this in Fargate like in Lambda?
Some good research has been done on this here: https://blog.cribl.io/2018/05/29/analyzing-aws-fargate/
But the takeaway is for smaller instances you shouldnt notice any more and ~40seconds time to get to a running state. For bigger ones this will take longer.
Is it less horizontal than Lambda? A lambda hooked to API Gateway will spawn a new function for every concurrent request, will Fargate do this too? Or will the load balancer decide it?
ECS will not create a new instance for every concurrent request,any scaling will be done off the AutoScaling group. The load balancer doesnt have any control over scaling, it will exclusively just balance load. However the metrics which it can give can be used to help determine if scaling is needed
I've been running Flask/Django applications in Lambda for some time (Using Serverless/Zappa), are there any benefits in migrating them to Fargate?
I havent used Flask or Django, but the main reason people tend to migrate over to serverless is to remove the need to maintain the scaling of servers, this inc managing instance types, cluster scheduling, optimizing cluster utilization
#abdullahkhawer i agree to his view on sticking to lambdas. Unless you require something to always be running and always being used 99% of the time lambdas will be cheaper than running a VM.
For a pricing example
1 t2.medium on demand EC2 instance = ~$36/month
2 Million invocations of a 256MB 3 second running lambda = $0.42/month
With AWS Fargate, you pay only for the amount of vCPU and memory resources that your containerized application requests from the time your container images are pulled until the AWS ECS Task (running in Fargate mode) terminates. A minimum charge of 1 minute applies. So, you pay until your Task (a group of containers) is running, more like AWS EC2 but on a per-minute basis and unlike AWS Lambda where you pay per request/invocation.
AWS Fargate doesn't spawn containers on every request as in AWS Lambda. AWS Fargate works by simply running containers on a fleet of AWS EC2 instances internally managed by AWS.
AWS Fargate now supports the ability to run tasks on a scheduled basis and in response to AWS CloudWatch Events. This makes it easier to launch and stop container services that you need to run only at a certain time to save money.
Keeping in mind your use case, if your applications are not making any problems in the production environment due to any AWS Lambda limitations then AWS Lambda is the better choice. If the AWS Lambda is being invoked too much (e.g., more than 1K concurrent invocations at every point of time) in the production environment, then go for AWS EKS or AWS Fargate as AWS Lambda might cost you more.

Correct way to scailing aws ecs

Now I'm architecting AWS ECS infrastructure.
To auto scale in/out, I used auto scailing.
My system is running on AWS ECS(to deploy docker-compose)
Assume that we have 1 cluster, 1 service with 2 ec2 instance.
I defined scailing policy via CloudWatch if cpu utilization up to 50%.
To autoscailing, we have to apply our policy to ecs service and autoscailing group.
When attach cloudwatch policy to ecs service, it will automatically increase task definition count if cpu utilization up to 50%.
When attach cloudwatch policy to autoscailing group, it will automatically increase ec2 instance count if cpu utilization up to 50%.
After tested it, everything works fine.
But in my service event logs, errors appear like this.
service v1 was unable to place a task because no container instance met all of its requirements. The closest matching container-instance 8bdf994d-9f73-42ec-8299-04b0c5e7fdd3 has insufficient memory available.
I think it occured because of service scailing is start before ec2 instance scailing. (Because service scailing(scale in/out task definition) need to ec2 instance to run it)
But it works fine. Maybe it retry automatically about several times. (I'm not sure)
I wonder that, it is normal configuration on AWS ECS autoscailing?
Or, any missing point in my flow?
Thanks.
ECS can only schedule a service if a container instance is available that matches the containers cpu/memory requirements. Ensure you have this space available to guarantee smooth auto-scaling.
The ec2-asg scaling should happen before service auto-scaling to ensure container instance is available for task scheduler.