How does the fargte lifecycle handle multiple requests? - amazon-web-services

We have a lambda that will fire requests to another system and I am thinking about using Fargate for this system. What I want to know is whether Fargate will spin up with every request sent to it(like a lambda), or whether it will spin up once and stay alive to handle subsequent requests from the lambda.
Each lambda invocation will only fire one request to the Fargate system.
There will be many requests, but will be dormant during night times. How does Fargate handle spinning up and down between requests?

I've found extended answer in the article Concurrency Compared: AWS Lambda, AWS App Runner, and AWS Fargate:
AWS Fargate is similar to AWS App Runner in that each container can serve many concurrent requests.
This means that if your load balancer receives a large spike of traffic then the requests will be distributed across all the available containers.
However, it is up to you to use the metrics to define your own scaling rules. You can create scaling rules based on metrics that ECS captures, such as application CPU or memory consumption. Or you can create scaling rules based on metrics from the load balancer, such as concurrent requests or request latency. You can even create custom scaling metrics powered by your application itself. This gives you maximum control over the scaling and concurrency of your application.

Related

How to define rate limiting for ECS fargate service?

I am developing a service using ECS on fargate. This service calls AWS textract to process documents. The service listens to a SQS with requests to process.
As per textract documentation: https://docs.aws.amazon.com/general/latest/gr/textract.html, there is a limit on TPS per account at which the textract apis can be called.
Since there are "no visible instances" in fargate and all architecture from launching instances to auto scale them is managed by fargate, I am confused on how should I define rate limiter while calling AWS textract apis.
For instance suppose there comes 1000 message requests in the SQS, the service(internally) might spawn multiple instances to process the requests and the TPS at which textract is called might easily exceed the limits.
Is there any way to define a rate limiter for such scenario such that it blocks the request if limits are breached?
Fargate doesn't handle autoscaling for you at all. Your description of how Fargate works sounds more like Lambda than Fargate. ECS handles the autoscaling, not Fargate. Fargate just runs the containers ECS tells it to run. In ECS you would have complete control over the autoscaling settings, such as the maximum number of Fargate tasks that you want to run in your ECS service.

Is AWS Fargate true serverless like Lambda? Does it automatically shutsdown when it finishes the task?

This really wasn't clear for me in the Docs. And the console configuration is very confusing.
Will a Docker Cluster running in Fargate mode behind a Load Balancer shutdown and not charge me while it's not being used?
What about cold starts? Do I need to care about this in Fargate like in Lambda?
Is it less horizontal than Lambda? A lambda hooked to API Gateway will spawn a new function for every concurrent request, will Fargate do this too? Or will the load balancer decide it?
I've been running Flask/Django applications in Lambda for some time (Using Serverless/Zappa), are there any benefits in migrating them to Fargate?
It seems to be that it is more expensive than Lambda but if the Lambda limitations are not a problem then Lambda should always be the better choice right?
Will a Docker Cluster running in Fargate mode behind a Load Balancer shutdown and not charge me while it's not being used?
This will depend on how you configure your AutoScaling Group. If you allow it to scale down to 0 then yes.
What about cold starts? Do I need to care about this in Fargate like in Lambda?
Some good research has been done on this here: https://blog.cribl.io/2018/05/29/analyzing-aws-fargate/
But the takeaway is for smaller instances you shouldnt notice any more and ~40seconds time to get to a running state. For bigger ones this will take longer.
Is it less horizontal than Lambda? A lambda hooked to API Gateway will spawn a new function for every concurrent request, will Fargate do this too? Or will the load balancer decide it?
ECS will not create a new instance for every concurrent request,any scaling will be done off the AutoScaling group. The load balancer doesnt have any control over scaling, it will exclusively just balance load. However the metrics which it can give can be used to help determine if scaling is needed
I've been running Flask/Django applications in Lambda for some time (Using Serverless/Zappa), are there any benefits in migrating them to Fargate?
I havent used Flask or Django, but the main reason people tend to migrate over to serverless is to remove the need to maintain the scaling of servers, this inc managing instance types, cluster scheduling, optimizing cluster utilization
#abdullahkhawer i agree to his view on sticking to lambdas. Unless you require something to always be running and always being used 99% of the time lambdas will be cheaper than running a VM.
For a pricing example
1 t2.medium on demand EC2 instance = ~$36/month
2 Million invocations of a 256MB 3 second running lambda = $0.42/month
With AWS Fargate, you pay only for the amount of vCPU and memory resources that your containerized application requests from the time your container images are pulled until the AWS ECS Task (running in Fargate mode) terminates. A minimum charge of 1 minute applies. So, you pay until your Task (a group of containers) is running, more like AWS EC2 but on a per-minute basis and unlike AWS Lambda where you pay per request/invocation.
AWS Fargate doesn't spawn containers on every request as in AWS Lambda. AWS Fargate works by simply running containers on a fleet of AWS EC2 instances internally managed by AWS.
AWS Fargate now supports the ability to run tasks on a scheduled basis and in response to AWS CloudWatch Events. This makes it easier to launch and stop container services that you need to run only at a certain time to save money.
Keeping in mind your use case, if your applications are not making any problems in the production environment due to any AWS Lambda limitations then AWS Lambda is the better choice. If the AWS Lambda is being invoked too much (e.g., more than 1K concurrent invocations at every point of time) in the production environment, then go for AWS EKS or AWS Fargate as AWS Lambda might cost you more.

EC2 to Lambda forwarding based on current usage

I am having long cold start times on our Lambda functions. We have tried "pinging" the Lambdas to keep them warm but that can get costly and seems like a poor way to keep performance up. We also have an EC2 instance running 24/7. I could theoretically "mirror" all of our Lambda functions to our EC2 instance to respond with the same data for our API call. Our Lambdas are on https://api.mysite.com and our EC2 is https://dev.mysite.com.
My question is could we "load balance?" traffic between the two. (Create a new subdomain to do the following) To have our dev subdomain (EC2) respond to all requests up until a certain "requests per minute" is hit. Then start to route traffic to our dev subdomain (Lambda) since we have enough traffic coming in to keep the Lambdas hot. Once traffic slows down, we move the traffic back over to our EC2.. Is this possible?
No, you can not do that with AWS Load balancer. What you can do is setup cloudwatch trigger to the lambda function which will map api.mysite.com to the lambda load balancer dns. Similarly, you can add trigger for low traffic which will redo dns entry.
If you are aware of rise in traffic, you can scheduled instances. Else you can also try AWS Fargate.
Hope this helps you.
Cloudwatch allows you to choose the number of triggers you wanna set for a lambda i.e. whenever an API is called, cloudwatch will trigger those many of the lambdas. This is one way you could achieve it. There's no way yet, to load balance between lambdas and instances.
You can configure your API to use Step functions which will orchestrate your lambda.

Connecting Cassandra from AWS Lambda

We are checking the feasibility of migrating one of our application to Amazon Web Services (AWS) . We decide to use AWS API Gateway to expose the services and AWS Lambda (java) for back end data processing. The lambda function has to fetch a large amount of data from our database.
Currently using Cassandra for data storage, which has been set up with in an EC2 instance and it has no public ip.
Can anyone suggest a way to access Cassandra(EC2) from AWS Lambda using the private Ip ( 10.0.x.x)?
Is it a right choice to use AWS Lambda for large scale applications?
Since your Cassandra instance is using private IP, you will need to configure your AWS lambda Network to use a VPC. It could be the VPC you are running Cassandra in, or a VPC you create for the purpose of your lambdas, and that you VPC-peer to your cassandra VPC. A few things to note from the documentation :
when your lambda runs in a VPC, it doesn't have internet access by default, you will need to configure a NAT for that.
There is an additional latency due to the configuration of the ENI (you only pay that penalty on cold start)
You need to make sure your lambda has the right permission to manage the ENI, you should use this role: AWSLambdaVPCAccessExecutionRole
Your plan to use API / AWS lambda has at least 3 potential issues which you need to consider carefully:
Cost. API gateway per request cost is higher than AWS lambda per request cost. Make sure you are familiar with the cost.
cold start. When AWS start an underlying container to execute your lambda, you pay a cold start latency (which get worse when using VPC due to the management of the ENI). If you execute your lambda concurrently, there will be multiple underlying containers. Each of them will have this cold start the first time. AWS tends to keep the underlying containers ready for a warm start, for a few minutes (users report 5 to 40 minutes). You might try to keep your container warm by pinging your aws lambda, obviously if you have multiple container in parallel, it is getting tricky.
Cassandra session. You will probably want to avoid creating and destroying your Cassandra session each time you invoke your lambda (costly). I haven't tried yet, but there are reports of keeping the session alive in a warm container, you might want to check this SO answer.
Having say all that, currently the biggest limitation in using AWS lambda is concurrent execution and cold start latency. For data processing, that's usually fine. For user-facing usage, the percentage of slow cold start might affect your user experience.

Automatic Scale down on AWS of Spring Boot Application

I want to deploy a Spring Boot application on Amazon AWS Elastic Beanstalk. The application is a back-end service which exposes a REST API. My concern is what happens when AWS has the automatic scaling enabled. I try with an example:
An instance starts with the application
I call the REST API which activates the process (it can take 20 minutes to be completed)
AWS scales out creating a new instance so that new requests can be processed by this new instance
After a while, AWS decides to reduce the number of instances - scale in, because the memory (or CPU, or Network out, or...) usage is lower than the lower limit
Does AWS check if the application is working? I'd like to avoid that it kills one instance while the application is working (loss of data, job interruption, ...).
You should investigate AWS Auto Scaling Lifecycle Hooks which will allow your application to be notified (via CloudWatch, SNS, or SQS) when AWS wants to scale down its instance and take the appropriate action, including delaying the restart.
http://docs.aws.amazon.com/autoscaling/latest/userguide/lifecycle-hooks.html