Hi I am new to Fargate and confused about its calculation.
How is the 'Average duration' calculated and charged ? is it calculated and charged only for the time between request arriving and return of response or pods are continually running and are charged for 24*7*365 ?
Also does fargate fetches image from ECR every time a request arrives ?
Do fargate costs even when there is no request and nothing is processing ?
What is the correct way of calculating Average duration section ?
This can make huge difference in cost.
You can learn more details from AWS Fargate Pricing and from AWS Pricing Calculator. When you read details from first link that I mentioned, you will find the explanation for duration and there are 3 example in the link.
How is the 'Average duration' calculated and charged ? is it calculated and charged only for the time between request arriving and return of response or pods are continually running and are charged for 247365 ?
Fargate is not a request-based service. Fargate runs your pod for the entire time you ask it to run that pod. It doesn't deploy pods when a request comes in, the pods are running 24/7 (or as long as you have it configured to run).
Fargate is "serverless" in the sense that you don't have to manage the EC2 server the container(s) are running on yourself, Amazon manages the EC2 server for you.
Also does fargate fetches image from ECR every time a request arrives ?
Fargate pulls from ECR when a pod is deployed. It has to be deployed and running already in order to accept requests. It does not deploy a pod when a request comes in like you are suggesting.
Do fargate costs even when there is no request and nothing is processing ?
Fargate charges for the amount of RAM and CPU you have allocated to your pod, regardless of if they are actively processing requests or not. Fargate does not care about the number of requests. You could even use Fargate for doing things like back-end processing services that don't accept requests at all.
If you want an AWS service that only runs (and charges) when a request comes in, then you would have to use AWS Lambda.
You could also look at AWS App Runner, which is in kind of a middle ground between Lambda and Fargate. It works like Fargate, but it suspends your containers when requests aren't coming in, in order to save some money on the CPU charges.
Related
We have a lambda that will fire requests to another system and I am thinking about using Fargate for this system. What I want to know is whether Fargate will spin up with every request sent to it(like a lambda), or whether it will spin up once and stay alive to handle subsequent requests from the lambda.
Each lambda invocation will only fire one request to the Fargate system.
There will be many requests, but will be dormant during night times. How does Fargate handle spinning up and down between requests?
I've found extended answer in the article Concurrency Compared: AWS Lambda, AWS App Runner, and AWS Fargate:
AWS Fargate is similar to AWS App Runner in that each container can serve many concurrent requests.
This means that if your load balancer receives a large spike of traffic then the requests will be distributed across all the available containers.
However, it is up to you to use the metrics to define your own scaling rules. You can create scaling rules based on metrics that ECS captures, such as application CPU or memory consumption. Or you can create scaling rules based on metrics from the load balancer, such as concurrent requests or request latency. You can even create custom scaling metrics powered by your application itself. This gives you maximum control over the scaling and concurrency of your application.
would be great if I could have to answers to the following questions on Google Cloud Run
If I create a cluster with resources upwards of 1vCPU, will those extra vCPUs be utilized in my Cloud Run service or is it always capped at 1vCPU irrespective of my Cluster configuration. In the docs here - this line has me confused Cloud Run allocates 1 vCPU per container instance, and this cannot be changed. I know this holds for managed Cloud Run, but does it also hold for Run on GKE?
If the resources specified for the Cluster actually get utilized (say, I create a node pool of 2 nodes of n1-standard-4 15gb memory) then why am I asked to choose a memory again when creating/deploying to Cloud Run on GKE. What is its significance?
The memory allocated dropdowon
If Cloud Run autoscales from 0 to N according to traffic, why can't I set the number of nodes in my cluster to 0 (I tried and started seeing error messages about unscheduled pods)?
I followed the docs on custom mapping and set it up. Can I limit the requests which cause a container instance to handle it to be limited by domain name or ip of where they are coming from (even if it only artificially setup by specifying a Host header like in the Run docs.
curl -v -H "Host: hello.default.example.com" YOUR-IP
So that I don't incur charges if I get HTTP requests from anywhere but my verified domain?
Any help will be very much appreciated. Thank you.
1: cloud run managed platform always allow 1 vcpu per revision. On gke, also by default. But, only for gke, you can override with --cpu param
https://cloud.google.com/sdk/gcloud/reference/beta/run/deploy#--cpu
2: can you precise what is asked and when performing which operation?
3: cloud run is build on top of kubernetes thank to knative. By the way, cloud run is in charge to scale pod up and down based on the traffic. Kubernetes is in charge to scale pod and node based on CPU and memory usage. The mechanism isn't the same. Moreover the node scale is "slow" and can't be compliant with spiky traffic. Finally, something have to run on your cluster for listening incoming request and serving/scaling correctly your pod. This thing has to run on a no 0 node cluster.
4: cloud run don't allow to configure this. I think that knative also can't. But you can deploy a ESP in front for routing requests to a specific cloud run service. By the way, you split the traffic before and address it to different services, and thus you scale independently. Each service can have a Max scale param, different concurrency param. ESP can implement rate limit.
This really wasn't clear for me in the Docs. And the console configuration is very confusing.
Will a Docker Cluster running in Fargate mode behind a Load Balancer shutdown and not charge me while it's not being used?
What about cold starts? Do I need to care about this in Fargate like in Lambda?
Is it less horizontal than Lambda? A lambda hooked to API Gateway will spawn a new function for every concurrent request, will Fargate do this too? Or will the load balancer decide it?
I've been running Flask/Django applications in Lambda for some time (Using Serverless/Zappa), are there any benefits in migrating them to Fargate?
It seems to be that it is more expensive than Lambda but if the Lambda limitations are not a problem then Lambda should always be the better choice right?
Will a Docker Cluster running in Fargate mode behind a Load Balancer shutdown and not charge me while it's not being used?
This will depend on how you configure your AutoScaling Group. If you allow it to scale down to 0 then yes.
What about cold starts? Do I need to care about this in Fargate like in Lambda?
Some good research has been done on this here: https://blog.cribl.io/2018/05/29/analyzing-aws-fargate/
But the takeaway is for smaller instances you shouldnt notice any more and ~40seconds time to get to a running state. For bigger ones this will take longer.
Is it less horizontal than Lambda? A lambda hooked to API Gateway will spawn a new function for every concurrent request, will Fargate do this too? Or will the load balancer decide it?
ECS will not create a new instance for every concurrent request,any scaling will be done off the AutoScaling group. The load balancer doesnt have any control over scaling, it will exclusively just balance load. However the metrics which it can give can be used to help determine if scaling is needed
I've been running Flask/Django applications in Lambda for some time (Using Serverless/Zappa), are there any benefits in migrating them to Fargate?
I havent used Flask or Django, but the main reason people tend to migrate over to serverless is to remove the need to maintain the scaling of servers, this inc managing instance types, cluster scheduling, optimizing cluster utilization
#abdullahkhawer i agree to his view on sticking to lambdas. Unless you require something to always be running and always being used 99% of the time lambdas will be cheaper than running a VM.
For a pricing example
1 t2.medium on demand EC2 instance = ~$36/month
2 Million invocations of a 256MB 3 second running lambda = $0.42/month
With AWS Fargate, you pay only for the amount of vCPU and memory resources that your containerized application requests from the time your container images are pulled until the AWS ECS Task (running in Fargate mode) terminates. A minimum charge of 1 minute applies. So, you pay until your Task (a group of containers) is running, more like AWS EC2 but on a per-minute basis and unlike AWS Lambda where you pay per request/invocation.
AWS Fargate doesn't spawn containers on every request as in AWS Lambda. AWS Fargate works by simply running containers on a fleet of AWS EC2 instances internally managed by AWS.
AWS Fargate now supports the ability to run tasks on a scheduled basis and in response to AWS CloudWatch Events. This makes it easier to launch and stop container services that you need to run only at a certain time to save money.
Keeping in mind your use case, if your applications are not making any problems in the production environment due to any AWS Lambda limitations then AWS Lambda is the better choice. If the AWS Lambda is being invoked too much (e.g., more than 1K concurrent invocations at every point of time) in the production environment, then go for AWS EKS or AWS Fargate as AWS Lambda might cost you more.
We are checking the feasibility of migrating one of our application to Amazon Web Services (AWS) . We decide to use AWS API Gateway to expose the services and AWS Lambda (java) for back end data processing. The lambda function has to fetch a large amount of data from our database.
Currently using Cassandra for data storage, which has been set up with in an EC2 instance and it has no public ip.
Can anyone suggest a way to access Cassandra(EC2) from AWS Lambda using the private Ip ( 10.0.x.x)?
Is it a right choice to use AWS Lambda for large scale applications?
Since your Cassandra instance is using private IP, you will need to configure your AWS lambda Network to use a VPC. It could be the VPC you are running Cassandra in, or a VPC you create for the purpose of your lambdas, and that you VPC-peer to your cassandra VPC. A few things to note from the documentation :
when your lambda runs in a VPC, it doesn't have internet access by default, you will need to configure a NAT for that.
There is an additional latency due to the configuration of the ENI (you only pay that penalty on cold start)
You need to make sure your lambda has the right permission to manage the ENI, you should use this role: AWSLambdaVPCAccessExecutionRole
Your plan to use API / AWS lambda has at least 3 potential issues which you need to consider carefully:
Cost. API gateway per request cost is higher than AWS lambda per request cost. Make sure you are familiar with the cost.
cold start. When AWS start an underlying container to execute your lambda, you pay a cold start latency (which get worse when using VPC due to the management of the ENI). If you execute your lambda concurrently, there will be multiple underlying containers. Each of them will have this cold start the first time. AWS tends to keep the underlying containers ready for a warm start, for a few minutes (users report 5 to 40 minutes). You might try to keep your container warm by pinging your aws lambda, obviously if you have multiple container in parallel, it is getting tricky.
Cassandra session. You will probably want to avoid creating and destroying your Cassandra session each time you invoke your lambda (costly). I haven't tried yet, but there are reports of keeping the session alive in a warm container, you might want to check this SO answer.
Having say all that, currently the biggest limitation in using AWS lambda is concurrent execution and cold start latency. For data processing, that's usually fine. For user-facing usage, the percentage of slow cold start might affect your user experience.
I am using an Amazon EC2 instance with instance type m3.medium and an Amazon RDS database instance.
In my working hours the website goes down because CPU utilization reaches 100%, and at night (not working hours) the CPU utilization is 60%.
So please give me right solution for this site down issue. I am not sure why I am experiencing this problem.
Once I had set a cron job for every minutes, but I was removed it because of slow down issue, but still I have site down issue.
When i try to use "top" command, i had shows below images for cpu usage, in which httpd command consume more cpu usage, so any suggestion for settings to reduce cpu usage with httpd command
Without website use by any user below two images:
http://screencast.com/t/1jV98WqhCLvV
http://screencast.com/t/PbXF5EYI
After website access simultaneously 5 users
http://screencast.com/t/QZgZsiNgdCUl
If you are CPU Utilization is reaching 100% you have two options.
Increase your EC2 Instance Type to large.
Use AutoScaling to launch one more EC2 Instance of same Instance Type.
Looks like you need some scheduled actions as you donot need 100% CPU Utilization during non-working hours.
The best possible option is to use AWS AutoScaling with Scheduled actions.
http://docs.aws.amazon.com/autoscaling/latest/userguide/schedule_time.html
AWS AutoScaling can launch new EC2 instances based on your CPU Utilization (or other metrics like Network Load, Disk read/write etc). This way you can always keep your site alive.
Using the AutoScaling scheduled actions you can specify metrics such that you stop your autoscaled instances during non-working hours and autoscale instances during working hours according to CPU Utilization(or other metrics).
You can even stop your severs if you donot need them at some point of time.
If you are not familiar with AWS AutoScaling you can follow the Documentation which is very precise and easy.
http://docs.aws.amazon.com/autoscaling/latest/userguide/GettingStartedTutorial.html
If the cpu utilization reach 100% bacause of the number of visitors your site have, you must consider to change the instance type, Auto Scaling or AWS CloudFront in order to cache as many http requests as posible (static and dynamic content).
If visitors are not the problem and there are other scheduled tasks on the EC2 isntance, I strongly recomend to decouple these workload via AWS SQS & AWS Elasticbeanstalk - Worker type