I want to deploy a Spring Boot application on Amazon AWS Elastic Beanstalk. The application is a back-end service which exposes a REST API. My concern is what happens when AWS has the automatic scaling enabled. I try with an example:
An instance starts with the application
I call the REST API which activates the process (it can take 20 minutes to be completed)
AWS scales out creating a new instance so that new requests can be processed by this new instance
After a while, AWS decides to reduce the number of instances - scale in, because the memory (or CPU, or Network out, or...) usage is lower than the lower limit
Does AWS check if the application is working? I'd like to avoid that it kills one instance while the application is working (loss of data, job interruption, ...).
You should investigate AWS Auto Scaling Lifecycle Hooks which will allow your application to be notified (via CloudWatch, SNS, or SQS) when AWS wants to scale down its instance and take the appropriate action, including delaying the restart.
http://docs.aws.amazon.com/autoscaling/latest/userguide/lifecycle-hooks.html
Related
We have a lambda that will fire requests to another system and I am thinking about using Fargate for this system. What I want to know is whether Fargate will spin up with every request sent to it(like a lambda), or whether it will spin up once and stay alive to handle subsequent requests from the lambda.
Each lambda invocation will only fire one request to the Fargate system.
There will be many requests, but will be dormant during night times. How does Fargate handle spinning up and down between requests?
I've found extended answer in the article Concurrency Compared: AWS Lambda, AWS App Runner, and AWS Fargate:
AWS Fargate is similar to AWS App Runner in that each container can serve many concurrent requests.
This means that if your load balancer receives a large spike of traffic then the requests will be distributed across all the available containers.
However, it is up to you to use the metrics to define your own scaling rules. You can create scaling rules based on metrics that ECS captures, such as application CPU or memory consumption. Or you can create scaling rules based on metrics from the load balancer, such as concurrent requests or request latency. You can even create custom scaling metrics powered by your application itself. This gives you maximum control over the scaling and concurrency of your application.
This really wasn't clear for me in the Docs. And the console configuration is very confusing.
Will a Docker Cluster running in Fargate mode behind a Load Balancer shutdown and not charge me while it's not being used?
What about cold starts? Do I need to care about this in Fargate like in Lambda?
Is it less horizontal than Lambda? A lambda hooked to API Gateway will spawn a new function for every concurrent request, will Fargate do this too? Or will the load balancer decide it?
I've been running Flask/Django applications in Lambda for some time (Using Serverless/Zappa), are there any benefits in migrating them to Fargate?
It seems to be that it is more expensive than Lambda but if the Lambda limitations are not a problem then Lambda should always be the better choice right?
Will a Docker Cluster running in Fargate mode behind a Load Balancer shutdown and not charge me while it's not being used?
This will depend on how you configure your AutoScaling Group. If you allow it to scale down to 0 then yes.
What about cold starts? Do I need to care about this in Fargate like in Lambda?
Some good research has been done on this here: https://blog.cribl.io/2018/05/29/analyzing-aws-fargate/
But the takeaway is for smaller instances you shouldnt notice any more and ~40seconds time to get to a running state. For bigger ones this will take longer.
Is it less horizontal than Lambda? A lambda hooked to API Gateway will spawn a new function for every concurrent request, will Fargate do this too? Or will the load balancer decide it?
ECS will not create a new instance for every concurrent request,any scaling will be done off the AutoScaling group. The load balancer doesnt have any control over scaling, it will exclusively just balance load. However the metrics which it can give can be used to help determine if scaling is needed
I've been running Flask/Django applications in Lambda for some time (Using Serverless/Zappa), are there any benefits in migrating them to Fargate?
I havent used Flask or Django, but the main reason people tend to migrate over to serverless is to remove the need to maintain the scaling of servers, this inc managing instance types, cluster scheduling, optimizing cluster utilization
#abdullahkhawer i agree to his view on sticking to lambdas. Unless you require something to always be running and always being used 99% of the time lambdas will be cheaper than running a VM.
For a pricing example
1 t2.medium on demand EC2 instance = ~$36/month
2 Million invocations of a 256MB 3 second running lambda = $0.42/month
With AWS Fargate, you pay only for the amount of vCPU and memory resources that your containerized application requests from the time your container images are pulled until the AWS ECS Task (running in Fargate mode) terminates. A minimum charge of 1 minute applies. So, you pay until your Task (a group of containers) is running, more like AWS EC2 but on a per-minute basis and unlike AWS Lambda where you pay per request/invocation.
AWS Fargate doesn't spawn containers on every request as in AWS Lambda. AWS Fargate works by simply running containers on a fleet of AWS EC2 instances internally managed by AWS.
AWS Fargate now supports the ability to run tasks on a scheduled basis and in response to AWS CloudWatch Events. This makes it easier to launch and stop container services that you need to run only at a certain time to save money.
Keeping in mind your use case, if your applications are not making any problems in the production environment due to any AWS Lambda limitations then AWS Lambda is the better choice. If the AWS Lambda is being invoked too much (e.g., more than 1K concurrent invocations at every point of time) in the production environment, then go for AWS EKS or AWS Fargate as AWS Lambda might cost you more.
I am trying to create a certain kind of networking infrastructure, and have been looking at Amazon ECS and Kubernetes. However I am not quite sure if these systems do what I am actually seeking, or if I am contorting them to something else. If I could describe my task at hand, could someone please verify if Amazon ECS or Kubernetes actually will aid me in this effort, and this is the right way to think about it?
What I am trying to do is on-demand single-task processing on an AWS instance. What I mean by this is, I have a resource heavy application which I want to run in the cloud and have process a chunk of data submitted by a user. I want to submit a this data to be processed on the application, have an EC2 instance spin up, process the data, upload the results to S3, and then shutdown the EC2 instance.
I have already put together a functioning solution for this using Simple Queue Service, EC2 and Lambda. But I am wondering would ECS or Kubernetes make this simpler? I have been going through the ECS documenation and it seems like it is not very concerned with starting up and shutting down instances. It seems like it wants to have an instance that is constantly running, then docker images are fed to it as task to run. Can Amazon ECS be configured so if there are no task running it automatically shuts down all instances?
Also I am not understanding how exactly I would submit a specific chunk of data to be processed. It seems like "Tasks" as defined in Amazon ECS really correspond to a single Docker container, not so much what kind of data that Docker container will process. Is that correct? So would I still need to feed the data-to-be-processed into the instances via simple queue service, or other? Then use Lambda to poll those queues to see if they should submit tasks to ECS?
This is my naive understanding of this right now, if anyone could help me understand the things I've described better, or point me to better ways of thinking about this it would be appreciated.
This is a complex subject and many details for a good answer depend on the exact requirements of your domain / system. So the following information is based on the very high level description you gave.
A lot of the features of ECS, kubernetes etc. are geared towards allowing a distributed application that acts as a single service and is horizontally scalable, upgradeable and maintanable. This means it helps with unifying service interfacing, load balancing, service reliability, zero-downtime-maintenance, scaling the number of worker nodes up/down based on demand (or other metrics), etc.
The following describes a high level idea for a solution for your use case with kubernetes (which is a bit more versatile than AWS ECS).
So for your use case you could set up a kubernetes cluster that runs a distributed event queue, for example an Apache Pulsar cluster, as well as an application cluster that is being sent queue events for processing. Your application cluster size could scale automatically with the number of unprocessed events in the queue (custom pod autoscaler). The cluster infrastructure would be configured to scale automatically based on the number of scheduled pods (pods reserve capacity on the infrastructure).
You would have to make sure your application can run in a stateless form in a container.
The main benefit I see over your current solution would be cloud provider independence as well as some general benefits from running a containerized system: 1. not having to worry about the exact setup of your EC2-Instances in terms of operating system dependencies of your workload. 2. being able to address the processing application as a single service. 3. Potentially increased reliability, for example in case of errors.
Regarding your exact questions:
Can Amazon ECS be configured so if there are no task running it
automatically shuts down all instances?
The keyword here is autoscaling. Note that there are two levels of scaling: 1. Infrastructure scaling (number of EC2 instances) and application service scaling (number of application containers/tasks deployed). ECS infrastructure scaling works based on EC2 autoscaling groups. For more info see this link . For application service scaling and serverless ECS (Fargate) see this link.
Also I am not understanding how exactly I would submit a specific
chunk of data to be processed. It seems like "Tasks" as defined in
Amazon ECS really correspond to a single Docker container, not so much
what kind of data that Docker container will process. Is that correct?
A "Task Definition" in ECS is describing how one or multiple docker containers can be deployed for a purpose and what its environment / limits should be. A task is a single instance that is run in a "Service" which itself can deploy a single or multiple tasks. Similar concepts are Pod and Service/Deployment in kubernetes.
So would I still need to feed the data-to-be-processed into the
instances via simple queue service, or other? Then use Lambda to poll
those queues to see if they should submit tasks to ECS?
A queue is always helpful in decoupling the service requests from processing and to make sure you don't lose requests. It is not required if your application service cluster can offer a service interface and process incoming requests directly in a reliable fashion. But if your application cluster has to scale up/down frequently that may impact its ability to reliably process.
Now I'm architecting AWS ECS infrastructure.
To auto scale in/out, I used auto scailing.
My system is running on AWS ECS(to deploy docker-compose)
Assume that we have 1 cluster, 1 service with 2 ec2 instance.
I defined scailing policy via CloudWatch if cpu utilization up to 50%.
To autoscailing, we have to apply our policy to ecs service and autoscailing group.
When attach cloudwatch policy to ecs service, it will automatically increase task definition count if cpu utilization up to 50%.
When attach cloudwatch policy to autoscailing group, it will automatically increase ec2 instance count if cpu utilization up to 50%.
After tested it, everything works fine.
But in my service event logs, errors appear like this.
service v1 was unable to place a task because no container instance met all of its requirements. The closest matching container-instance 8bdf994d-9f73-42ec-8299-04b0c5e7fdd3 has insufficient memory available.
I think it occured because of service scailing is start before ec2 instance scailing. (Because service scailing(scale in/out task definition) need to ec2 instance to run it)
But it works fine. Maybe it retry automatically about several times. (I'm not sure)
I wonder that, it is normal configuration on AWS ECS autoscailing?
Or, any missing point in my flow?
Thanks.
ECS can only schedule a service if a container instance is available that matches the containers cpu/memory requirements. Ensure you have this space available to guarantee smooth auto-scaling.
The ec2-asg scaling should happen before service auto-scaling to ensure container instance is available for task scheduler.
What is the best way to deal with traffic spikes on elastic beanstalk? In my experience this does not seem to scale quickly enough i.e. the new instances take a few minutes to get going.
Should I be doing some more calculations to optimise the scaling process?
Is there a formula for working these thing out?
Yeah, it takes 5-10 minutes (depending on the stack you're using; not counting Windows instances) to launch a new Beanstalk instance via CloudFormation, install and configure the environment software, add the instance to the load balanced cluster, deploy your application code, and run any of your .ebextensions. (All of which you can follow along with by watching the event log for the environment, or the log for the CloudFormation template executing in the background.)
There are a couple of ways to handle this:
Use larger instances that won't need to scale-out so quickly.
Tweak your Auto Scaling triggers for your environment (via the AWS Console, web service API, or CLI tools) so that scale-up happens sooner. That way you'll have the extra capacity by the time the existing servers get maxed-out.