scaling ec2 on demand , lambda style - amazon-web-services

I am new to AWS and I am completely lost. I am losing hope.
What I want to do seems quite easy, but I cannot find any clear streamlined documentation.
I would like to have my current web server call some API with an http request, and get some value returned from the API. The values to return are computed by some Python script.
Now lambda would do the job, but I need to run it on an EC2 because of computational demand (16 cores needed).
On top of that, this would need to scale: process each api request on a different machine with its own 16 cores. If I have 100 calls, I should get 100 EC2 running in parallel.
When there is no API call, I should have 0 EC2 running and $0.00 charged.
Normally, for a given session, several calls will come in a period of about 10 mins, and these calls should be answered quickly. What I am thinking is maybe having a special first call that starts an EC2 for 20 mins. So that first call can take 20 seconds, but the subsequent ones should be very quickly answered.
Summary of what I need AWS to do:
- Receive "start" API Call (Amazon API Gateway ?)
- Start an EC2 specifically for that session for 20 mins
- Return EC2 address (?)
- Call script on EC2 through http requests
- Stop EC2 after 20 mins or after the last call
What would be the services to set up and configure for that ?
I can create a docker image with the script if needed. (is there actually another way ?)
Is it possible ? How ?
Thanks a lot
Edit: modified a bit the question to reflect time constraints.

It seems to me that you should look into AWS batch. This allows you to run a batch of jobs and you can use machines with more vCPUs and benefit from per second billing.
https://aws.amazon.com/batch/

Related

Best way to run a script that takes a long time (48 hours) to run on AWS?

I am an academic researcher. I need to get data from a social media platform for a large number of users. Due to API restrictions, it takes a very long time (~48 hours) to get this data for all users. As of now, I write this data to a CSV file as I go with one line per user.
My lab has access and many credits for AWS. Assuming this script just needs to be run once a week, what is the best way to do it in AWS? And I assume I should use a database instead of a CSV file -- what options are there for setting that up?
There are several ways you can achieve this:
You could set up a cron job on an instance you keep deployed throughout. This could be a very small, inexpensive instance like t2.small or t3.medium. The job runs once a week.
If you don't want to keep the instance deployed, write a small script which creates an EC2 instance every week and puts your script on the instance where it runs. And the script itself and send AWS the request to terminate the instance on a successful run (not recommended)
If you can break your task into steps, AWS Lambda is the way to go. Look at Step Functions here.
I would recommend looking at other solutions, e.g. AWS Fargate. It is serverless and allows running arbitrary long tasks in Docker containers.
Furthermore, a handy workaround (usually) to the API restriction is to ping from multiple clients or IPs and merge the results later.

Django + Gunicorn on Google Cloud Run, how are different parameters of Gunicorn and GCR related?

For deploying a Django web app to GCR, I would like to understand the relationships between various autoscaling related parameters of Gunicorn and GCR.
Gunicorn has flags like:
workers
threads
timeout
Google Cloud Run has these configuration options:
CPU limit
Min instances
Max instances
Concurrency
My understanding so far:
Number of workers set in Gunicorn should match the CPU limit of GCR.
We set timeout to 0 in Gunicorn to allow GCP autoscale the GCR instance.
GCP will always keep some instances alive, this number is Min instances.
When more traffic comes, GCP will autoscale up to a certain number, this number is Max instances.
I want to know the role of threads (Gunicorn) and concurrency (GCR) in autoscaling. More specifically:
How does the number of thread in Gunicorn affect autoscaling?
I think This should not affect autoscaling at all. They are useful for background tasks such as file operations, making async calls etc.
How does the Concurrency setting of GCR affect autoscaling?
If number or workers is set to 1, then a particular instance should be able to handle only one request at a time. So setting this value to anything more than 1 does not help. In fact, We should set CPU limit, concurrency, workers these three to match each other. Please let me know if this is correct.
Edit 1:
Adding some details in response to John Hanley's commment.
We expect to have up to 100 req/s. This is based on what we've seen in GCP console. If our business grows we'll get more traffic. So I would like to understand how the final decision changes if we're to expect say 200 or 500 req/s.
We expect requests to arrive in bursts. Users are groups of people who perform some activities on our web app during a given time window. There can be only one such event on a given day, but the event will see 1000 or more users using our services for a 30 minute window. On busy days, we can have multiple events, some of them may overlap. The service will be idle outside of the event times.
How many simultaneous requests can a cloud run instance handle? I am trying to understand this one myself. Without cloud run, I could've deployed this with x number of workers and then the answer would've been x. But with cloud run, I don't know if the number of gunicorn workers have the same meaning.
Edit 2: more details.
The application is stateless.
The web app reads and writes to DB.

AWS Container (ECS) vs AMI & Spot instances

The core of my question is whether or not there are downsides to using an Amazon Machine Image + Micro Spot instances to run a task, vs using the Elastic Container Service (ECS).
Here's my situation: I have the need to run a task on demand that is triggered by a remote web hook.
There is the possibility this task can get triggered 10 times in a row, or go weeks w/o ever executing, so I definitely want a service that only runs (and bills) on demand.
My plan is to point the webhook to a Lambda function, but then the question is what to have the Lambda function do.
Tho it doesn't take very long, this task requires several different runtimes (Powershell Core, Python, PHP, Git) to get its job done, so Lambda isn't really a possibility as I'd hit the deployment package size limit. But I can use Lambda to kick off the job.
What I started doing was creating an AMI that has all the necessary runtimes and code, then using a Spot request to launch an instance, have it execute the operation via a startup script passed in via userdata, then shut itself down when it's done. I'd have to put in some rate control logic to prevent two from running at once, but that's a solvable problem.
I hesitated half way through developing this solution when I realized I could probably do this with a docker container on ECS using Fargate.
I just don't know if there is any benefit of putting in the additional development time of switching to a docker container, when I am not a docker pro and already have the AMI configured. Plus ECS/Fargate is actually more expensive than just running a micro instance.
Are these any concerns about spinning up short-lived (<5min) spot requests (t3a-micro) where there could be a dozen fired off in a single day? Are there rate limits about this? Will I get an angry email from AWS telling me to knock it off? Are there other reasons ECS is the only right answer? Something else entirely?
Your solution using spot instance and AMI is a valid one, though I've experienced slow times to get a spot instance in the past. You also incur the AMI startup time.
As mentioned in the comments, you will incur a minimum of 1 hour charge for the instance, so you should leave your instance up for the hour before terminating, in case more requests can come in the same hour.
IMHO you should build it all with lambda. By splitting the workload for each runtime into its own lambda you can make it work.
AWS supports python, powershell runtimes, and you can create a custom PHP one. Chain them together with your glue of choice, SNS, SQS, direct invocation, or Step Functions, and you have the most cost effective solution. You also get the benefits of better and independent maintenance for each function/runtime.
Put the initial lambda behind API gateway and you will get rate limiting capabiltiy too.

AWS Lambda Provisioning Business Logic

In AWS Lambda, there is no need of provisioning done by us. But I was wondering how AWS Lambda might be provisioning machines to run for the requests. Is it creating a EC2 server for each request and execute the request and then kill the server? Or it keeps some EC2 servers always on to serve the request by executing the lambda function? If its doing the former point, then I am wondering it would also affect performance of AWS Lambda to serve the request. Can anyone guide me on this?
Lambdas run inside of a docker-like container, on EC2 servers (using Firecracker) that are highly, highly optimized. AWS has thousands of servers running full time to serve all of the Lambda functions that are running.
A cold start Lambda (one that's never been run before) starts up in a few seconds, depending on how big it is. An EC2 server takes 30+ seconds to startup. If it had to startup an EC2 server, you'd never be able to use a Lambda through API gateway (because API Gateway has a 30 second timeout). But obviously you can.
If you want your Lambdas to startup super duper fast (100ms), use Provisioned Concurrency.
AWS Lambda is known to reuse the resources. It will not create an EC2 server for each request so that will not be a performance concern
But you should note that the disk space provided for your function sometimes not cleanup properly. As some users reported
You can read more on the execution life cycle of Lambda here: https://docs.aws.amazon.com/lambda/latest/dg/running-lambda-code.html

Jmeter load test with 30K users with aws

My scenario is mentioned below, please provide the solution.
I need to run 17 HTTP Rest API's for 30K users.
I will create 6 AWS instances (Slaves) for running 30K (6 Instances*5000 Users) users.
Each AWS instance (Slave) needs to handle 5K Users.
I will create 1 AWS instance (Master) for controlling 6 AWS slaves.
1) For Master AWS instance, what instance type and storage I need to use?
2) For Slave AWS instance, what instance type and storage I need to use?
3) The main objective is a Single AWS instance need to handle 5000Users (5k) users, for this what instance type and storage I need to use? This objective needs to solve for low cost (pricing)?
Full ELB DNS Name:
The answer is I don't know, this is something you need to find out how many users you will be able to simulate on this or that AWS instance as it depends on the nature of your test, what it is doing, response size, number of postprocessors/assertions, etc.
So I would recommend the following approach:
First of all make sure you are following recommendations from the 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure
Start with single AWS server, i.e. t2.large and single virtual user. Gradually increase the load at the same time monitor the AWS health (CPU,RAM, Disk, etc) using either Amazon CloudWatch or JMeter PerfMon Plugin. Once there will be a lack of the monitored metrics (i.e. CPU usage exceeds 90%) stop your test and mention the number of virtual users at this stage (you can use i.e. Active Threads Over Time listener for this)
Depending on the outcome either switch to other instance type (i.e. Compute Optimized if there is a lack of CPU or Memory Optimized if there is a lack of RAM) or go for higher spec instance of the same tier (i.e. t2.xlarge)
Once you get the number of users you can simulate on a single host you should be able to extrapolate it to other hosts.
JMeter master host doesn't need to be as powerful as slave machines, just make sure it has enough memory to handle incoming results.