I have a web application running 24/7 in a AWS micro instance and it works just fine.
Occasionally (10 to 50 times a day) I need process big amounts of data (stored on RDS) in a CPU intensive task. That's overkill for my micro instance.
Starting a EC2 server for this tasks doesn't seem a good idea because these tasks must be executed on demand when a user asks for them, and I need low latency (less than 10 seconds).
Is there any amazon service where I can submit my task and take advantage of higher CPU capacity?
Keep in mind that my task needs to read a big amount of data from RDS.
You can queue up all those task 10 to 50 over a particular cut-off time during a day and launch an instance to process that and terminate the same when you are done with the processing. The scheduling part of that can be done by the Micro instance.
Once the Micro instance starts the High Compute Instance; then rest of that can be carried out by High Compute Instance; once the queue for all those to be processed are empty you can terminate that instance.
This is like 0 Instance to 1 Instance over a schedule.
It depends on the cost that you are willing to pay, the complexity of the processing, the future scale of your service, the ability to pre-calculate your results and so on.
One option is to have a larger instance (or pool of instances in case of scale of your service), ready for processing that you can trigger them. You can lower the cost of this machine by using reserve instances pricing (http://aws.amazon.com/ec2/purchasing-options/reserved-instances/), or even better, using Spot instances (http://aws.amazon.com/ec2/purchasing-options/spot-instances/). With spot you are facing the risk that some of the times, you won't be able to have the instances up and running, and you need to back it with on-demand instance(s).
This is probably a more expensive solution, but as you have more and more jobs like that running, the "cost per job" is decreasing dramatically.
Another option is to off-load the processing to a different service. If you can run your calculation with a query syntax of external services like DynamoDB or Redis, for example, you can keep on using your micro instance to trigger the query. For example, Redis with ElastiCache can have complex data manipulations like sorted set intersection etc. You do need to make sure that you have your data also in the other data store, and write the query.
Another option is to have these calculations pre-calculated. It really depends on the type of jobs you need to run. If you can prepare these output in advanced and only update their results with the latest data from the time it was calculated to the time it is requested, it might also be easier on your machines to prepare for it without these unpredictable CPU peaks.
We used AWS Batch for this, basically because it heads above other options out there. Besides AWS Batch, we considered additional servers, AWS Lambda, using a separate server for each task and so on.
But we finally chose AWS Batch because of the number of reasons:
In AWS Batch, all processes are completely isolated. This means, that the tasks will not impact each other & break the workflow.
You can set the min. and max. RAM you want to use up
AWS Batch supports containers which makes it easier to integrate if you use containers already
You can also create queues for different tasks to gain more control over your resources and expenses.
AWS Batch is very price-effective, meaning you pay for what you use.
It's also pretty easy to set up. Here's a quick code snippet on how to launch rockets for each individual user. For more info go here: https://fulcrum.rocks/blog/cpu-intensive-tasks
`const comand = "npm run rocket"
const newJob = await new Promise((resolve, reject) => {
batch.submitJob(
{
jobName: "your_important_job",
jobDefinition: "killer_process",
jobQueue: "night_users",
timeout: {
attemptDurationSeconds: 600
},
retryStrategy: {
attempts: 1
},
containerOverrides: {
vcpus: 2,
memory: 2048,
command: [comand]
}
},
(err, data) => {
if (err) {
console.error(err.message);
reject(err);
}
resolve(data);
}
);
});
`
Related
I have a lightweight server that runs cron jobs at a given time. As I understand Google Cloud Run only processes incoming requests and then becomes idle after a short time if there is no other request to process. Hence, it is not advisable to deploy that cron service to Cloud Run.
Out of curiosity, I deployed the following server that starts up and then prints a log every hour.
const express = require('express');
const app = express();
setInterval(() => console.log('ping!'), 1000 * 60 * 60);
app.listen(process.env.PORT, () => {
console.log('server listening');
})
I deployed it with a minimum and maximum instance count of 1. It has not received any request and when I checked back the next day, it was precisely printing the log every hour. Was this coincidence or can I use this setup for production?
If you set the min instance to 1 and the CPU always on to true, yes, you can perform background compute intensive processing without CPU Throttling (in your hello world case, you can use the few CPU % allowed to the idle instance without the CPU always on option).
BUT, and the but is very important, you will pay for 1 Cloud Run instance always up. In addition, is you receive request, you can scale up and have more than 1 instance up and running. Does it make sense to have several instances with the same CRON scheduling? (except if you set the max instance to 1).
At the end, the best pattern is to host the scheduling outside, on Cloud Scheduler, and then to query your instance to perform the task. It's serverless, you can handle several task in parallel, it's scalable.
From my understanding no.
From the documentation here, Google indicates that the CPU of idle instances is throttled to nearly zero. I suppose this means that very simple operation can still be performed (e.g. logging a string every hour). I guess you could test it more extensively by doing some more complex operations and evaluate the processing time of these operations.
Either way, I would not count on it in a production environment. There is no guarantee that the CPU "throttled to nearly zero" will be able to complete the operations you need in a reasonable time delay.
I'm new to using Cloud Run and the idea of scaling down to zero is very appealing to me, but I have question about a few scenarios about its usage:
If I have a Cloud Run instance querying an external API endpoint, would the instance winds down while waiting for the response if no additional requests come in (i.e. I set the query time out to 60min, and no requests are received in that 60 min)?
If the Cloud Run instance is running computation that lasts for longer than 24 hour, or perhaps even days, without receiving requests, could it be trusted to carry out the computation until it's done without being randomly shutdown or restarted for servicing or other purposes (I ask this because Cloud Run is primarily intended as for stateless applications, but I have infrequent computation jobs that may take a long time that may be considered "stateful" in short-term context).
Does CPU utilization impact auto-scaling (e.g. if I have a computationally intensive job not configured for distributed computing running on one instance, would this trigger Cloud Run to spawn additional instances?)
If you deep dive in the documentation, I'm quite sure that you can find your answers. So, here a summary
(Interesting read).The Cloud Run instances are shut down only when they aren't in used (usually 15 minutes (can change at any time, no commitment, only observations) without request handling). In your case, if you are in a request handling context, no worries, your instance won't be killed, it is in use! Note: don't send an HTTP response before the end of the processing. Background process/jobs aren't considered in a request context. The context is considered from the receipt of the request to the response (OK or KO) back. Partial response/streaming is accepted.
Cloud run instance can, potentially, live more than 24h, but nothing is guaranteed. And, because the request handling is limited to 1h, you can't run process longer that that. I recommend you to have a look to GKE autopilot or to run a container on a Compute Engine and stop the VM at the end of the processing to save resources and money (or a hack to run your container on AI PLatform custom training; even if you train nothing, you run a custom container on a serverless platform!). If you can, I recommend you to design your workload to be split in several small and parallelizable jobs
Yes, it's described here. But keep in mind that only 1 request is processed on one instance. If you send a request that trigger an intensive compute job, the request will be only processed on the same instance (that can have several CPUs if your workload is compliant with that). And if another request comes in during the intensive processing, another Cloud Run instance will be spawn to handle it; only the new request.
I have written a function which queries data and then I process that data and call two external API's. My function works fine if the number of records are 2000, but more than that causes timeout error after 900 seconds. I have allocated 4GB for this fucntion.
What else can be done in this case?
If you have a monolithic application that you need to run serverless and requires an execution time greater than 15 minutes, you could consider using ECS instead:
Create a Docker image with your function
Upload the Docker image to ECR
Create an ECS Task Definition to run the container image
Run an ECS task
Lambda is great and super-easy to use, but you have a time limit of 15 that you can not increase in any way. You also have a limit of 10GB of memory (CPU is scaled accordingly), so if you are thinking of increasing performances, take this in mind. I had the same issue and I am moving to Fargate, where you can define a task which run a docker container uploaded to ECR. You have no timeout, you can have multi-CPU environments and you can invoke the task with a lambda. It's a similar approach to what #Paolo described, look here for differences between the two services.
Looks like the maximum time limit for lambda is 15 min, from this AWS Lambda Time limit
Try to redesign your solution to be more efficient, you can make the two API calls concurrent and use batchgets or parallel scans here is a good guide Best Practices for Querying and Scanning Data
You could use the initial lambda execution to trigger other asynchronous lambda calls. You would loop through your 2000 records and for each one trigger another lambda, passing in details about the record to be processed. Each asynchronously-triggered lambda would process just the single record it got sent. That way you essentially process records in parallel instead of in a serial fashion.
These resources explain things a bit more:
https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html
https://docs.aws.amazon.com/lambda/latest/dg/API_Invoke.html
https://docs.aws.amazon.com/lambda/latest/dg/invocation-scaling.html
With async invocation, your initial lambda does little more than loop through records and trigger async lambda calls for each record. You will need to think about concurrency to ensure you don't get throttled by having too many lambda executing concurrently.
For a website I’m developing on AWS, a user can submit a large job (ex. select a large number of items and ask to update them all in some way). We don’t want to limit the size of the job these users are submitting so this job can can in theory run for a very long period of time and require a large amount of memory (this rules out AWS Lambda as a compute engine option). We want jobs to be as independent from one another as possible so we chose to run each job in its own container in Amazon ECS. What we currently do when a user submits a job request is send a message with a job id/reference to an SQS queue, have AWS lambda poll that queue and upon receiving a message, lambda starts an ECS task (SQS -> Lambda -> ECS). This has the problem that a new ECS task is started with each request, so a new container must be booted up which can take minutes. This latency is directly visible to the user and is particularly unacceptable if the users job is not even particularly large yet they still wait for minutes for the container to boot up. Additionally, the cost of constantly running container or two would not be too problematic.
I've been toying with some ideas for updating this flow.
Attempt 1:
In this updated flow we'd create an ECS task that looks like the following:
message = null;
while (message == null) {
message = pollForMessages();
}
processMessage(message);
// task finishes, and container can be brought down
We remove the lambda from the flow and just have SQS -> ECS rather than SQS -> Lambda -> ECS. In this case, there would be no cold start assuming a container is up spinning for messages. We could set the minimum number of tasks we want running to be a number > 0 to ensure all messages are processed at some point. However this suffers from the problem that it would not auto-scale as the number of messages in the queue increases. So something needs to spawn more containers when traffic increases.
Attempt 2:
In this updated flow we'd create an ECS task that looks like the following:
message = null;
while (message == null) {
message = pollForMessages();
}
If (number of running tasks < number of messages in queue) {
spawnMoreContainers();
}
processMessage(message);
// task finishes, and container can be brought down
This comes with the issue that we could end up over provisioning containers if multiple containers see that there are more messages in the queue than tasks running. Since these tasks run forever until a message is processed this could result in a large unnecessary cost. It could also under provision containers - if the task sees that number of running tasks >= number of messages, but these running tasks are already busy processing messages, these tasks will not end up taking one of these messages out of the queue and we may end up with messages that have to wait a very long time to be processed.
Attempt 3:
message = null;
while (message == null) {
message = pollForMessages();
If (# of containers > min provisioned && this particular container has been running longer than some timeout) {
// finish this task so this container can be brought down
return;
}
}
If (number of running tasks < number of messages in queue) {
spawnMoreContainers();
}
processMessage(message);
// task finishes, and container can be brought down
While this may save us some cost compared to Attempt 2 so over provisioning wouldn’t be so much of an issue, there is still the possibility that we could under provision containers, in which case certain job requests would need to wait for potentially long periods of time before being processed.
Attempt 4:
We can introduce locking (ex. https://aws.amazon.com/blogs/database/building-distributed-locks-with-the-dynamodb-lock-client/) to mitigate some of the race conditions, however we'll always have the issue that a task running does not necessarily mean a task that is available to pick up messages and Fargate gives us no way of distinguishing between these, which makes it difficult to determine how many containers to provision (ex. we see there are 5 running containers and 5 messages, but we don't know whether to provision more containers or not because we don't know if those containers are already processing a message or if they're waiting). Alternatively we could introduce some mechanism, either an external orchestrator or some logic within the containers and some data store, to manage the state of these containers.
Essentially to deal with each of these problems, the architecture becomes more and more complex and implementation would be difficult and error prone.
It also seems to me like these solutions are reinventing the wheel, and I feel there must be some service out there that has solved this problem already, but I can’t seem to find it.
The suggestions I’ve seen to deal with this are:
Maybe AWS batch is more suited for this use case - Indeed, AWS batch might be the more recommended approach for a workload like this but, we don’t remove any of the cold start problem by switching. AWS batch would still create a new container with each job.
Run the ECS tasks on EC2 rather than Fargate, then cache the container image on the host - With this, we’d be managing our own infrastructure and ideally we’d like this to be serverless.
Have an alarm on the number of messages in the queue and have this alarm trigger a lambda that then boots up more containers - alarms on the /AWS log group have a minimum period of 1 minute. This means the alarm would not be triggered until a minute after we’d received more requests than our provisioned containers can handle. Additionally we'd have to set up many alarms to scale at different numbers of messages.
I’m wondering if anyone is aware of potential services/frameworks that could make doing this more feasible? Or if anyone has suggestions on alternative architectures?
If you don't mind a bit slower response time to the bursts, you may create an autoscaling group (I assume there is something similar for ECS). This group can be governed by a custom metric, e. g. queue length divided by the number of workers. A detailed guide is here: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html
In any case, I'd decouple the scaling decision from the worker code, because there is a varying number of workers that you would need to synchronize. It's much easier to have one overseer that controls how many workers there should be. Because the overseer is not on the critical path to task processing, you don't need to care that much about its uptime. It's OK if it takes a few minutes before it recovers after a failure - the workers are still there, processing at least at some capacity.
I am trying to set up a scalable background image processing using beanstalk.
My setup is the following:
Application server (running on Elastic Beanstalk) receives a file, puts it on S3 and sends a request to process it over SQS.
Worker server (also running on Elastic Beanstalk) polls the SQS queue, takes the request, load original image from S3, processes it resulting in 10 different variants and stores them back on S3.
These upload events are happening at a rate of about 1-2 batches per day, 20-40 pics each batch, at unpredictable times.
Problem:
I am currently using one micro-instance for the worker. Generating one variant of the picture can take anywhere from 3 seconds to 25-30 (it seems first ones are done in 3, but then micro instance slows down, I think this is by its 2 second bursty workload design). Anyway, when I upload 30 pictures that means the job takes: 30 pics * 10 variants each * 30 seconds = 2.5 hours to process??!?!
Obviously this is unacceptable, I tried using "small" instance for that, the performance is consistent there, but its about 5 seconds per variant, so still 30*10*5 = 26 minutes per batch. Still not really acceptable.
What is the best way to attack this problem which will get fastest results and will be price efficient at the same time?
Solutions I can think of:
Rely on beanstalk auto-scaling. I've tried that, setting up auto scaling based on CPU utilization. That seems very slow to react and unreliable. I've tried setting measurement time to 1 minute, and breach duration at 1 minute with thresholds of 70% to go up and 30% to go down with 1 increments. It takes the system a while to scale up and then a while to scale down, I can probably fine tune it, but it still feels weird. Ideally I would like to get a faster machine than micro (small, medium?) to use for these spikes of work, but with beanstalk that means I need to run at least one all the time, since most of the time the system is idle that doesn't make any sense price-wise.
Abandon beanstalk for the worker, implement my own monitor of of the SQS queue running on a micro, and let it fire up larger machine(or group of larger machines) when there are enough pending messages in the queue, terminate them the moment we detect queue is idle. That seems like a lot of work, unless there is a solution for this ready out there. In any case, I lose the benefits of beanstalk of deploying the code through git, managing environments etc.
I don't like any of these two solutions
Is there any other nice approach I am missing?
Thanks
CPU utilization on a micro instance is probably not the best metric to use for autoscaling in this case.
Length of the SQS queue would probably be the better metric to use, and the one that makes the most natural sense.
Needless to say, if you can budget for a bigger base-line machine everything would run that much faster.