So I have a set of long running tasks that have to be run on Compute Engine and have to scale. Each task takes approximately 3 hours. So in order to handle this I thought about using:
https://cloud.google.com/solutions/using-cloud-pub-sub-long-running-tasks
Architecture. And while it works fine there is one huge problem. On scale down, I'd really like to avoid it scaling down a task that is currently running! I'd potentially lose 3 hours worth of processing.
Is there a way to ensure that autoscale down doesn't scale down a VM with a long running / uptime?
EDIT: A few people have asked to elaborate my task. So it's similar to what's described in the link above which is many long running tasks that need to be run on a GPU. There is a chunk of data that needs to be processed. It takes 4 hours (video encoding) then once completed it outputs to a bucket. Well it can take anywhere from 1 to 6 hours depending on the length of the video. Just like the architecture above it would be nice to have the cluster scale up based on queue size. But when scaling down I'd like to ensure that it's not scaling down currently running tasks which is what is currently happening. It being GPU bound doesn't allow me to use the CPU metric.
I think you should probably add more details about what kind of task you are running. However, as #Jhon Hanley suggestion, it worth to take a look of Cloud Tasks and see as well the following documentation that talks about the scaling risks.
Related
I shortly describe my use case: Assuming I wanted to spin up a cluster with 10 workers on AWS:
In the past I always used initial_workers: 10, min_workers: 0, max_workers: 10 options (cluster.yaml) to initially spin up the cluster to full capacity and then exploit the automated downscaling of the cluster based on idle time. So at the end of job, where almost all trials have been terminated and the full capacity of the cluster is not needed anymore, nodes are automatically removed.
Now with the initial_workers option gone #12444, it is not really clear to me how to accomplish the same downscaling behavior.
I experimented with the programatic way to request resources (ray.autoscaler.sdk.request_resources) before and after tune.run but this seems to be the same as settig the min_workers field and I can only downscale the cluster after all jobs have been terminated.
I also tried to set the upscaling_speed but for some reason upscaling is very slowly and seems to add only one node at a time (I am requesting GPUs). There is also always only one pending task which I also do not really understand yet (Unfortunately I also do not really have the time to investigate this fully :()
Currently I am using the programatic way described above which works fine but then I have a lot of idle resources at the end of the job that run for hours before I can downscale.
Would be great if someone could point me to the right direction to solve this.
Thx
With ray version 1.30 the autoscaler issues I observed seem to be resolved and now the cluster scales with the pending trials as expected (using AWS ec2 g4dn instances). So no need for intial_workers option anymore.
As far as I can tell by default, on Google Cloud and presumably elsewhere, each vCPU = 1 hyperthread. (3rd paragraph in the intro) Which, from my perspective, would suggest that unless one changes this setting to 2 or 4 vCPUs, concurrency in the code running on the docker image achieves nothing. Is there some multi-threaded knowledge im missing that means that concurrency on a single hyperthread accomplishes something? scaling up the vCPU number isnt very attractive as the minimum memory setting is already forced to 2GB for 4 vCPUs
This question is framed based on the Google Cloud tech stack, but is meant to umbrella all providers.
Do Serverless solutions ever really benefit from concurrency?
EDIT:
The accepted answer is a great first look, but I realized my above assumptions ignored context switching idle time. For example:
If we wish to write a backend which talks to a database, a lot of our compute time might be spent idling for the database request results. context switching to the next request in this case would allow us to fill CPU load more efficiently.
Therefore, depending on the use case, even on a single threaded vCPU our Serverless app can benefit from concurrency
I wrote this. From my experience, YES, you can handle several thread in parallel and your performance increase with the number of CPU. however, you need to have a process that support multithread.
In case of Cloud Run, each request can be processed in a thread, parallelization is easy.
I have a script that takes two hours to run and I want to run it every 15 minutes as a cronjob on a cloud vm.
I noticed that my cpu is often at 100% usage. Should I resize memory and/or number_of_cores ?
Each time you execute your cron job, a new process will be created.
So if your job takes 120 min (2h) to complete, and you will be starting new jobs every 15 minutes, then you will be having 8 jobs running at the same time (120/15).
Thus, if the jobs are resource intensive, you will observe issues, such as 100% cpu usage.
So the question whether to up-scale or not is really dependent on the nature of these jobs. What do they do, how much cpu and memory do they take? Based on your description you are already running at 100% CPU often, thus an upgrade would be warranted in my view.
It would depend on your cron, but outside of resourcing for your server/application the following issues should be considered:
Is there overlap in data? i.e. do you retrieve a pool of data that will be processed multiple times.
Will duplicate critical actions happen? i.e. will a customer receive an email multiple times, will a payment be processed multiple times.
Is there a chance of a race condition that cause the script to exit early.
Will there be any collisions in the processing i.e. duplicate bookings made etc.
You will need to increase the CPU and Memory specification of your VM instance (in GCP) due to the high CPU load of your instance. The document [1] on upgrading the machine type of your VM instance, to do this need to shutdown your VM instance and change it´s machine type.
To know about different machine types in GCP, please have the link [2].
On the other hand, you can autoscale based on the average CPU utilization if you use managed instance group (MIG) [3]. Using this policy tells the autoscaler to collect the CPU utilization of the instances in the group and determine whether it needs to scale. You set the target CPU utilization the autoscaler should maintain and the autoscaler works to maintain that level.
[1] https://cloud.google.com/compute/docs/instances/changing-machine-type-of-stopped-instance
[2] https://cloud.google.com/compute/docs/machine-types
[3] https://cloud.google.com/compute/docs/autoscaler/scaling-cpu-load-balancing#scaling_based_on_cpu_utilization
I've got an application that is built in node.js, and is primarily used to post photos to (up to 25mb). The app resizes to thumbnail size, and moves both the thumbnail and full size image to S3. When the uploads begin happening, they usually come in bursts of 10-15 pictures, rinse, wash, repeat in 5 minute durations. I'm seeing a lot of scaling, and the trigger is the default 6MB NetworkOut trigger. My question is, is the moving the photos to S3 considered NetworkOut? Or should I consider a different scaling trigger, so far the app hasn't stuttered so I'm hesitant to not fix what ain't broken, but I am seeing quite a big of scaling so I thought I would investigate. Thanks for any help!
The short answer - scale when ever a resource is constrained. eg, If your instances can keep up with network IO or cpu is above 80% then scale. Yes, sending any data from your ec2 instance is network out traffic. You got to get that data from point A to B somehow :)
As you go up in size on ec2 instances you get more memory and cpu along with more network IO. If you don't see issue with transfers you may want to switch the auto scale over to watch cpu or memory. In an app I'm working on users can start jobs which require a bit of cpu. So I have my auto-scale to scale if my cpu is over 80%. But you might have a process that consumes a lot of memory and not much cpu...
On a side note - you may want to think about having your uploads go directly to your s3 bucket and use a lambda to trigger the resize routine. This has several advantages over your current design. http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html
I suggest getting familiar with the instance metrics. You can then recognize your app-specific bottlenecks on the current instance type and count.
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/health-enhanced-metrics.html
I'd like to be able to create a "job" that will execute in an arbitrary time from now... Let's say 1 year from now. I'm trying to come up with a stable, distributed system that doesn't rely on me maintaining a server and scheduling code. (Obviously, I'll have to maintain the servers to execute the job).
I realize I can poll simpleDB every few seconds and check to see if there's anything that needs to be executed, but this seems very inefficient. Ideally I could create an Amazon SNS topic that would fire off at the appropriate time, but I don't think it's possible.
Alternatively, I could create a message in the Amazon SQS that would not be visible for 1 year. After 1 year, it becomes visible and my polling code picks up on it and executes it.
It would seem this is a topic like Singletons or Inversion Control that Phd's have discussed and come up with best practices for. I can't find the articles if there any.
Any ideas?
Cheers!
The easiest way for most people to do this would be to run at least an EC2 server with a cron job on the EC2 server to trigger an action. However, the cost of running an EC2 server 24 hours a day for a year just to trigger an action would be around $170 at the cheapest (8G t1.micro with Heavy Utilization Reserved Instance). Plus, you have to monitor that server and recover from failures.
I have sketched out a different approach to running jobs on a schedule that uses AWS resources completely. It's a bit more work, but does not have the expense or maintenance issues with running an EC2 instance.
You can set up an Auto Scaling schedule (cron format) to start an instance at some point in the future, or on a recurring schedule (e.g., nightly). When you set this up, you specify the job to be run in a user-data script for the launch configuration.
I've written out sample commands in the following article, along with special settings you need to take care of for this to work with Auto Scaling:
Running EC2 Instances on a Recurring Schedule with Auto Scaling
http://alestic.com/2011/11/ec2-schedule-instance
With this approach, you only pay for the EC2 instance hours when the job is actually running and the server can shut itself down afterwards.
This wouldn't be a reasonable way to schedule tens of thousands of emails with an individual timer for each, but it can make a lot of sense for large, infrequent jobs (a few times a day to once per year).
I think it really depends on what kind of job you want to execute in 1 year and if that value (1 year) is actually hypothetical. There are many ways to schedule a task, windows and linux both offer a service to schedule tasks. Windows being Task Scheduler, linux being crontab. In addition to those operating system specific solutions you can use Maintenance tasks on MSSQL server and I'm sure many of the larger db's have similar features.
Without knowing more about what you plan on doing its kind of hard to suggest any more alternatives since I think many of the other solutions would be specific to the technologies and platforms you plan on using. If you want to provide some more insight on what you're going to be doing with these tasks then I'd be more than happy to expand my answer to be more helpful.