Run a single job on GPU else use CPU in AWS

Run a single job on GPU else use CPU in AWS - amazon-web-services

I am relatively new to programming and beginner at AWS cloud. I was wondering if there was a service\way that normally my code runs on CPU instance like t3.medium in AWS but when a certain condition is true (like a file gets available in bucket) then runs my function on a GPU instance like g4dn.xlarge. I want to do this so that I don't incur charges for continuously running my code on a GPU instance. The function that needs to be run is taxing as it takes up-to 7 minutes on google colab. I might be using it 20-30 times a day. Thanks.

Related

Suggested way to write a queryset to Parquet in GCP

I have a query that runs against a SQL Server instance that takes anywhere from 5 minutes to 75 minutes to complete. The response size is anywhere from a few rows to 1GB of data. I have a parquet writer that only has to wait until the query completes and results sent back and it will write the results to Google Cloud Storage.
What would be the best product to accomplish this, and is their one that would have roughly zero startup time? The two that came to mind for me were Cloud Functions and Cloud Run, but I've never used either.

Neither service meets your requirement of 75 minutes.
Cloud Functions times out at 540 seconds.
Cloud Functions Time Limits
Cloud Run times out at 60 minutes.
Cloud Run Request Timeout
For that sort of runtime, I would launch a container in Compute Engine Container-Optimized OS.
Container-Optimized OS
There is the possibility of configuring Cloud Run cpu-throttling so that you can run tasks in the background.
Run more workloads on Cloud Run with new CPU allocation controls
Note that you will be paying for the service on a constant basis as it is no longer running services (containers) on demand.

How to use AWS Batch to increase computation speed

I've made a docker container containing an R script which applies a random forest model to a data sample. This data sample is only 0.1% of the total data.
When running the same script on my local PC it takes 90 seconds.
I was hoping to prove that I could run this faster on AWS batch then I could scale up the process.
However, I've tried running both Fargate and ECS AWS batch jobs and the job seems to take 5 minutes with fairly high vCPUs, memory.
Does anyone know how I can improve the speed of this process?
Am I using the right service to be performing this computation?

What AWS service can I use to efficiently process large amounts of S3 data on a weekly basis?

I have a large amount of images stored in an AWS S3 bucket.
Every week, I run a classification task on all these images. The way I'm doing it currently is by downloading all the images to my local PC, processing them, then making database changes once the process is complete.
I would like to reduce the amount of time spent downloading images to increase the overall speed of the classification task.
EDIT2:
I actually am required to process 20,000 images at a time to increase performance of the classification engine. This means I can't use Lambdas since the maximum option for RAM available is 3GB and I need 16GB to process all 20,000 images
The classification task uses about 16GB of RAM. What AWS service can I use to automate this task? Is there a service that can be put on the same VLAN as the S3 Bucket so that images transfer very quickly?
The entire process takes about 6 hours to do. If I spin up an EC2 with 16GB of RAM it would be very cost ineffective as it would finish after 6 hours then spend the remainder of the week sitting there doing nothing.
Is there a service that can automate this task in a more efficient manner?
EDIT:
Each image is around 20-40KB. The classification is a neural network, so I need to download each image so I can feed it through the network.
Multiple images are processed at the same time (batches of 20,000), but the processing part doesn't actually take that long. The longest part of the whole process is the downloading part. For example, downloading takes about 5.7 hours, processing takes about 0.3 hours in total. Hence why I'm trying to reduce the amount of downloading time.

For your purpose you can still use EC2 instance. And if you have large amount of data to be downloaded from S3, you can attach and EBS volume to the instance.
You need to setup the instance with all the tools and software required for running your job. And when you don't have any process to run, you can shut down the instance. And boot it up when you want to run the process.
EC2 instances are not charged for the time they are in stopped state. You will be charged for the EBS volume and Elasitc IP attached to the Instance.
You also will be charged for the storage of the EC2 image on S3.
But I think these cost will be less than the cost of running EC2 instance all the time.
You can schedule start and stop the instance using AWS instance scheduler.
https://www.youtube.com/watch?v=PitS8RiyDv8
You can also use AutoScaling but that would be more complex solution than using the Instance Scheduler.

I would look into Kinesis streams for this, but it's hard to tell because we don't know exactly what processing you are doing to the images

AWS batch to always launch new ec2 instance for each job

I have setup a batch environment with
Managed Compute environment
Job Queue
Job Definitions
The actual job(docker container) does a lot of video encoding and hence uses up most of the CPU. The process itself takes a few minutes (close to 5 minutes to get all the encoders initialized). Ideally I would want one job per instance so that the encoders are not CPU starved.
My issue is when I launch multiple jobs at the same time or close enough, AWS batch decides launch both of them in the same instance as the first container is still initializing and has not started using CPUs yet.
It seems like a race condition to me where both jobs see the instance created as available.
Is there a way I can launch one instance for each job without looking for instances that are already running? Or any other solution to lock an instance once it is designated for a particular job?
Thanks a lot for your help.

You shouldn't have to worry about separating the jobs onto different instances because the containers the jobs run in are limited in how many vCPUs they can use. For example, if you launch two jobs that each require 4 vCPUs, Batch might spin up an instance that has 8 vCPUs and run both jobs on the same instance. Each job will have access to only 4 of the vCPUs, so performance should be identical to a job running on its own with no other jobs on the instance.
However, if you still want to separate the jobs onto separate instances, you can do so by matching the vCPUs of the job with the instance type in the compute environment. For example, if you have a job that requires 4 vCPUs, you can configure your compute environment to only allow c5.xlarge instances, so each instance can run only one job. However, if you want to run other jobs with higher vCPU requirements, you would have to run them in a different compute environment.

AWS Batch EC2 Provision Time

I'm relatively new to using AWS Batch, and have been noticing it takes a LONG time to spin up EC2 instances in a managed compute environment.
My jobs will go from Submitted > Pending > Runnable within 1 minute.
But sometimes they will sit in Runnable anywhere from 15 minutes to 1 hour before an EC2 instance finally gets around to spinning up.
Any tips and tricks on getting AWS Batch to spin up instances more quickly?
Ideally I'd like an instance the moment somethings in the Runnable state.
For some more context, I am using AWS Batch essentially like Lambda but choose your own instance and hard drive. I can't use lambda because the jobs need a lot more resources (GPUs) and time to process.

It would appear the scheduler takes its time based on non-transparent load at the data center.
Would love if creating a Batch Job returned estimated TTL.
But anyways, sometimes I get machines instantly, sometimes it takes up to 15 minutes, and sometimes it will take an hour or more for newer GPU instance types, because there are not any available.
There doesn't appear to be anyway to control the schedule. Oh well.

Note: Below setting might help reduce provision time, but will incur additional costs.
Compute environments -> Compute resources -> Minimum vCPUs
Making this = 1 (or more) will allow single instance to run all the time.
Compute environments -> Compute resources -> Allocation strategy
Changing this from "BEST_FIT" to "Best_Fit_Progressive" will also help.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js