How to use AWS Batch to increase computation speed - amazon-web-services

I've made a docker container containing an R script which applies a random forest model to a data sample. This data sample is only 0.1% of the total data.
When running the same script on my local PC it takes 90 seconds.
I was hoping to prove that I could run this faster on AWS batch then I could scale up the process.
However, I've tried running both Fargate and ECS AWS batch jobs and the job seems to take 5 minutes with fairly high vCPUs, memory.
Does anyone know how I can improve the speed of this process?
Am I using the right service to be performing this computation?

Related

Run a single job on GPU else use CPU in AWS

I am relatively new to programming and beginner at AWS cloud. I was wondering if there was a service\way that normally my code runs on CPU instance like t3.medium in AWS but when a certain condition is true (like a file gets available in bucket) then runs my function on a GPU instance like g4dn.xlarge. I want to do this so that I don't incur charges for continuously running my code on a GPU instance. The function that needs to be run is taxing as it takes up-to 7 minutes on google colab. I might be using it 20-30 times a day. Thanks.

Suggested way to write a queryset to Parquet in GCP

I have a query that runs against a SQL Server instance that takes anywhere from 5 minutes to 75 minutes to complete. The response size is anywhere from a few rows to 1GB of data. I have a parquet writer that only has to wait until the query completes and results sent back and it will write the results to Google Cloud Storage.
What would be the best product to accomplish this, and is their one that would have roughly zero startup time? The two that came to mind for me were Cloud Functions and Cloud Run, but I've never used either.
Neither service meets your requirement of 75 minutes.
Cloud Functions times out at 540 seconds.
Cloud Functions Time Limits
Cloud Run times out at 60 minutes.
Cloud Run Request Timeout
For that sort of runtime, I would launch a container in Compute Engine Container-Optimized OS.
Container-Optimized OS
There is the possibility of configuring Cloud Run cpu-throttling so that you can run tasks in the background.
Run more workloads on Cloud Run with new CPU allocation controls
Note that you will be paying for the service on a constant basis as it is no longer running services (containers) on demand.

AWS Batch permits approx 25 concurrent jobs in array configuration while compute environment allows using 256 CPU

I am running Job Array on the AWS Batch using Fargate Spot environment.
The main goal is to do some work as quickly as possible. So, when I run 100 jobs I expect that all of these jobs will be run simultaneously.
But only approx 25 of them start immediately, the rest of jobs are waiting with RUNNABLE status.
The jobs run on compute environment with max. 256 CPU. Each job uses 1 CPU or even less.
I haven't found any limits or quotas that can influence the process of running jobs.
What could be the cause?
I've talked with AWS Support and they advised me not to use Fargate when I need to process a lot of jobs as quick as possible.
For large-scale job processing, On-Demand solution is recommended.
So, after changed Provisioning model to On-Demand number of concurrent jobs grown up to CPU limits determined in settings, this was what I needed.

What AWS service can I use to efficiently process large amounts of S3 data on a weekly basis?

I have a large amount of images stored in an AWS S3 bucket.
Every week, I run a classification task on all these images. The way I'm doing it currently is by downloading all the images to my local PC, processing them, then making database changes once the process is complete.
I would like to reduce the amount of time spent downloading images to increase the overall speed of the classification task.
EDIT2:
I actually am required to process 20,000 images at a time to increase performance of the classification engine. This means I can't use Lambdas since the maximum option for RAM available is 3GB and I need 16GB to process all 20,000 images
The classification task uses about 16GB of RAM. What AWS service can I use to automate this task? Is there a service that can be put on the same VLAN as the S3 Bucket so that images transfer very quickly?
The entire process takes about 6 hours to do. If I spin up an EC2 with 16GB of RAM it would be very cost ineffective as it would finish after 6 hours then spend the remainder of the week sitting there doing nothing.
Is there a service that can automate this task in a more efficient manner?
EDIT:
Each image is around 20-40KB. The classification is a neural network, so I need to download each image so I can feed it through the network.
Multiple images are processed at the same time (batches of 20,000), but the processing part doesn't actually take that long. The longest part of the whole process is the downloading part. For example, downloading takes about 5.7 hours, processing takes about 0.3 hours in total. Hence why I'm trying to reduce the amount of downloading time.
For your purpose you can still use EC2 instance. And if you have large amount of data to be downloaded from S3, you can attach and EBS volume to the instance.
You need to setup the instance with all the tools and software required for running your job. And when you don't have any process to run, you can shut down the instance. And boot it up when you want to run the process.
EC2 instances are not charged for the time they are in stopped state. You will be charged for the EBS volume and Elasitc IP attached to the Instance.
You also will be charged for the storage of the EC2 image on S3.
But I think these cost will be less than the cost of running EC2 instance all the time.
You can schedule start and stop the instance using AWS instance scheduler.
https://www.youtube.com/watch?v=PitS8RiyDv8
You can also use AutoScaling but that would be more complex solution than using the Instance Scheduler.
I would look into Kinesis streams for this, but it's hard to tell because we don't know exactly what processing you are doing to the images

Using ECS Fargate to run long processes

I'm migrating a software that I use to extract images from documents from iron.io to ECS Fargate, and the start of a container in EC2 is very slow, somethings takes 3 minutes for a container change the state from PENDING to RUNNING. Is it possible to improve this speed? I've searched about this subject, and there is a lack of information about why it takes so much time sometimes, and others are faster (but still very slow).
The majority of time spent transitioning an ECS Fargate task from PENDING to RUNNING will be spent in pulling images over the network from a docker repository. Improving pull speed will most likely include examing the images being pulled and trying to reduce layer sizes if at all possible.
If your service is behind an ALB, you can also try adjusting the health check period and the healthy threshold counts to a lower number.