AWS batch - how to limit number of concurrent jobs - amazon-web-services

I am looking for a way to limit the number of batch jobs that are running by holding the remaining jobs in the queue. Is it possible with aws batch?

Limiting the maximum number of vcpus of the managed compute environment the queue is tied to will effectively limit the number of batch jobs running concurrently on that queue.
However, this comes with the caveat that, if you have other queues sharing this compute environment, they would also be limited accordingly. Moreover, if you have multiple compute environments associated with that queue you are attempting to limit, Batch will eventually begin scheduling jobs on the secondary compute environments if there are enough jobs waiting in the RUNNABLE state.

Related

Running multiple jobs in SageMaker

I was wondering if it is possible to run large number of "jobs" (or "pipeline" or whatever is the right way) to execute some modelling tasks in parallel.
So what I planned to do is to do a ETL process and EDA done and after that when the data is ready, I would like to fire 2000 modelling jobs. We have 2000 products and each job can start with a data (SELECT * FROM DATA WHERE PROD_ID='xxxxxxxxx') and my idea is to run these training jobs in parallel (there is no dependency between them - so it makes sense to me).
First of all - 1) Can it be done in AWS SageMaker? 2) What would be the right approach? 3) Any special considerations I need to be aware of?
Thanks a lot in advance!
it's possible to run this on SageMaker, with SageMaker pipelines that will orchestrate a SageMaker Processing job, followed by a Training job. You can define the PROD_ID as a String parameter to the SageMaker Pipeline, then run multiple pipelines executions concurrently (default soft limit is 200 concurrent executions).
As you have a very high numbers of jobs (2K) which you want to run in parallel, and perhaps optimize compute usage, you might also want to look at AWS Batch, which allows you to queue up tasks, for a fleet of instances that starts containers to perform these jobs. AWS Batch also support Spot instances which could reduce your instance cost by 70%-90%. Another advantage of AWS Batch is that jobs reuse the same running instance (only container stop/start), while in SageMaker there's a ~2 minute overhead to start the instance per job. Additionally, AWS Batch also takes care of retries and allowing you to chain all 2,000 jobs together and run a "finisher" job when all jobs have completed.
Limits increase - For any service, you'll need to increase your service quota limits. It can be done from the console "Quotas" for most services, or by contacting AWS support. Some services has hard limits.

It may take a long time to start a data flow job

When I start a data flow job, it sometimes waits for more than 30 minutes without being allocated an instance.
What is happen??
Your Dataflow Job is getting slow because the time needed to start the VMs on Google Compute Engine grows with the number of VMs you start, and in general VM startup and shutdown performance can have high variance.
you can look at Cloud Logs for your job ID, and see if there's any logging going on, also you can check the Dataflow monitoring interface inside your Dataflow job.[1]
you can enable autoscaling[2] instead of specifying a large number of instances manually, it should gradually scale to the appropriate number of VMs at the appropriate moment in the job's lifetime.
Without autoscaling, you have to choose a fixed number of workers by specifying workers to execute your pipeline. As the input workload varies over time, this number can become either too high or too low. Provisioning too many workers results in unnecessary extra cost, and provisioning too few workers results in higher latency for processed data. By enabling autoscaling, resources are used only as they are needed.
The objective of autoscaling is to minimize backlog while maximizing worker utilization and throughput, and quickly react to spikes in load.
[1] https://cloud.google.com/dataflow/docs/guides/using-monitoring-intf
[2] https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#streaming-autoscaling

How are shards from a Kinesis stream assigned to multiple instances of a Kinesis consumer?

I have a setup with a kinesis stream with 20 shards that is consumed by a kinesis consumer based on KCL. The consumer is deployed in ECS with 20 instances.(Meaning multiple KCL instances?)
What I believed would happen in this scenario is:
Each instance would create 20 worker threads for each shard, independently of each other.
So at any given time, a shard would have 20 separate threads connecting to it
The same set of records would get processed by each instance (ie: duplicate record processing will not be handled across the instances)
This would also exceed the consumer rate limits per each shard. ( 5 transactions per second)
Running a single instance of my consumer is sufficient. In other words, scaling the consumer across multiple instances will not have any benefits at all.
This answer seem to suggest that the "shard's lease" would ensure that it is only processed by a single instance. However, the second answer here says that "A KCL instance will only start one process per shard, but you can have another KCL instance consuming the same stream (and shard), assuming the second one has permission.".
Further this documentation suggests "Increasing the number of instances up to the maximum number of open shards" as a possible scale-up approach which contradicts some of the above points.
How does the consumer instances actually function in this scenario?
What would happen in the scenario you describe is that each of the 20 workers will eventually only process 1 shard.
At startup, each worker will try to claim as many shards by creating leases for those shards. When all 20 workers start simultaneously, they will all try to create leases for 20 shards, but this will not succeed for all of them. One worker may end up with eg 5 shards, and other ones with 2 or 3. After a few iterations of lease taking, though, each worker should have only 1 shard. This way the AWS rate limits are respected.
While this balancing process happens, it is possible for a short while for two workers to process the same records twice. This happens between the time that a worker steals a lease from another worker and that worker trying to update the lease and discovering that another worker has taken it, either by periodic refreshing or by checkpointing.
After this initial lease division, though, this will not happen anymore. When the workers are restarted, they resume the leases they had previously. But when a worker is down for a long time, other workers will take over its leases.
Kinesis has an at-least-once processing model because of this. It is best to design your application so that operations on the data are idempotent.
Scaling is useful if you want to be fault-tolerant (other workers will take over from a failed worker) or your data processing is so time-consuming that one worker would not be able to cope with 20 shards. Scaling beyond the number of shards is indeed only useful for fault-tolerance purposes.

Scheduling long-running tasks using AWS services

My application heavily relies on AWS services, and I am looking for an optimal solution based on them. Web Application triggers a scheduled job (assume repeated infinitely) which requires certain amount of resources to be performed. Single run of the task normally will take maximum 1 min.
Current idea is to pass jobs via SQS and spawn workers on EC2 instances depending on the queue size. (this part is more or less clear)
But I struggle to find a proper solution for actually triggering the jobs at certain intervals. Assume we are dealing with 10000 jobs. So for a scheduler to run 10k cronjobs (the job itself is quite simple, just passing job description via SQS) at the same time seems like a crazy idea. So the actual question would be, how to autoscale the scheduler itself (given the scenarios when scheduler is restarted, new instance is created etc. )?
Or the scheduler is redundant as an app and it is wiser to rely on AWS Lambda functions (or other services providing scheduling)? The problem with using Lambda functions is the certain limitation and the memory provided 128mb provided by single function is actually too much (20mb seems like more than enough)
Alternatively, the worker itself can wait for a certain amount of time and notify the scheduler that it should trigger the job one more time. Let's say if the frequency is 1 hour:
1. Scheduler sends job to worker 1
2. Worker 1 performs the job and after one hour sends it back to Scheduler
3. Scheduler sends the job again
The issue here however is the possibility of that worker will be get scaled in.
Bottom Line I am trying to achieve a lightweight scheduler which would not require autoscaling and serve as a hub with sole purpose of transmitting job descriptions. And certainly should not get throttled on service restart.
Lambda is perfect for this. You have a lot of short running processes (~1 minute) and Lambda is for short processes (up until five minutes nowadays). It is very important to know that CPU speed is coupled to RAM linearly. A 1GB Lambda function is equivalent to a t2.micro instance if I recall correctly, and 1.5GB RAM means 1.5x more CPU speed. The cost of these functions is so low that you can just execute this. The 128MB RAM has 1/8 CPU speed of a micro instance so I do not recommend using those actually.
As a queueing mechanism you can use S3 (yes you read that right). Create a bucket and let the Lambda worker trigger when an object is created. When you want to schedule a job, put a file inside the bucket. Lambda starts and processes it immediately.
Now you have to respect some limits. This way you can only have 100 workers at the same time (the total amount of active Lambda instances), but you can ask AWS to increase this.
The costs are as follows:
0.005 per 1000 PUT requests, so $5 per million job requests (this is more expensive than SQS).
The Lambda runtime. Assuming normal t2.micro CPU speed (1GB RAM), this costs $0.0001 per job (60 seconds, first 300.000 seconds are free = 5000 jobs)
The Lambda requests. $0.20 per million triggers (first million is free)
This setup does not require any servers on your part. This cannot go down (only if AWS itself does).
(don't forget to delete the job out of S3 when you're done)

What happens if the number of workers is > number of shards when using KCL with AWS Kinesis streams?

The AWS Kinesis stream documentation mentions
Typically, when you use the KCL, you should ensure that the number of instances does not exceed the number of shards
What would be the consequence if the number of instances exceeds the number of shards? I plan on running one worker per Web server (separate thread). So I want to know whether it is required to check and compare the number of shards and running workers when a new web server instance is started. Or can one just start another worker without any side effect if the number of workers exceeds the number of shards.
TL; DR: There can only be one Worker per Shard. Any additional Workers will sit idle.
If you have a Kinesis stream with two shards, and you run an app on a single instance that leverages the KCL, the app will run two workers in separate threads-- one Worker per Shard (per thread).
If you run two instances, your app will run a single Worker on each instance in a thread-- two instances, one worker each; one Kinesis stream, two shards.
Each worker takes out a lease against a shard in a stream so no other worker of the same app can read the same shard. The Worker stores the lease information in Dynamo DB so other Workers can read it.
If you were to run 3 instances in this scenario, one of the instances would sit around waiting for a Worker on one of the other instances to lose its lease. Once one of the other Workers loses its lease, the third Worker could pick up the stream and begin processing.