run multi-threading processes in aws instances - amazon-web-services

I have a node group in my EKS with m5.2xlarge instances (6 desired instances, 8 cores 32gb RAM for each instance)
I want to run a multi-threading processes of Gradle-tests in this node group.
The Gradle code knows to find cores and assign threads, so tests will run in parallel according to number of cores.
On MacOS and Windows my tests are working in parallel as I wish, but on k8s when running on these instances only one test is running at a time. no parallelism there.
htop from inside the pod itself shows 8 cores with usage average of 3% amd memory average usage is 5% !!!
Any suggestion what should be done to run multi-threading processes on AWS instances would be much appreciated.

Related

AWS Batch permits approx 25 concurrent jobs in array configuration while compute environment allows using 256 CPU

I am running Job Array on the AWS Batch using Fargate Spot environment.
The main goal is to do some work as quickly as possible. So, when I run 100 jobs I expect that all of these jobs will be run simultaneously.
But only approx 25 of them start immediately, the rest of jobs are waiting with RUNNABLE status.
The jobs run on compute environment with max. 256 CPU. Each job uses 1 CPU or even less.
I haven't found any limits or quotas that can influence the process of running jobs.
What could be the cause?
I've talked with AWS Support and they advised me not to use Fargate when I need to process a lot of jobs as quick as possible.
For large-scale job processing, On-Demand solution is recommended.
So, after changed Provisioning model to On-Demand number of concurrent jobs grown up to CPU limits determined in settings, this was what I needed.

AWS BATCH - how to run more concurrent jobs

I have just started working with AWS BATCH for my deep learning workload. I have created a compute environment with the following config:
min vCPUs: 0
max vCPUs: 16
Instance type: g4dn family, g3s family, g3 family, p3 family
allocation strategy: BEST_FIT_PROGRESSIVE
The maximum number of vCPU limits for my account is 16 and each of my jobs requires 16GB of memory. I observe that a maximum of 2 jobs can run concurrently at any point in time. I was using allocation strategy: BEST_FIT before and changed it to allocation strategy: BEST_FIT_PROGRESSIVE but I still see that only 2 jobs can run concurrently. This limits the amount of experimentation I can do in a given time. What can I do to increase number of jobs that can run concurrently?
I figured it out myself just now. I'm posting an answer here just in case anyone finds it helpful in the future. It turns out that the instances that were assigned to each of my jobs are g4dn2xlarge. Each of these instances takes up 8 vCPUs. And as my vCPU limit is 16 only 2 jobs can run concurrently. One of the solutions to this is to ask AWS to increase the limit on vCPU by creating a new support case. Another solution could be to modify the compute environment to use GPU instances that consume 4 vCPUs (lowest possible on AWS) and in this case maximum of 4 jobs can run concurrently.
There are 2 kind of solutions:
Configure your compute environment with ec2 instances with vCPUs tha be
multiple of your jobs definitions. For example:
Compute env. with ec2 instance type 8 vCPU and limit up 128 vCPUs of you
have a job definition with 8 vCPU it will let you to execute up to 16
concurrent jobs.Because 16 jobs concurrents X 8 vCPU = 128 vCPUs (take
in count the allocation strategy and memory of your instance which is
important in your job consume memory resources too)
Multi-node parallel jobs, this a very interesting soution because in
this kind of scenario you don't need ec2 instances vCPU that at lest be
multiple of you vCPU used in your Job definition and jobs can be spaned
accross multiple Amazon EC2 instances.

AWS batch to always launch new ec2 instance for each job

I have setup a batch environment with
Managed Compute environment
Job Queue
Job Definitions
The actual job(docker container) does a lot of video encoding and hence uses up most of the CPU. The process itself takes a few minutes (close to 5 minutes to get all the encoders initialized). Ideally I would want one job per instance so that the encoders are not CPU starved.
My issue is when I launch multiple jobs at the same time or close enough, AWS batch decides launch both of them in the same instance as the first container is still initializing and has not started using CPUs yet.
It seems like a race condition to me where both jobs see the instance created as available.
Is there a way I can launch one instance for each job without looking for instances that are already running? Or any other solution to lock an instance once it is designated for a particular job?
Thanks a lot for your help.
You shouldn't have to worry about separating the jobs onto different instances because the containers the jobs run in are limited in how many vCPUs they can use. For example, if you launch two jobs that each require 4 vCPUs, Batch might spin up an instance that has 8 vCPUs and run both jobs on the same instance. Each job will have access to only 4 of the vCPUs, so performance should be identical to a job running on its own with no other jobs on the instance.
However, if you still want to separate the jobs onto separate instances, you can do so by matching the vCPUs of the job with the instance type in the compute environment. For example, if you have a job that requires 4 vCPUs, you can configure your compute environment to only allow c5.xlarge instances, so each instance can run only one job. However, if you want to run other jobs with higher vCPU requirements, you would have to run them in a different compute environment.

What is the number of cores in aws.data.highio.i3 elastic cloud instance given for a 14 day trial period?

I wanted to make some performance calculations hence i need to know the number of cores that this aws.data.highio.i3 instance deployed by elastic cloud on aws has, I know that it has 4 GB of ram so if anyone can help me with the number of cores that would be really very helpfull.
I am working on elasticsearch deployed on elastic cloud and my use case requires me to make approx 40 million writes in a day so if you can help me suggest what machines i must use that can work accordingly to my use case and are I/O optimized as well.
The instance used by Elastic Cloud for aws.data.highio.i3 in the background is i3.8xlarge, see here. That means it has 32 virtual CPUs or 16 cores, see here.
But you down own the instance in Elastic Cloud, from reference hardware page:
Host machines are shared between deployments, but containerization and
guaranteed resource assignment for each deployment prevent a noisy
neighbor effect.
Each ES process runs on a large multi-tenant server with resources carved out using cgroups, and ES scales the thread pool sizing automatically. You can see the number of times that the CPU was throttled by the cgroups if you go to Stack Monitoring -> Advanced and down to graphs Cgroup CPU Performance and Cgroup CFS Stats.
That being said, if you need full CPU availability all the time, better go with AWS Elasticsearch service or host your own cluster.

What vCPUs in Fargate really mean?

I was trying to get answers on my question here and here, but I understood that I need to know specifically Fargate implementation of vCPUs. So my question is:
If I allocate 4 vCPUs to my task does that mean that my
single-threaded app running on a container in this task will be able to fully use all this vCPUs as they are essentially only a
portion of time of the processor's core that I can use?
Let's say, I assigned 4vCPUs to my task, but on a technical level I
assigned 4vCPUs to a physical core that can freely process one
thread (or even more with hyperthreading). Is my logic correct for
the Fargate case?
p.s. It's a node.js app that runs session with multiple players interacting with each other so I do want to provide a single node.js process with a maximum capacity.
Fargate uses ECS (Elastic Container Service) in the background to orchestrate Fargate containers. ECS in turn relies on the compute resources provided by EC2 to host containers. According to AWS Fargate FAQ's:
Amazon Elastic Container Service (ECS) is a highly scalable, high performance container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances
...
ECS uses containers provisioned by Fargate to automatically scale, load balance, and manage scheduling of your containers
This means that a vCPU is essentially the same as an EC2 instance vCPU. From the docs:
Amazon EC2 instances support Intel Hyper-Threading Technology, which enables multiple threads to run concurrently on a single Intel Xeon CPU core. Each vCPU is a hyperthread of an Intel Xeon CPU core, except for T2 instances.
So to answer your questions:
If you allocate 4 vCPUs to a single threaded application - it will only ever use one vCPU, since a vCPU is simply a hyperthread of a single core.
When you select 4 vCPUs you are essentially assigning 4 hyperthreads to a single physical core. So your single threaded application will still only use a single core.
If you want more fine grained control of CPU resources - such as allocating multiple cores (which can be used by a single threaded app) - you will probably have to use the EC2 Launch Type (and manage your own servers) rather than use Fargate.
Edit 2021: It has been pointed out in the comments that most EC2 instances in fact have 2 hyperthreads per CPU core. Some specialised instances such as the c6g and m6g have 1 thread per core, but the majority of EC2 instances have 2 threads/core. It is therefore likely that the instances used by ECS/Fargate also have 2 threads per core. For more details see doco
You can inspect what physical CPU your ECS runs on, by inspecting the /proc/cpuinfo for model name field. You can just cat this file in your ENTRYPOINT / CMD script or use ECS Exec to open a terminal session with your container.
I've actually done this recently, because we've been observing some weird performance drops on some of our ECS Services. Out of 84 ECS Tasks we ran, this was the distribution:
Intel(R) Xeon(R) CPU E5-2686 v4 # 2.30GHz (10 tasks)
Intel(R) Xeon(R) Platinum 8124M CPU # 3.00GHz (22 tasks)
Intel(R) Xeon(R) Platinum 8175M CPU # 2.50GHz (10 tasks)
Intel(R) Xeon(R) Platinum 8259CL CPU # 2.50GHz (25 tasks)
Intel(R) Xeon(R) Platinum 8275CL CPU # 3.00GHz (17 tasks)
Interesting that it's 2022 and AWS is still running CPUs from 2016 (the E5-2686 v4). All these tasks are fully-paid On-Demand ECS Fargate. When running some tasks on SPOT, I even got an E5-2666 v3 which is 2015, I think.
While assigning random CPUs for our ECS Tasks was somewhat expected, the differences in these are so significant that I observed one of my services to report 25% or 45% CPU Utilization in idle, depending on which CPU it hits on the "ECS Instance Type Lottery".