I am using ECS Fargate as my web application deployment. And I'd like to support 200 per second request for my app. I see there is a task size from where I can configure CPU and Memory. And I wonder if I configure 1024 CPU unit with 2048MG memory, how many threads my app can support? Can I say this configuration support open up to 1024 thread in my process?
1 vCPU = 1024 CPU units. Source:
You can determine the number of CPU units that are available per Amazon EC2 instance type by multiplying the number of vCPUs listed for that instance type on the Amazon EC2 Instances detail page by 1,024.
And I wonder if I configure 1024 CPU unit with 2048MG memory, how many threads my app can support?
It's impossible to say, you have to run a load test and measure this.
Can I say this configuration support open up to 1024 thread in my process?
This would generally depend on which technology you use and what exactly a "thread" is.
But most probably you won't be able to run 1024 threads on 1024 CPU units (which is just one vCPU).
Related
I have an ethereum-POA node setup on a VM having below mentioned configurations. Using NodeJS Web3 client, I am trying to create new wallets using the web3.personal.importRawKey function.
VM Configurations Azure VM - Standard D2s v3 (2 vcpus, 8 GiB memory)
As part of our Stress testing, I tried concurrently creating the wallet for 5-10 users, it worked. But when I try to create 15-20 wallets concurrently then the geth process gets killed abruptly and the node stops. On a 1 CPU, 4 GB memory VM, I was able to create at max 4 concurrent wallets. While on 2 vcpus, 8 GiB memory CM, I could process max 10-12 concurrent users.
My concern is the number of concurrent users wallet creation compared to the RAM seems very low and I can't understand why geth processes get killed.One thing I observed was the CPU percentage goes to 200% and then kills the geth node pr
How would I be able to handle at least 1000 concurrent requests to the above-mentioned function to create Blockchain wallets?
Any help will be appreciated.
Thanks in advance!
A c5.2xlarge instance has 8 vCPU. If I run os.cpu_count() (Python) or std::thread::hardware_concurrency() (C++) they each report 8 on this instance. I assume the underlying hardware is probably a much bigger machine, but they are telling me what I have available to me, and that seems useful and correct.
However, if my ECS task requests only 2048 CPU (2 vCPU), then it will still get 8 from the above queries on a c5.2xlarge machine. My understanding is Docker is going to limit my task to only using "2 vCPU worth" of CPU, if other busy tasks are running. But it's letting me see the whole instance.
It seems like this would lead to tasks creating too many threads/processes.
For example, if I'm running 2048 CPU tasks on a c5.18xlarge instance, each task will think it has 72 cores available. They will all create way too many threads/processes overall; it will work but be inefficient.
What is the best practice here? Should programs somehow know their ECS task reservation? And create threads/processes according to that? That seems good except then you might be under-using an instance if it's not full of busy tasks. So I'm just not sure what's optimal there.
I guess the root issue is Docker is going to throttle the total amount of CPU used. But it cannot adjust the number of threads/processes you are using. And using too many or too few threads/processes is inefficient.
See discussion of cpu usage in ECS docs.
See also this long blog post: https://goldmann.pl/blog/2014/09/11/resource-management-in-docker/
There is a huge difference between virtualization technologies and containers. Having a clear understanding of these technologies will help. That being said an application should be configurable if you want to deploy it in different environments.
I would suggest creating an optional config which tells the application that it can only use certain number of cpu cores. If that value is not provided then it falls back to auto detect.
Once you have this option when defining ECS task you can provide this optional config, which will fix the problem you are facing.
At the moment, I have a single c4.large (3.75GB RAM, 2 vCPU) instance in my workers cluster, currently running 21 tasks for 16 services. These tasks range from image processing, to data transformation, most sending HTTP requests too. As you can see, the instance is quite well utilisated.
My question is, how do I know how many tasks to place on an instance? I am placing up to 8 tasks for a service, but I'm unsure as to whether this results in a speed increase, given they are using the same underlying instance. How do I find the optimal placement?
Should I put many chefs in my kitchen, or will just two get the food out to customers faster?
We typically run lots of smaller sized server in our clusters. Like 4-6 t2.small for our workers and place 6-7 tasks on each. The main reason for this is not to speed up processing but reduce the blast radius of servers going down.
We've seen it quite often for a server to simply fail an instance health check and AWS take it down. Having the workers spread out reduces the effect on the system.
I agree with the other people’s 80% rule. But you never want a single host for any kind of critical applications. If that goes down you’re screwed. I also think it’s better to use larger sized servers because of their increase network performance. You should look into a host with enhanced networking, especially because you say you have a lot of HTTP work.
Another thing to consider is disk I/O. If you are piling too many tasks on a host and there is a failure, it’s going to try to schedule those all somewhere else. I have had servers crash because of too many tasks being scheduled and burning through disk credits.
i currently use T2.Micro RDS with SQL Express.
Due to a heavy load application running, there might be times that 1 request of a visitor might take 30 seconds to complete. This makes the RDS work 100% CPU. The result is any other visitor that goes to the website same time and during 100% CPU load, the website takes much longer to answer.
T2.micro has 1 vCPU.
I'm thinking of upgrade to T2.medium with has 2 vCPU.
The question is, if i have 2 vCPU will i avoid the bottleneck?
Example, 1st visitor with 30 second request, uses vCPU #1 and second visitor comes same time, is he using vCPU #2 ? Will that help my situation ?
Also, i did not see any option in aws rds to see what CPU is that. Is there option to choose faster vCPU somehow ?
Thank you.
The operating system's scheduler automatically handles the distribution of running threads across all the available cores, to get as much work done as possible in the least amount of time.
So, yes, a multi-core machine should improve performance as long as more than one query is running. If a single, CPU-intensive, long-running query -- and nothing else -- is running on a 2-core machine, the maximum CPU utilization you'd probably see would be about 50%... but as long as there is more than one query running, each of them will be running on one of the cores at a time, and the system can actually move a thread anong the available cores as the workload shifts, to put them on the optimum core.
A t2.micro is a very small server, but t2 is a good value proposition. With all the t2-class machines, you aren't allowed to run 100% CPU continuously, regardless of the number of cores, unless you have a sufficient CPU credit balance available. This is why the t2 is so inexpensive. You need to keep an eye on this metric as well. CPU credits are earned automatically over time, and spent by using CPU. A second motivation for upscaling a t2 machine is that larger t2 instances earn these credits at a faster rate than smaller ones.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/t2-instances.html
Suppose I specify I want my worker role to run on a 4-cores virtual machine. How do I make use of all cores?
Looks like there's RoleEntryPoint.Run() method that I override to install my requests handler.
How do I handle requests to make use of all cores? Do I manually spawn threads and just delegate processing to them or is there some ready clever abstraction that will help do it automatically?
You should add multiple workers in the WorkerRole::OnStart() as described here http://www.31a2ba2a-b718-11dc-8314-0800200c9a66.com/2010/12/running-multiple-threads-on-windows.html
Spawn threads or use .Net 4 Tasks to have .Net schedule your jobs using a thread pool.
+1 to Oliver - spawn TPL Tasks on each request, the framework runtime should take care of everything from there.
Another way to keep all 4 cores busy is to scale your application out to multiple instances of your Web Role such that each core is running in its instance (note that in Windows Azure, each instance runs in its own virtual machine). Since in Windows Azure you pay by the hour for each core, using one core on each of 4 Worker Role instances will cost the same as running 4 cores on a single Worker Role instance.
The benefit of using 4 Worker Role instances is that you can adjust more conveniently to 3, or 2 - or 10 - instances, depending on the amount of compute you need to bring to bear at any point in time. Changing the number of running instances is easy to do - you do not need to redeploy the application. To change the size of the instances, you need to redeploy. Also, you have less granularity with just instance size: partial, 1, 2, 4, and 8 cores. There is no instance size with, say, 6 cores.
Note also that the Windows Azure SLA is not in effect if you have a single instance. A minimum of 2 instances is required before the various SLAs kick it. This is in part so that Azure's Fabric Controller can update parts of your application (such as with an O/S patch) without taking down your whole application.
Caveat: for legacy code that is not designed with the cloud in mind, it is possible to have code that will not function correctly with more than one instance of it running. In other words, it can't "scale out" effectively; in this case, you can "scale up" by running it on a larger instance size (such as with 4 cores).