aws rds 100% cpu 2 vcores - amazon-web-services

i currently use T2.Micro RDS with SQL Express.
Due to a heavy load application running, there might be times that 1 request of a visitor might take 30 seconds to complete. This makes the RDS work 100% CPU. The result is any other visitor that goes to the website same time and during 100% CPU load, the website takes much longer to answer.
T2.micro has 1 vCPU.
I'm thinking of upgrade to T2.medium with has 2 vCPU.
The question is, if i have 2 vCPU will i avoid the bottleneck?
Example, 1st visitor with 30 second request, uses vCPU #1 and second visitor comes same time, is he using vCPU #2 ? Will that help my situation ?
Also, i did not see any option in aws rds to see what CPU is that. Is there option to choose faster vCPU somehow ?
Thank you.

The operating system's scheduler automatically handles the distribution of running threads across all the available cores, to get as much work done as possible in the least amount of time.
So, yes, a multi-core machine should improve performance as long as more than one query is running. If a single, CPU-intensive, long-running query -- and nothing else -- is running on a 2-core machine, the maximum CPU utilization you'd probably see would be about 50%... but as long as there is more than one query running, each of them will be running on one of the cores at a time, and the system can actually move a thread anong the available cores as the workload shifts, to put them on the optimum core.
A t2.micro is a very small server, but t2 is a good value proposition. With all the t2-class machines, you aren't allowed to run 100% CPU continuously, regardless of the number of cores, unless you have a sufficient CPU credit balance available. This is why the t2 is so inexpensive. You need to keep an eye on this metric as well. CPU credits are earned automatically over time, and spent by using CPU. A second motivation for upscaling a t2 machine is that larger t2 instances earn these credits at a faster rate than smaller ones.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/t2-instances.html

Related

E2 CPU Usage Goes Up Over Time on Google Compute Engine

It is quite strange that all of my 6 E2-small vm instances (all Debian 10) are increasing in CPU usage over time. Is this a bug from Google?
And I can verify that this does not happen on N1 CPU (g1-small, Debian 10, orange line):
I restarted the E2 instance (blue line) before end of January and created a new N1 instance (orange line). Both VM's are not utilised yet, and you can see that E2 is increasing its CPU usage over time.
Here's my top command on the E2:
Here are 3 more VM's (utilised in production) which shows CPU slowly creeping up over time (restarted Jan 26):
Is this a google bug?
This is a bug with google_os, fixed by:
sudo apt-get update && sudo apt-get upgrade google-osconfig-agent -y
Confirmed that all of the affected vm's are not increasing in cpu usage anymore after several days.
After updating and restarting on Feb 18, the CPU usage is now stable.
No, this is not a bug, E2 small machines are Shared-core machine.
Shared-core machine types use context-switching to share a physical core between vCPUs for the purpose of multitasking. Different shared-core machine types sustain different amounts of time on a physical core. Review the following sections to learn more.
In general, shared-core instances can be more cost-effective for running small, non-resource intensive applications than standard, high-memory or high-CPU machine types.
CPU Bursting
Shared-core machine types offer bursting capabilities that allow instances to use additional physical CPU for short periods of time. Bursting happens automatically when your instance requires more physical CPU than originally allocated. During these spikes, your instance will opportunistically take advantage of available physical CPU in bursts. Note that bursts are not permanent and are only possible periodically. Bursting doesn't incur any additional charges. You are charged the listed on-demand price for f1-micro, g1-small, and e2 shared-core machine types.
E2 shared-core machine types
E2 shared-core machines are cost-effective, have a virtio memory balloon device, and are ideal for small workloads. When you use E2 shared-core machine types, your VM runs two vCPUs simultaneously, shared on one physical core, for a specific fraction of time, depending on the machine type.
*e2-micro sustains 2 vCPUs, each for 12.5% of CPU time, totaling 25% vCPU time.
*e2-small sustains 2 vCPUs, each at 25% of CPU time, totaling 50% of vCPU time.
*e2-medium sustains 2 vCPUs, each at 50% of CPU time, totaling 100% vCPU time.
Each vCPU can burst up to 100% of CPU time, for short periods, before returning to the time limitations here.
It depends on the processes running on the instance for it to burst and increase usage.

AWS EC2 Performance explanation

I have a REST API web server, built in .NetCore, that has data heavy APIs.
This is hosted on AWS EC2, I have noticed that the average response time for certain APIs are ~4 seconds and if I turn up the AWS-EC2 specs, the response time goes down to a few milliseconds. I guess this is expected, what I don't understand is that even when I load test the APIs on a lower end CPU, the server never crosses 50% utilization of memory/CPU. So what is the correct technical explanation that makes the APIs perform faster if the lower end CPU never reaches a 100% utilization of memory/CPU?
There is no simple answer, there are so many ec2 variations you need to first figure out what is slowing down your API.
When you 'turn up' your ec2 instance, you are getting some combination of more memory, faster cpu, faster disk and more network bandwidth - and we can't tell which one of those 'more' features are improving your performance. Different instance classes ar optimized for different problems.
It could be as simple as the better network bandwidth, or it could be that your application is disk-bound and the better instance you chose is optimized for i/O performance.
Depending on what feature your instance is lacking, it would help you decide which type of instance to upgrade to - or as you have found out, just upgrade to something 'bigger' and be happy with the performance (at the tradeoff of being more expensive).

How many threads/processes to create in an ECS task

A c5.2xlarge instance has 8 vCPU. If I run os.cpu_count() (Python) or std::thread::hardware_concurrency() (C++) they each report 8 on this instance. I assume the underlying hardware is probably a much bigger machine, but they are telling me what I have available to me, and that seems useful and correct.
However, if my ECS task requests only 2048 CPU (2 vCPU), then it will still get 8 from the above queries on a c5.2xlarge machine. My understanding is Docker is going to limit my task to only using "2 vCPU worth" of CPU, if other busy tasks are running. But it's letting me see the whole instance.
It seems like this would lead to tasks creating too many threads/processes.
For example, if I'm running 2048 CPU tasks on a c5.18xlarge instance, each task will think it has 72 cores available. They will all create way too many threads/processes overall; it will work but be inefficient.
What is the best practice here? Should programs somehow know their ECS task reservation? And create threads/processes according to that? That seems good except then you might be under-using an instance if it's not full of busy tasks. So I'm just not sure what's optimal there.
I guess the root issue is Docker is going to throttle the total amount of CPU used. But it cannot adjust the number of threads/processes you are using. And using too many or too few threads/processes is inefficient.
See discussion of cpu usage in ECS docs.
See also this long blog post: https://goldmann.pl/blog/2014/09/11/resource-management-in-docker/
There is a huge difference between virtualization technologies and containers. Having a clear understanding of these technologies will help. That being said an application should be configurable if you want to deploy it in different environments.
I would suggest creating an optional config which tells the application that it can only use certain number of cpu cores. If that value is not provided then it falls back to auto detect.
Once you have this option when defining ECS task you can provide this optional config, which will fix the problem you are facing.

AWS Elasticache CPU usage exceeding 100%

We have been using AWS Elasticache for our applications. We had initially set a CPU alarm threshold for 22% (4 core node, so effectively 90% CPU usage), which is based on the recommended thresholds. But we often see the CPU utilization crossing well over 25% to values like 28%, 34%.
What I am trying to understand that how is this theoretically possible, considering Redis is single-threaded ? The only way I can think that this can happen is if there is maintenance operation happening on other cores, which can bump the CPU usage > 25%. Even if the cluster is highly loaded, it should cap CPU usage at 25% and probably start timing out for clients. Can someone help me understand under what scenarios can the CPU usage of a single-threaded Redis instance cross 100% CPU utilization ?
Redis event loop is single-threaded. the Redis process itself is not. There are a couple of extra threads to offline some I/O bound operations. Now, these threads should not consume CPU.
However, Redis also forks child processes to take care of heavy duty operations like AOF rewrite or RDB save. Each forked process generally consumes 100% of a CPU core (except if the operation is slowed down by I/Os), on top of the Redis event loop consumption.
If you find the CPU consumption regularly high, it may be due to a wrong AOF and RDB configuration (i.e. the Redis instance rewrites the AOF or generates a dump too frequently).

ec2 instance running locust.io issues

I'm trying to run a locust.io load test on an ec2 instance - a t2.micro. I fire up 50 concurrent users, and initially everything works fine, with the CPU load reaching ~15%. After an hour or so though, the network out shows a drop of about 80% -
Any idea why this is happening? It's certainly not due to CPU credits. Maybe I reached the network limits for a t2 micro instance?
Thanks
Are you sure it's not a CPU credit issue? Can you check your cpu credits over that same time period to see how they look?
Or better yet, run the same test on a non-t2 instance. One that isn't limited in it's CPU usage.
t2.micro's consume CPU credits at usage about 10% of CPU.