Google Compute Engine - Low on Resource Utilisation - google-cloud-platform

I use a VM Instance provided by Google Compute Engine.
Machine Type: n1-standard-8 (8 vCPUs, 30 GB memory).
When I check for the CPU Utilisation, it never uses more than 12%. I use my VM for running Jupyter Notebook. I have tried loading dataframes which costed 7.5 GiB (And it takes a long time to process the data for simple operations). But still the utilisation is same
How can I utilise the CPU power ~ 100%?
Or Does my program use only 1 out of the 8 CPU (1/8)*100 =12.5%?

You can run stress command to impose a configurable amount of CPU, memory, I/O, and disk stress on the system.
Example to stress 4 cores for 90 seconds:
stress --cpu 4 --timeout 90
In the meantime go to your Google Cloud Console on your browser to check your CPU usage on your VM or open new SSH connection to your VM and run TOP command to see your CPU status.
After running those mentioned commands, if your CPU can reach over 99%, your instance is working fine and you have to check your application resources to know why it is restricted and cannot use CPU more than 12%.

Related

E2 CPU Usage Goes Up Over Time on Google Compute Engine

It is quite strange that all of my 6 E2-small vm instances (all Debian 10) are increasing in CPU usage over time. Is this a bug from Google?
And I can verify that this does not happen on N1 CPU (g1-small, Debian 10, orange line):
I restarted the E2 instance (blue line) before end of January and created a new N1 instance (orange line). Both VM's are not utilised yet, and you can see that E2 is increasing its CPU usage over time.
Here's my top command on the E2:
Here are 3 more VM's (utilised in production) which shows CPU slowly creeping up over time (restarted Jan 26):
Is this a google bug?
This is a bug with google_os, fixed by:
sudo apt-get update && sudo apt-get upgrade google-osconfig-agent -y
Confirmed that all of the affected vm's are not increasing in cpu usage anymore after several days.
After updating and restarting on Feb 18, the CPU usage is now stable.
No, this is not a bug, E2 small machines are Shared-core machine.
Shared-core machine types use context-switching to share a physical core between vCPUs for the purpose of multitasking. Different shared-core machine types sustain different amounts of time on a physical core. Review the following sections to learn more.
In general, shared-core instances can be more cost-effective for running small, non-resource intensive applications than standard, high-memory or high-CPU machine types.
CPU Bursting
Shared-core machine types offer bursting capabilities that allow instances to use additional physical CPU for short periods of time. Bursting happens automatically when your instance requires more physical CPU than originally allocated. During these spikes, your instance will opportunistically take advantage of available physical CPU in bursts. Note that bursts are not permanent and are only possible periodically. Bursting doesn't incur any additional charges. You are charged the listed on-demand price for f1-micro, g1-small, and e2 shared-core machine types.
E2 shared-core machine types
E2 shared-core machines are cost-effective, have a virtio memory balloon device, and are ideal for small workloads. When you use E2 shared-core machine types, your VM runs two vCPUs simultaneously, shared on one physical core, for a specific fraction of time, depending on the machine type.
*e2-micro sustains 2 vCPUs, each for 12.5% of CPU time, totaling 25% vCPU time.
*e2-small sustains 2 vCPUs, each at 25% of CPU time, totaling 50% of vCPU time.
*e2-medium sustains 2 vCPUs, each at 50% of CPU time, totaling 100% vCPU time.
Each vCPU can burst up to 100% of CPU time, for short periods, before returning to the time limitations here.
It depends on the processes running on the instance for it to burst and increase usage.

How to minimize Google Cloud launch latency

I have a persistent server that unpredictably receives new data from users, needing about 10 GPU instances to crank at the problem for about 5 minutes, and I send the answer back to the users. The server itself is a cheap always-persistent single CPU Google Cloud instance. When a user request comes in, my code launches my 10 created but stopped Google Cloud GPU instances with
gcloud compute instances start (instance list)
In the rare case if the stopped instances don't exist (sometimes they get wiped) that's detected and they're recreated with
gcloud beta compute instances create (...)
This system all works fine. My only complaint is that even with created but stopped instances, the launch time before my GPU code finally starts to run is about 5 minutes. Most of this is just the time for the instance itself to launch its Ubuntu host and call my code.. the delay once Ubuntu is running to start the GPU is only about 10 seconds.
How can I reduce this 5 minute delay? I imagine most of it comes from Google having to copy over the 4GB of instance data to the target machine, but the startup time of (vanilla) Ubuntu adds probably 1 more minute. I'm not even sure if I could quantify these two numbers independently, I only can measure the combined 3-7 minutes delay from the launch until my code starts responding.
I don't think Ubuntu OS startup time is the major startup latency contributor since I timed an actual machine with the same Ubuntu and same GPU on my desk from poweron boot up and it began running my GPU code in 46 seconds.
My goal is to get results back to my users as soon as possible, and that 5 minute startup delay is a bottleneck.
Would making a smaller instance SIZE of say 2GB help? What else can I do to reduce the latency?
2GB is large. That's a heckuva big image. You should be able to cut that down to 100MB, perhaps using Alpine instead of Ubuntu.
Copying 4GB of data is also less than ideal. Given that, I suspect the solution will be more of an architecture change than a code change.
But if you want to take a whack at everything which is NOT about your 4GB of data, there is a capability to prepare a custom image for your VMs. If you can build a slim custom image that will help.
There's good resources for learning more, the two I would start with include:
- Improve GCE Boot Times with Custom Images
- Three steps to Compute Engine startup-time bliss: Google Cloud Performance Atlas

aws rds 100% cpu 2 vcores

i currently use T2.Micro RDS with SQL Express.
Due to a heavy load application running, there might be times that 1 request of a visitor might take 30 seconds to complete. This makes the RDS work 100% CPU. The result is any other visitor that goes to the website same time and during 100% CPU load, the website takes much longer to answer.
T2.micro has 1 vCPU.
I'm thinking of upgrade to T2.medium with has 2 vCPU.
The question is, if i have 2 vCPU will i avoid the bottleneck?
Example, 1st visitor with 30 second request, uses vCPU #1 and second visitor comes same time, is he using vCPU #2 ? Will that help my situation ?
Also, i did not see any option in aws rds to see what CPU is that. Is there option to choose faster vCPU somehow ?
Thank you.
The operating system's scheduler automatically handles the distribution of running threads across all the available cores, to get as much work done as possible in the least amount of time.
So, yes, a multi-core machine should improve performance as long as more than one query is running. If a single, CPU-intensive, long-running query -- and nothing else -- is running on a 2-core machine, the maximum CPU utilization you'd probably see would be about 50%... but as long as there is more than one query running, each of them will be running on one of the cores at a time, and the system can actually move a thread anong the available cores as the workload shifts, to put them on the optimum core.
A t2.micro is a very small server, but t2 is a good value proposition. With all the t2-class machines, you aren't allowed to run 100% CPU continuously, regardless of the number of cores, unless you have a sufficient CPU credit balance available. This is why the t2 is so inexpensive. You need to keep an eye on this metric as well. CPU credits are earned automatically over time, and spent by using CPU. A second motivation for upscaling a t2 machine is that larger t2 instances earn these credits at a faster rate than smaller ones.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/t2-instances.html

What type of EC2 instance is better suitted for Wso2 CEP Standalone?

compute optimized (c3, c4) or a memory optimized (r3) instance for running a stand-alone wso2 cep server?
i searched the documentation but could not find anything regarding running this server on ec2
As per the WSO2 SA recommendations,
Hardware Recommendation
Physical :
3GHz Dual-core Xeon/Opteron (or latest), 4 GB RAM (minimum : 2 GB for JVM and 2GB for the OS, 10GB free disk space (minimum) disk based on the expected storage requirements (calculate by considering the file uploads and the backup policies) . (e.g if 3 Carbon instances running in a machine, it requires 4 CPU, 8 GB RAM 30 GB free space)
Virtual Machine :
2 compute units minimum (each unit having 1.0-1.2 GHz Opteron/Xeon processor) 4 GB RAM 10GB free disk space. One cpu unit for OS and one for JVM. (e.g if 3 Carbon instances running require VM of 4 compute units 8 GB RAM 30 GB free space)
EC2 : c3.large instance to run one Carbon instance. (e.g if 3 Carbon instances EC2 Extra-Large instance) Note : based on the I/O performance of c3.large instance, it is recommended to run multiple instances in a Larger instance (c3.xlarge or c3.2xlarge).
NoSQL-Data Nodes:
4 Core 8 GB (http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningHardware_c.html)
Example
Let's say a customer needs 87 carbon instance. Hence, they need 87 CPU core / 174GB of memory / 870GB free space.
This is calculated by not considering the resources for OS. Per each machine, they need 1CPU core, 2GB memory for OS.
Lets say they want to buy 10 machines, then total requirement will be 97 CPU core (10 core for OS + 87 core for Carbon) 194 GB memory (20 GB for OS + 174GB for Carbon) 870GB free space for carbon (Normally, storage will have more than this).
Which means, each machine will have 1/10 of above and can run about 9 carbon instances. i.e roughly 10 CPU core / 20 GB Memory/ 100 GB of free storage
Reference : https://docs.wso2.com/display/CLUSTER44x/Production+Deployment+Guidelines
Note:
However, everything depends on the what you're going to process using CEP. therefore, please refer #Tharik's answer as well.
It depends on the type of processing CEP node does. CEP node requires alot of memory if the processing event size is large, or the event flowing through put is high and if there are time windows in the queries. For those cases Memory Optimized EC2 instances are better as those provide lowest price for RAM size. If there are a lot of computation on algorithms you have extended you might more processing capabilities of compute optimized instances.

ec2 instance running locust.io issues

I'm trying to run a locust.io load test on an ec2 instance - a t2.micro. I fire up 50 concurrent users, and initially everything works fine, with the CPU load reaching ~15%. After an hour or so though, the network out shows a drop of about 80% -
Any idea why this is happening? It's certainly not due to CPU credits. Maybe I reached the network limits for a t2 micro instance?
Thanks
Are you sure it's not a CPU credit issue? Can you check your cpu credits over that same time period to see how they look?
Or better yet, run the same test on a non-t2 instance. One that isn't limited in it's CPU usage.
t2.micro's consume CPU credits at usage about 10% of CPU.