I have a persistent server that unpredictably receives new data from users, needing about 10 GPU instances to crank at the problem for about 5 minutes, and I send the answer back to the users. The server itself is a cheap always-persistent single CPU Google Cloud instance. When a user request comes in, my code launches my 10 created but stopped Google Cloud GPU instances with
gcloud compute instances start (instance list)
In the rare case if the stopped instances don't exist (sometimes they get wiped) that's detected and they're recreated with
gcloud beta compute instances create (...)
This system all works fine. My only complaint is that even with created but stopped instances, the launch time before my GPU code finally starts to run is about 5 minutes. Most of this is just the time for the instance itself to launch its Ubuntu host and call my code.. the delay once Ubuntu is running to start the GPU is only about 10 seconds.
How can I reduce this 5 minute delay? I imagine most of it comes from Google having to copy over the 4GB of instance data to the target machine, but the startup time of (vanilla) Ubuntu adds probably 1 more minute. I'm not even sure if I could quantify these two numbers independently, I only can measure the combined 3-7 minutes delay from the launch until my code starts responding.
I don't think Ubuntu OS startup time is the major startup latency contributor since I timed an actual machine with the same Ubuntu and same GPU on my desk from poweron boot up and it began running my GPU code in 46 seconds.
My goal is to get results back to my users as soon as possible, and that 5 minute startup delay is a bottleneck.
Would making a smaller instance SIZE of say 2GB help? What else can I do to reduce the latency?
2GB is large. That's a heckuva big image. You should be able to cut that down to 100MB, perhaps using Alpine instead of Ubuntu.
Copying 4GB of data is also less than ideal. Given that, I suspect the solution will be more of an architecture change than a code change.
But if you want to take a whack at everything which is NOT about your 4GB of data, there is a capability to prepare a custom image for your VMs. If you can build a slim custom image that will help.
There's good resources for learning more, the two I would start with include:
- Improve GCE Boot Times with Custom Images
- Three steps to Compute Engine startup-time bliss: Google Cloud Performance Atlas
Related
I have a use case where I need to start EC2 instances on-demmand, so starting fast is relevant to our users. Currently our startup time is 2 minutes on average, varing according time of the day and instance type.
We are lauching it using the NodeJS SDK and straight from our custom AMI, not using lauching templates, and we noted that smaller image sizes launch faster, but unfortunately we are unable to reduce it.
When the instance starts I have a #reboot cronjob with an application that notifies an api that the instance is ready, everything is installed in the AMI,built for this purpose based on ubuntu 18, and no hard work is done when it starts. We are measuring this startup time based as the difference of the time when it was started and the time it notified is ready.
The startup time is higher when no instances with that AMI were launched recently, suggesting that AWS has some kind of cold start in this case. We also noticed that increasing the disk size from 30gb to 45gb increased this startup from 1 min to the 2 min average that I mentioned.
What strategies may I try to reduce this startup time?
I've written a simple batch file that starts apache and sends a curl request to my server at start time. I am using windows server 2016 and n-4 compute engine instance.
I've noticed that 2 identical machines require vastly different start up times. One sends a message in just 40s, other one takes almost 80s. While in console, both seem to start at the same time, the reality is different, since the other one is inaccessible for 80s via RD tools.
The second machine is made from disk image of the first one. What factors contribute to the start time? Where should I trip the fat?
The delay could occur if the instances are in different regions and also if the second instance has some additional memory intensive applications or additional customizations done. The boot disk type for the instance also contributes to the booting time. Are you getting any information from the logs about this delay during the startup time? You could also compare traceroute results on both instances to see if there is a delay at some point in the network.
I use a VM Instance provided by Google Compute Engine.
Machine Type: n1-standard-8 (8 vCPUs, 30 GB memory).
When I check for the CPU Utilisation, it never uses more than 12%. I use my VM for running Jupyter Notebook. I have tried loading dataframes which costed 7.5 GiB (And it takes a long time to process the data for simple operations). But still the utilisation is same
How can I utilise the CPU power ~ 100%?
Or Does my program use only 1 out of the 8 CPU (1/8)*100 =12.5%?
You can run stress command to impose a configurable amount of CPU, memory, I/O, and disk stress on the system.
Example to stress 4 cores for 90 seconds:
stress --cpu 4 --timeout 90
In the meantime go to your Google Cloud Console on your browser to check your CPU usage on your VM or open new SSH connection to your VM and run TOP command to see your CPU status.
After running those mentioned commands, if your CPU can reach over 99%, your instance is working fine and you have to check your application resources to know why it is restricted and cannot use CPU more than 12%.
I'm using Gitlab shared runner with Docker (current runner version: 10.0.2, docker storage driver: overlay2), running on AWS t2.small instance. I started experiencing issues with builds slowing down after some time (it's hard to say when exactly they become slow) - they take ~10x more time to finish than before. After killing the instance problem disappears for a while and after some time it slows down again.
Things I already checked:
CPU usage on machine is around 20% the whole time
RAM usage is around 1,5 G during the heaviest build
IOPS on EBS are not exhausting all Burst Balance (e.g. right now burst balance is around 80%)
Download speed
What else might be causing this ?
Just in case, jobs that are running on this runner are mostly yarn install and yarn build of a medium-sized front-end React application.
You mention you're using a t2.small, what's the CPU credit balance on your instance when you see this slow-down?
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/t2-instances.html#t2-instances-cpu-credits
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/t2-instances.html#t2-instances-monitoring-cpu-credits
We are running a docker container on AWS elastic beanstalk, which was running fine for a few weeks but suddenly started to experience very sudden CPU spikes (from ~5% to ~60% in a matter of minutes), who sometimes drop back down quickly, and sometimes stay high for enough time to produce an autoscaling event and spin up a few instances for extra help (Which are terminated some time after that when the CPU spike dies down).
The funny thing is, I wanted to investigate the problem today, so I've sshed into every instance (4 in total) and ran top on all of them, trying to locate the CPU consuming process, and was surprised to discover all instances have ~15% CPU busy (system + user combined), while the EBS monitoring page still shows the servers are at 60% CPU.
I've measured these figures for the good part of the hour, making sure the CPU high load stays high, while the top command still shows low values.
I've also tried to measure CPU for a while using the advice found here -https://askubuntu.com/questions/22021/how-to-log-cpu-load, and got the same very low CPU stats when querying the server directly.
My question is - is it possible AWS monitoring system is not showing me accurate data? Is there anyway to verify the data displayed in the monitoring page?
Any help would be appreciated.