speed up boot time of compute engine instances - google-cloud-platform

I've written a simple batch file that starts apache and sends a curl request to my server at start time. I am using windows server 2016 and n-4 compute engine instance.
I've noticed that 2 identical machines require vastly different start up times. One sends a message in just 40s, other one takes almost 80s. While in console, both seem to start at the same time, the reality is different, since the other one is inaccessible for 80s via RD tools.
The second machine is made from disk image of the first one. What factors contribute to the start time? Where should I trip the fat?

The delay could occur if the instances are in different regions and also if the second instance has some additional memory intensive applications or additional customizations done. The boot disk type for the instance also contributes to the booting time. Are you getting any information from the logs about this delay during the startup time? You could also compare traceroute results on both instances to see if there is a delay at some point in the network.

Related

How to measure speed of an execution via IPC of two applications

I have a sending application that sends a command via an anonymous pipe.
I have a receiving application that receives the command, handles it, and returns a result.
The sending application receives the result.
I can time the length of the complete operation in the first application (steps 1-3). I can also time the handling of the second application that executes the command (step 2).
Looking at both logs I can subtract the time for step 2 form the time of step 1-3 and assuming that the sending application doesn't waste any time, I now know how many time for the transfer was used.
Is there any way, that I can sync both logs in some way, that both logs show the same time-token in msecs. Or to sync all operations in some way, that I can see all timing in the log of the first application.
I know that the applications need some special commands to sync them in some way. But this would be possible, but I have no clue if this is possible at all.
Or the question in other words: Is it possible to time everything in application 1 without looking on both logs individually.
Best possible result:
I know the time that is used for the i/o (pipe)
Time of code executed in app 2 (without i/o)
Time of code executed in app 1 (without i/o)
Further information: The applications run on different machines. Even in different networks connected via VPN.
Similar to what Joseph Larson suggested, but without concern of clocks to be synchronized: could you just append a time spent in app 2 to the result it returns? Then log it from app 1, together with total time.

How to improve AWS EC2 startup time?

I have a use case where I need to start EC2 instances on-demmand, so starting fast is relevant to our users. Currently our startup time is 2 minutes on average, varing according time of the day and instance type.
We are lauching it using the NodeJS SDK and straight from our custom AMI, not using lauching templates, and we noted that smaller image sizes launch faster, but unfortunately we are unable to reduce it.
When the instance starts I have a #reboot cronjob with an application that notifies an api that the instance is ready, everything is installed in the AMI,built for this purpose based on ubuntu 18, and no hard work is done when it starts. We are measuring this startup time based as the difference of the time when it was started and the time it notified is ready.
The startup time is higher when no instances with that AMI were launched recently, suggesting that AWS has some kind of cold start in this case. We also noticed that increasing the disk size from 30gb to 45gb increased this startup from 1 min to the 2 min average that I mentioned.
What strategies may I try to reduce this startup time?

How to minimize Google Cloud launch latency

I have a persistent server that unpredictably receives new data from users, needing about 10 GPU instances to crank at the problem for about 5 minutes, and I send the answer back to the users. The server itself is a cheap always-persistent single CPU Google Cloud instance. When a user request comes in, my code launches my 10 created but stopped Google Cloud GPU instances with
gcloud compute instances start (instance list)
In the rare case if the stopped instances don't exist (sometimes they get wiped) that's detected and they're recreated with
gcloud beta compute instances create (...)
This system all works fine. My only complaint is that even with created but stopped instances, the launch time before my GPU code finally starts to run is about 5 minutes. Most of this is just the time for the instance itself to launch its Ubuntu host and call my code.. the delay once Ubuntu is running to start the GPU is only about 10 seconds.
How can I reduce this 5 minute delay? I imagine most of it comes from Google having to copy over the 4GB of instance data to the target machine, but the startup time of (vanilla) Ubuntu adds probably 1 more minute. I'm not even sure if I could quantify these two numbers independently, I only can measure the combined 3-7 minutes delay from the launch until my code starts responding.
I don't think Ubuntu OS startup time is the major startup latency contributor since I timed an actual machine with the same Ubuntu and same GPU on my desk from poweron boot up and it began running my GPU code in 46 seconds.
My goal is to get results back to my users as soon as possible, and that 5 minute startup delay is a bottleneck.
Would making a smaller instance SIZE of say 2GB help? What else can I do to reduce the latency?
2GB is large. That's a heckuva big image. You should be able to cut that down to 100MB, perhaps using Alpine instead of Ubuntu.
Copying 4GB of data is also less than ideal. Given that, I suspect the solution will be more of an architecture change than a code change.
But if you want to take a whack at everything which is NOT about your 4GB of data, there is a capability to prepare a custom image for your VMs. If you can build a slim custom image that will help.
There's good resources for learning more, the two I would start with include:
- Improve GCE Boot Times with Custom Images
- Three steps to Compute Engine startup-time bliss: Google Cloud Performance Atlas

RethinkDB struggles with 1.5k writes on i3.Metal instance

I have been benchmarking RethinkDB's performance on an i3.Metal ec2 instance, with all 8 nvme SSD's in a raid 0, and am finding the performance to be lacking.
The setup I have is 3 clients subscribed to the changes of a table. Then one client progressively increases it's writes per second, which get sent to the other two clients via a changefeed.
This setup can reach about 1.5k writes a second before it starts to get erratic and lag. If I don't have any clients, and no changefeed subscriptions I can get about 2.5k writes a second.
There were benchmarks which showed RethinkDB getting roughly 8k read/writes per node on inferior hardware. I am just wondering is there some special way to configure the i3.metal to get better performance? Or is there an entirely different cloud service that will provide better RethinkDB performance?

AWS EC2 ECS - How many tasks should I place on a single instance?

At the moment, I have a single c4.large (3.75GB RAM, 2 vCPU) instance in my workers cluster, currently running 21 tasks for 16 services. These tasks range from image processing, to data transformation, most sending HTTP requests too. As you can see, the instance is quite well utilisated.
My question is, how do I know how many tasks to place on an instance? I am placing up to 8 tasks for a service, but I'm unsure as to whether this results in a speed increase, given they are using the same underlying instance. How do I find the optimal placement?
Should I put many chefs in my kitchen, or will just two get the food out to customers faster?
We typically run lots of smaller sized server in our clusters. Like 4-6 t2.small for our workers and place 6-7 tasks on each. The main reason for this is not to speed up processing but reduce the blast radius of servers going down.
We've seen it quite often for a server to simply fail an instance health check and AWS take it down. Having the workers spread out reduces the effect on the system.
I agree with the other people’s 80% rule. But you never want a single host for any kind of critical applications. If that goes down you’re screwed. I also think it’s better to use larger sized servers because of their increase network performance. You should look into a host with enhanced networking, especially because you say you have a lot of HTTP work.
Another thing to consider is disk I/O. If you are piling too many tasks on a host and there is a failure, it’s going to try to schedule those all somewhere else. I have had servers crash because of too many tasks being scheduled and burning through disk credits.