While studying basic ML algorithms on MNIST database, I noticed that my netbook is too week for such purpose. I started a free trial on Google Cloud and successfully set up VM instance with 1 vCPU. However, it only boosts up the performance 3x and I need much more computing power for some specific algorithms.
I want to do the following:
use 1 vCPU for setting up an algorithm
switch to plenty of vCPU to perform a single algorithm
go back to 1 vCPU
Unfortunately, I am not sure how Google will charge me for such maneuver. I am afraid that it will drain my 300$ which I have on my account. It is my very first day playing with VMs and using clouds for computing purpose so I really need a good advice from someone with experience.
Question. How to manage namber of vCPUs on Google Cloud Compute Engine to compute single expensive algorithms?
COSTS
The quick answer is that you will pay what you use, if you make use of 16 cpu for 1 hour you will pay 16 cpu for 1 hour.
In order to have a rough idea of cost I would advice you to take a look to Price Calculator and try to create your own estimation with the resources you are going to use.
Having a 1VCPU and 3.75GB of RAM machine running for one day cost around 0.80$ (if it is not a preentible instance and without any committed use discounts), a machine having 32 VCPU and 120GB of RAM on the other hand would cost around 25$/day.
Remember the rule: when it is running, you are paying it; you can change the machine type how many times you want according your needs and during the transition you would pay just the persistent disk. Therefore it could make sense to switch off the machine each time you are not using it.
Consider that you will have to pay as well networking and storage, but the costs in your use case are kind of marginal, for example 100GB of storage for one day costs $0.13.
Notice that since September 2017 Google extended per-second billing, with a one minute minimum, to Compute Engine. I believe that this is how most of the Cloud Provider works.
ADDING VCPU
When the machine is off, you can modify from the edit menu the number of VCU and the amount of memory, here you can find a step to step official guide you can follow through the process. You can change machine type as well through the command line, for example setting a custom machine type with 4 vCPUs and 1 GB of memory :
$ gcloud compute instances set-machine-type INSTANCE-NAME --machine-type custom-4-1024
As soon you are done with your computation, stop the instance and reduce the size of the machine (or leave it off).
Related
I'm choosing instances do run microservices on an AWS EKS cluster.
When reading about it on this article an taking a look on the aws docs it seems that choosing many small instances instead of on larger instance results on a better deal.
There seems to be no downside on taking, for instance, 2 t3.nano (2 vCPU / 0.5GiB each) vs 1 t3.micro (2 vCPU / 1GiB each). The price and the memory are the same but the CPU provided has a huge difference the more instances you get.
I assume there are some processes running on each machine by default, but I found no places metioning its impact on the machine resources or usage. Is it negligible? Is there any advantage on taking one big instance instead?
The issue is whether or not your computing task can be completed on the smaller instances and also there is an overhead involved in instance-to-instance communication that isn't present in intra-instance communication.
So, it is all about fitting your solution onto the instances and your requirements.
There is no right answer to this question. The answer depends on your specific workload, and you have to try out both approaches to find out what works best for your case. There are advantages and disadvantages to both approaches.
For example, if the OS takes 200 MB for each instance, you will be left with only 600 MB both nano instances combined vs the 800 MB on the single micro instance.
When the cluster scales out, initializing 2 nano instances might roughly take twice as much time as initializing one micro instance to provide the same additional capacity to handle the extra load.
Also, as noted by Cargo23, inter-instance communication might increase the latency of your application.
I am trying to understand cost of google cloud. Suppose I allocate 256 MB for the function and there is 1 minimum instance running all times and maximum instance is set to 2, then what will be monthly cost? I am wondering also if setting minimum instances to 0 will reduce bill significantly or ok to set it as 1 without affecting cost too much. I Could not understand the idle pricing, so If someone has real life example on cost and can share.
per note ,
Note: A minimum number of instances kept running incur billing costs
at idle rates. Typically, to keep one idle function instance warm
costs less than $6.00 a month
Also for Google Function Idle, but I could not find any direct google document on this.
Instances are recycled after 15 minutes of inactivity.
but another answer says it is 30 minutes
Your question lacks important details such as the execution time of your code or the amount of data moving out from your function but you can use the price calculator to get an estimate.
Keep in mind that since the charge for each million of invocations is 0.40 usd the price for the amount of invocations might not be as important as the price for your connectivity.
For example I did an emulation for a function with minimum 1 instance, 3 million of executions a month, 256mb, 100 ms of execution time and 1 mb of bandwidth per execution and for Uscentral1 you will be charged 354.10
In any case it would be better if you discuss this kind of questions with [Google Cloud Billing Support](Contact Cloud Billing Support) as they will be able to give you the most accurate response to this kind of questions.
Keep in mind that Stack Overflow is more focused towards code and development support
My team is using a gpu instance to run machine learning tensorflow based, yolo,computer vision applications and use it for training machine learning models also.. It costs 7$ an hour and has 8 gpu's. Was trying to reduce costs on it. We need 8 gpu's for faster training and sometimes many people can use different gpu's at the same time.
For our use case we are not using sometimes the gpu's(8 gpus) at all for atleast 1-2 weeks of a month. But a use of the gpu may arrive during that time but maynot also. So i wanted to know is there a way to edit the code and do all cpu intensive operations when gpu not needed through a low cost cpu instance. And turn on the gpu instance only when needed use it and then stop it when work done.
I thought of using efs for putting code on the shared file system and then running from there but i read an article( https://www.jeffgeerling.com/blog/2018/getting-best-performance-out-amazon-efs ) where its written that i should never run code from network based drives because the speed can become really slow. So i dont know if its good to run machine learning application from efs file system. I was thinking of making virtual environments on folders in efs but i dont think that is a good idea.
Could anyone suggest good ways of achieving this and reduce costs. And if you are suggesting to use an instance with lower number of gpu's that i have considered but we sometimes need 8 gpu's for faster training but we dont use the gpus at all for 1-2 weeks but the costs are still incurred.
Please suggest a way on how to achieve a low cost for this use case without using spot or reserved instances.
Thanks in advance
A few thoughts:
GPU instances now allow hibernation, so when launching your GPU select the new Stop Instance behavior 'hibernate' which will let you turn it off for 2 weeks but spin it up quickly if necessary
If you only have one instance, look into using EBS for data storage with a high volume of provisioned iops to move data on/off your instance quickly
Alternately, move your model to Sagemaker to ensure you are only charged for GPU use when you are actively training your model
If you are applying your model (inferencing) move that workload to a cheap instance. A trained yolo model can run inferencing on very small CPU instances, no need for a GPU for that part of the workload at all.
To reduce inference costs, you can use Elastic Inference which supports pay-per-use functionality:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-inference.html
I'm running a GCP Compute Engine instance with 4 CPUS and 15 GB of memory with a cost of, lets say for this example, USD 100/month.
If i purchase a 'Committed Use Discount' of the same spec as above, do i need to apply it to the VM or the system will automatically know where is the use discount supposed to be used?
I'm only asking because you cannot cancel these things after you make the purchase.
After you purchase a commitment, you'll set your machine specs by following these steps. As you can see, you'll choose a machine type, the quantities of vCPU and memory as it will be a new instance.
The instance details of my f1-micro instance shows a graph of CPU utilisation fluctuating between 8% and 15%, but what is the scale? The f1-micro has 0.2 CPU so is my max 20%? Or does the 100% in the graph mark my 20% of the CPU? Occasionally the graph has gone above 20% but is it bursting then? Or does the bursting start at 100% in the graph?
The recommendation to increase performance is always displayed. Is it just sales tactics? The VM is a watchdog so it is not doing much.
I tried to build up a small test in order to answer your question and if interested you can do the same to double check.
TEST
I created two instances one f1-micro and one n1-standard-1 and then I forced a CPU burst making use of stress, but you can use any tool of your choice.
$ sudo apt-get install stress
$ stress --cpu 1 & top
In this way we can compare the output of top of the two instances with the one showed in the dashboard, since the operating system is not aware to share the CPU so we expect a 100% seen from the inside of the machine.
RESULTS
While the output of top for both the instances showed as expected that the 99.9% of the CPU was currently used, the output of the dashboard is more interesting.
n1-standard-1 showed a stable value around 100% the whole time.
f1-micro shows an initial spike to 250% (because it is using a bigger share of the CPU assigned, i.e. it is running on bursting mode) and then it reduces to 100%.
I repeated the test several times and each time I got the same behaviour, therefore the % refers to the share of CPU that you are currently using.
This features is Documented here:
"f1-micro machine types offer bursting capabilities that allow instances to use additional physical CPU for short periods of time. Bursting happens automatically when your instance requires more physical CPU than originally allocated"
On the other hand if you want to know more about those recommendation and how they work you can check the Official Documentation.