I tried running my job with BASIC_GPU scale tier but I got an out of memory error. So then I tried running it with a custom configuration but I can't find a way of just using 1 Nvidia K80 with additional memory. All examples and predefined options use a number of GPUs, CPUs and workers and my code is not optimized for that. I just want 1 GPU and additional memory. How can I do that?
GPU memory is not extensible currently (Till something like PASCAL is accessible)
Reducing the batch size solves some of the out of memory issues
Adding GPUs to workers doesn't help either, as the model is deployed on individual worker separately (No memory pooling b/n workers)
Related
I'm trying out QuestDB using the binaries, running them in an Ubuntu container under Proxmox. The docs for the binaries don't say what resources you need, so I guesstimated. Looking at the performance metrics for the container when running some of the CRUD examples with 10,000,000 rows, I still managed to over-provision — by a lot.
Provisioned the container with 4 CPU cores, 4GB RAM & swap, and 8GB SSD. It would probably be fine with a fraction of that: CPU usage during queries is <1%, RAM usage <1.25GB, and storage is <25%.
There is some good info in the capacity planning section of the QuestDB docs (e.g. 8 GB RAM for light workloads), but my question is really about the low end of the scale — what’s the least you can get away with and still be performant when getting started with the examples from the docs?
(I don't mind creating a pull request with this and some other docs additions. Most likely, 2 cores, 2 GB of RAM and 4 GB of storage would be plenty and still give you a nice 'wow, this is quick' factor, with the proviso that this is for evaluation purposes only.)
In QuestDB ingestion and querying are separated by design, meaning if you are planning to ingest medium/high throughput data while running queries, you want to have a dedicated core for ingestion and then another for the shared pool.
The shared pool is used for queries, but also for internal tasks QuestDB needs to run. If you are just running a demo, you probably can do well with just one core for the shared pool, but for production scenarios it is likely you would want to increase that depending on your access patterns.
Regarding disk capacity and memory, it all depends on the size of the data set. QuestDB queries will be faster if the working dataset fits in memory. 2GB of RAM and 4GB of disk storage as you suggested should be more than enough for the examples, but for most production scenarios you would probably want to increase both.
As far as I can tell by default, on Google Cloud and presumably elsewhere, each vCPU = 1 hyperthread. (3rd paragraph in the intro) Which, from my perspective, would suggest that unless one changes this setting to 2 or 4 vCPUs, concurrency in the code running on the docker image achieves nothing. Is there some multi-threaded knowledge im missing that means that concurrency on a single hyperthread accomplishes something? scaling up the vCPU number isnt very attractive as the minimum memory setting is already forced to 2GB for 4 vCPUs
This question is framed based on the Google Cloud tech stack, but is meant to umbrella all providers.
Do Serverless solutions ever really benefit from concurrency?
EDIT:
The accepted answer is a great first look, but I realized my above assumptions ignored context switching idle time. For example:
If we wish to write a backend which talks to a database, a lot of our compute time might be spent idling for the database request results. context switching to the next request in this case would allow us to fill CPU load more efficiently.
Therefore, depending on the use case, even on a single threaded vCPU our Serverless app can benefit from concurrency
I wrote this. From my experience, YES, you can handle several thread in parallel and your performance increase with the number of CPU. however, you need to have a process that support multithread.
In case of Cloud Run, each request can be processed in a thread, parallelization is easy.
I'm trying to get a GCP "Deep Learning VM" instance running with a GPU. Following these instructions. I'm being hit with a You've gone over GPUs (all regions) quota by 1 GPU. Please increase your quota in the quotas page. Learn more. However when I look at the quota's I do have a 1GPU limit for "NVIDIA v100". I have a 0 limit for the Committed NVIDIA ***.
When you create a "Deep Learning VM" instance and select GPUS, are you selecting committed GPUS?
When you request a GPU quota, you must request a quota for the GPU models that you want to create in each region, and an additional global quota for the total number of GPUs of all types in all zones. You can request to increase GPU quota from here.
Since you already have NVIDIA_V100_GPUS quota limit of 1 in (for example us-west1) region, all you need to do now is to request for GPUs(All regions) quota increase through your Quotas page. The value of the request depends on the number of GPUS that you want to deploy. This should get rid of the error that you are getting.
If you want to use committed GPUs then you need to create reservation based on your GPU types when purchasing the commitment. So, when you create a Deep Learning VM it should matched with your committed GPU types in order to use for the machine. For example, if you want to reserve 4 V100 GPUs, then you must also commit to 4 V100 GPUs and when you are creating the Deep Learning VM using one of the V100 GPU on the reservation section you can that 1 V100 GPU is being used. If you choose another GPU types then it will not selected from committed GPUs. Committed GPUS are only used to get discounts for using GPU resources.
Please, request increase of quota for GPU first.
Go here and increase quota you need from 0 to 1 (or e.g., you can search 'GPU' and increase for P100, V100, K80, etc). After receiving approval you can deploy your VM with GPU=1.
I created a VM instance with Google Compute Engine with a single initial GPU and would like to add a second GPU of the same type. The VM instance is using Windows Server 2016, has 8 CPUS and 52GB of memory. I have followed the steps to add a GPU at the following location:
https://cloud.google.com/compute/docs/gpus/add-gpus
When using the updated VM instance, the performance is very slow (Click on a window/button and 10 seconds later it opens). The CPU does not appear to be heavily utilized. If I remove the GPU so that only a single GPU is used, the performance goes back to normal.
Am I missing a step (maybe updating something in windows)?
My team is using a gpu instance to run machine learning tensorflow based, yolo,computer vision applications and use it for training machine learning models also.. It costs 7$ an hour and has 8 gpu's. Was trying to reduce costs on it. We need 8 gpu's for faster training and sometimes many people can use different gpu's at the same time.
For our use case we are not using sometimes the gpu's(8 gpus) at all for atleast 1-2 weeks of a month. But a use of the gpu may arrive during that time but maynot also. So i wanted to know is there a way to edit the code and do all cpu intensive operations when gpu not needed through a low cost cpu instance. And turn on the gpu instance only when needed use it and then stop it when work done.
I thought of using efs for putting code on the shared file system and then running from there but i read an article( https://www.jeffgeerling.com/blog/2018/getting-best-performance-out-amazon-efs ) where its written that i should never run code from network based drives because the speed can become really slow. So i dont know if its good to run machine learning application from efs file system. I was thinking of making virtual environments on folders in efs but i dont think that is a good idea.
Could anyone suggest good ways of achieving this and reduce costs. And if you are suggesting to use an instance with lower number of gpu's that i have considered but we sometimes need 8 gpu's for faster training but we dont use the gpus at all for 1-2 weeks but the costs are still incurred.
Please suggest a way on how to achieve a low cost for this use case without using spot or reserved instances.
Thanks in advance
A few thoughts:
GPU instances now allow hibernation, so when launching your GPU select the new Stop Instance behavior 'hibernate' which will let you turn it off for 2 weeks but spin it up quickly if necessary
If you only have one instance, look into using EBS for data storage with a high volume of provisioned iops to move data on/off your instance quickly
Alternately, move your model to Sagemaker to ensure you are only charged for GPU use when you are actively training your model
If you are applying your model (inferencing) move that workload to a cheap instance. A trained yolo model can run inferencing on very small CPU instances, no need for a GPU for that part of the workload at all.
To reduce inference costs, you can use Elastic Inference which supports pay-per-use functionality:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/elastic-inference.html