Exhausted Compute Units Google Colab - google-cloud-platform

i'm planning to subs google colab pro to get better GPU memory when doing some research. But i was wondering if i exhaust my 100 compute units in the first day due to continues usage of GPU, can i still use GPU for my google colab?
If anyone know or already tried to use GPU after having 0 compute units, is it still possible to use a GPU? please kindly share your experience

For what I understood reading here: Google colab subscription
What happens if I don't have computing units? All users can access Colab resources based on availability. If you do not have computing units, you can only use Colab resources reserved for non-paying users.
So, if you finish your compute units you'll be downgraded to the free user version of Colab. In that case, you can be assigned another gpu as a free user but with the common limitations.
Another useful link to understand a little bit better about usage of computing units is the following: What exactly is a compute unit?.
I'll be trying the paid version in a few days for at least a month to see if it is really worth it.
Hope it helps!

Related

option to add gpu on gcloud not available?

I never had this issue until recently, but now, when creating a VM, the option to add a gpu is always not clickable.
this is what it looks like.what is the cause for this?
this is not caused because there are no gpus in my region, I checked a lot of them. also I don't think its an issue with my account, I CAN make gpu instanced through the marketplace
It is probably because of the current machine type that has been set. You can only attach GPUs to general-purpose N1 machine types. GPUs are not supported for other machine types. Feel free to check this documentation for reference.

Error symbol log running GPU on google cloud ML

I'm trying to use google cloud ml with GPU mode.
When I train BASIC_GPU mode, I have many error log.
But, It works training well.
I am not sure whether the learning was good working in GPU mode.
This is error log history.
enter image description here
This is the some part of print config.log_device_placement.
enter image description here
Also, I tried training complex_model_m_gpu mode.
I also have error log like BASIC_GPU.
But, I can't see gpu:/1, gpu:/2, gpu:/3 when i print config.log_device_placement. Only gpu:/0 i can see.
The important thing is that BASIC_GPU and complex_model_m_gpu have same speed for running time.
I wonder whether the learning was good working in GPU mode or there is something wrong.
Sorry for my english, anyone knows the problem then help me.
thank you.
Please refer to TensorFlow's performance guide for optimizing for GPUs for tips on how to make the most of your GPUs.
A couple things to note
You can turn on logging of device placement to see which ops get assigned to which Devices. This is a great way to check that ops are actually assigned to GPUs and that you are using all GPUs when you have multiple GPUs.
TensorBoard should also provide information about device placement so that is another way to check that you are using all GPUs.
When using multiple GPUs, you need to make sure you are assigning ops to all GPUs. The TensorFlow guide provides more information on this topic.

Speeding up model training using MITIE with Rasa

I'm training a model for recognizing short, one to three sentence strings of text using the MITIE back-end in Rasa. The model trains and works using spaCy, but it isn't quite as accurate as I'd like. Training on spaCy takes no more than five minutes, but training for MITIE ran for several days non-stop on my computer with 16GB of RAM. So I started training it on an Amazon EC2 r4.8xlarge instance with 255GB RAM and 32 threads, but it doesn't seem to be using all the resources available to it.
In the Rasa config file, I have num_threads: 32 and set max_training_processes: 1, which I thought would help use all the memory and computing power available. But now that it has been running for a few hours, CPU usage is sitting at 3% (100% usage but only on one thread), and memory usage stays around 25GB, one tenth of what it could be.
Do any of you have any experience with trying to accelerate MITIE training? My model has 175 intents and a total of 6000 intent examples. Is there something to tweak in the Rasa config files?
So I am going to try to address this from several angles. First specifically from the Rasa NLU angle the docs specifically say:
Training MITIE can be quite slow on datasets with more than a few intents.
and provide two alternatives:
Use the mite_sklearn pipeline which trains using sklearn.
Use the MITIE fork where Tom B from Rasa has modified the code to run faster in most cases.
Given that you're only getting a single cores used I doubt this will have an impact, but it has been suggested by Alan from Rasa that num_threads should be set to 2-3x your number of cores.
If you haven't evaluated both of those possibilities then you probably should.
Not all aspects of MITIE are multi-threaded. See this issue opened by someone else using Rasa on the MITIE GitHub page and quoted here:
Some parts of MITIE aren't threaded. How much you benefit from the threading varies from task to task and dataset to dataset. Sometimes only 100% CPU utilization happens and that's normal.
Specifically on training data related I would recommend that you look at the evaluate tool recently introduced into the Rasa repo. It includes a confusion matrix that would potentially help identify trouble areas.
This may allow you to switch to spaCy and use a portion of your 6000 examples as an evaluation set and adding back in examples to the intents that aren't performing well.
I have more questions on where the 6000 examples came from, if they're balanced, and how different each intent is, have you verified that words from the training examples are in the corpus you are using, etc but I think the above is enough to get started.
It will be no surprise to the Rasa team that MITIE is taking forever to train, it will be more of a surprise that you can't get good accuracy out of another pipeline.
As a last resort I would encourage you to open an issue on the Rasa NLU GitHub page and and engage the team there for further support. Or join the Gitter conversation.

AWS and Nielsen's Law

This isn't exactly a programming question but it's relevant to programmers, and I'm looking for a data-backed, specific answer.
I'm working in a field where the size of the files we create is doubling roughly every 100 days. Nielsen's Law suggests that connection speeds are increasing by about 50% every year, which would mean doubling every 600 days. There is some talk of doing our processing on AWS or some other cloud computing service.
To me, this seems implausible, since the time required to upload the data will soon dwarf the time for processing. However, Nielsen's Law (original article by Nielsen) was made for end user connection speeds, so I'm not sure I can make my point with that.
Does anyone know of a public resource on AWS connection speeds, or institutional (e.g. university or corporation, not residential) connection speeds, over time? I'm wanting to know if it is just larger than residential, but still increasing at the same rate, or if for some reason connection speeds to institutional customers might be increasing faster than Nielsen's Law. Any help in finding evidence on the trend over time for this is appreciated.
I'd think just coming up with a nice pretty graph plotting estimated data download time into the future ought to make your point.
I can't speak for Neilsen's law, but your question in terms of AWS is nearly impossible to answer (in practical, "how should we spend this money" terms).
AWS's connection speeds vary depending on who you are, how much money you have to spend, and where you're located. For example, can you colocate a rack in Reston, VA? How about another datacenter provider used by Amazon? There are three per AZ. If you can, you can likely negotiate much more bandwidth between your rack and theirs than would be available normally. Are you a customer the size of Foursquare? Are you planning on running your jobs on 20,000 instances? I'm sure Amazon network engineering will help you squeeze out every drop of bandwidth you can, and probably write a white paper about it, too. There's rumor of dedicated, non-public-internet pipes between eu-west and us-east. The network map of today's cloud won't likely resemble 2017's at all.
This isn't to suggest your question is wrong or bad, just that it's difficult to answer. As a thought experiment, it's fascinating. As an argument in a discussion about long-term capacity planning and capital outlay, I'm not sure it's as useful.

What is the recommened HW specs for virtualizations?

We are a startup company and doesnt have invested yet in HW resources in order to prepre our dev and testing environment. The suggestion is to buy a high end server, install vmware ESX and deploy mutiple VMs for build, TFS, database, ... for testing,stging and dev enviornment.
We are still not sure what specs to go with e.g. RAM, whether SAN is needed?, HD, Processor, etc..?
Please advice.
You haven't really given much information to go on. It all depends on what type of applications you're developing, resource usage, need to configure different environments, etc.
Virtualization provides cost savings when you're looking to consolidate underutilized hardware. If each environment is sitting idle most of the time, then it makes sense to virtualize them.
However if each of your build/tfs/testing/staging/dev environments will be heavily used by all developers during the working day simultaniously then there might not be as many cost savings by virtualizing everything.
My advice would be if you're not sure, then don't do it. You can always virtualize later and reuse the hardware.
Your hardware requirements will somewhat depend on what kind of reliability you want for this stuff. If you're using this to run everything, I'd recommend having at least two machines you split the VMs over, and if you're using N servers normally, you should be able to get by on N-1 of them for the time it takes your vendor to replace the bad parts.
At the low-end, that's 2 servers. If you want higher reliability (ie. less downtime), then a SAN of some kind to store the data on is going to be required (all the live migration stuff I've seen is SAN-based). If you can live with the 'manual' method (power down both servers, move drives from server1 to server2, power up server2, reconfigure VMs to use less memory and start up), then you don't really need the SAN route.
At the end of the day, your biggest sizing requirement will be HD and RAM. Your HD footprint will be relatively fixed (at least in most kinds of a dev/test environment), and your RAM footprint should be relatively fixed as well (though extra here is always nice). CPU is usually one thing you can skimp on a little bit if you have to, so long as you're willing to wait for builds and the like.
The other nice thing about going all virtualized is that you can start with a pair of big servers and grow out as your needs change. Need to give your dev environment more power? Get another server and split the VMs up. Need to simulate a 4-node cluster? Lower the memory usage of the existing node and spin up 3 copies.
At this point, unless I needed very high-end performance (ie. I need to consider clustering high-end physical servers for performance needs), I'd go with a virtualized environment. With the extensions on modern CPUs and OS/hypervisor support for them, the hit is not that big if done correct.
This is a very open ended question that really has a best answer of ... "It depends".
If you have the money to get individual machines for everything you need then go that route. You can scale back a little on the hardware with this option.
If you don't have the money to get individual machines, then you may want to look at a top end server for this. If this is your route, I would look at a quad machine with at least 8GB RAM and multiple NICs. You can go with a server box that has multiple hard drive bays that you can setup multiple RAIDS on. I recommend that you use a RAID 5 so that you have redundancy.
With something like this you can run multiple VMWare sessions without much of a problem.
I setup a 10TB box at my last job. It had 2 NICs, 8GB, and was a quad machine. Everything included cost about 9.5K
If you can't afford to buy the single machines then you probably are not in a good position to start re-usably with virtualisation.
One way you can do it is take the minimum requirements for all your systems, i.e. TFS, mail, web etc., add them all together and that will give you an idea of half the minimum server you need to host all those systems. Double it and you be near what will get you buy, if you have spare cash double/triple the RAM. Most OSes run better with more RAM to particular ceiling. Think about buying expandable storage of some kind and aim for half populated to start with which will keep the initial cost/GB and make for some expansion at lower cost in the future.
You can also buy servers which take multiple CPUs but only put in the minimum amount of CPUs. Also go for as many cores on a CPU as you can get for thermal, physical and licensing efficiency.
I appreciate this is a very late reply but as I didn't see many ESX answers here I wanted to post a reply though my post equally relates to Hyper-V etc.