I'm trying to use google cloud ml with GPU mode.
When I train BASIC_GPU mode, I have many error log.
But, It works training well.
I am not sure whether the learning was good working in GPU mode.
This is error log history.
enter image description here
This is the some part of print config.log_device_placement.
enter image description here
Also, I tried training complex_model_m_gpu mode.
I also have error log like BASIC_GPU.
But, I can't see gpu:/1, gpu:/2, gpu:/3 when i print config.log_device_placement. Only gpu:/0 i can see.
The important thing is that BASIC_GPU and complex_model_m_gpu have same speed for running time.
I wonder whether the learning was good working in GPU mode or there is something wrong.
Sorry for my english, anyone knows the problem then help me.
thank you.
Please refer to TensorFlow's performance guide for optimizing for GPUs for tips on how to make the most of your GPUs.
A couple things to note
You can turn on logging of device placement to see which ops get assigned to which Devices. This is a great way to check that ops are actually assigned to GPUs and that you are using all GPUs when you have multiple GPUs.
TensorBoard should also provide information about device placement so that is another way to check that you are using all GPUs.
When using multiple GPUs, you need to make sure you are assigning ops to all GPUs. The TensorFlow guide provides more information on this topic.
Related
i'm planning to subs google colab pro to get better GPU memory when doing some research. But i was wondering if i exhaust my 100 compute units in the first day due to continues usage of GPU, can i still use GPU for my google colab?
If anyone know or already tried to use GPU after having 0 compute units, is it still possible to use a GPU? please kindly share your experience
For what I understood reading here: Google colab subscription
What happens if I don't have computing units? All users can access Colab resources based on availability. If you do not have computing units, you can only use Colab resources reserved for non-paying users.
So, if you finish your compute units you'll be downgraded to the free user version of Colab. In that case, you can be assigned another gpu as a free user but with the common limitations.
Another useful link to understand a little bit better about usage of computing units is the following: What exactly is a compute unit?.
I'll be trying the paid version in a few days for at least a month to see if it is really worth it.
Hope it helps!
I never had this issue until recently, but now, when creating a VM, the option to add a gpu is always not clickable.
this is what it looks like.what is the cause for this?
this is not caused because there are no gpus in my region, I checked a lot of them. also I don't think its an issue with my account, I CAN make gpu instanced through the marketplace
It is probably because of the current machine type that has been set. You can only attach GPUs to general-purpose N1 machine types. GPUs are not supported for other machine types. Feel free to check this documentation for reference.
I'm training a model for recognizing short, one to three sentence strings of text using the MITIE back-end in Rasa. The model trains and works using spaCy, but it isn't quite as accurate as I'd like. Training on spaCy takes no more than five minutes, but training for MITIE ran for several days non-stop on my computer with 16GB of RAM. So I started training it on an Amazon EC2 r4.8xlarge instance with 255GB RAM and 32 threads, but it doesn't seem to be using all the resources available to it.
In the Rasa config file, I have num_threads: 32 and set max_training_processes: 1, which I thought would help use all the memory and computing power available. But now that it has been running for a few hours, CPU usage is sitting at 3% (100% usage but only on one thread), and memory usage stays around 25GB, one tenth of what it could be.
Do any of you have any experience with trying to accelerate MITIE training? My model has 175 intents and a total of 6000 intent examples. Is there something to tweak in the Rasa config files?
So I am going to try to address this from several angles. First specifically from the Rasa NLU angle the docs specifically say:
Training MITIE can be quite slow on datasets with more than a few intents.
and provide two alternatives:
Use the mite_sklearn pipeline which trains using sklearn.
Use the MITIE fork where Tom B from Rasa has modified the code to run faster in most cases.
Given that you're only getting a single cores used I doubt this will have an impact, but it has been suggested by Alan from Rasa that num_threads should be set to 2-3x your number of cores.
If you haven't evaluated both of those possibilities then you probably should.
Not all aspects of MITIE are multi-threaded. See this issue opened by someone else using Rasa on the MITIE GitHub page and quoted here:
Some parts of MITIE aren't threaded. How much you benefit from the threading varies from task to task and dataset to dataset. Sometimes only 100% CPU utilization happens and that's normal.
Specifically on training data related I would recommend that you look at the evaluate tool recently introduced into the Rasa repo. It includes a confusion matrix that would potentially help identify trouble areas.
This may allow you to switch to spaCy and use a portion of your 6000 examples as an evaluation set and adding back in examples to the intents that aren't performing well.
I have more questions on where the 6000 examples came from, if they're balanced, and how different each intent is, have you verified that words from the training examples are in the corpus you are using, etc but I think the above is enough to get started.
It will be no surprise to the Rasa team that MITIE is taking forever to train, it will be more of a surprise that you can't get good accuracy out of another pipeline.
As a last resort I would encourage you to open an issue on the Rasa NLU GitHub page and and engage the team there for further support. Or join the Gitter conversation.
I am trying to run any of the services from gate web service, in neon 2.3.
Even Annie that runs so well in gate doesn't run, or better, it stay for indefinite time processing, a thing that should take no more than a couple of seconds. I run wizard, set input directory, leave file pattern as default and set a folder and name for the output ontology, shouldn't it be enough? Shouldn't i get something, even an error?
I think its the location who's giving me problems.
http://safekeeper1.dcs.shef.ac.uk/neon/services/sardine
http://safekeeper1.dcs.shef.ac.uk/neon/services/sprat
http://safekeeper1.dcs.shef.ac.uk/neon/services/annie
http://safekeeper1.dcs.shef.ac.uk/neon/services/termraider
How can i confirm it? Can i run it offline?
Can anyone give me a hand?
Also, i've seen sprat running on gate, on "SPRAT: a tool for automatic semantic pattern-based ontology population"
Can anyone teach me how, and with what versions?
Thx,
Celso Costa
I just downloaded and built the libraries/executables of Google Performance Tools. Before I run the CPU profiler on the application that I want to investigate, I want to learn how to use the tools properly perhaps on a sample application. What would be a good example to run the Google CPU profiler on? Thanks in advance.
The following paragraph appears in the README.windows file distributed with perftools 1.3:
The heap-profiler has had a preliminary port to Windows. It has not been well tested, and probably does not work at all when Frame Pointer Optimization (FPO) is enabled -- that is, in release mode. The other features of perftools, such as the cpu-profiler and leak-checker, have not yet been ported to Windows at all.
In my experience, for performance tuning, stack-sampling is the method of choice.
Google perftools contains a stack-sampler, and I believe its visual analyzer can be made to show the cost of individual statements, not just functions.
What you need to know is the percent of time the stack contains that statement, because that is how much time would be saved if the statement were removed.