Is there any way to ensure caffe using GPU? I was compiled caffe after installing CUDA driver and without CPU_ONLY flag in cmake and while compiling cmake logged detection of CUDA 8.0.
But while train a sample, I doubt it using GPU according nvidia-smi result. How can I ensure?
For future caffe wanderers scouring around, this finally did the trick for me:
caffe.set_mode_gpu()
caffe.set_device(0)
I did have solver_mode: GPU, and it would show the process on the gpu, but the 'GPU Memory Usage' as seen using nvidia-smi was not enough to fit my model (so I knew something was wrong...)
The surest way I know is to properly configure the solver.prototxt file.
Include the line
solver_mode: GPU
If you have any specifications for the engine to use in each layer of your model, you'll want to also make sure they refer to GPU software.
You can use Caffe::set_mode(Caffe::GPU); in you program explicitly.
To make sure the process is using GPU, you can use nvidia-smi command in ubuntu to which process use GPU.
As to me, I use MTCNN to do face detection(implement by caffe):
I use nvidia-smi command to show processes who use GPU, if you want to see it by interval use watch nvidia-smi.
As below image, we can see that the process mtcnn_c(use caffe backend) is using GPU.
Related
I am trying to run the code in the Pytorch tutorial on the autograd module. However, when I run the .backwards() call, I get the error:
cuda runtime error (38) : no CUDA-capable device is detected at torch/csrc/autograd/engine.cpp:359
I admittedly have no CUDA-capable device set up at the moment, but it was my understanding that this wasn't strictly necessary (at least I didn't find it specified anywhere in the tutorial). So I was wondering if there is a way to still run the code without a CUDA-enabled GPU.
You should transfer you network, inputs, and labels onto the cpu using: net.cpu(), Variable(inputs.cpu()), Variable(labels.cpu())
I want to use second GPU device as a dedicate device under linux, in order to benchmark a kernel.
The kernel that I am testing is a SIMD computing kernel without reductions and not X-Server is attached to the GPU, the device is a GeForge GTX-480 so I suppose that the compute capability is 2. Therefore, advanced features as dynamic parallelism and others, are disabled.
using the nvidia-smi utility there are various modes to setup the GPU
"Default" means multiple contexts are allowed per device.
"Exclusive Process" means only one context is allowed per device, usable from multiple threads at a time.
"Prohibited" means no contexts are allowed per device (no compute apps).
Which is the best mode to setup the GPU in order to obtain a benchmark as faithful as possible?
What is the command that I should use in order to make permanent such setup?
I am compiling the kernel using the following flags:
nvcc --ptxas-options=-v -O3 -w -arch=sm_20 -use_fast_math -c -o
Exist a better combination of flags in order to obtain more help from the compiler to get faster execution times?
Any suggestion will be very appreciated.
my question is related to what is more appropriated? setup the GPU to a compute-exclusive mode or not.
It should not matter whether you set the GPU to exclusive-process or Default, as long as there is only one process attempting to use that GPU.
You generally would not want to use exclusive-thread except in specific situations, because exclusive-thread could prevent multi-threaded GPU apps from running correctly, and may also interfere with other functions such as profiler functions.
What is the command that I should use in order to make permanent such setup?
If you refer to the nvidia-smi command line help (nvidia-smi --help) or the nvidia-smi man page (man nvidia-smi), you can determine the command to make the change. Any changes you make will be permanent until they are explicitly changed again.
I wrote a simple application that checks if NVIDIA CUDA is available on the computer. It simply displays true if a CUDA-capable device is found.
I send the app to a second PC, and the application didn't run - a dialog box showed up that cudart.dll was not found. I want to check if CUDA is present and it requires CUDA to do that :)
I am using CUDA 5.0, VS2012, VC++11, Windows 7.
Can I compile the application in a way, that all CUDA libraries are inside the executable?
So the scenario is:
My app is compiled & sent to a computer
The computer can:
be running windows, linux (my app is compatible with the system)
have a gpu or not
have an nvidia gpu or not
have CUDA installed or not
My app should return true only if 2.3 and 2.4 are positive (GPU with CUDA)
As an opening comment, I think the order and number of steps in your edit is incorrect. It should be:
Programs starts and attempts to load the runtime API library
If the runtime library is present, attempt to use it to enumerate devices.
If step 1 fails, you do not have the necessary runtime support, and CUDA cannot be used. If 2 fails, there is not a compatible driver and GPU present in the system and CUDA cannot be used. If they both pass, you are good to go.
In step 1 you want to use something like dlopen on Linux and handle the return status. On Windows, you probably want to use the DLL delay loading mechanism (Sorry, not a Windows programmer, can't tell you more than that).
In both cases, if the library loads, then fetch the address of cudaGetDeviceCount via the appropriate host OS API and call it. That tells you whether there are compatible GPUs which can be enumerated. What you do after you find an apparently usable GPU is up to you. I would check for compute status and try establishing a context on it. That will ensure that a fully functional runtime/driver combination is present and everything works.
Linking to a different post on stackoverflow: detecting-nvidia-gpus-without-cuda
This shows the whole sequence to check if the cuda api is available and accessible.
I think that using only the software there is no reliable way to ensure that a GPU is Cuda-capable or not, especially if we consider that Cuda is a driver-based technology and for the OS Cuda doesn't exist if the driver says that Cuda doesn't exist.
I think that the best way to do this is the old fashion way, consider checking this simple web page and you will get a much more reliable answer.
create a plugin for your application that dynamically links to the relevant CUDA-libraries and performs the check.
then try loading the plugin and run it's check.
if the plugin fails to load, then you don't have the CUDA-libraries installed, so you can assume False
if the plugin succeeds to load, then you have CUDA-libs installed and can perform the check, whether the hardware supports CUDA as well.
As a late andditional answer:
I am struggling with the same problem (detecting cuda installation without using it) and my solution so far is
ensuring LoadLibraryA("nvcuda.dll") != nullptr (tells you pretty much only if there is an nvidia card installed, though)
checking for environment variable CUDA_PATH (or in my case, CUDA_PATH_V8_0), since that seems to be set by the cuda installation: const char * szCuda8Path = std::getenv("CUDA_PATH_V8_0"); (must be != nullptr)
Use cudaGetDeviceCount() to know if the computer is CUDA-capable.
According to this thread, you cannot statically link cudart.dll.
There are workarounds: embed the CUDA runtime as a resource in your executable, then extract it when your program runs, then dynamically link.
You can also use nvidia-smi to see if CUDA is installed on a machine.
DISCLAIMER:
I see that some suggestions for the exact same question come up, however that (similar) post was migrated to SuperUsers and seems to have been removed. I however would still like to post my question here because I consider it software/programming related enough not to post on SuperUsers (the line is vague sometimes between what is a software and what is a hardware issue).
I am running a very simple OpenGL program in Code::Blocks in VirtualBox with Ubuntu 11.10 installed on a SSD. Whenever I build&run a program I get these errors:
OpenGL Warning: XGetVisualInfo returned 0 visuals for 0x232dbe0
OpenGL Warning: Retry with 0x802 returned 0 visuals
Segmentation fault
From what I have gathered myself so far this is VirtualBox related. I need to set
LIBGL_ALWAYS_INDIRECT=1
In other words, enabling indirect rendering via X.org rather then communicating directly with the hardware. This issue is probably not related to the fact that I have an ATI card as I have a laptop with an ATI card that runs the same program flawlessly.
Still, I don't dare to say that the fact that my GPU is an ATI doesn't play any role at all. Nor am I sure if the drivers are correctly installed (it says under System info -> Graphics -> graphics driver: Chromium.)
Any help on HOW to set LIBGL_ALWAYS_INDIRECT=1 would be greatly appreciated. I simply lack the knowledge of where to put this command or where/how to execute it in the terminal.
Sources:
https://forums.virtualbox.org/viewtopic.php?f=3&t=30964
https://www.virtualbox.org/ticket/6848
EDIT: in the terminal type:
export LIBGL_ALWAYS_INDIRECT = 1
To verfiy that direct rendering is off:
glxinfo | grep direct
However, the problem persists. I still get mentioned OpenGL warnings and the segmentation fault.
I ran into this same problem running the Bullet Physics OpenGL demos on Ubuntu 12.04 inside VirtualBox. Rather than using indirect rendering, I was able to solve the problem by modifying the glut window creation code in my source as described here: https://groups.google.com/forum/?fromgroups=#!topic/comp.graphics.api.opengl/Oecgo2Fc9Zc.
This entailed replacing the original
...
glutCreateWindow(title);
...
with
...
if (!glutGet(GLUT_DISPLAY_MODE_POSSIBLE))
{
exit(1);
}
glutCreateWindow(title);
...
as described in the link. It's not clear to me why this should correct the segfault issue; apparently glutGet has some side effects beyond retrieving state values. It could be a quirk of freeglut's implementation of glut.
If you look at the /etc/environment file, you can see a couple of variables exposed there - this will give you and idea of how to expose that environment variable across the entire system. You could also try putting it in either ~/.profile or ~/.bash_profile depending on your needs.
The real question in my mind is: Did you install the guest additions for Ubuntu? You shouldn't need to install any ATI drivers in your guest as VirtualBox won't expose the actual physical graphics hardware to your VM. You can configure your guest to support 3D acceleration in the virtual machine settings (make sure you turn off the VM first) under the Display section. You will probably want to boost the allocated virtual memory - 64MB or 128MB should be plenty depending on your needs.
I've got a Nvidia Tesla s2050; a host with a nvidia quadro card.CentOS 5.5 with CUDA 3.1
When i run cuda app, i wanna use 4 Tesla c-2050, but not including quadro on host in order not to lagging the whole performance while split the job by 5 equally.any way to implement this?
I'm assuming you have four processes and four devices, although your question suggests you have five processes and four devices, which means that manual scheduling may be preferable (with the Tesla devices in "shared" mode).
The easiest is to use nvidia-smi to specify that the Quadro device is "compute prohibited". You would also specify that the Teslas are "compute exclusive" meaning only one context can attach to each of these at any given time.
Run man nvidia-smi for more information.
Yes. Check CUDA Support/Choosing a GPU
Problem
Running your code on a machine with
multiple GPUs may result in your code
executing on an older and slower GPU.
Solution
If you know the device number of the
GPU you want to use, call
cudaSetDevice(N). For a more robust
solutions, include the code shown
below at the beginning of your program
to automatically select the best GPU
on any machine.
Check their website for further explanation.
You may also find this post very interesting.