Pytorch .backward() method without CUDA - gradient

I am trying to run the code in the Pytorch tutorial on the autograd module. However, when I run the .backwards() call, I get the error:
cuda runtime error (38) : no CUDA-capable device is detected at torch/csrc/autograd/engine.cpp:359
I admittedly have no CUDA-capable device set up at the moment, but it was my understanding that this wasn't strictly necessary (at least I didn't find it specified anywhere in the tutorial). So I was wondering if there is a way to still run the code without a CUDA-enabled GPU.

You should transfer you network, inputs, and labels onto the cpu using: net.cpu(), Variable(inputs.cpu()), Variable(labels.cpu())

Related

Why isn't my colab notebook using the GPU?

When I run code on my colab notebook after having selected the GPU, I get a message saying "You are connected to a GPU runtime, but not utilizing the GPU". Now I understand similar questions have been asked before, but I still don't understand why. I am running PCA on a dataset over hundreds of iterations, for multiple trials. Without a GPU it takes about as long as it does on my laptop, which can be >12 hours, resulting in a time out on colab. Is colab's GPU restricted to machine learning libraries like tensorflow only? Is there a way around this so I can take advantage of the GPU to speed up my analysis?
Colab is not restricted to Tensorflow only.
Colab offers three kinds of runtimes: a standard runtime (with a CPU), a GPU runtime (which includes a GPU) and a TPU runtime (which includes a TPU).
"You are connected to a GPU runtime, but not utilizing the GPU" indicates that the user is conneted to a GPU runtime, but not utilizing the GPU, and so a less costly CPU runtime would be more suitable.
Therefore, you have to use a package that utilizes the GPU, such as Tensorflow or Jax. GPU runtimes also have a CPU, and unless you are specifically using packages that exercise the GPU, it will sit idle.

How can I make ensure caffe using GPU?

Is there any way to ensure caffe using GPU? I was compiled caffe after installing CUDA driver and without CPU_ONLY flag in cmake and while compiling cmake logged detection of CUDA 8.0.
But while train a sample, I doubt it using GPU according nvidia-smi result. How can I ensure?
For future caffe wanderers scouring around, this finally did the trick for me:
caffe.set_mode_gpu()
caffe.set_device(0)
I did have solver_mode: GPU, and it would show the process on the gpu, but the 'GPU Memory Usage' as seen using nvidia-smi was not enough to fit my model (so I knew something was wrong...)
The surest way I know is to properly configure the solver.prototxt file.
Include the line
solver_mode: GPU
If you have any specifications for the engine to use in each layer of your model, you'll want to also make sure they refer to GPU software.
You can use Caffe::set_mode(Caffe::GPU); in you program explicitly.
To make sure the process is using GPU, you can use nvidia-smi command in ubuntu to which process use GPU.
As to me, I use MTCNN to do face detection(implement by caffe):
I use nvidia-smi command to show processes who use GPU, if you want to see it by interval use watch nvidia-smi.
As below image, we can see that the process mtcnn_c(use caffe backend) is using GPU.

DirectX11 Desktop duplication not working with NVIDIA

I'm trying too use DirectX desktop duplication API.
I tried running exmaples from
http://www.codeproject.com/Tips/1116253/Desktop-Screen-Capture-on-Windows-via-Windows-Desk
And from
https://code.msdn.microsoft.com/windowsdesktop/Desktop-Duplication-Sample-da4c696a
Both of these are examples of screen capture using DXGI.
I have NVIDIA GeForce GTX 1060 with Windows 10 Pro on the machine. It has Intelâ„¢ Core i7-6700HQ processor.
These examples work perfectly fine when NVIDIA Control Panel > 3D Settings is selected to Auto select processor.
However if I set the setting manually to NVIDIA Graphics Card the samples stop working.
Error occurs at the following line.
//IDXGIOutput1* DxgiOutput1
hr = DxgiOutput1->DuplicateOutput(m_Device, &m_DeskDupl);
Error in hr(HRESULT) is DXGI_ERROR_UNSUPPORTED 0x887A0004
I'm new to DirectX and I don't know the issue here, is DirectX desktop duplication not supported on NVIDIA ?
If that's the case then is there a way to select a particular processor at the start of program so that program can run with any settings ?
#Edit
After looking around I asked the developer (Evgeny Pereguda) of the second sample project on codeproject.com
Here's a link to the discussion
https://www.codeproject.com/Tips/1116253/Desktop-Screen-Capture-on-Windows-via-Windows-Desk?msg=5319978#xx5319978xx
Posting the screenshot of the discussion on codeproject.com in case original link goes down
I also found an answer on stackoverflow which unequivocally suggested that it could not be done with the desktop duplication API referring to support ticket at microsoft's support site https://support.microsoft.com/en-us/help/3019314/error-generated-when-desktop-duplication-api-capable-application-is-ru
Quote from the ticket
This issue occurs because the DDA does not support being run against
the discrete GPU on a Microsoft Hybrid system. By design, the call
fails together with error code DXGI_ERROR_UNSUPPORTED in such a
scenario.
However there are some applications which are efficiently duplicating desktop on windows in both modes (integrated graphics and discrete) on my machine. (https://www.youtube.com/watch?v=bjE6qXd6Itw)
I have looked into the installation folder of the Virtual Desktop on my machine and can see following DLLs of interest
SharpDX.D3DCompiler.dll
SharpDX.Direct2D1.dll
SharpDX.Direct3D10.dll
SharpDX.Direct3D11.dll
SharpDX.Direct3D9.dll
SharpDX.dll
SharpDX.DXGI.dll
SharpDX.Mathematics.dll
Its probably an indication that this application is using DXGI to duplicate desktop, or may be the application is capable of selecting a specific processor before it starts.
Anyway the question remains, is there any other efficient method of duplicating desktop in both modes?
The likely cause is certain internal limitation for Desktop Duplication API, described in Error generated when Desktop Duplication API-capable application is run against discrete GPU:
... when the application tries to duplicate the desktop image against the discrete GPU on a Microsoft Hybrid system, the application may not run correctly, or it may generate one of the following errors:
Failed to create windows swapchain with 0x80070005
CDesktopCaptureDWM: IDXGIOutput1::DuplicateOutput failed: 0x887a0004
The article does not suggest any other workaround except use of a different GPU (without more specific detail as for whether it is at all achievable programmatically):
To work around this issue, run the application on the integrated GPU instead of on the discrete GPU on a Microsoft Hybrid system.
Microsoft introduced a registry value that can be set programmatically to control which GPU an application runs on. Full answer here.

How can I run a code directly into a processor with a File System?

I have a simple anisotropic filter c/c++ code that will process an .pgm image which is an text file with greyscale information for each pixel, and after done processing, it will generate an output image with the filter applied.
This program takes up to some seconds in order for it to do about 10 iterations on a x86 CPU running windows.
Me and an academic finishing his master degree on applied computing, we need to run the code under FPGA (Altera DE2-115) to see if there is considerable results of performance gain when running the code directly on the processor (NIOS 2).
We have successfully booted up the S.O uClinux under the FPGA, but there are some errors with device hardware, and by that we can't access SD-Card not even Ethernet, so we can't get the code and image into the FPGA in order to test its performance.
So I am here asking to an alternative way to test our code performance directly into an CPU with a file system so the code can read the image and generate another one.
The alternative can be either with an product that has low cost and easy to use (I was thinking raspberry PI), or either if I could upload the code somewhere that runs automatically for me and give me the reports.
Thanks in advance.
what you're trying to do is benchmarking some software on a multi GHz x86 Processor vs. a soft-core processor running 50MHz? (as much as I can tell from Altera docs)
I can guarantee that it will be even slower on the FPGA! Since it is also running an OS (even embedded Linux) it also has threading overhead and what not. This can not be considered running it "directly" on CPU (whatever you mean by this)
If you really want to leverage the performance of an FPGA you should "convert" your C-Code into a HDL and run it directly in hardware. Accessing the data should be possible. I don't know how it's done with an Altera board but Xilinx has some libraries accessing data from a SD card with FAT.
You can use on board SRAM or DDR2 RAM to run OS and your application.
Hardware design in your FPGA must have memory controller in it. In SOPC or Qsys select external memory as reset vector and compile design.
Then open NioSII build tools for Eclipse.
In Eclipse create new project by selecting NiosII Application and BSP project.
Once the project is created, go to BSP properties and type offset of external memory in the linker tab and generate BSP.
Compile project and Run as Nios II hardware.
This will run you application on through external memory.
You wont be able to see the image but 2-D array representing image in memory can be
printed on console.

Program to check CUDA presence needs CUDA?

I wrote a simple application that checks if NVIDIA CUDA is available on the computer. It simply displays true if a CUDA-capable device is found.
I send the app to a second PC, and the application didn't run - a dialog box showed up that cudart.dll was not found. I want to check if CUDA is present and it requires CUDA to do that :)
I am using CUDA 5.0, VS2012, VC++11, Windows 7.
Can I compile the application in a way, that all CUDA libraries are inside the executable?
So the scenario is:
My app is compiled & sent to a computer
The computer can:
be running windows, linux (my app is compatible with the system)
have a gpu or not
have an nvidia gpu or not
have CUDA installed or not
My app should return true only if 2.3 and 2.4 are positive (GPU with CUDA)
As an opening comment, I think the order and number of steps in your edit is incorrect. It should be:
Programs starts and attempts to load the runtime API library
If the runtime library is present, attempt to use it to enumerate devices.
If step 1 fails, you do not have the necessary runtime support, and CUDA cannot be used. If 2 fails, there is not a compatible driver and GPU present in the system and CUDA cannot be used. If they both pass, you are good to go.
In step 1 you want to use something like dlopen on Linux and handle the return status. On Windows, you probably want to use the DLL delay loading mechanism (Sorry, not a Windows programmer, can't tell you more than that).
In both cases, if the library loads, then fetch the address of cudaGetDeviceCount via the appropriate host OS API and call it. That tells you whether there are compatible GPUs which can be enumerated. What you do after you find an apparently usable GPU is up to you. I would check for compute status and try establishing a context on it. That will ensure that a fully functional runtime/driver combination is present and everything works.
Linking to a different post on stackoverflow: detecting-nvidia-gpus-without-cuda
This shows the whole sequence to check if the cuda api is available and accessible.
I think that using only the software there is no reliable way to ensure that a GPU is Cuda-capable or not, especially if we consider that Cuda is a driver-based technology and for the OS Cuda doesn't exist if the driver says that Cuda doesn't exist.
I think that the best way to do this is the old fashion way, consider checking this simple web page and you will get a much more reliable answer.
create a plugin for your application that dynamically links to the relevant CUDA-libraries and performs the check.
then try loading the plugin and run it's check.
if the plugin fails to load, then you don't have the CUDA-libraries installed, so you can assume False
if the plugin succeeds to load, then you have CUDA-libs installed and can perform the check, whether the hardware supports CUDA as well.
As a late andditional answer:
I am struggling with the same problem (detecting cuda installation without using it) and my solution so far is
ensuring LoadLibraryA("nvcuda.dll") != nullptr (tells you pretty much only if there is an nvidia card installed, though)
checking for environment variable CUDA_PATH (or in my case, CUDA_PATH_V8_0), since that seems to be set by the cuda installation: const char * szCuda8Path = std::getenv("CUDA_PATH_V8_0"); (must be != nullptr)
Use cudaGetDeviceCount() to know if the computer is CUDA-capable.
According to this thread, you cannot statically link cudart.dll.
There are workarounds: embed the CUDA runtime as a resource in your executable, then extract it when your program runs, then dynamically link.
You can also use nvidia-smi to see if CUDA is installed on a machine.