-1001 error in OpenCL with Nvidia card in Ubuntu Linux - c++

I am trying to run this OpenCL Example in Ubuntu 10.04.
My graphics card is an NVIDIA GeForce GTX 480. I have installed the latest NVIDIA driver and CUDA toolkit manually.
The program compiles without any errors. Thus linking with libOpenCL works. The application also runs but the output is very strange (mostly zeros and some random numbers). Debugging shows that
clGetPlatformIDs(1, &platform_id, &ret_num_platforms);
returns -1001.
google and stack told me that the reason may be a missing nvidia.icd in /etc/OpenCL/vendors. It was not there so I've added /etc/OpenCL/vendors/nvidia.icd with the following line
libnvidia-opencl.so.1
I have also tried some variants (absolute paths etc). But nothing solved the problem. Right now I have no idea what else I can try. Any suggestions?
EDIT: I have installed the Intel OpenCL SDK and I have copied its icd into /etc/OpenCL/vendors and the application works fine for
clGetDeviceIDs( platform_id, CL_DEVICE_TYPE_DEFAULT, 1,
&device_id, &ret_num_devices);
For
clGetDeviceIDs( platform_id, CL_DEVICE_TYPE_GPU, 1,
&device_id, &ret_num_devices);
I get the error -1.
EDIT:
I have noticed one thing in the console when executing the application. After execution of line
cl_int ret = clGetPlatformIDs(1, &platform_id, &ret_num_platforms);
the application gives me the output
modprobe: ERROR: ../libkmod/libkmod-module.c:809 kmod_module_insert_module() could not find module by name='nvidia_331_uvm'
modprobe: ERROR: could not insert 'nvidia_331_uvm': Function not implemented
There seems to be a conflict with an older driver version since I am using 340.
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 340.32 Tue Aug 5 20:58:26 PDT 2014
Maybe I should try to remove Ubuntu's own NVIDIA drivers one more time and reinstall the latest manually one more time?
EDIT:
The old driver was the problem. Somehow it wasn't removed properly thus I have done it one more time with
apt-get remove nvidia-331 nvidia-opencl-icd-331 nvidia-libopencl1-331
and now it works. I hope this helps someone who has similar problems.

The above mentioned problems occurred due to a driver conflict. If you have a similar problem then read the above edits to get the solution.

Related

Linux BTF: bpftool: Failed to get EHDR from /sys/kernel/btf/vmlinux

I am trying to start with BPF CO:RE Development.
Using Ubuntu 20.04 LTS in a VM, I needed to recompile the kernel and install pahole (from apt install dwarves) so that BTF is enabled (I set CONFIG_DEBUG_FS=y and CONFIG_DEBUG_INFO_BTF=y).
So my setup is:
Ubuntu 20.04
Kernel 5.4.0-90-generic
bpftool --version: /usr/lib/linux-tools/5.4.0-90-generic/bpftool v5.4.148
/sys/kernel/btf/vmlinux exists and can be read out with cat.
But bpftool shows the following error:
$ sudo bpftool btf dump file /sys/kernel/btf/vmlinux format c
libbpf: failed to get EHDR from /sys/kernel/btf/vmlinux
Error: failed to load BTF from /sys/kernel/btf/vmlinux: Unknown error -4001
From https://github.com/libbpf/libbpf/blob/master/src/libbpf.h
it looks like it is LIBBPF_ERRNO__FORMAT, /* BPF object format invalid */
but I can not find out what's wrong.
Does anybody know where the mistake might be?
Thanks in advance!
EDIT: Added bpftool version
You need to update bpftool to support a fallback to reading BTF as raw data if the input file is not an object file. The minimum bpftool version required is v5.5 as that's the Linux release where the patch landed. In general, I would recommend to always use the latest bpftool version as there are no backports.
Update:
It looks like bpftool only accepts a ELF-file with the compiled runnning kernel in it, but my /sys/kernel/btf/vmlinux is not:
$ file /sys/kernel/btf/vmlinux
/sys/kernel/btf/vmlinux: data
Same for /boot/vmlinuz:
$ sudo file /boot/vmlinuz-5.4.0-90-generic
/boot/vmlinuz-5.4.0-90-generic: Linux kernel x86 boot executable bzImage, version 5.4.0-90-generic (root#elde-dev) #101+test1 SMP Tue Nov 23 16:38:41 UTC 2021, RO-rootFS, swap_dev 0xD, Normal VGA
Does anybody know why my /sys/kernel/btf/vmlinux does not show the right format?
I found this workaround:
Using this script (https://elixir.bootlin.com/linux/latest/source/scripts/extract-vmlinux) as suggested here (https://unix.stackexchange.com/questions/610672/where-is-the-linux-kernel-elf-file-located) I could get the "working" vmlinux-file which then could be read by bpftool. But this can not really be the right way for BPF CO:RE I guess... Also, in all the tutorials, bpftool is used directly with /sys/kernel/btf/vmlinux.
So why do I get the wrong format?
EDIT: As suggested above, just downoad the newest linux kernel, compile bpftool from there and use that.

OpenGL program with tensorflow C++ gives failed call to cuInit : CUDA_ERROR_OUT_OF_MEMORY

I have trained a model with no issues using tensorflow on python. I am now trying to integrate inference for this model into a pre-existing OpenGL enabled software. However, I get a CUDA_ERROR_OUT_OF_MEMORY during cuInit (that is, even earlier than loading the model, just at session creation). It does seem, that OpenGL has taken some MiBs of memory (around 300 MB), as shown by gpustat or nvidia-smi.
Is it possible there is a clash as both TF and OpenGL are trying to access/allocate the GPU memory? Has anyone encountered this problem before? Most references I found googling around are at model loading time, not at session/CUDA initialization. Is this completely unrelated to OpenGL and I am just barking up the wrong tree? A simple TF C++ inference example works. Any help is appreciated.
Here is the tensorflow logging output, for completeness:
2018-01-08 12:11:38.321136: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-01-08 12:11:38.379100: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY
2018-01-08 12:11:38.379388: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: rosenblatt
2018-01-08 12:11:38.379413: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: rosenblatt
2018-01-08 12:11:38.379508: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 384.98.0
2018-01-08 12:11:38.380425: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 384.98 Thu Oct 26 15:16:01 PDT 2017 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)"""
2018-01-08 12:11:38.380481: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 384.98.0
2018-01-08 12:11:38.380497: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 384.98.0
EDIT: Removing all references to OpenGL resulted in the same problem, so it has nothing to do with a clash between the libraries.
Ok, the problem was the use of the sanitizer in the debug version of the binary. The release version, or the debug version with no sanitizer work as expected.

OpenCV 3.0 install WITH_OPENCL_SVM=ON on Machine with OpenCL 1.1

So I installed OpenCV WITH_OPENCL_SVM=ON thinking that I was going to eventually get a GPU with OpenCL 2.0 on it (currently only 1.1.) However now when I try to run any programs with OpenCV I get an
Error on line 2629 (ocl.cpp): CL_DEVICE_SVM_CAPABILITIES via clGetDeviceInfo failed: -30
I believe when I create a cv::UMat. The line of code in ocl.cpp is dependent on HAVE_OPENCL_SVM being defined, I assume it then does a check for the SVM capability and fails that check because I don't have 2.0. I tried:
#undef HAVE_OPENCL_SVM
in my code and modifying cvconfig.h (honestly don't know how/when that file is referenced) so that it is not defined as well as un-defining it there again... and the error persists.
Thanks!

OpenCL not finding platforms?

I am trying to utilize the C++ API for OpenCL. I have installed my NVIDIA drivers and I have tested that I can run the simple vector addition program provided here. I can compile this program with following gcc call and the program runs without problem.
gcc main.c -o vectorAddition -l OpenCL -I/usr/local/cuda-6.5/include
However, I would very much prefer to use the C++ API as opposed the very verbose host files needed for C.
I downloaded the C++ bindings from Khronos from here and placed the cl.hpp file in the same location as my other cl.h file. The code uses some C++11 so I can compile the code with:
g++ main.cpp -o vectorAddition_cpp -std=c++11 -l OpenCL -I/usr/local/cuda-6.5/include
but when I try to run the program I get the error:
clGetPlatformIDs(-1001)
I also tried the example provided here as well which gave a more helpful error message.
No platforms found. Check OpenCL installation!
The particular code which provides this error is this:
std::vector<cl::Platform> all_platforms;
cl::Platform::get(&all_platforms);
if(all_platforms.size()==0){
std::cout<<" No platforms found. Check OpenCL installation!\n";
exit(1);
}
This seems so strange given that the C implementation runs without problem. Any insights would be sincerely appreciated.
EDIT
The C implementation actually isn't running correctly. Each addition is printed to equal zero. Checking the ret_num_platforms also returns 0. For some reason my setup is failing to find my GPU. What could I have missed? My install consists of the nvidia-340 driver and cuda-6.5 installed via apt-get and the .run file respectively.
My sincerest thanks to #pasternak for helping me troubleshoot this problem. To solve it however I ended up needing to avoid essentially all ubuntu apt-get calls for install and just use the cuda run file for the full installation. Here is what fixed the problem.
Purge existing nvidia and cuda implementations (sudo apt-get purge cuda* nvidia-*)
Download cuda-6.5 toolkit from the CUDA toolkit archive
Reboot computer
Switch to ttyl (Ctrl-Alt-F1)
Stop the X server (sudo stop lightdm)
Run the cuda run file (sh cuda_6.5.14_linux_64.run)
Select 'yes' and accept all defaults
Required reboot
Switch to ttyl, stop X server and run the cuda run file again and select 'yes' and default for everything (including the driver again)
Update PATH to include /usr/local/cuda-6.5/bin and LD_LIBRARY_PATH
to include /usr/local/cuda-6.5/lib64
Reboot again
Compile main.c program (gcc main.c -o vectorAddition -l OpenCL -I/usr/local/cuda-6.5/include)
Verify works with ./vectorAddition
C++ API
Download cl.hpp file from Khronos here noting that it is version 1.1
Place cl.hpp file in /usr/local/cuda-6.5/include/CL with other cl headers.
Compile main.cpp (g++ main.cpp -o vectorAddition_cpp -std=c++11 -l OpenCL -I/usr/local/cuda-6.5/include)
Verify it works (./vectorAddition_cpp)
All output from both programs show the correct output for addition between vectors.
I personally find it interesting the Ubuntu's nvidia drivers don't seem to play well with the cuda toolkits. Possibly just for the older versions but still very unexpected.
It is hard to say without running the specific code on your machine but looking at the difference between the example C code you said was working and the cl.hpp might give us a clue. In particular, notice that the C example uses the following line to simply read a single platform ID:
cl_platform_id platform_id = NULL;
cl_int ret = clGetPlatformIDs(1, &platform_id, &ret_num_platforms);
Notice that is passes 1 as its first argument. This assumes that at least one OpenCL platform exists and requests that the first one found is placed in platform_id. Additionally, note that even though the return code is assigned to "ret" is it not used to actually check if an error is returned.
Now if we look at the implementation of the static method used to queue the set of platforms in cl.hpp, i.e. cl::Platform::get:
static cl_int get(
VECTOR_CLASS<Platform>* platforms)
{
cl_uint n = 0;
cl_int err = ::clGetPlatformIDs(0, NULL, &n);
if (err != CL_SUCCESS) {
return detail::errHandler(err, __GET_PLATFORM_IDS_ERR);
}
cl_platform_id* ids = (cl_platform_id*) alloca(
n * sizeof(cl_platform_id));
err = ::clGetPlatformIDs(n, ids, NULL);
if (err != CL_SUCCESS) {
return detail::errHandler(err, __GET_PLATFORM_IDS_ERR);
}
platforms->assign(&ids[0], &ids[n]);
return CL_SUCCESS;
}
we see that it first calls
::clGetPlatformIDs(0, NULL, &n);
notice that the first parameter is 0, which tells the OpenCL runtime to return the number of platforms in "n". If this is successful it then goes on to request the actual "n" platform IDs.
So the difference here is that the C version is not checking that there is at least one platform and simply assuming that one exists, while the cl.hpp variant is and as such maybe it is this call that is failing.
The most likely reason for all this is that the ICD is not correctly installed. You can see this thread for an example of how to fix this issue:
ERROR: clGetPlatformIDs -1001 when running OpenCL code (Linux)
I hope this helps.

Opengl version trouble glew.h

I am developing an OpenGL application and need to use the glew library. I am using Visual Studio C++ 2008 Express.
I compiled a program using gl.h, glu.h, and glut.h just fine and it does what it's supposed to do. But after including glew.h it still compiles just fine, but when I try:
glewInit();
if (glewIsSupported("GL_VERSION_2_0"))
printf("Ready for OpenGL 2.0\n");
else {
printf("OpenGL 2.0 not supported\n");
}
It keeps printing:
"OpenGL 2.0 not supported".
I tried to change it to glewIsSupported("GL_VERSION_1_3") or even glewIsSupported("GL_VERSION_1_0") and it still returns false meaning that it does not support OpenGL version whatever.
I have a Radeon HD 5750 so, it should support OpenGL 3.1 and some of the features of 3.2. I know that all the device drivers are installed properly since I was able to run all the programs in the Radeon sdk provided by ATI.
I also installed Opengl Extensions Viewer 3.15 and it says OpenGL Version 3.1 Ati Driver 6.14.10.9116. I tired all of them GLEW_VERSION_1_1, GLEW_VERSION_1_2, GLEW_VERSION_1_3, GLEW_VERSION_2_0, GLEW_VERSION_2_1, GLEW_VERSION_3_0 and all of these return false.
Any other suggestioms? I even tried GLEW_ARB_vertex_shader && GLEW_ARB_fragment_shader and this is returning false as well.
glewIsSupported is meant to check and see if the specific features are supported. You want something more like...
if (GLEW_VERSION_1_3)
{
/* Yay! OpenGL 1.3 is supported! */
}
there may be some lack of necessary initialization. I encounter the same question. And here is how I solve the question: you need to include the glCreateWindow() ahead. Include this function and try again.
Firstly, you should check whether glew has initialized properly:
if(glewInit() != GLEW_OK)
{ // something is wrong };
Secondly, you need to create the context before calling glewInit()
Thirdly, you can also try:
glewExperimental=true;
Before calling glewInit()
I encountered the same problem when running a program through windows RDP, then I noticed that my video card may not working properly when using RDP, so I tried teamviewer instead, both glewinfo.exe and my program start to work normally then.
The OP's problem may be solved for a long time, just for others' infomation.