I've looked around and been unable to find the solution to what I find a relatively simple OpenCL related question.
Thing is, I just started using double precision in my OpenCL kernels, as my current project requires that much precision. Furthermore, I'm trying to keep everything managed, so that all kernels have the same #DEFINES that they can use.
Then I came to the extentions. By OpenCL I'll have to include
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
How do I include this in the build-options for clBuildProgram?
You can check the extensions supported by a device from the host calling clGetDeviceInfo with CL_DEVICE_EXTENSIONS (section 4.2 of the OpenCL 1.1 spec). The returned string will contain 'cl_khr_fp64' if the extension is supported.
When compiling OpenCL code with clBuildProgram, the compiler defines 'cl_khr_fp64' if the extension is supported (section 9.1 of the OpenCL 1.1 spec).
To enable the extension in the OpenCL code, you then have to include the pragma line. You can control the use of the extension from the host code by passing an option to clBuildProgram, like -D USE_FP64=1, and then test it in the OpenCL code:
#if USE_FP64
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#endif
Related
I have some code that is heavily dependent on Eigen. I would like to optimize it with CUDA, but when I am compiling I get:
[tcai4#golubh4 Try1]$ nvcc conv_parallel.cu -I /home/tcai4/project-cse/Try1 -lfftw3 -o conv.o
In file included from Eigen/Dense:1,
from Eigen/Eigen:1,
from functions.h:8,
from conv_parallel.cu:10:
Eigen/Core:44:34: error: math_functions.hpp: No such file or directory
I think math_functions.hpp is a file from CUDA. Can someone help me figure out why nvcc cannot find it?
edit: I am using CUDA 5.5 and Eigen 3.3, except from linking Eigen and fftw3 library, I did not use any other flags(as you can see from my code).
I encountered this issue while building TensorFlow 1.4.1 with Cuda 9.1, and strangely math_functions.hpp existed only in include/crt.
Creating a symlink from cuda/include/math_functions.hpp to cuda/include/crt/math_functions.hpp fixed the issue:
ln -s /usr/local/cuda/include/crt/math_functions.hpp /usr/local/cuda/include/math_functions.hpp
The reason nvcc cannot find the file in question is because that file is part of the CUDA Math library, which was introduced in CUDA 6. Your almost 4 year old version of CUDA predates the release of the Math library. Your CUDA version doesn't contain said file.
You should, therefore, assume that what you are trying to do cannot work without first updating to a newer version of the CUDA toolkit.
Creating symlink sometimes causes other complication.
You can try replacing
// We need math_functions.hpp to ensure that that EIGEN_USING_STD_MATH macro
// works properly on the device side
#include <math_functions.hpp>
with
// We need cuda_runtime.h to ensure that that EIGEN_USING_STD_MATH macro
// works properly on the device side
#include <cuda_runtime.h>
in
/usr/include/eigen3/Eigen/Core,
which works for me.
The reason why "math_functions.hpp" cannot be found is because "math_functions.hpp" has been renamed to "math_functions.h". So you just need to go to
/usr/include/eigen3/Eigen/Core
and change "math_functions.hpp" to "math_functions.h"
I tried compiling OpenCV 2.4.13.1 with opencl 1.1 with headers from https://github.com/KhronosGroup/OpenCL-Headers
I had to change #ifdef CL_VERSION_1_2 to #ifdef CL_VERSION_1_1 in opencv/cmake/checks/opencl.cpp
also http://docs.opencv.org/2.4.13/modules/ocl/doc/introduction.html suggests that it should work with OpenCL 1.1
But I still get errors like cl_runtime_opencl.hpp:294:61: error: 'cl_device_partition_property' does not name a type when building.
Do I have to go back to an older version to get OpenCL1.1 working? or have I missed something?
Edit:
I don't mind an answer for OpenCV 3.0
As I know, CUDA supports C and C++. But I can't' use C++ in my kernel.
I try a simple example like this
__global__ void simple(){
cout<<"abc";
}
That is error. But if I change to printf("abc"); it is right.
Can you explain for me? Thank you very much!
From CUDA 7.5 nvidia slides:
C++11 Supported features:
auto
lambdas
std::initializer_list
variadic templates
static_asserts
constexpr
rvalue references
range based for loops
C++ Not supported features
thread_local
Standard libraries: std::*
std::cout is defined in the C++ standard library which is not supported by CUDA. Use C printf
From CUDA 6.5, the ‘compute_11′, ‘compute_12′, ‘compute_13′, ‘sm_11′, ‘sm_12′, and ‘sm_13′ architectures are deprecated. So nvcc will compile by default to CC 2.0 enabling printf support.
More info here and here
CUDA doesn't link the libraries & header files that are required to use the cout function. However, you can enable the use of printf()
This answer explains the process which enables this feature:
printing from cuda kernels
quoted here for easier access:
To enable use of plain printf() on devices of Compute Capability >= 2.0, it's important to compile for CC of at least CC 2.0 and disable the default, which includes a build for CC 1.0.
Right-click the .cu file in your project, select Properties, select Configuration Properties | CUDA C/C++ | Device. Click on the Code Generation line, click the triangle, select Edit. In the Code Generation dialog box, uncheck Inherit from parent or project defaults, type compute_20,sm_20 in the top window, click OK.
I want to make sure that my application conforms to OpenGL 2.1.
How can I check this?
Because my computer supports GL4.4, even if I use, for example, glGenVertexArrays(), it will work successfully. But glGenVertexArrays() is only available with GL3+.
So, I want to verify that my app only uses GL2.1 functionality.
One way is to run it on my old PC that support only GL2.1, but I'm looking for an easier way.
If you find an extension loader that supports generating version specific headers, as described by #datenwolf, that's probably your easiest solution. There's another options you can try if necessary.
The official OpenGL headers you can find at https://www.opengl.org/registry contain the definitions grouped by version, and enclosed in preprocessor conditions. The layout looks like this:
...
#ifndef GL_VERSION_2_1
#define GL_VERSION_2_1 1
// GL 2.1 definitions
#endif
#ifndef GL_VERSION_3_0
#define GL_VERSION_3_0 1
// GL 3.0 definitions
#endif
#ifndef GL_VERSION_3_1
#define GL_VERSION_3_1 1
// GL 3.1 definitions
#endif
...
You should be able to include the official header at least for a version test. If you disable the versions you do not want to use by defining the corresponding pre-processor symbol, you will get compile errors if you are trying to use features from those versions. For example for GL 2.1:
#define GL_VERSION_3_0 1
#define GL_VERSION_3_1 1
#define GL_VERSION_3_2 1
#define GL_VERSION_3_3 1
#define GL_VERSION_4_0 1
#define GL_VERSION_4_1 1
#define GL_VERSION_4_2 1
#define GL_VERSION_4_3 1
#define GL_VERSION_4_4 1
#define GL_VERSION_4_5 1
#include <GL/glext.h>
// your code
Try https://github.com/cginternals/glbinding. It's an OpenGL wrapper library which supports exactly what you ask for:
Feature-Centered Header Design
The OpenGL API is iteratively developed and released in versions,
internally (for the API specification) named features. The latest
feature/version of OpenGL is 4.5. The previous version are 1.0, 1.1,
1.2, 1.3, 1.4, 1.5, 2.0, 2.1, 3.0, 3.1, 3.2, 3.3, 4.0, 4.1, 4.2, 4.3, and 4.4. OpenGL uses a deprecation model for removing outdated parts
of its API which results in compatibility (with deprecated API) and
core (without deprecated API) usage that is manifested in the targeted
OpenGL context. On top of that, new API concepts are suggested as
extensions (often vendor specific) that might be integrated into
future versions. All this results in many possible specific
manifestations of the OpenGL API you can use in your program.
One tough task is to adhere to one agreed set of functions in your own
OpenGL program (e.g., OpenGL 3.2 Core if you want to develop for every
Windows, macOS, and Linux released in the last 4 years). glbinding
facilitates this by providing per-feature headers by means of
well-defined/generated subsets of the OpenGL API.
All-Features OpenGL Headers
If you do not use per-feature headers the OpenGL program can look like
this:
#include <glbinding/gl/gl.h>
// draw code
gl::glClear(gl::GL_COLOR_BUFFER_BIT | gl::GL_DEPTH_BUFFER_BIT);
gl::glUniform1i(u_numcubes, m_numcubes);
gl::glDrawElementsInstanced(gl::GL_TRIANGLES, 18, gl::GL_UNSIGNED_BYTE, 0, m_numcubes * m_numcubes);
Single-Feature OpenGL Headers
When developing your code on Windows with latest drivers installed,
the code above is likely to compile and run. But if you want to port
it to systems with less mature driver support (e.g., macOS or Linux
using open source drivers), you may wonder if glDrawElementsInstanced
is available. In this case, just switch to per-feature headers of
glbinding and choose the OpenGL 3.2 Core headers (as you know that at
least this version is available on all target platforms):
#include <glbinding/gl32core/gl.h>
// draw code
gl32core::glClear(gl32core::GL_COLOR_BUFFER_BIT | gl32core::GL_DEPTH_BUFFER_BIT);
gl32core::glUniform1i(u_numcubes, m_numcubes);
gl32core::glDrawElementsInstanced(gl32core::GL_TRIANGLES, 18, gl32core::GL_UNSIGNED_BYTE, 0, m_numcubes * m_numcubes);
If the code compiles than you can be sure it is OpenGL 3.2 Core
compliant. Using functions that are not yet available or relying on
deprecated functionality is prevented.
You can compile it in an environment in which only the OpenGL-2.1 symbols are available. Depending on which extension wrapper / loader you use this can be easy or hard.
For example if you use the glloadgen OpenGL loader generator you can generate a header file and compilation unit that will cover only exactly the OpenGL-2.1 symbols and tokens. If you then compile your project using this, the compiler will error out on anything that's not covered.
I'm writing a small hello world OpenCL program using Khronos Group's cl.hpp for OpenCL 1.2 and nVidia's openCL libraries. The drivers and ICD I have support OpenCL 1.1. Since the nVidia side doesn't support 1.2 yet, I get some errors on functions required on OpenCL 1.2.
On the other side, cl.hpp for OpenCL 1.2 has a flag, CL_VERSION_1_1 to be exact, to run the header in 1.1 mode, but it's not working. Anybody has similar experience or solution?
Note: cl.hpp for version 1.1 works but, generates many warnings during compilation. This is why I'm trying to use 1.2 version.
Unfortunately NVIDIA distributes an old version of the OpenCL ICD (the library that dispatches API calls to the appropriate driver). Your best options are to either
Get hold of a more up to date version of the ICD (if you're using Linux, this is libOpenCL.so, and you can find a newer copy in AMD's APP SDK). The downside is that if you distribute your compiled code, it will also require the 1.2 ICD.
Use the OpenCL 1.1 header files, except that you can use the latest cl.hpp. It should (in theory) detect that it is being combined with OpenCL 1.1 headers and disable all the OpenCL 1.2 code (that doesn't get tested much though). The advantage of using the latest cl.hpp is that there are a lot of bug fixes that don't get back-ported to the 1.1 version of cl.hpp.
You can do this:
#include <CL/cl.h>
#undef CL_VERSION_1_2
#include <CL/cl.hpp>
I've just implemented that in my code and it seems to do the trick.
You can define the flag CL_USE_DEPRECATED_OPENCL_1_1_APIS which will make the 1.2 hpp file 1.1 compatible.
#define CL_USE_DEPRECATED_OPENCL_1_1_APIS
This is what I have done on NVIDIA and AMD. Works like a charm
I was fed up with downloading several GB OpenCL SDK's by Intel, Nvidia, and AMD with different issues:
Intel requires registration and has a temporary license.
Nvidia SDK does not support OpenCL 2.0 and you have to download cl.hpp anyway.
AMDs cl.hpp file defines min and max macros which can conflict with MSVC's min and max macros (I spend too much time figuring out how to fix this with e.g. NOMINMAX). The header is not even the same as the one defined by Khronos (which does not have the min/max problem).
Therefore, I downloaded the source code and includes from Khronos as suggested by this SO answer and compiled the OpenCL.lib file myself. The includes and OpenCL.lib files are a couple of MB. That's a lot smaller than all the extra stuff in the Intel/Nvidia/AMD SDKs! I can include the OpenCL includes and OpenCL.lib files in my project and no longer have to tell others to download an SDK.
The includes for OpenCL 2.0 from the Khronos registry has a new C++ binding file cl2.hpp. Looking at this file I have determined that the correct way to support the deprecated functions with the OpenCL 2.0 is something like this.
#define CL_HPP_MINIMUM_OPENCL_VERSION 110
#define CL_HPP_TARGET_OPENCL_VERSION 120
#define CL_HPP_CL_1_2_DEFAULT_BUILD
#include "CL/cl2.hpp"
This is because the cl2.hpp file has this code
#if CL_HPP_MINIMUM_OPENCL_VERSION <= 100 && !defined(CL_USE_DEPRECATED_OPENCL_1_0_APIS)
# define CL_USE_DEPRECATED_OPENCL_1_0_APIS
#endif
#if CL_HPP_MINIMUM_OPENCL_VERSION <= 110 && !defined(CL_USE_DEPRECATED_OPENCL_1_1_APIS)
# define CL_USE_DEPRECATED_OPENCL_1_1_APIS
#endif
#if CL_HPP_MINIMUM_OPENCL_VERSION <= 120 && !defined(CL_USE_DEPRECATED_OPENCL_1_2_APIS)
# define CL_USE_DEPRECATED_OPENCL_1_2_APIS
#endif
#if CL_HPP_MINIMUM_OPENCL_VERSION <= 200 && !defined(CL_USE_DEPRECATED_OPENCL_2_0_APIS)
# define CL_USE_DEPRECATED_OPENCL_2_0_APIS
#endif
Notice that you no longer need to (and should not) include <CL/opencl.h> anymore.
Lastly, after #include "CL/cl2.hpp" in order to get my code to work with Boost/Compute I had to add
#undef CL_VERSION_2_0
My own OpenCL code works without this but Boost/Compute does not. It appears I'm not the only one having this issue. My GPU does not support OpenCL 2.0.
Looks like the only way is to use the OpenCL 1.1 headers while working with 1.1 capable devices.
You can call can set the options of clBuildProgram as follows
const char options[] = "-cl-std=CL1.1";
clBuildProgram( program, 1, &devices, options, NULL, NULL );
This forces the compiler to use OpenCL 1.1 no matter which version is supported by your device