Cannot compile `enqueue_kernel` on opencl 2.1 NEO device - c++

I have the following code on the device Intel(R) Gen9 HD Graphics NEO -- OpenCL 2.1 NEO :
__kernel void update(
const __global uint* positions,
const __global float3* offsets,
const int size,
__global int* cost
) {
int global_id = get_global_id(0);
if (global_id >= size) {
return;
}
int3 update_index = position_from_index(grid_centroid_positions[global_id], SIZE) -
offset_grid;
ndrange_t ndrange = ndrange_3d(size, size, size);
enqueue_kernel(get_default_queue(), ndrange,
^{update_surrouding_cells(offsets, global_id, update_index, update_edge_size, size, cost)});
}
But i get the following compiler error:
6:158:5: error: use of undeclared identifier 'ndrange_t'
ndrange_t ndrange = ndrange_3d(size, size, size);
^
6:161:5: error: implicit declaration of function 'enqueue_kernel' is invalid in OpenCL
enqueue_kernel(get_default_queue(), ndrange,
^
6:161:20: error: implicit declaration of function 'get_default_queue' is invalid in OpenCL
enqueue_kernel(get_default_queue(), ndrange,
^
6:162:163: error: expected ';' after expression
^{update_surrouding_cells(offsets, global_id, update_index, update_edge_size, size, cost)});
^
;
6:161:41: error: use of undeclared identifier 'ndrange'
enqueue_kernel(get_default_queue(), ndrange,
Compilation options are as follows:
-I "/home/development/cl" -g
-D SIZE=256
The device supports opencl 2.1, yet when compiling it seems none of the things for enqueue_kernel exist. Do i need a special extension or something? I am reading the spec here, but it doesn't seem to say anything about actually compiling the examples with dynamic parallelism.

When compiling, it is not just the version of the device that is important. The compiled version of cl code is passed into the compilation options. AKA the compilation options when compiling the opencl program (kernel code) should include:
-cl-std=CL2.0
Or the specific standard that you are looking for.

Related

nvcc generating invalid error compiling JNI code

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
The error is:
MyFile.cu(231): error: expression must have pointer type
The relevant code:
JNIEXPORT jboolean JNICALL Java_MyFile_convergeMatrixCuda (
JNIEnv *env, jclass clazz, jfloatArray fxnMatrixJ, jfloatArray mulMatrixJ, jfloatArray addMatrixJ,
jfloatArray resultsJ, jint numRowsJ, jint numColsJ, jint maxIterations, jfloat epsilonJ)
{
int numRows = (int) numRowsJ;
int numCols = (int) numColsJ;
int maxIter = (int) maxIterations;
float epsilon = (float) epsilonJ;
float *fxnMatrixH = (*env)->GetFloatArrayElements (env, fxnMatrixJ, NULL);
GetFloatArrayElements returns a float*. Replacing "(*env)->GetFloatArrayElements" with "env->GetFloatArrayElements" gets these errors:
float *fxnMatrixH = env->GetFloatArrayElements (env, fxnMatrixJ, NULL);
MyFile.cu(231): error: argument of type "JNIEnv *" is incompatible with parameter of type "jfloatArray"
MyFile.cu(231): error: argument of type "jfloatArray" is incompatible with parameter of type "jboolean *"
MyFile.cu(231): error: too many arguments in function call
nvcc does work correctly when compiling non-JNI code
NVidia's documentation states that
Source files for CUDA applications consist of a mixture of conventional C++ host code, plus GPU device functions.
(*env)->GetFloatArrayElements (env, fxnMatrixJ, NULL); is the way you'd invoke a JNI function in C. But in C++ it would be env->GetFloatArrayElements(fxnMatrixJ, NULL);

nvcc problems compiling object oriented code due to missing class template

I have an issue with compiling CUDA code using nvcc. To demonstrate it I created a dummy class to represent a surface in 3D space.
Here goes the file surface.h:
#ifndef SURFACE_H
#define SURFACE_H
class surface
{
private:
float dX; // grid resolution in x [m]
float dY; // grid resolution in y [m]
int nX; // number of elements in x
int nY; // number of elements in y
float* depth; // pointer to depth array [m]
public:
__host__ __device__ void set_dim(const int _nX, const int _nY);
__host__ __device__ void set_res(const float _dX, const float _dY);
__host__ __device__ float get_surface_mm(const int iX, const int iY);
};
#endif
And here is the corresponding surface.cpp file:
#include "surface.h"
__host__ __device__ void surface::set_dim(const int _nX, const int _nY){
nX = _nX;
nY = _nY;
return;
}
__host__ __device__ void surface::set_res(const float _dX, const float _dY){
dX = _dX;
dY = _dY;
return;
}
__host__ __device__ float surface::get_surface_mm(const int iX, const int iY){
float surfLvl = (float) iX * iY;
return surfLvl;
}
I am trying to compile it with nvcc -x cu -arch=sm_50 -I. -dc surface.cpp -o surface.o but get the following errors:
surface.h(4): error: argument list for class template "surface" is missing
surface.cpp(7): error: argument list for class template "surface" is missing
surface.cpp(8): error: identifier "nX" is undefined
surface.cpp(9): error: identifier "nY" is undefined
surface.cpp(13): error: argument list for class template "surface" is missing
surface.cpp(14): error: identifier "dX" is undefined
surface.cpp(15): error: identifier "dY" is undefined
surface.cpp(19): error: argument list for class template "surface" is missing
8 errors detected in the compilation of "/tmp/tmpxft_000bedf2_00000000-6_surface.cpp1.ii".
I really don't get the reason for this error because in my opinion the class is fully defined and the argument list should be known to the compiler. Did any of you already experience a similar issue? If I remove the __device__ and __host__ flags and compile it with gcc everything works fine.
nvcc --version output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
I am aware that CUDA does not necessarily support every single feature of object oriented programming but double-checked that the stuff I am trying to compile here should be compatible.
I appreciate every hint :). Thanks a lot in advance.
The only problem with this code was that surface is already a built-in type of cuda.h resulting in this problem. After renaming it everything runs through without error.

getting ERROR C4013 - __readcr() undefined; assuming extern returning int

I'm trying to compile a driver and I'm getting these error:
Error C2220 warning treated as error - no 'object' file generated
Error C4013 '__writecr0' undefined; assuming extern returning int
Error C4013 '__readcr0' undefined; assuming extern returning int
I've included intrin.h too, solution platform is ARM64
UINT64 cr0 = __readcr0();
__writecr0( cr0 &
*(PUSHORT)place =
*(PULONG)(place + 2) =
*(PVOID**)(place + 6) =
__writecr0( cr0 );
The docs state that the intrinsic is only available for x86 and x86-64, and even there only in kernel mode. It will thus not work for ARM64 cross compilation.

Mex file building with Octave (issue with wrappers)

I am currently porting some code from Matlab to Octave. Some of the functions of the Matlab code use the Piotr's Computer Vision Matlab Toolbox (here) that includes some mex files.
In Matlab, everything is working like a charm but when I run my codes with Octave, it throws this error:
error: 'imResampleMex' undefined near line 50 column 5
However all the internal paths within the toolbox should be good. I figured out that the way Matlab and Octave handle mex files is different and tried to build one of the mex files from the C++ function within Octave like this:
mkoctfile --mex imResampleMex.cpp
It fails and throws the following error messages related to the C++ wrappers function:
In file included from imResampleMex.cpp:6:0:
wrappers.hpp:21:24: error: 'wrCalloc' declared as an 'inline' variable
inline void* wrCalloc( size_t num, size_t size ) { return calloc(num,size);
^
wrappers.hpp:21:24: error: 'size_t' was not declared in this scope
wrappers.hpp:21:36: error: 'size_t' was not declared in this scope
inline void* wrCalloc( size_t num, size_t size ) { return calloc(num,size);
^
wrappers.hpp:21:48: error: expression list treated as compound expression in initializer [-fpermissive]
inline void* wrCalloc( size_t num, size_t size ) { return calloc(num,size);
^
wrappers.hpp:21:50: error: expected ',' or ';' before '{' token
inline void* wrCalloc( size_t num, size_t size ) { return calloc(num,size);
^
wrappers.hpp:22:24: error: 'wrMalloc' declared as an 'inline' variable
inline void* wrMalloc( size_t size ) { return malloc(size); }
^
wrappers.hpp:22:24: error: 'size_t' was not declared in this scope
wrappers.hpp:22:38: error: expected ',' or ';' before '{' token
inline void* wrMalloc( size_t size ) { return malloc(size); }
^
wrappers.hpp: In function 'void wrFree(void*)':
wrappers.hpp:23:44: error: 'free' was not declared in this scope
inline void wrFree( void * ptr ) { free(ptr); }
^
wrappers.hpp: At global scope:
wrappers.hpp:28:17: error: 'size_t' was not declared in this scope
void* alMalloc( size_t size, int alignment ) {
^
wrappers.hpp:28:30: error: expected primary-expression before 'int'
void* alMalloc( size_t size, int alignment ) {
^
wrappers.hpp:28:44: error: expression list treated as compound expression in initializer [-fpermissive]
void* alMalloc( size_t size, int alignment ) {
^
wrappers.hpp:28:46: error: expected ',' or ';' before '{' token
void* alMalloc( size_t size, int alignment ) {
^
imResampleMex.cpp: In function 'void resampleCoef(int, int, int&, int*&, int*&, T*&, int*, int)':
imResampleMex.cpp:21:39: error: 'alMalloc' cannot be used as a function
wts = (T*)alMalloc(nMax*sizeof(T),16);
^
imResampleMex.cpp:22:43: error: 'alMalloc' cannot be used as a function
yas = (int*)alMalloc(nMax*sizeof(int),16);
^
imResampleMex.cpp:23:43: error: 'alMalloc' cannot be used as a function
ybs = (int*)alMalloc(nMax*sizeof(int),16);
^
imResampleMex.cpp: In function 'void resample(T*, T*, int, int, int, int, int, T)':
imResampleMex.cpp:48:43: error: 'alMalloc' cannot be used as a function
T *C = (T*) alMalloc((ha+4)*sizeof(T),16); for(y=ha; y<ha+4; y++) C[y]=0;
^
warning: mkoctfile: building exited with failure status
The wrappers.hpp file is looking like this:
#ifndef _WRAPPERS_HPP_
#define _WRAPPERS_HPP_
#ifdef MATLAB_MEX_FILE
// wrapper functions if compiling from Matlab
#include "mex.h"
inline void wrError(const char *errormsg) { mexErrMsgTxt(errormsg); }
inline void* wrCalloc( size_t num, size_t size ) { return mxCalloc(num,size); }
inline void* wrMalloc( size_t size ) { return mxMalloc(size); }
inline void wrFree( void * ptr ) { mxFree(ptr); }
#else
// wrapper functions if compiling from C/C++
inline void wrError(const char *errormsg) { throw errormsg; }
inline void* wrCalloc( size_t num, size_t size ) { return calloc(num,size); }
inline void* wrMalloc( size_t size ) { return malloc(size); }
inline void wrFree( void * ptr ) { free(ptr); }
#endif
// platform independent aligned memory allocation (see also alFree)
void* alMalloc( size_t size, int alignment ) {
const size_t pSize = sizeof(void*), a = alignment-1;
void *raw = wrMalloc(size + a + pSize);
void *aligned = (void*) (((size_t) raw + pSize + a) & ~a);
*(void**) ((size_t) aligned-pSize) = raw;
return aligned;
}
// platform independent alignned memory de-allocation (see also alMalloc)
void alFree(void* aligned) {
void* raw = *(void**)((char*)aligned-sizeof(void*));
wrFree(raw);
}
#endif
I imagine I need to modify this file but my knowledge of C++ and of mex files being close to non-existent, I am struggling figuring a way out of this mountain of errors. I don't even have a clue whether I'm going in the right direction or not... Any help is welcome!
Edit 1:
I modified my wrappers.hpp file adding #include <stdlib.h> to it. A mex file is now created. However, when running the function calling the file, I now get the following error:
error: failed to install .mex file function 'imResampleMex'
error: called from
imResample at line 50 column 4
Here are the steps I used to solve the issue of creating the mex files from the Piotr's Computer Vision Matlab Toolbox for Octave (concerns the cpp files of the folder channels only).
The original mex files that come with the toolbox are built for Matlab and do not work with Octave. While building them from Octave a call is made to the file wrappers.hpp.
This file has to be modified by adding these two lines at the beginning of the file: #include <stdlib.h> and #include "mex.h".
For building the mex file, in the Octave prompt type (while being in the directory of the cpp file):
mkoctfile --mex -DMATLAB_MEX_FILE the_file.cpp
This way the Octave compatible mex file is created.
Edit 23/05/2017 - After receiving more questions from people having trouble generating these files I released the generated files on Github: https://github.com/Eskapp/Piotr_vision_extrafiles_Octave. Feel free to use them.
You'll need to add them manually to the Computer Vision toolbox.

"error: expected primary-expression before 'volatile'"

I'm trying to compile NRF driver with MinGW 4.8.2. I'm getting this error:
In file included from
b:/nrfdriver/sdk/nRF51_SDK_8.1.0_b6ed55f/components/device/nrf51.h:119:0,
from b:/nrfdriver/sdk/nRF51_SDK_8.1.0_b6ed55f/components/softdevice/s130/headers/nrf_soc.h:50,
from b:/nrfdriver/pc-ble-driver-0.5.0/driver/inc_override/nrf_soc.h:21,
from b:/nrfdriver/pc-ble-driver-0.5.0/driver/inc_override/app_util_platform.h:26,
from b:/nrfdriver/sdk/nRF51_SDK_8.1.0_b6ed55f/components/drivers_nrf/uart/app_uart.h:27,
from b:\nrfdriver\pc-ble-driver-0.5.0\driver\src\app_uart_pc.c:13:
C:/MinGW/mingw64/lib/gcc/x86_64-w64-mingw32/4.8.2/include/xmmintrin.h:
In function 'void _mm_setcsr(unsigned int)':
b:/nrfdriver/sdk/nRF51_SDK_8.1.0_b6ed55f/components/toolchain/gcc/core_cm0.h:164:21:
error: expected primary-expression before 'volatile' #define __I
volatile /*!< Defines 'read only' permissions */
^ driver\CMakeFiles\s130_nrf51_ble_driver.dir\build.make:297: recipe for
target 'driver/CMakeFiles/s130_nrf51_ble_driver.dir/src/app_uart_pc.c
.obj' failed
Error comes from this line:
#define __I volatile /*!< Defines 'read only' permissions */
How can such a simple #define lead to a compiler error and any idea how I should fix that? (upgrading the compiler is not an option as this version of the driver is supposed to be built with this version of MinGW).
Note that the driver is meant to be built in 32bits and I'm trying to build it targetting 64bits, dunno if that could be the cause of the problem...
What if we also look in the xmmintrin.h header?
/* Set the control register to I. */
extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_setcsr (unsigned int __I)
{
__builtin_ia32_ldmxcsr (__I);
}
Ouch, another use of __I.