OpenCL Compiler Weird Condition - c++

I'm a new one working on OpenCL. I have some weird trouble when I try to compile the kernel.
On Nvidia platform, no matter what code in the source, it always show me cl_success and the log is only "\n"; On Intel platform, no matter what code in the source, clBuildProgram returns CL_INVALID_BINARY, clGetProgramBuildInfo with CL_PROGRAM_BUILD_STATUS returns CL_ERROR and the log looks find no mistake:
fcl build 1 succeeded.\n fcl build 2 succeeded.\n bcl build succeeded.\n.
Due to this is my first piece of complicated kernel codes, I know it fills tons of mistakes. However, this doesn't look like an error of codes. Why the compiler shows some contradictory information?
Here is my code:
The Codes are long, I just post parts that might be associated with. "..." means something skipped. Ask the rest if you need.
DrawProcess.ccp
#include <stdlib.h>
#include "Console.h"
#include "Renderer.h"
#include "Object.h"
#include "TertiaryArithmeticAlgorithms.h"
#define CL_USE_DEPRECATED_OPENCL_2_0_APIS
#if defined(__APPLE__) || defined(__MACOSX)
#include <OpenCL/cl.hpp>
#else
#include <CL/cl.h>
#endif
#include "Camera.h"
cl_command_queue CommandQueue;
cl_mem BufIdx[8];
cl_kernel Rasterization;
bool Initialization()
{
ConWrite("======== OpenCL Initializing ========\n");
//
cl_platform_id ThePlatformID=NULL;
cl_uint NumPlatforms;
cl_int status;
if(CL_INVALID_VALUE==clGetPlatformIDs(NULL,NULL,&NumPlatforms))
{
ConWrite("ERROR: Fail to Get the Number of Available Items in Platform List! The Number of Available Items in Platform List Equal to 0 and Platform List is NULL OR Both Platform List and the Exact Number of Items in Platform List are NULL.\n");
ConWrite("=== OpenCL Initialization Failed! ===\n");
return 1;
}
else
{
ConWrite("The Number of Items in Platform List is ");
ConWrite(&NumPlatforms);
ConWrite(".\n");
}
//
cl_platform_id *PlatformList;
if(NumPlatforms>0)
{
PlatformList=(cl_platform_id*)malloc(NumPlatforms*sizeof(cl_platform_id));
if(CL_INVALID_VALUE==clGetPlatformIDs(NumPlatforms,PlatformList,NULL))
{
ConWrite("ERROR: Fail to Get the Platform List! The Number of Available Items in Platform List Equal to 0 and Platform List is NULL OR Both Platform List and the Exact Number of Items in Platform List are NULL.\n");
ConWrite("=== OpenCL Initialization Failed! ===\n");
return 1;
}
else
{
ConWrite("Platform List Obtained.\n");
}
}
else
{
ConWrite("ERROR: The Number of Available Items in Platform List is not Greater than 0!\n");
ConWrite("=== OpenCL Initialization Failed! ===\n");
return 1;
}
...
cl_program VertexProgram=clCreateProgramWithSource(Context,1,Cartography,NULL,NULL);
status=clBuildProgram(VertexProgram,LengthOfDevices/sizeof(cl_device_id*),DeviceList,NULL,NULL,NULL);
if(CL_SUCCESS==status)
{
ConWrite("CODE: CL_SUCCESS. OpenCL Program Built.\n");
}
else
{
switch(status)
{
case CL_INVALID_PROGRAM:
ConWrite("CODE: CL_INVALID_PROGRAM. ERROR: The Program is an Invalid Program Object!\n");
break;
case CL_INVALID_VALUE:
ConWrite("CODE: CL_INVALID_VALUE. ERROR: Device List is Unavailable and the Number of Devices is Greater Than Zero, OR Device List is NOT NULL and the Number of Devices is Zero, OR the Pointer to Notify is NULL But User Data is NOT NULL!\n");
break;
case CL_INVALID_DEVICE:
ConWrite("CODE: CL_INVALID_DEVICE. ERROR: OpenCL Devices listed in the Device List are NOT in the List of Devices Associated with the Program!\n");
break;
case CL_INVALID_BINARY:
ConWrite("CODE: CL_INVALID_BINARY. ERROR: The Program was Created with Binary and Devices Listed in the Device List do NOT Have a Valid Binary Program!\n");
break;
case CL_INVALID_BUILD_OPTIONS:
ConWrite("CODE: CL_INVALID_BUILD_OPTIONS. ERROR: The Build Options Specified by Options are Invalid!\n");
break;
case CL_INVALID_OPERATION:
ConWrite("CODE: CL_INVALID_OPERATION. ERROR: The Build of the Program Executable for Any of the Devices Listed in the Device List by a Previous Call to the Function for the Program has NOT Completed!\n");
break;
//case CL_COMPILER_NOT_AVAILABLE: if program is created with clCreateProgramWithSource and a compiler is not available i.e. CL_DEVICE_COMPILER_AVAILABLE specified in the table of OpenCL Device Queries for clGetDeviceInfo is set to CL_FALSE.
//case CL_BUILD_PROGRAM_FAILURE: if there is a failure to build the program executable. This error will be returned if clBuildProgram does not return until the build has completed.
//case CL_INVALID_OPERATION: if there are kernel objects attached to program.
//case CL_OUT_OF_HOST_MEMORY: if there is a failure to allocate resources required by the OpenCL implementation on the host.
}
}
cl_build_status *BudStat;
size_t StatusSize;
clGetProgramBuildInfo(VertexProgram,DeviceList[0],CL_PROGRAM_BUILD_STATUS,0,NULL,&StatusSize);
BudStat=(cl_build_status*)malloc(StatusSize);
clGetProgramBuildInfo(VertexProgram,DeviceList[0],CL_PROGRAM_BUILD_STATUS,StatusSize,BudStat,NULL);
switch (*BudStat)
{
case CL_BUILD_NONE:
ConWrite("CODE: CL_BUILD_NONE.\n");
break;
case CL_BUILD_ERROR:
ConWrite("CODE: CL_BUILD_ERROR.\n");
break;
case CL_BUILD_SUCCESS:
ConWrite("CODE: CL_BUILD_SUCCESS.\n");
break;
case CL_BUILD_IN_PROGRESS:
ConWrite("CODE: CL_BUILD_IN_PROGRESS.\n");
default:
break;
}
char *Log;
size_t LogSize;
status=clGetProgramBuildInfo(VertexProgram,DeviceList[0],CL_PROGRAM_BUILD_LOG,0,NULL,&LogSize);
if(status==CL_SUCCESS)
{
ConWrite("CODE: CL_SUCCESS. OpenCL Program Build Infomation Obtained.\n");
}
else
{
switch(status)
{
case CL_INVALID_DEVICE:
ConWrite("CODE: CL_INVALID_DEVICE. ERROR: The Device is NOT in the List of Devices Associated with the Program.\n");
break;
case CL_INVALID_VALUE:
ConWrite("CODE: CL_INVALID_VALUE. ERROR: The Parameter Name is Invalid, OR the Size in Bytes Specified by Parameter's Value Size is Less Than Size of Return Type and Parameter Value is NOT NULL.\n");
break;
case CL_INVALID_PROGRAM:
ConWrite("CODE: CL_INVALID_PROGRAM. ERROR: The Program is an Invalid Program Object.\n");
break;
}
}
Log=(char*)malloc(LogSize+1);
Log[LogSize]='0';
clGetProgramBuildInfo(VertexProgram,DeviceList[0],CL_PROGRAM_BUILD_LOG,LogSize+1,Log,NULL);
ConWrite(Log);
Rasterization=clCreateKernel(VertexProgram,"VertexRenderer",NULL);
...
And Here is my kernel:
Renderer.h
#ifndef _1174_Renderer
#define _1174_Renderer
//------------------------------
const char *Cartography[]=
{
"#define COUNTER IdxVert\n",
"__kernel void VertexRenderer(",
"global float4 CamPos,", //X coordinate, Y coordinate, Z coordinate, SectorID
"global float4 CamAng,", //Horizontal Angle, Vertical Angle, Inclined Angle, Sight Angle
"global float4 CamNorV1,", //W represents horizontal resolution.
"global float4 CamNorV2,", //W represents vertical resolution.
"global float4 CamNorV3,", //W represents diagonal resolution.
"global float4 *Vertex,", //
"global uint IdxVert,",
"global uchar2 *ScrPos)\n", //
"{",
" half4 CpToV[COUNTER];", //CpToV.w is useless.
" int GID=(int)get_global_id(0);",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" CpToV[GID].xyz=Vertex[GID].xyz-CamPos.xyz;",
" half Distance[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" Distance[GID]=tan(acos((CamNorV1.x*CpToV[GID].x+CamNorV1.y*CpToV[GID].y+CamNorV1.z*CpToV[GID].z)*rsqrt(CamNorV1.x*CamNorV1.x+CamNorV1.y*CamNorV1.y+CamNorV1.z*CamNorV1.z)*rsqrt(CpToV[GID].x*CpToV[GID].x+CpToV[GID].y*CpToV[GID].y+CpToV[GID].z*CpToV[GID].z)))/tan(CamAng.w)*CamNorV3.w;",
" half Scale[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" Scale[GID]=(CamNorV1.x*CpToV[GID].x+CamNorV1.y*CpToV[GID].y+CamNorV1.z*CpToV[GID].z)/(CamNorV1.x*CamNorV1.x+CamNorV1.y*CamNorV1.y+CamNorV1.z*CamNorV1.z);",
" half4 MapVect[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" MapVect[GID].xyz=CpToV[GID].xyz-Scale*CamNorV1.xyz;",
" half Theta1[COUNTER];",
" half Theta2[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" Theta1[GID]=acos((CamNorV2.x*MapVect[GID].x+CamNorV2.y*MapVect[GID].y+CamNorV2.z*MapVect[GID].z)*rsqrt(CamNorV2.x*CamNorV2.x+CamNorV2.y*CamNorV2.y+CamNorV2.z*CamNorV2.z)*rsqrt(MapVect[GID].x*MapVect[GID].x+MapVect[GID].y*MapVect[GID].y+MapVect[GID].z*MapVect[GID].z));",
" Theta2[GID]=acos((CamNorV3.x*MapVect[GID].x+CamNorV3.y*MapVect[GID].y+CamNorV3.z*MapVect[GID].z)*rsqrt(CamNorV3.x*CamNorV3.x+CamNorV3.y*CamNorV3.y+CamNorV3.z*CamNorV3.z)*rsqrt(MapVect[GID].x*MapVect[GID].x+MapVect[GID].y*MapVect[GID].y+MapVect[GID].z*MapVect[GID].z));",
" half Theta[COUNTER];",
" constant half Pi=(half)3.1415926f;",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" (Theta1[GID]<=Pi/2)?(Theta[GID]=Theta2[GID]):(Theta[GID]=2*Pi-Theta2[GID]);",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" ScrPos[GID].x=(uchar)cos(Theta[GID])*Distance[GID]+CamNorV1.w;",
" ScrPos[GID].y=(uchar)sin(Theta[GID])*Distance[GID]+CamNorV2.w;",
"}"
"#define COUNTER Dlt\n",
"__kernel void Polarization(",
"global float4 *NmVect,",
"global float4 *AllVert,",
"global ushort4 *DltIdx,", //W represents the index of planar vectors of primarch.
"global uint Dlt)\n",
"{",
" int GID=(int)get_global_id(0);",
" half4 SPToCam[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" SPToCam[GID].xyz=CamPos.xyz-AllVert[DltIdx[GID].x].xyz;",
" half m[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" m[GID]=SPToCam[GID].x*NmVect[DltIdx[GID].w].x+SPToCam[GID].y*NmVect[DltIdx[GID].w].y+SPToCam[GID].z*NmVect[DltIdx[GID].w].z;",
" bool Polar[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" (m>0)?(Polar=true):(Polar=false);",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" ",
"}",
"__kernel void Hierarchization(",
"global ",
")\n",
"{",
" for(uint i=0;i<NumOfObj;i++){",
" for(uint k=0;k<NumOfLvInObj[IdxOfObj[i]];k++){",
" for(uint j=0;j<NumOfVtxInLv[k+IdxOfLv[IdxOfObj[i]]]-1;j++){",
" uint m=0;",
" (k==0)?():()",
" "
};
//------------------------------
#endif
Don't need care too much about kernel. All wrong...
And my hardware:
my desktop:
NVIDIA GeForce GTX 770
Intel(R) Core(TM) i7-4770 CPU #3.40GHz
Window 10
my laptop:
NVIDIA GeForce GT 750M
Intel(R) HD Graphics 4600
Intel(R) Core(TM) i7-4712HQ CPU #2.30GHz
Windows 8.1
Another question: When I run the program on my desktop, only Nvidia platform can be detected. OpenCL is also able to run on CPU, isn't it? Why Intel platform cannot be detected?

I am not sure, however, the second argument in clCreateProgramWithSource looks strange:
cl_program VertexProgram=clCreateProgramWithSource(Context,1,Cartography,NULL,NULL);
It should be a number of lines in your source code, so I suggest trying
cl_program VertexProgram=clCreateProgramWithSource(Context,sizeof(Cartography)/sizeof(Cartography[0]),Cartography,NULL,NULL);

Related

OpenCL could not found Intel HD 4000

I'll warn you in advance my written english it is not good, so please have some patience because I'll do a lot of errors.
I need to expose the graphic card in order to do some benchmark with parallel algorithms on finite element analysis. I downloaded the intel sdk at this link https://software.intel.com/en-us/intel-opencl .
I am using Ubuntu 16.10, so i followed all the instruction as explained in this post https://streamcomputing.eu/blog/2011-06-24/install-opencl-on-debianubuntu-orderly/ .
When i run a simple algorithm wich checks all the device, it only recognizes the cpu, failing to find the graphic card. The same program works well on a mac (because OpenCL is in the stack of course).
// includes...
int main(int argc, const char * argv[])
{
// See what standard OpenCL sees
std::vector<cl::Platform> platforms;
// Get platform
cl::Platform::get(&platforms);
// Temp
std::string s;
// Where the GPU lies
cl::Device gpudevice;
// Found a GPU
bool gpufound = false;
std::cout << "**** OPENCL ****" << std::endl;
// See if we have a GPU
for (auto p : platforms)
{
std::vector<cl::Device> devices;
p.getDevices(CL_DEVICE_TYPE_ALL, &devices);
for (auto d : devices)
{
std::size_t i = 4;
d.getInfo(CL_DEVICE_TYPE, &i);
std::cout << "> Device type " <<
(i & CL_DEVICE_TYPE_CPU ? "CPU" : "") <<
(i & CL_DEVICE_TYPE_GPU ? "GPU" : "") <<
(i & CL_DEVICE_TYPE_ACCELERATOR ? "ACCELERATOR" : "");
if (i & CL_DEVICE_TYPE_GPU)
{
gpudevice = d;
gpufound = true;
}
std::cout << " Version " << s << std::endl;
}
}
if (!gpufound)
{
std::cout << "NO GPU FOUND. ABORTING." << std::endl;
return 1;
}
// Do other things...
the output is:
/home/andrea/Dropbox/fem/SiNDy/clfem/cmake-build-debug/vector_sycl
**** OPENCL ****
> Device type CPU Version
NO GPU FOUND. ABORTING.
Process finished with exit code 1
I tried to add the current user in the video group, i also tried to install Intel Media Server Studio following the instructions coming with the package but I could not build the kernel because of some compile errors.
I also updated all the drivers with the automatic software update of Ubuntu, but still the GC is not found.
Maybe you want to try beignet, which is an OpenCL implementation for IvyBridge+ iGPUs. There are packages of beignet for Ubuntu 16.10. To be more precise, I think you are looking for the packages beignet-dev and beignet-opencl-icd. Test it yourself since I have no Ubuntu installation currently available. (However, beignet itself works pretty well on my Intel HD Graphics 520 and Antergos/Arch Linux)

OpenCV gives Assertion failed error when running on GPU using OpenCL

I have an Nvidia GTX 970M GPU & I am trying to run a face detection algorithm in c++ that runs on the GPU using OpenCL.
The function where this error occurs is :
ocl::OclCascadeClassifier::detectMultiScale()
The error I get is :
OpenCV Error: Assertion failed (localThreads[0] * localThreads[1] * localThreads[2] <= kernelWorkGroupSize) in cv::ocl::openCLVerifyKernel
I know that this problem is related to the GPU of the device but I do not know how to fix this. I have tried using OpenCV versions 2 and 3 but both give the same problem.
The problem was that it was trying to use the Intel HD Graphics GPU instead of the Nvidia GPU. I solved this by choosing the Nvidia GPU as the OpenCL Device.
The code I used was :
cv::ocl::DevicesInfo devInfo;
int res = cv::ocl::getOpenCLDevices(devInfo);
if (res == 0)
{
std::cerr << "There is no OPENCL Here !" << std::endl;
}
else
{
for (unsigned int i = 0; i < devInfo.size(); ++i)
{
std::cout << "Device : " << devInfo[i]->deviceName << " is present" << std::endl;
}
}
cv::ocl::setDevice(devInfo[1]);

Can't run any available OpenCL sample with recent API CL/cl2.hpp

I am trying to get started with OpenCL. After installing it, I have found a little oddity compared to almost all online tutorials, there was no cl.hpp header, only cl2.hpp. I have learned that it is a new version. A new version of API with little to no tutorials available.
The tutorials I have found failed to compile. This one (http://github.khronos.org/OpenCL-CLHPP/) for example failed to compile because of an undefined variable and if I avoided it, it reported that I had no OpenCL 2.0 devices. I was able to get this one past the device check (http://simpleopencl.blogspot.cz/2013/06/tutorial-simple-start-with-opencl-and-c.html), but it crashes when I try to create context (device was found).
Code I am trying:
std::vector<cl::Platform> all_platforms;
cl::Platform::get(&all_platforms);
if(all_platforms.size()==0){
std::cout<<" No platforms found. Check OpenCL installation!\n";
exit(1);
}
cl::Platform default_platform=all_platforms[0];
std::cout << "Using platform: "<<default_platform.getInfo<CL_PLATFORM_NAME>()<<"\n";
//get default device of the default platform
std::vector<cl::Device> all_devices;
default_platform.getDevices(CL_DEVICE_TYPE_ALL, &all_devices);
if(all_devices.size()==0){
std::cout<<" No devices found. Check OpenCL installation!\n";
exit(1);
}
cl::Device default_device=all_devices[0];
std::cout<< "Using device: "<<default_device.getInfo<CL_DEVICE_NAME>()<<"\n";
cl::Context context({default_device}); // null pointer read here
cl::Program::Sources sources;
// kernel calculates for each element C=A+B
std::string kernel_code=
" void kernel simple_add(global const int* A, global const int* B, global int* C){ "
" C[get_global_id(0)]=A[get_global_id(0)]+B[get_global_id(0)]; "
" } ";
sources.push_back({kernel_code.c_str(),kernel_code.length()});
cl::Program program(context,sources);
if(program.build({default_device})!=CL_SUCCESS){
std::cout<<" Error building: "<<program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(default_device)<<"\n";
exit(1);
}
// create buffers on the device
cl::Buffer buffer_A(context,CL_MEM_READ_WRITE,sizeof(int)*10);
cl::Buffer buffer_B(context,CL_MEM_READ_WRITE,sizeof(int)*10);
cl::Buffer buffer_C(context,CL_MEM_READ_WRITE,sizeof(int)*10);
int A[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
int B[] = {0, 1, 2, 0, 1, 2, 0, 1, 2, 0};
//create queue to which we will push commands for the device.
cl::CommandQueue queue(context,default_device);
//write arrays A and B to the device
queue.enqueueWriteBuffer(buffer_A,CL_TRUE,0,sizeof(int)*10,A);
queue.enqueueWriteBuffer(buffer_B,CL_TRUE,0,sizeof(int)*10,B);
//run the kernel
cl::KernelFunctor simple_add(cl::Kernel(program,"simple_add"),queue,cl::NullRange,cl::NDRange(10),cl::NullRange);
simple_add(buffer_A,buffer_B,buffer_C);
//alternative way to run the kernel
cl::Kernel kernel_add=cl::Kernel(program,"simple_add");
kernel_add.setArg(0,buffer_A);
kernel_add.setArg(1,buffer_B);
kernel_add.setArg(2,buffer_C);
queue.enqueueNDRangeKernel(kernel_add,cl::NullRange,cl::NDRange(10),cl::NullRange);
queue.finish();
int C[10];
//read result C from the device to array C
queue.enqueueReadBuffer(buffer_C,CL_TRUE,0,sizeof(int)*10,C);
std::cout<<" result: \n";
for(int i=0;i<10;i++){
std::cout<<C[i]<<" ";
}
I am using Intel Gen OCL Driver on Intel(R) HD Graphics IvyBridge M GT2, on Ubuntu 16.04.
Any idea what am I doing wrong?
If the driver doesn't support CL2.0, you need to "lower the expectation" of cl2.hpp by setting the minimum and target versions of CL to the relevant version (e.g. 1.2):
#define CL_HPP_MINIMUM_OPENCL_VERSION 120
#define CL_HPP_TARGET_OPENCL_VERSION 120
#include <CL/cl2.hpp>
That way, the code inside cl2.hpp will compile for a CL1.2 environment.

cuModuleLoadDataEx ignores all options

This question is similar to cuModuleLoadDataEx options but I would like to bring the topic up again and in addition provide more information.
When loading a PTX string with the NV driver via cuModuleLoadDataEx it seems to ignore all options all together. I provide full working examples so that anyone interested can directly and with no effort reproduce this. First a small PTX kernel (save this as small.ptx) then the C++ program that loads the PTX kernel.
.version 3.1
.target sm_20, texmode_independent
.address_size 64
.entry main()
{
ret;
}
main.cc
#include<cstdlib>
#include<iostream>
#include<fstream>
#include<sstream>
#include<string>
#include<map>
#include "cuda.h"
int main(int argc,char *argv[])
{
CUdevice cuDevice;
CUcontext cuContext;
CUfunction func;
CUresult ret;
CUmodule cuModule;
cuInit(0);
std::cout << "trying to get device 0\n";
ret = cuDeviceGet(&cuDevice, 0);
if (ret != CUDA_SUCCESS) { exit(1);}
std::cout << "trying to create a context\n";
ret = cuCtxCreate(&cuContext, 0, cuDevice);
if (ret != CUDA_SUCCESS) { exit(1);}
std::cout << "loading PTX string from file " << argv[1] << "\n";
std::ifstream ptxfile( argv[1] );
std::stringstream buffer;
buffer << ptxfile.rdbuf();
ptxfile.close();
std::string ptx_kernel = buffer.str();
std::cout << "Loading PTX kernel with driver\n" << ptx_kernel;
const unsigned int jitNumOptions = 3;
CUjit_option *jitOptions = new CUjit_option[jitNumOptions];
void **jitOptVals = new void*[jitNumOptions];
// set up size of compilation log buffer
jitOptions[0] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
int jitLogBufferSize = 1024*1024;
jitOptVals[0] = (void *)&jitLogBufferSize;
// set up pointer to the compilation log buffer
jitOptions[1] = CU_JIT_INFO_LOG_BUFFER;
char *jitLogBuffer = new char[jitLogBufferSize];
jitOptVals[1] = jitLogBuffer;
// set up wall clock time
jitOptions[2] = CU_JIT_WALL_TIME;
float jitTime = -2.0;
jitOptVals[2] = &jitTime;
ret = cuModuleLoadDataEx( &cuModule , ptx_kernel.c_str() , jitNumOptions, jitOptions, (void **)jitOptVals );
if (ret != CUDA_SUCCESS) { exit(1);}
std::cout << "walltime: " << jitTime << "\n";
std::cout << std::string(jitLogBuffer) << "\n";
}
Build (assuming CUDA is installed under /usr/local/cuda, I use CUDA 5.0):
g++ -I/usr/local/cuda/include -L/usr/local/cuda/lib64/ main.cc -o main -lcuda
If someone is able to extract any sensible information from the compilation process that would be great! The documentation of CUDA driver API where cuModuleLoadDataEx is explained (and which options it is supposed to accept) http://docs.nvidia.com/cuda/cuda-driver-api/index.html
If I run this, the log is empty and jitTime wasn't even touched by the NV driver:
./main small.ptx
trying to get device 0
trying to create a context
loading PTX string from file empty.ptx
Loading PTX kernel with driver
.version 3.1
.target sm_20, texmode_independent
.address_size 64
.entry main()
{
ret;
}
walltime: -2
EDIT:
I managed to get the JIT compile time. However it seems that the driver expects an array of 32bit values as OptVals. Not as stated in the manual as an array of pointers (void *) which are on my system 64 bits. So, this works:
const unsigned int jitNumOptions = 1;
CUjit_option *jitOptions = new CUjit_option[jitNumOptions];
int *jitOptVals = new int[jitNumOptions];
jitOptions[0] = CU_JIT_WALL_TIME;
// here the call to cuModuleLoadDataEx
std::cout << "walltime: " << (float)jitOptions[0] << "\n";
I believe that it is not possible to do the same with an array of void *. The following code does not work:
const unsigned int jitNumOptions = 1;
CUjit_option *jitOptions = new CUjit_option[jitNumOptions];
void **jitOptVals = new void*[jitNumOptions];
jitOptions[0] = CU_JIT_WALL_TIME;
// here the call to cuModuleLoadDataEx
// here I also would have a problem casting a 64 bit void * to a float (32 bit)
EDIT
Looking at the JIT compilation time jitOptVals[0] was misleading. As mentioned in the comments, the JIT compiler caches previous translations and won't update the JIT compile time if it finds a cached compilation. Since I was looking whether this value has changed or not I assumed that the call ignores the options all together. Which it doesn't. It's works fine.
Your jitOptVals should not contain pointers to your values, instead cast the values to void*:
// set up size of compilation log buffer
jitOptions[0] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
int jitLogBufferSize = 1024*1024;
jitOptVals[0] = (void *)jitLogBufferSize;
// set up pointer to the compilation log buffer
jitOptions[1] = CU_JIT_INFO_LOG_BUFFER;
char *jitLogBuffer = new char[jitLogBufferSize];
jitOptVals[1] = jitLogBuffer;
// set up wall clock time
jitOptions[2] = CU_JIT_WALL_TIME;
float jitTime = -2.0;
//Keep jitOptVals[2] empty as it only an Output value:
//jitOptVals[2] = (void*)jitTime;
and after cuModuleLoadDataEx, you get your jitTime like jitTime = (float)jitOptions[2];

glutWarpPointer crash

I'm following these tutorials on modern OpenGL. I've done them up to number 15 "Camera Control - Part 2". The tutorial suggests using glutWarpPointer(). The problem is, my program crashes at that call. This is my code:
c_camera::c_camera(int width, int height, const c_vector3f& Pos, const c_vector3f& Target, const c_vector3f& Up){
m_windowWidth = width;
m_windowHeight = height;
m_pos = Pos;
m_target = Target;
m_target.Normalize();
m_up = Up;
m_up.Normalize();
Init();
}
void c_camera::Init(){
c_vector3f HTarget(m_target.x, 0.0, m_target.z);
HTarget.Normalize();
if (HTarget.z >= 0.0f){
if (HTarget.x >= 0.0f){
m_AngleH = 360.0f - (asin(HTarget.z) TO_DEG);
} else {
m_AngleH = 180.0f + (asin(HTarget.z) TO_DEG);
}
} else {
if (HTarget.x >= 0.0f){
m_AngleH = (asin(-HTarget.z) TO_DEG);
} else {
m_AngleH = 90.0f + (asin(-HTarget.z) TO_DEG);
}
}
m_AngleV = -(asin(m_target.y) TO_DEG);
m_OnUpperEdge = false;
m_OnLowerEdge = false;
m_OnLeftEdge = false;
m_OnRightEdge = false;
m_mousePos.x = m_windowWidth / 2;
m_mousePos.y = m_windowHeight / 2;
cout << "this gets printed just fine" << endl;
glutWarpPointer(500,400); //program crashes
cout << "this doesn't get printed" << endl;
}
I'm not sure if I'm doing something weird here, or if I just have a bad glut version (seems unlikely to me) or if the tutorial is just wrong... Do I need to set up something glut specific before I can call glutWarpPointer()? I am new to glut, and new to modern OpenGL (I learned immediate mode first).
A quick google search didn't help me much. Any help would be appreciated.
Edit: I am on windows, and I'm using mingw 4.5
Edit2: These are the details windows gives me about the crash:
Problem Event Name: APPCRASH
Application Name: modern_opengl.exe
Application Version: 0.0.0.0
Application Timestamp: 51044575
Fault Module Name: glut32.dll
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 3bea4ff3
Exception Code: c0000005
Exception Offset: 0000a879
OS Version: 6.2.9200.2.0.0.256.48
Locale ID: 1043
Additional Information 1: 5861
Additional Information 2: 5861822e1919d7c014bbb064c64908b2
Additional Information 3: f3d5
Additional Information 4: f3d5be0cad2787556264647dc02181c3
Edit3: This is my call stack:
0 1000A879 glutWarpPointer() (C:\Windows\system\glut32.dll:??)
1 004033FB c_camera::Init(this=0x4aa0e0) (C:\CodeBlocks\projects\modern_opengl\c_camera.cpp:50)
2 00403164 c_camera::c_camera(this=0x4aa0e0, width=800, height=600, Pos=..., Target=..., Up=...) (C:\CodeBlocks\projects\modern_opengl\c_camera.cpp:18)
3 00402F4B __static_initialization_and_destruction_0(__initialize_p=1, __priority=65535) (C:\CodeBlocks\projects\modern_opengl\main.cpp:55)
4 00403004 GLOBAL_sub_I_vertices() (C:\CodeBlocks\projects\modern_opengl\main.cpp:177)
5 0043595B __do_global_ctors() (../mingw/gccmain.c:59)
6 00401098 __mingw_CRTStartup() (../mingw/crt1.c:236)
7 00401284 mainCRTStartup() (../mingw/crt1.c:264)
Your function seems to be in c_camera::Init, which seems to be called before main probably due to it being instantiated as a global object (globals are constructed before main is entered). You should delay glut calls till after you enter main and called glutInit is called.