We have a working library that uses LibTorch 1.5.0, built with CUDA 10.0 which runs as expected.
We are working on upgrading to CUDA 10.2 for various non-PyTorch related reasons. We noticed that when we run LibTorch inference on the newly compiled LibTorch (compiled exactly the same, except changing to CUDA 10.2), the runtime is about 20x slower.
We also checked it using the precompiled binaries.
This was tested on 3 different machines using 3 different GPUs (Tesla T4, GTX980 & P1000) and all gives consistent ~20x slower on CUDA 10.2
(Both on Windows 10 & Ubuntu 16.04), all with the latest drivers and on 3 different torch scripts (of the same architecture)
I've simplified the code to be extremely minimal without external dependencies other than Torch
int main(int argc, char** argv)
{
// Initialize CUDA device 0
cudaSetDevice(0);
std::string networkPath = DEFAULT_TORCH_SCRIPT;
if (argc > 1)
{
networkPath = argv[1];
}
auto jitModule = std::make_shared<torch::jit::Module>(torch::jit::load(networkPath, torch::kCUDA));
if (jitModule == nullptr)
{
std::cerr << "Failed creating module" << std::endl;
return EXIT_FAILURE;
}
// Meaningless data, just something to pass to the module to run on
// PATCH_HEIGHT & WIDTH are defined as 256
uint8_t* data = new uint8_t[PATCH_HEIGHT * PATCH_WIDTH * 3];
memset(data, 0, PATCH_HEIGHT * PATCH_WIDTH * 3);
auto stream = at::cuda::getStreamFromPool(true, 0);
bool res = infer(jitModule, stream, data, PATCH_WIDTH, PATCH_HEIGHT);
std::cout << "Warmed up" << std::endl;
res = infer(jitModule, stream, data, PATCH_WIDTH, PATCH_HEIGHT);
delete[] data;
return 0;
}
// Inference function
bool infer(std::shared_ptr<JitModule>& jitModule, at::cuda::CUDAStream& stream, const uint8_t* inputData, int width, int height)
{
std::vector<torch::jit::IValue> tensorInput;
// This function simply uses cudaMemcpy to copy to device and create a torch::Tensor from that data
// I can paste it if it's relevant but didn't now to keep as clean as possible
if (!prepareInput(inputData, width, height, tensorInput, stream))
{
return false;
}
// Reduce memory usage, without gradients
torch::NoGradGuard noGrad;
{
at::cuda::CUDAStreamGuard streamGuard(stream);
auto totalTimeStart = std::chrono::high_resolution_clock::now();
jitModule->forward(tensorInput);
// The synchronize here is just for timing sake, not use in production
cudaStreamSynchronize(stream.stream());
auto totalTimeStop = std::chrono::high_resolution_clock::now();
printf("forward sync time = %.3f milliseconds\n",
std::chrono::duration<double, std::milli>(totalTimeStop - totalTimeStart).count());
}
return true;
}
When compiling this with Torch that was compiled using CUDA 10.0 we get a runtime of 18 ms and when we run it with Torch compiled with CUDA 10.2, we get a runtime of 430 ms
Any thoughts on that?
This issue was also posted on PyTorch Forums.
Issue on GitHub
UPDATE
I profiled this small program using both CUDAs
It seems that both use very different kernels
96.5% of the 10.2 computes are conv2d_grouped_direct_kernel which takes ~60-100ms on my P1000
where as the top kernels in the 10.0 run are
47.1% - cudnn::detail::implicit_convolve_sgemm (~1.5 ms)
23.1% - maxwell_scudnn_winograd_128x128_ldg1_ldg4_tile148n_nt (~0.4 ms)
8.5% - maxwell_scudnn_128x32_relu_small_nn (~0.4ms)
so it's easy to see where the time difference comes from. Now the question is, why.
I'll warn you in advance my written english it is not good, so please have some patience because I'll do a lot of errors.
I need to expose the graphic card in order to do some benchmark with parallel algorithms on finite element analysis. I downloaded the intel sdk at this link https://software.intel.com/en-us/intel-opencl .
I am using Ubuntu 16.10, so i followed all the instruction as explained in this post https://streamcomputing.eu/blog/2011-06-24/install-opencl-on-debianubuntu-orderly/ .
When i run a simple algorithm wich checks all the device, it only recognizes the cpu, failing to find the graphic card. The same program works well on a mac (because OpenCL is in the stack of course).
// includes...
int main(int argc, const char * argv[])
{
// See what standard OpenCL sees
std::vector<cl::Platform> platforms;
// Get platform
cl::Platform::get(&platforms);
// Temp
std::string s;
// Where the GPU lies
cl::Device gpudevice;
// Found a GPU
bool gpufound = false;
std::cout << "**** OPENCL ****" << std::endl;
// See if we have a GPU
for (auto p : platforms)
{
std::vector<cl::Device> devices;
p.getDevices(CL_DEVICE_TYPE_ALL, &devices);
for (auto d : devices)
{
std::size_t i = 4;
d.getInfo(CL_DEVICE_TYPE, &i);
std::cout << "> Device type " <<
(i & CL_DEVICE_TYPE_CPU ? "CPU" : "") <<
(i & CL_DEVICE_TYPE_GPU ? "GPU" : "") <<
(i & CL_DEVICE_TYPE_ACCELERATOR ? "ACCELERATOR" : "");
if (i & CL_DEVICE_TYPE_GPU)
{
gpudevice = d;
gpufound = true;
}
std::cout << " Version " << s << std::endl;
}
}
if (!gpufound)
{
std::cout << "NO GPU FOUND. ABORTING." << std::endl;
return 1;
}
// Do other things...
the output is:
/home/andrea/Dropbox/fem/SiNDy/clfem/cmake-build-debug/vector_sycl
**** OPENCL ****
> Device type CPU Version
NO GPU FOUND. ABORTING.
Process finished with exit code 1
I tried to add the current user in the video group, i also tried to install Intel Media Server Studio following the instructions coming with the package but I could not build the kernel because of some compile errors.
I also updated all the drivers with the automatic software update of Ubuntu, but still the GC is not found.
Maybe you want to try beignet, which is an OpenCL implementation for IvyBridge+ iGPUs. There are packages of beignet for Ubuntu 16.10. To be more precise, I think you are looking for the packages beignet-dev and beignet-opencl-icd. Test it yourself since I have no Ubuntu installation currently available. (However, beignet itself works pretty well on my Intel HD Graphics 520 and Antergos/Arch Linux)
I get this error when I try to execute my program:
libGL error: unable to load driver: i965_dri.so
libGL error: driver pointer missing
libGL error: failed to load driver: i965
libGL error: unable to load driver: swrast_dri.so
libGL error: failed to load driver: swrast
X Error of failed request: GLXBadFBConfig
Major opcode of failed request: 154 (GLX)
Minor opcode of failed request: 34 ()
Serial number of failed request: 42
Current serial number in output stream: 41
My Code (I took it from the "OpenGL Development Cookbook" book) :
#include <GL/glew.h>
#include <GL/freeglut.h>
#include <iostream>
const int WIDTH = 640;
const int HEIGHT = 480;
void OnInit()
{
glClearColor(1, 0, 0, 0);
std::cout << "Initialization successfull" << std::endl;
}
void OnShutdown()
{
std::cout << "Shutdown successfull" << std::endl;
}
void OnResize(int nw, int nh)
{
}
void OnRender()
{
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glutSwapBuffers();
}
int main(int argc, char** argv)
{
glutInit(&argc, argv);
glutInitDisplayMode(GLUT_DEPTH | GLUT_DOUBLE | GLUT_RGBA);
glutInitContextVersion(3, 3);
glutInitContextFlags(GLUT_CORE_PROFILE | GLUT_DEBUG);
glutInitContextProfile(GLUT_FORWARD_COMPATIBLE);
glutInitWindowSize(WIDTH, HEIGHT);
glutCreateWindow("OpenGL");
glewExperimental = GL_TRUE;
GLenum err = glewInit();
if(GLEW_OK != err) {std::cerr << "Error: " << glewGetErrorString(err) << std::endl; }
else{if(GLEW_VERSION_3_3) {std::cout << "Driver supports OpenGL 3.3\n Details: " << std::endl; }}
std::cout << "\tUsing glew: " << glewGetString(GLEW_VERSION) << std::endl;
std::cout << "\tVendor: " << glGetString(GL_VENDOR) << std::endl;
std::cout << "\tRenderer: " << glGetString(GL_RENDERER) << std::endl;
std::cout << "\tGLSL: " << glGetString(GL_SHADING_LANGUAGE_VERSION) << std::endl;
OnInit();
glutCloseFunc(OnShutdown);
glutDisplayFunc(OnRender);
glutReshapeFunc(OnResize);
glutMainLoop();
return 0;
}
I verified if my driver supports the OpenGL version I am using with the glxinfo | grep "OpenGL" command:
OpenGL vendor string: Intel Open Source Technology Center
OpenGL renderer string: Mesa DRI Intel(R) Sandybridge Mobile
OpenGL core profile version string: 3.3 (Core Profile) Mesa 10.5.9
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 10.5.9
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
I am using Ubuntu 14.04.3.
I'm not sure but I think I get this error because I am using intel and not Nvidia.
It's hard to tell from a distance, but the errors you have there look like a damaged OpenGL client library installation. glxinfo queries the GLX driver loaded into the Xorg server, which is somewhat independent from the installed libGL (as long as only indirect rendering calls are made). The errors indicate that the installed libGL either doesn't match the DRI drivers or the DRI libraries are damaged.
Either way, the best course of action is to do a clean reinstall of everything related to OpenGL on your system. I.e. do a forced reinstall of xorg-server, xf86-video-…, mesa, libdri… and so on.
I faced a very similar error:
X Error of failed request: GLXBadFBConfig
Major opcode of failed request: 154 (GLX)
Minor opcode of failed request: 34 ()
Serial number of failed request: 42
Current serial number in output stream: 41
Removing the following line solved it:
glutInitContextVersion(3, 3);
I'm now working on feature matching, and I wrote these codes below.
void CLPS::ExtractKeyPoints()
{
vector<DMatch> init_matches;
ULONGLONG timer;
timer = GetTickCount64();
Ptr<BRISK> fd_a = BRISK::create(20);
fd_a->detectAndCompute(S, Mat(), spoints, desc_s, false);
fd_a->detectAndCompute(T, Mat(), tpoints, desc_t, false);
timer = GetTickCount64() - timer;
cout << "Detect and compute: " << timer << " milliseconds complete. Output: " << spoints.size() << " points. \n";
timer = GetTickCount64();
FlannBasedMatcher matcher = FlannBasedMatcher(cv::makePtr<cv::flann::LshIndexParams>(10, 32, 2));
matcher.match(desc_s, desc_t, init_matches);
timer = GetTickCount64() - timer;
cout << "Matching time: " << timer << " milliseconds complete. " << endl;
Point2f spt, tpt;
int num_matches = 0;
vector<DMatch>::iterator iter_matches;
coord_matches.create(1, 3, CV_32FC1);
Mat cm_row;
cm_row.create(1, 3, CV_32FC1);
if (!init_matches.empty())
{
for (iter_matches = init_matches.begin(); iter_matches != init_matches.end(); ++iter_matches)
{
spt = spoints[iter_matches->queryIdx].pt;
tpt = tpoints[iter_matches->trainIdx].pt;
if (std::abs(spt.y - tpt.y) < VERTICAL_EPSILON && tpt.x - spt.x < upperbound && tpt.x - spt.x > lowerbound)
{
num_matches++;
coord_matches.reserve(num_matches);
cm_row.at<float>(0, 0) = spt.x;// 与 CV_32FC1 对应
cm_row.at<float>(0, 1) = spt.y;
cm_row.at<float>(0, 2) = tpt.x - spt.x;
coord_matches.push_back(cm_row);
}
}
cout << coord_matches.rows << " matches found." << endl;
}
else
{
cerr << "No match is found. Validate your source data or report to us if there is a bug. " << endl;
}
}
Then I just call it like this:
CLPS lps;
lps.ExtractKeyPoints();
However, it triggers an exception with following message when this function is returning:
Unhandled exception at 0x00007FFC0797D328 (ucrtbase.dll) in dars_lps.exe: An invalid parameter was passed to a function that considers invalid parameters fatal.
where dars_lps.exe is my application name. Then the program jumps to some destructors in <vector>.
I'm using Visual Studio 2015 on Windows 8.1 Update 1, and my OpenCV version is 3.1. I have confirmed that I'm linking to correct version of library files (i.e. vc14).
I was once working on Visual Studio 2010 on Windows 7 SP1, OpenCV 2.4.9, but it never reported such an exception.
I know this question may be similar to some other questions elsewhere (e.g. this on OpenCV site), occuring when calling different functions, but all of them are staying unsolved. I suspect that the problem lies in BRISK or FlannBasedMatcher, but I can't comment out those code (or this function will be virtually empty, and obviously, no more exceptions).
I also noticed that the problem was rising up after OpenCV 3.0 was released, and most of the similar problems happen in newer versions of Visual Studio or Windows. Both 64-bit and 32-bit platforms have this kind of problems. There does exist a report of such problem in Visual Studio 2015 here, but it was OpenCV 3.0 that is used, when there were no library built for Visual Studio 2015 yet.
Is it still a bug to be fixed or an error of syntax by myself?
I'm a new one working on OpenCL. I have some weird trouble when I try to compile the kernel.
On Nvidia platform, no matter what code in the source, it always show me cl_success and the log is only "\n"; On Intel platform, no matter what code in the source, clBuildProgram returns CL_INVALID_BINARY, clGetProgramBuildInfo with CL_PROGRAM_BUILD_STATUS returns CL_ERROR and the log looks find no mistake:
fcl build 1 succeeded.\n fcl build 2 succeeded.\n bcl build succeeded.\n.
Due to this is my first piece of complicated kernel codes, I know it fills tons of mistakes. However, this doesn't look like an error of codes. Why the compiler shows some contradictory information?
Here is my code:
The Codes are long, I just post parts that might be associated with. "..." means something skipped. Ask the rest if you need.
DrawProcess.ccp
#include <stdlib.h>
#include "Console.h"
#include "Renderer.h"
#include "Object.h"
#include "TertiaryArithmeticAlgorithms.h"
#define CL_USE_DEPRECATED_OPENCL_2_0_APIS
#if defined(__APPLE__) || defined(__MACOSX)
#include <OpenCL/cl.hpp>
#else
#include <CL/cl.h>
#endif
#include "Camera.h"
cl_command_queue CommandQueue;
cl_mem BufIdx[8];
cl_kernel Rasterization;
bool Initialization()
{
ConWrite("======== OpenCL Initializing ========\n");
//
cl_platform_id ThePlatformID=NULL;
cl_uint NumPlatforms;
cl_int status;
if(CL_INVALID_VALUE==clGetPlatformIDs(NULL,NULL,&NumPlatforms))
{
ConWrite("ERROR: Fail to Get the Number of Available Items in Platform List! The Number of Available Items in Platform List Equal to 0 and Platform List is NULL OR Both Platform List and the Exact Number of Items in Platform List are NULL.\n");
ConWrite("=== OpenCL Initialization Failed! ===\n");
return 1;
}
else
{
ConWrite("The Number of Items in Platform List is ");
ConWrite(&NumPlatforms);
ConWrite(".\n");
}
//
cl_platform_id *PlatformList;
if(NumPlatforms>0)
{
PlatformList=(cl_platform_id*)malloc(NumPlatforms*sizeof(cl_platform_id));
if(CL_INVALID_VALUE==clGetPlatformIDs(NumPlatforms,PlatformList,NULL))
{
ConWrite("ERROR: Fail to Get the Platform List! The Number of Available Items in Platform List Equal to 0 and Platform List is NULL OR Both Platform List and the Exact Number of Items in Platform List are NULL.\n");
ConWrite("=== OpenCL Initialization Failed! ===\n");
return 1;
}
else
{
ConWrite("Platform List Obtained.\n");
}
}
else
{
ConWrite("ERROR: The Number of Available Items in Platform List is not Greater than 0!\n");
ConWrite("=== OpenCL Initialization Failed! ===\n");
return 1;
}
...
cl_program VertexProgram=clCreateProgramWithSource(Context,1,Cartography,NULL,NULL);
status=clBuildProgram(VertexProgram,LengthOfDevices/sizeof(cl_device_id*),DeviceList,NULL,NULL,NULL);
if(CL_SUCCESS==status)
{
ConWrite("CODE: CL_SUCCESS. OpenCL Program Built.\n");
}
else
{
switch(status)
{
case CL_INVALID_PROGRAM:
ConWrite("CODE: CL_INVALID_PROGRAM. ERROR: The Program is an Invalid Program Object!\n");
break;
case CL_INVALID_VALUE:
ConWrite("CODE: CL_INVALID_VALUE. ERROR: Device List is Unavailable and the Number of Devices is Greater Than Zero, OR Device List is NOT NULL and the Number of Devices is Zero, OR the Pointer to Notify is NULL But User Data is NOT NULL!\n");
break;
case CL_INVALID_DEVICE:
ConWrite("CODE: CL_INVALID_DEVICE. ERROR: OpenCL Devices listed in the Device List are NOT in the List of Devices Associated with the Program!\n");
break;
case CL_INVALID_BINARY:
ConWrite("CODE: CL_INVALID_BINARY. ERROR: The Program was Created with Binary and Devices Listed in the Device List do NOT Have a Valid Binary Program!\n");
break;
case CL_INVALID_BUILD_OPTIONS:
ConWrite("CODE: CL_INVALID_BUILD_OPTIONS. ERROR: The Build Options Specified by Options are Invalid!\n");
break;
case CL_INVALID_OPERATION:
ConWrite("CODE: CL_INVALID_OPERATION. ERROR: The Build of the Program Executable for Any of the Devices Listed in the Device List by a Previous Call to the Function for the Program has NOT Completed!\n");
break;
//case CL_COMPILER_NOT_AVAILABLE: if program is created with clCreateProgramWithSource and a compiler is not available i.e. CL_DEVICE_COMPILER_AVAILABLE specified in the table of OpenCL Device Queries for clGetDeviceInfo is set to CL_FALSE.
//case CL_BUILD_PROGRAM_FAILURE: if there is a failure to build the program executable. This error will be returned if clBuildProgram does not return until the build has completed.
//case CL_INVALID_OPERATION: if there are kernel objects attached to program.
//case CL_OUT_OF_HOST_MEMORY: if there is a failure to allocate resources required by the OpenCL implementation on the host.
}
}
cl_build_status *BudStat;
size_t StatusSize;
clGetProgramBuildInfo(VertexProgram,DeviceList[0],CL_PROGRAM_BUILD_STATUS,0,NULL,&StatusSize);
BudStat=(cl_build_status*)malloc(StatusSize);
clGetProgramBuildInfo(VertexProgram,DeviceList[0],CL_PROGRAM_BUILD_STATUS,StatusSize,BudStat,NULL);
switch (*BudStat)
{
case CL_BUILD_NONE:
ConWrite("CODE: CL_BUILD_NONE.\n");
break;
case CL_BUILD_ERROR:
ConWrite("CODE: CL_BUILD_ERROR.\n");
break;
case CL_BUILD_SUCCESS:
ConWrite("CODE: CL_BUILD_SUCCESS.\n");
break;
case CL_BUILD_IN_PROGRESS:
ConWrite("CODE: CL_BUILD_IN_PROGRESS.\n");
default:
break;
}
char *Log;
size_t LogSize;
status=clGetProgramBuildInfo(VertexProgram,DeviceList[0],CL_PROGRAM_BUILD_LOG,0,NULL,&LogSize);
if(status==CL_SUCCESS)
{
ConWrite("CODE: CL_SUCCESS. OpenCL Program Build Infomation Obtained.\n");
}
else
{
switch(status)
{
case CL_INVALID_DEVICE:
ConWrite("CODE: CL_INVALID_DEVICE. ERROR: The Device is NOT in the List of Devices Associated with the Program.\n");
break;
case CL_INVALID_VALUE:
ConWrite("CODE: CL_INVALID_VALUE. ERROR: The Parameter Name is Invalid, OR the Size in Bytes Specified by Parameter's Value Size is Less Than Size of Return Type and Parameter Value is NOT NULL.\n");
break;
case CL_INVALID_PROGRAM:
ConWrite("CODE: CL_INVALID_PROGRAM. ERROR: The Program is an Invalid Program Object.\n");
break;
}
}
Log=(char*)malloc(LogSize+1);
Log[LogSize]='0';
clGetProgramBuildInfo(VertexProgram,DeviceList[0],CL_PROGRAM_BUILD_LOG,LogSize+1,Log,NULL);
ConWrite(Log);
Rasterization=clCreateKernel(VertexProgram,"VertexRenderer",NULL);
...
And Here is my kernel:
Renderer.h
#ifndef _1174_Renderer
#define _1174_Renderer
//------------------------------
const char *Cartography[]=
{
"#define COUNTER IdxVert\n",
"__kernel void VertexRenderer(",
"global float4 CamPos,", //X coordinate, Y coordinate, Z coordinate, SectorID
"global float4 CamAng,", //Horizontal Angle, Vertical Angle, Inclined Angle, Sight Angle
"global float4 CamNorV1,", //W represents horizontal resolution.
"global float4 CamNorV2,", //W represents vertical resolution.
"global float4 CamNorV3,", //W represents diagonal resolution.
"global float4 *Vertex,", //
"global uint IdxVert,",
"global uchar2 *ScrPos)\n", //
"{",
" half4 CpToV[COUNTER];", //CpToV.w is useless.
" int GID=(int)get_global_id(0);",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" CpToV[GID].xyz=Vertex[GID].xyz-CamPos.xyz;",
" half Distance[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" Distance[GID]=tan(acos((CamNorV1.x*CpToV[GID].x+CamNorV1.y*CpToV[GID].y+CamNorV1.z*CpToV[GID].z)*rsqrt(CamNorV1.x*CamNorV1.x+CamNorV1.y*CamNorV1.y+CamNorV1.z*CamNorV1.z)*rsqrt(CpToV[GID].x*CpToV[GID].x+CpToV[GID].y*CpToV[GID].y+CpToV[GID].z*CpToV[GID].z)))/tan(CamAng.w)*CamNorV3.w;",
" half Scale[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" Scale[GID]=(CamNorV1.x*CpToV[GID].x+CamNorV1.y*CpToV[GID].y+CamNorV1.z*CpToV[GID].z)/(CamNorV1.x*CamNorV1.x+CamNorV1.y*CamNorV1.y+CamNorV1.z*CamNorV1.z);",
" half4 MapVect[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" MapVect[GID].xyz=CpToV[GID].xyz-Scale*CamNorV1.xyz;",
" half Theta1[COUNTER];",
" half Theta2[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" Theta1[GID]=acos((CamNorV2.x*MapVect[GID].x+CamNorV2.y*MapVect[GID].y+CamNorV2.z*MapVect[GID].z)*rsqrt(CamNorV2.x*CamNorV2.x+CamNorV2.y*CamNorV2.y+CamNorV2.z*CamNorV2.z)*rsqrt(MapVect[GID].x*MapVect[GID].x+MapVect[GID].y*MapVect[GID].y+MapVect[GID].z*MapVect[GID].z));",
" Theta2[GID]=acos((CamNorV3.x*MapVect[GID].x+CamNorV3.y*MapVect[GID].y+CamNorV3.z*MapVect[GID].z)*rsqrt(CamNorV3.x*CamNorV3.x+CamNorV3.y*CamNorV3.y+CamNorV3.z*CamNorV3.z)*rsqrt(MapVect[GID].x*MapVect[GID].x+MapVect[GID].y*MapVect[GID].y+MapVect[GID].z*MapVect[GID].z));",
" half Theta[COUNTER];",
" constant half Pi=(half)3.1415926f;",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" (Theta1[GID]<=Pi/2)?(Theta[GID]=Theta2[GID]):(Theta[GID]=2*Pi-Theta2[GID]);",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" ScrPos[GID].x=(uchar)cos(Theta[GID])*Distance[GID]+CamNorV1.w;",
" ScrPos[GID].y=(uchar)sin(Theta[GID])*Distance[GID]+CamNorV2.w;",
"}"
"#define COUNTER Dlt\n",
"__kernel void Polarization(",
"global float4 *NmVect,",
"global float4 *AllVert,",
"global ushort4 *DltIdx,", //W represents the index of planar vectors of primarch.
"global uint Dlt)\n",
"{",
" int GID=(int)get_global_id(0);",
" half4 SPToCam[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" SPToCam[GID].xyz=CamPos.xyz-AllVert[DltIdx[GID].x].xyz;",
" half m[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" m[GID]=SPToCam[GID].x*NmVect[DltIdx[GID].w].x+SPToCam[GID].y*NmVect[DltIdx[GID].w].y+SPToCam[GID].z*NmVect[DltIdx[GID].w].z;",
" bool Polar[COUNTER];",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" (m>0)?(Polar=true):(Polar=false);",
" mem_fence(CLK_GLOBAL_MEM_FENCE);",
" ",
"}",
"__kernel void Hierarchization(",
"global ",
")\n",
"{",
" for(uint i=0;i<NumOfObj;i++){",
" for(uint k=0;k<NumOfLvInObj[IdxOfObj[i]];k++){",
" for(uint j=0;j<NumOfVtxInLv[k+IdxOfLv[IdxOfObj[i]]]-1;j++){",
" uint m=0;",
" (k==0)?():()",
" "
};
//------------------------------
#endif
Don't need care too much about kernel. All wrong...
And my hardware:
my desktop:
NVIDIA GeForce GTX 770
Intel(R) Core(TM) i7-4770 CPU #3.40GHz
Window 10
my laptop:
NVIDIA GeForce GT 750M
Intel(R) HD Graphics 4600
Intel(R) Core(TM) i7-4712HQ CPU #2.30GHz
Windows 8.1
Another question: When I run the program on my desktop, only Nvidia platform can be detected. OpenCL is also able to run on CPU, isn't it? Why Intel platform cannot be detected?
I am not sure, however, the second argument in clCreateProgramWithSource looks strange:
cl_program VertexProgram=clCreateProgramWithSource(Context,1,Cartography,NULL,NULL);
It should be a number of lines in your source code, so I suggest trying
cl_program VertexProgram=clCreateProgramWithSource(Context,sizeof(Cartography)/sizeof(Cartography[0]),Cartography,NULL,NULL);