OpenCL crashing while dynamic linking? - c++

I am trying to load OpenCL library at run time so that the same exe can run on platforms which do not have OpenCL drivers without finding unresolved symbols. I am using Qt to do this but I dont think I am facing my problem due to Qt. Here is my function which checks if OpenCL 1.1 is installed or not:
QLibrary *MyOpenCL::openCLLibrary = NULL;
bool MyOpenCL::loadOpenCL()
{
if(openCLLibrary)
return true;
QLibrary *lib = new QLibrary("OpenCL");
if(!lib->load())
return false;
bool result = false;
typedef cl_int (*MyPlatorms)(cl_uint, cl_platform_id *, cl_uint *);
MyPlatorms pobj = (MyPlatorms) lib->resolve("clGetPlatformIDs");
if(pobj)
{
cl_uint nplatforms = 0;
cl_uint myerr = pobj(0, NULL, &nplatforms);
if((myerr == CL_SUCCESS) && (nplatforms > 0))
{
cl_platform_id *mplatforms = new cl_platform_id[nplatforms];
myerr = pobj(nplatforms, mplatforms, NULL);
typedef cl_int (*MyPlatformInfo)(cl_platform_id, cl_platform_info, size_t, void *, size_t *);
MyPlatformInfo pinfoobj = (MyPlatformInfo) lib->resolve("clGetPlatformInfo");
if(pinfoobj)
{
size_t size;
for(unsigned int i = 0; i < nplatforms; i++)
{
size = 0;
myerr = pinfoobj(mplatforms[i], CL_PLATFORM_VERSION, 0, NULL, &size);//size = 27
if(size < 1)
continue;
char *ver = new char[size];
myerr = pinfoobj(mplatforms[i], CL_PLATFORM_VERSION, size, ver, NULL);
qDebug() << endl << ver;//segmentation fault at this line
...
}
As can be seen Qt successfully resolved clGetPlatformIDs(). It even showed that there is 1 platform available. But when I pass the array to store the cl_platform_id, it crashes.
Why is this happening?
EDIT:
I am using Qt 4.8.1 with MinGW compiler using OpenCL APP SDK 2.9.
I am using the OpenCL 1.1 header from Khronos website.
My laptop which has Windows7 64 bit also has ATI Radeon 7670m GPU which has OpenCL 1.1 drivers.

The first parameter to clGetPlatformIDs is the number of elements the driver is allowed to write to the array pointed to by the second element.
If the first call, you are passing INT_MAX and NULL for these. I'd expect a crash here because you are telling the driver to go ahead and write through your NULL pointer.
You should pass 0 for the first parameter since all you are interested in is the returned third parameter value.
In the second call you at least pass valid memory for the second parameter, but you again pass INT_MAX. Here you should pass nplatforms since that is how much memory you allocated. For the third parameter, pass NULL since you don't need the return value (again).

Related

What is the preferred way to get a device path for CreateFile() in a UWP C++ App?

I am converting a project to a UWP App, and thus have been following guidelines outlined in the MSDN post here. The existing project heavily relies on CreateFile() to communicate with connected devices.
There are many posts in SO that show us how to get a CreateFile()-accepted device path using SetupAPI's SetupDiGetDeviceInterfaceDetail() Is there an alternative way to do this using the PnP Configuration Manager API? Or an alternative, user-mode way at all?
I had some hope when I saw this example in Windows Driver Samples github, but quickly became dismayed when I saw that the function they used in the sample is ironically not intended for developer use, as noted in this MSDN page.
function GetDevicePath in general correct and can be used as is. about difference between CM_*(..) and CM_*_Ex(.., HMACHINE hMachine) - the CM_*(..) simply call CM_*_Ex(.., NULL) - so for local computer versions with and without _Ex suffix the same.
about concrete GetDevicePath code - call CM_Get_Device_Interface_List_Size and than CM_Get_Device_Interface_List only once not 100% correct - because between this two calls new device with this interface can be arrived to system and buffer size returned by CM_Get_Device_Interface_List_Size can be already not enough for CM_Get_Device_Interface_List. of course possibility of this very low, and you can ignore this. but i prefer make code maximum theoretical correct and call this in loop, until we not receive error other than CR_BUFFER_SMALL. also need understand that CM_Get_Device_Interface_List return multiple, NULL-terminated Unicode strings - so we need iterate here. in [example] always used only first returned symbolic link name of an interface instance. but it can be more than 1 or at all - 0 (empty). so better name function - GetDevicePaths - note s at the end. i be use code like this:
ULONG GetDevicePaths(LPGUID InterfaceClassGuid, PWSTR* pbuf)
{
CONFIGRET err;
ULONG len = 1024;//first try with some reasonable buffer size, without call *_List_SizeW
for(PWSTR buf;;)
{
if (!(buf = (PWSTR)LocalAlloc(0, len * sizeof(WCHAR))))
{
return ERROR_NO_SYSTEM_RESOURCES;
}
switch (err = CM_Get_Device_Interface_ListW(InterfaceClassGuid, 0, buf, len, CM_GET_DEVICE_INTERFACE_LIST_PRESENT))
{
case CR_BUFFER_SMALL:
err = CM_Get_Device_Interface_List_SizeW(&len, InterfaceClassGuid, 0, CM_GET_DEVICE_INTERFACE_LIST_PRESENT);
default:
LocalFree(buf);
if (err)
{
return CM_MapCrToWin32Err(err, ERROR_UNIDENTIFIED_ERROR);
}
continue;
case CR_SUCCESS:
*pbuf = buf;
return NOERROR;
}
}
}
and usage example:
void example()
{
PWSTR buf, sz;
if (NOERROR == GetDevicePaths((GUID*)&GUID_DEVINTERFACE_VOLUME, &buf))
{
sz = buf;
while (*sz)
{
DbgPrint("%S\n", sz);
HANDLE hFile = CreateFile(sz, FILE_GENERIC_READ, FILE_SHARE_VALID_FLAGS, 0, OPEN_EXISTING, 0, 0);
if (hFile != INVALID_HANDLE_VALUE)
{
// do something
CloseHandle(hFile);
}
sz += 1 + wcslen(sz);
}
LocalFree(buf);
}
}
so we must not simply use in returned DevicePathS (sz) only first string, but iterate it
while (*sz)
{
// use sz
sz += 1 + wcslen(sz);
}
I got a valid Device Path to a USB Hub Device, and used it successfully to get various device descriptors by sending some IOCTLs, by using the function I posted in my own answer to another question
I'm reporting the same function below:
This function returns a list of NULL-terminated Device Paths (that's what we get from CM_Get_Device_Interface_List())
You need to pass it the DEVINST, and the wanted interface GUID.
Since both the DEVINST and interface GUID are specified, it is highly likely that CM_Get_Device_Interface_List() will return a single Device Path for that interface, but technically you should be prepared to get more than one result.
It is responsibility of the caller to delete[] the returned list if the function returns successfully (return code 0)
int GetDevInstInterfaces(DEVINST dev, LPGUID interfaceGUID, wchar_t**outIfaces, ULONG* outIfacesLen)
{
CONFIGRET cres;
if (!outIfaces)
return -1;
if (!outIfacesLen)
return -2;
// Get System Device ID
WCHAR sysDeviceID[256];
cres = CM_Get_Device_ID(dev, sysDeviceID, sizeof(sysDeviceID) / sizeof(sysDeviceID[0]), 0);
if (cres != CR_SUCCESS)
return -11;
// Get list size
ULONG ifaceListSize = 0;
cres = CM_Get_Device_Interface_List_Size(&ifaceListSize, interfaceGUID, sysDeviceID, CM_GET_DEVICE_INTERFACE_LIST_PRESENT);
if (cres != CR_SUCCESS)
return -12;
// Allocate memory for the list
wchar_t* ifaceList = new wchar_t[ifaceListSize];
// Populate the list
cres = CM_Get_Device_Interface_List(interfaceGUID, sysDeviceID, ifaceList, ifaceListSize, CM_GET_DEVICE_INTERFACE_LIST_PRESENT);
if (cres != CR_SUCCESS) {
delete[] ifaceList;
return -13;
}
// Return list
*outIfaces = ifaceList;
*outIfacesLen = ifaceListSize;
return 0;
}
Please note that, as RbMm already said in his answer, you may get a CR_BUFFER_SMALL error from the last CM_Get_Device_Interface_List() call, since the device list may have been changed in the time between the CM_Get_Device_Interface_List_Size() and CM_Get_Device_Interface_List() calls.

How I can get my total GPU memory using Qt's native OpenGL?

I'm trying to get the total amount of GPU memory from my video card using native Qt's OpenGL, I have tried hundred of methods, but none do work.
This is what I have at the moment:
QOpenGLContext context;
context.create();
QOffscreenSurface surface;
surface.setFormat(context.format());
surface.create();
QOpenGLFunctions func;
context.makeCurrent(&surface);
func.initializeOpenGLFunctions();
GLint total_mem_kb = 0;
func.glGetIntegerv(GL_GPU_MEM_INFO_TOTAL_AVAILABLE_MEM_NVX,&total_mem_kb);
qDebug()<<total_mem_kb;
The problem is that the variable total_mem_kb is always 0, It does not get the value inside of glGetIntegerv. By running this code I get 0. What can be the problem? Can you please give me a hint?
First an foremost check if the NVX_gpu_memory_info extension is supported.
Note that the extension requires OpenGL 2.0 at least.
GLint count;
glGetIntegerv(GL_NUM_EXTENSIONS, &count);
for (GLint i = 0; i < count; ++i)
{
const char *extension = (const char*)glGetStringi(GL_EXTENSIONS, i);
if (!strcmp(extension, "GL_NVX_gpu_memory_info"))
printf("%d: %s\n", i, extension);
}
I know you just said that you have an Nvidia graphics card, but this doesn't by default guarantee support. Additionally if you have an integrated graphics card then make sure you are actually using your dedicated graphics card.
If you have an Nvidia GeForce graphics card, then then the following should result in something along the lines of "Nvidia" and "GeForce".
glGetString(GL_VENDOR);
glGetString(GL_RENDERER);
If it returns anything but "Nvidia" then you need to open your Nvidia Control Panel and set the preferred graphics card to your Nvidia graphics card.
After you've verified it being the Nvidia graphics card and that the extension is supported. Then you can try getting the total and current available memory:
GLint totalMemoryKb = 0;
glGetIntegerv(GL_GPU_MEMORY_INFO_TOTAL_AVAILABLE_MEMORY_NVX, &totalMemoryKb);
GLint currentMemoryKb = 0;
glGetIntegerv(GL_GPU_MEMORY_INFO_CURRENT_AVAILABLE_VIDMEM_NVX, &currentMemoryKb);
I would also like to point out that the NVX_gpu_memory_info extension defines it as:
GL_GPU_MEMORY_INFO_TOTAL_AVAILABLE_MEMORY_NVX
and not
GL_GPU_MEM_INFO_TOTAL_AVAILABLE_MEM_NVX
Note the MEMORY vs MEM difference.
So suspecting you've defined GL_GPU_MEM_INFO_TOTAL_AVAILABLE_MEM_NVX yourself or leveraging something else that has defined it. That tells it could be wrongly defined or referring to something else.
I use the following:
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////
LONG __stdcall glxGpuTotalMemory()
{
GLint total_mem_kb = 0;
glGetIntegerv(GL_GPU_MEM_INFO_TOTAL_AVAILABLE_MEM_NVX, &total_mem_kb);
if (total_mem_kb == 0 && wglGetGPUIDsAMD)
{
UINT n = wglGetGPUIDsAMD(0, 0);
UINT *ids = new UINT[n];
size_t total_mem_mb = 0;
wglGetGPUIDsAMD(n, ids);
wglGetGPUInfoAMD(ids[0], WGL_GPU_RAM_AMD, GL_UNSIGNED_INT, sizeof(size_t), &total_mem_mb);
total_mem_kb = total_mem_mb * 1024;
}
return total_mem_kb;
}
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////
LONG __stdcall glxGpuAvailMemory()
{
GLint cur_avail_mem_kb = 0;
glGetIntegerv(GL_GPU_MEM_INFO_CURRENT_AVAILABLE_MEM_NVX, &cur_avail_mem_kb);
if (cur_avail_mem_kb == 0 && wglGetGPUIDsAMD)
{
glGetIntegerv(GL_TEXTURE_FREE_MEMORY_ATI, &cur_avail_mem_kb);
}
return cur_avail_mem_kb;
}

clCreateContextFromType ends up in a SEGFAULT while execution

I am trying to create an OpenCL context on the platform which is containing my graphics card. But when I call clCreateContextFromType() a SEGFAULT is thrown.
int main(int argc, char** argv)
{
/*
...
*/
cl_platform_id* someValidPlatformId;
//creating heap space using malloc to store all platform ids
getCLPlatforms(someValidPlatformId);
//error handling for getCLPLatforms()
//OCLPlatform(cl_platform_id platform)
OCLPLatform platform = OCLPlatform(someValidPlatformId[0]);
//OCLContext::OCL_GPU_DEVICE == CL_DEVICE_TYPE_GPU
OCLContext context = OCLContext(platform,OCLContext::OCL_GPU_DEVICE);
/*
...
*/
}
cl_platform_id* getCLPlatforms(cl_platform_id* platforms)
{
cl_int errNum;
cl_uint numPlatforms;
numPlatforms = (cl_uint) getCLPlatformsCount(); //returns the platform count
//using clGetPlatformIDs()
//as described in the Khronos API
if(numPlatforms == 0)
return NULL;
errNum = clGetPlatformIDs(numPlatforms,platforms,NULL);
if(errNum != CL_SUCCESS)
return NULL;
return platforms;
}
OCLContext::OCLContext(OCLPlatform platform,unsigned int type)
{
this->initialize(platform,type);
}
void OCLContext::initialize(OCLPlatform platform,unsigned int type)
{
cl_int errNum;
cl_context_properties contextProperties[] =
{
CL_CONTEXT_PLATFORM,
(cl_context_properties)platform.getPlatformId(),
0
};
cout << "a" << endl;std::flush(cout);
this->context = clCreateContextFromType(contextProperties,
(cl_device_type)type,
&pfn_notify,
NULL,&errNum);
if(errNum != CL_SUCCESS)
throw OCLContextException();
cout << "b" << endl;std::flush(cout);
/*
...
*/
}
The given type is CL_DEVICE_TYPE_GPU and also the platform contained by the cl_context_properties array is valid.
To debug the error I implemented the following pfn_notify() function described by the Khronos API:
static void pfn_notify(const char* errinfo,
const void* private_info,
size_t cb, void* user_data)
{
fprintf(stderr, "OpenCL Error (via pfn_notify): %s\n", errinfo);
flush(cout);
}
Here is the ouput schown by the shell:
$ ./OpenCLFramework.exe
a
Segmentation fault
The machine i am working with has the following properties:
Intel Core i5 2500 CPU
NVIDIA Geforce 210 GPU
OS: Windows 7
AMD APP SDK 3.0 Beta
IDE: Eclipse with gdb
It would be great if somebody knew an answer to this problem.
The problem seems to be solved now.
Injecting the a valid cl_platform_id throught gdb solved the SEGFAULT. So I digged a little bit deeper and the issue for the error was that I saved the value as a standard primitive. When I called a function with this value casted to cl_platform_id some functions failed handling that. So it looks like it is a mingling of types what lead to this failure.
Now I save the value as cl_platform_id and cast it to an primitive when needed and not vice versa.
I thank you for your answers and apologize for the long radio silence for my part.

GetFIleVersionInfoSize() succeeds but returns incorrect size

I am trying to get the version of a file. I want to look at the version number of this file to determine which OS is installed on a non-booted drive (I'll actually be doing this from a Win PE environment and trying to determine if the main drive has Windows XP or Windows 7 installed). Anyway, I have the following
wchar_t *fileName;
fileName = new wchar_t[255];
lstrcpy(fileName, hdds[HardDriveIndexes::SystemDrive].driveLetter.c_str());
lstrcat(fileName, L"Windows\\System32\\winload.exe");
TCHAR *versionInfoBuffer;
DWORD versionDataSize;
if (versionDataSize = GetFileVersionInfoSize(fileName, NULL) != 0)
{
versionInfoBuffer = new TCHAR[versionDataSize];
BOOL versionInfoResult = FALSE;
versionInfoResult = GetFileVersionInfo(fileName, NULL, versionDataSize, versionInfoBuffer);
if (versionInfoResult == FALSE)
{
wprintf(L"The last error associated with getting version info is: %d\n", GetLastError());
}
}
else
{
wprintf(L"The last error associated with gettting version info size is: %d\n", GetLastError());
}
The problem is that GetFileVersionInfoSize() succeeds but always returns 1 as the size. This causes GetFileVersionInfo() to fail with error 122. So far I have only tested this on a Windows 7 system. There is another function GetFileVersionInfoSizeEx() that works as expected, but it is only supported from Vista onwards. I would like to keep XP support if possible (some of our old Win PE images are still based on XP).
Is GetFileVersionInfoSize() deprecated and I somehow can't find that information, am I using it incorrectly, etc.?
The problem isn't with the call, it's with your assignment; you need parens around it:
if ( ( versionDataSize = GetFileVersionInfoSize(fileName, NULL) ) != 0)
What you had written assigns the value of the expression size != 0, which is 1 for true.

Reading Shared Memory from x86 to x64 and vice versa on OSX

If I create a SM from 64 bit application and open it on 32 bit application it fails.
//for 64 bit
shared_memory_object( create_only, "test" , read_write) ;
// for 32 bit
shared_memory_object (open_only, "test", read_write);
file created by 64bit application is at path as below:
/private/tmp/boost_interprocess/AD21A54E000000000000000000000000/test
where as file searched by 32 bit application is at path
/private/tmp/boost_interprocess/AD21A54E00000000/test
Thus 32 bit applications cannot read the file.
I am using boost 1.47.0 on Mac OS X.
Is it a bug? Do I have to do some settings use some Macros in order to fix it? Has any one encountered this problem before?
Is it important that the shared memory be backed by a file? If not, you might consider using the underlying Unix shared memory APIs: shmget, shmat, shmdt, and shmctl, all declared in sys/shm.h. I have found them to be very easy to use.
// create some shared memory
int id = shmget(0x12345678, 1024 * 1024, IPC_CREAT | 0666);
if (id >= 0)
{
void* p = shmat(id, 0, 0);
if (p != (void*)-1)
{
initialize_shared_memory(p);
// detach from the shared memory when we are done;
// it will still exist, waiting for another process to access it
shmdt(p);
}
else
{
handle_error();
}
}
else
{
handle_error();
}
Another process would use something like this to access the shared memory:
// access the shared memory
int id = shmget(0x12345678, 0, 0);
if (id >= 0)
{
// find out how big it is
struct shmid_ds info = { { 0 } };
if (shmctl(id, IPC_STAT, &info) == 0)
printf("%d bytes of shared memory\n", (int)info.shm_segsz);
else
handle_error();
// get its address
void* p = shmat(id, 0, 0);
if (p != (void*)-1)
{
do_something(p);
// detach from the shared memory; it still exists, but we can't get to it
shmdt(p);
}
else
{
handle_error();
}
}
else
{
handle_error();
}
Then, when all processes are done with the shared memory, use shmctl(id, IPC_RMID, 0) to release it back to the system.
You can use the ipcs and ipcrm tools on the command line to manage shared memory. They are useful for cleaning up mistakes when first writing shared memory code.
All that being said, I am not sure about sharing memory between 32-bit and 64-bit programs. I recommend trying the Unix APIs and if they fail, it probably cannot be done. They are, after all, what Boost uses in its implementation.
I found the solution to the problem and as expected it is a bug.
This Bug is present in tmp_dir_helpers.hpp file.
inline void get_bootstamp(std::string &s, bool add = false)
{
...
std::size_t char_counter = 0;
long fields[2] = { result.tv_sec, result.tv_usec };
for(std::size_t field = 0; field != 2; ++field){
for(std::size_t i = 0; i != sizeof(long); ++i){
const char *ptr = (const char *)&fields[field];
bootstamp_str[char_counter++] = Characters[(ptr[i]&0xF0)>>4];
bootstamp_str[char_counter++] = Characters[(ptr[i]&0x0F)];
}
...
}
Where as it should have been some thing like this..
**long long** fields[2] = { result.tv_sec, result.tv_usec };
for(std::size_t field = 0; field != 2; ++field){
for(std::size_t i = 0; i != sizeof(**long long**); ++i)
I have created a ticket in boost for this bug.
Thank you.