I've started writing a new CUDA application. However I hit a funny detour along the way.
Calling the first cudaMalloc on a variable x, fails the first time. However when I call it the second time it returns cudaSuccess. Recently upgraded to CUDA 4.0 SDK, it's a really weird bug.
I even did some testing and it seems the first call of cudaMalloc fails.
The very first call to any of the cuda library functions launches an initialisation subroutine. It can happen that somehow the initialisation fails and not the cudaMalloc itself. (CUDA Programming Guide, section 3.2.1)
Somehow, later, however it seems it works, despite the initial failure. I don't know your setting and your code so I can't really help you further. Check the Programming Guide!
I would strongly recommend using the CUDA_SAFE_CALL macro if you aren't -- to force the thread synchronisation, at least while you're debugging the code:
CUDA_SAFE_CALL(cudaMalloc((void**) &(myVar), mem_size_N ));
Update: As per #talonmies, you don't need the cutil library. So let's rewrite the solution:
/* Allocate Data */
cudaMalloc((void**) &(myVar), mem_size_N );
/* Force Thread Synchronization */
cudaError err = cudaThreadSynchronize();
/* Check for and display Error */
if ( cudaSuccess != err )
{
fprintf( stderr, "Cuda error in file '%s' in line %i : %s.\n",
__FILE__, __LINE__, cudaGetErrorString( err) );
}
And as noted in the other answer -- you may want to include the synch & check before you allocation memory just to make sure the API initialized correctly.
Related
For an obscure reason my call to IDXGIOutput5::DuplicateOutput1() fail with error 0x887a0004 (DXGI_ERROR_UNSUPPORTED) after I added cudart.lib in my project.
I work on Visual Studio 2019, my code for monitor duplication is the classic :
hr = output5->DuplicateOutput1(this->dxgiDevice, 0, sizeof(supportedFormats) / sizeof(DXGI_FORMAT), supportedFormats, &this->dxgiOutputDuplication);
And the only thing I tried to do with cuda at the moment is simply to list the Cuda devices :
int nDevices = 0;
cudaError_t error = cudaGetDeviceCount(&nDevices);
for (int i = 0; i < nDevices; i++) {
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, i);
LOG_FUNC_DEBUG("Graphic adapter : Descripion: %s, Memory Clock Rate : %d kHz, Memory Bus Width : %u bits",
prop.name,
prop.memoryClockRate,
prop.memoryBusWidth
);
}
Moreover this piece of code is called far later after I try to start monitor duplication with DXGI.
Every thing seems correct in my application : I do a call to SetProcessDpiAwarenessContext(DPI_AWARENESS_CONTEXT_PER_MONITOR_AWARE_V2), and I'm not running on e discrete GPU (see [https://support.microsoft.com/en-us/help/3019314/error-generated-when-desktop-duplication-api-capable-application-is-ru][1])
And by the way it used to work, and it works again if I just remove the "so simple" Cuda call and the cudart.lib from the linker input !
I really don't understand what can cause this strange behavior, any idea ?
...after I added cudart.lib in my project
When you link CUDA library you force your application to run on discrete GPU. You already know this should be avoided, but you still force it through this link.
...and I'm not running on e discrete GPU...
You are, static link to CUDA is a specific case which hints to use dGPU.
There are systems where Desktop Duplication is not working against dGPU and yours seems to be one of those. Even though unobvious, you are seeing behavior by [NVIDIA] design.
(There are however also other systems where Desktop Duplication is working against dGPU and is not working against iGPU).
Your potential solution is along this line:
Application is not directly linked against cuda.lib or cudart.lib or LoadLibrary to dynamically load the nvcuda.dll or cudart*.dll and uses GetProcAddress to retrieve function addresses from nvcuda.dll or cudart*.dll.
I'm trying to run an opengl application on a remote computing cluster. I'm using osmesa as I intend to execute off-screen software rendering (no x11 forwarding etc). I want to use glew (to make life dealing with shaders and other extension related calls easier), and I seem to have built and linked both mesa and glew fine.
When I call mesa-create-context, glewinit gives a OPENGL Version not available output, which probably means the context has not been created. When I call glGetString(GL_EXTENSIONS) i dont get any output, which confirms this. This also shows that glew is working fine on its own. (Other glew commands like glew version etc also work).
Now when I (as shown below), add the mesa-make-context-current function, glewinit crashes with a segfault. Running glGetString(GL_EXTENSIONS) gives me a list of extensions now however (which means context creation is successful!)
I've spent hours trying to figure this out, tried tinkering but nothing works. Would greatly appreciate any help on this. Maybe some of you has experienced something similar before?? Thanks again!
int Height = 1; int Width = 1;
OSMesaContext ctx; void *buffer;
ctx = OSMesaCreateContext( OSMESA_RGBA, NULL );
buffer = malloc( Width * Height * 4 * sizeof(GLfloat) );
if (!OSMesaMakeCurrent( ctx, buffer, GL_UNSIGNED_BYTE, Width, Height )) {
printf("OSMesaMakeCurrent failed!\n");
return 0;
}
-- glewinit() crashes after this.
Just to add, osmesa and glew actually did not compile initially. Because glew undefines GLAPI in it's last line and since osmesa will not include gl.h again, GLAPI remains undefined and causes an error in osmesa.h (119). I got around this by adding an extern to GLAPI, not sure if this is relevant though.
Looking at the source to glewInit in glew.c if glewContextInit succeeds it returns GLEW_OK, GLEW_OK is defined to 0, and so on Linux systems it will always call glxewContextInit which calls glX functions that in the case of OSMesa will likely not be ready for use. This will cause a segfault (as I see), and it seems that the glewInit function has no capability to handle this case unfortunately without patching the C source and recompiling the library.
If others have already solved this I would be interested, I have seen some patched versions of the glew.c sources that workaround this. It isn't clear if there is any energy in the GLEW community to merge changes in that address this use case.
I am working with HANDLES, the first one, nextColorFrameEvent is an event handler and the second one is a stream handler. They are being initialized in the following piece of code:
nextColorFrameEvent = CreateEvent( NULL, TRUE, FALSE, NULL );
hr = nui->NuiImageStreamOpen(
NUI_IMAGE_TYPE_COLOR,
NUI_IMAGE_RESOLUTION_640x480,
0,
2,
nextColorFrameEvent,
&videoStreamHandle);
I want to properly deal with them on destruction, while not creating errors at the same time. Sometimes the initializer wont be called, so both HANDLEs are still NULL when the software comes to an end. Thats why I want to check first if the HANDLEs are properly initialized etc. and if they are, I want to close them. I got my hands on the following piece of code for this:
if (nextColorFrameEvent && nextColorFrameEvent != INVALID_HANDLE_VALUE)CloseHandle(nextColorFrameEvent);
#ifdef QT_DEBUG
DWORD error = GetLastError();
qDebug()<< error;
#endif
if (videoStreamHandle && videoStreamHandle != INVALID_HANDLE_VALUE)CloseHandle(videoStreamHandle);
#ifdef QT_DEBUG
error = GetLastError();
qDebug()<< error;
#endif
But this is apperently incorrect: if I do not run the initializer and then close the software this piece of code runs and gives me a 6:
Starting C:\...\Qt\build-simpleKinectController-Desktop_Qt_5_0_2_MSVC2012_64bit-Debug\debug\simpleKinectController...
6
6
C:\...\Qt\build-simpleKinectController-Desktop_Qt_5_0_2_MSVC2012_64bit-Debug\debug\simpleKinectController exited with code 0
which means:
ERROR_INVALID_HANDLE 6 (0x6) The handle is invalid.
Which means that closeHandle ran anyway despite the tests. What tests should I do to prevent closing when the handle is not a valid HANDLE?
Bonus question: if I run the initializer this error will no longer appear when only closing colorFrameEvent, but will still appear when closing videoStreamHandle:
Starting C:\...\Qt\build-simpleKinectController-Desktop_Qt_5_0_2_MSVC2012_64bit-Debug\debug\simpleKinectController...
0
6
C:\...\Qt\build-simpleKinectController-Desktop_Qt_5_0_2_MSVC2012_64bit-Debug\debug\simpleKinectController exited with code 0
Do I need a diffent function to close a stream handler?
nui->NuiImageStreamOpen(...) does not create a valid Windows handle for the stream but instead it creates an internal handle inside the driver side.
So you can not use windows API to release/close stream handle !!!
To do that just call nui->NuiShutdown(). I have not yet used the callback event but I think its a valid windows handle and should be closed normally.
if you need just to change settings you can always call nui->NuiImageStreamOpen(...) with new settings. No need to shutdown ...
I would also welcome function nui->NuiImageStreamClose(...); because current state of API complicates things for long term running aps with changing sensor configurations.
CreateEvent (http://msdn.microsoft.com/en-us/library/windows/desktop/ms682396(v=vs.85).aspx) returns NULL if an event was not created.
You are checking against INVALID_HANDLE_VALID which is not NULL.
You are probably trying to double-close a handle. That is likely to generate ERROR_INVALID_HANDLE 6. You can't detect this with your test, because the first CloseHandle(nextColorFrameEvent); did not change nextColorFrameEvent.
The solution is to use C++ techniques, in particular RAII. There are plenty of examples around how to use shared_ptr with HANDLE. shared_ptr is the standard solution to run cleanup code at most once, after everyone is done, and only if anybody actually allocated a resource.
there is a good way of debugging that I'm particularly fond of, despite being all writen in macros, which are nasty, but in this case they work wonders.
Zed's Awesome Debug Macros
there are a couple of things I like to change though. They make extensive use of goto, which I tend to avoid, specially in c++ projects, because otherwise you wouldn't be able to declare variables mid-code. This is why I use exit(-1) instead, or, in some projects, I mod the code to try, throw, catch c++. Since you are working with Handles, a good thing would be setting a variable and telling the program to close itself.
here is what I mean. Take this piece of code from the macros (i assume you would read the exercise and familiarize with the macros):
#define check(A, M, ...) if(!(A)) { log_err(M, ##__VA_ARGS__); errno=0; goto error; }
i'd change
goto error;
to something like
error = true;
the syntax inside the program would be something like, and I took it from a multithread program I'm writing myself:
pRSemaphore = CreateSemaphore(NULL, 0, MAX_TAM_ARQ, (LPCWSTR) "p_read_semaphore");
check(pRSemaphore, "Impossible to create semaphore: %d\n", GetLastError());
As you can see, GetLastError is only called when pRSemaphore is set to null. There are somewhat fancy mechanisms behind the macro (at least they are fancy for me), but they are hidden inside the "check" mask, so you needn't worry about them.
next step is to treat the error with something like:
inline void ExitWithError(bool &err) {
//close all handles
//tell other related process to do the same if necessary
exit(-1);
}
or you could just call it inside the macro like
#define check(A, M, ...) if(!(A)) { log_err(M, ##__VA_ARGS__); errno=0; ExitWithError(); }
hope I could be of any help
Sometimes it takes a long time of inserting conditional prints and checks to glGetError() to narrow down using a form of binary search where the first function call is that an error is first reported by OpenGL.
I think it would be cool if there is a way to build a macro which I can wrap around all GL calls which may fail which will conditionally call glGetError immediately after. When compiling for a special target I can have it check glGetError with a very high granularity, while compiling for typical release or debug this wouldn't get enabled (I'd check it only once a frame).
Does this make sense to do? Searching for this a bit I find a few people recommending calling glGetError after every non-draw gl-call which is basically the same thing I'm describing.
So in this case is there anything clever that I can do (context: I am using GLEW) to simplify the process of instrumenting my gl calls this way? It would be a significant amount of work at this point to convert my code to wrap a macro around each OpenGL function call. What would be great is if I can do something clever and get all of this going without manually determining which are the sections of code to instrument (though that also has potential advantages... but not really. I really don't care about performance by the time I'm debugging the source of an error).
Try this:
void CheckOpenGLError(const char* stmt, const char* fname, int line)
{
GLenum err = glGetError();
if (err != GL_NO_ERROR)
{
printf("OpenGL error %08x, at %s:%i - for %s\n", err, fname, line, stmt);
abort();
}
}
#ifdef _DEBUG
#define GL_CHECK(stmt) do { \
stmt; \
CheckOpenGLError(#stmt, __FILE__, __LINE__); \
} while (0)
#else
#define GL_CHECK(stmt) stmt
#endif
Use it like this:
GL_CHECK( glBindTexture2D(GL_TEXTURE_2D, id) );
If OpenGL function returns variable, then remember to declare it outside of GL_CHECK:
const char* vendor;
GL_CHECK( vendor = glGetString(GL_VENDOR) );
This way you'll have debug checking if _DEBUG preprocessor symbol is defined, but in "Release" build you'll have just the raw OpenGL calls.
BuGLe sounds like it will do what you want:
Dump a textual log of all GL calls made.
Take a screenshot or capture a video.
Call glGetError after each call to check for errors, and wrap glGetError so that this checking is transparent to your program.
Capture and display statistics (such as frame rate)
Force a wireframe mode
Recover a backtrace from segmentation faults inside the driver, even if the driver is compiled without symbols.
I'm writing OpenCL using the c++ bindings, trying to make a small library.
NDRange offset(0);
NDRange global_size(numWorkItems);
NDRange local_size(1);
//this call fails with error code -56
err = queue.enqueueNDRangeKernel(kernelReduction, offset, global_size, local_size);
//this call works:
err = queue.enqueueTask(kernelReduction);
Now, Error code -56 is CL_INVALID_GLOBAL_OFFSET. And I have no clue why the first call would fail. Any suggestions?
If you are using OpenCL 1.0, you cannot use global offsets afaik (you need to work around by using a constant memory counter or something). Try updating the bindings to OpenCL 1.1 if they don't automatically adapt and make sure you update your drivers as well.
global_work_offset must be NULL. Any value here should produce CL_INVALID_GLOBAL_OFFSET.
check it out: clEnqueueNDRangeKernel