Linking Cuda (cudart.lib) makes DXGI DuplicateOutput1() fail - c++

For an obscure reason my call to IDXGIOutput5::DuplicateOutput1() fail with error 0x887a0004 (DXGI_ERROR_UNSUPPORTED) after I added cudart.lib in my project.
I work on Visual Studio 2019, my code for monitor duplication is the classic :
hr = output5->DuplicateOutput1(this->dxgiDevice, 0, sizeof(supportedFormats) / sizeof(DXGI_FORMAT), supportedFormats, &this->dxgiOutputDuplication);
And the only thing I tried to do with cuda at the moment is simply to list the Cuda devices :
int nDevices = 0;
cudaError_t error = cudaGetDeviceCount(&nDevices);
for (int i = 0; i < nDevices; i++) {
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, i);
LOG_FUNC_DEBUG("Graphic adapter : Descripion: %s, Memory Clock Rate : %d kHz, Memory Bus Width : %u bits",
prop.name,
prop.memoryClockRate,
prop.memoryBusWidth
);
}
Moreover this piece of code is called far later after I try to start monitor duplication with DXGI.
Every thing seems correct in my application : I do a call to SetProcessDpiAwarenessContext(DPI_AWARENESS_CONTEXT_PER_MONITOR_AWARE_V2), and I'm not running on e discrete GPU (see [https://support.microsoft.com/en-us/help/3019314/error-generated-when-desktop-duplication-api-capable-application-is-ru][1])
And by the way it used to work, and it works again if I just remove the "so simple" Cuda call and the cudart.lib from the linker input !
I really don't understand what can cause this strange behavior, any idea ?

...after I added cudart.lib in my project
When you link CUDA library you force your application to run on discrete GPU. You already know this should be avoided, but you still force it through this link.
...and I'm not running on e discrete GPU...
You are, static link to CUDA is a specific case which hints to use dGPU.
There are systems where Desktop Duplication is not working against dGPU and yours seems to be one of those. Even though unobvious, you are seeing behavior by [NVIDIA] design.
(There are however also other systems where Desktop Duplication is working against dGPU and is not working against iGPU).
Your potential solution is along this line:
Application is not directly linked against cuda.lib or cudart.lib or LoadLibrary to dynamically load the nvcuda.dll or cudart*.dll and uses GetProcAddress to retrieve function addresses from nvcuda.dll or cudart*.dll.

Related

D3D11: Reference Rasterizer vs WARP

I have a test for a pixel shader that does some rendering and compares the result to a reference image to verify that the shader produces an expected output. When this test is run on a CI machine, it is on a VM without a GPU, so I call D3D11CreateDevice with D3D_DRIVER_TYPE_REFERENCE to use the reference rasterizer. We have been doing this for years without issue on a Windows 7 VM.
We are now trying to move to a Windows 10 VM for our CI tests. When I run the test here, various API calls start failing after some number of successful tests (on the order of 5000-10000) with DXGI_ERROR_DEVICE_REMOVED, and calling GetDeviceRemovedReason returns DXGI_ERROR_DRIVER_INTERNAL_ERROR. After some debugging I've found that the failure originates during a call to ID3D11DeviceContext::PSSetShader (yes, this returns void, but I found this via a breakpoint in KernelBase.dll!RaiseException). This call looks exactly like the thousands of previous calls to PSSetShader as far as I can tell. It doesn't appear to be a resource issue, the process is only using 8MB of memory when the error occurs, and the handle count is not growing.
I can reproduce the issue on multiple Win10 systems, and it succeeds on multiple Win7 systems. The big difference between the two is that on Win7, the API calls are going through d3d11ref.dll, and on Win10 they are going through d3d10warp.dll. I am not really familiar with what the differences are or why one or the other would be chosen, and MSDN's documentation is quite opaque on the subject. I know that both d3d11ref.dll and d3d10warp.dll are both present on both failing and passing systems; I don't know what the logic is for one or the other being loaded for the same set of calls, or why the d3d10warp library fails.
So, can someone explain the difference between the two, and/or suggest how I could get d3d11ref.dll to load in Windows 10? As far as I can tell it is a bug in d3d10warp.dll and for now I would just like to side-step it.
In case it matters, I am calling D3D11CreateDevice with the desired feature level set to D3D_FEATURE_LEVEL_11_0, and I verify that the same level is returned as acheived. I am passing 0 for creationFlags, and my D3D11_SDK_VERSION is defined as 7 in d3d11.h. Below is the call stack above PSSetShader when the failure occurs. This seems to be the first call that fails, and every call after it with a return code also fails.
KernelBase.dll!RaiseException()
KernelBase.dll!OutputDebugStringA()
d3d11.dll!CDevice::RemoveDevice(long)
d3d11.dll!NDXGI::CDevice::RemoveDevice()
d3d11.dll!CContext::UMSetError_()
d3d10warp.dll!UMDevice::MSCB_SetError(long,enum UMDevice::DDI_TYPE)
d3d10warp.dll!UMContext::SetShaderWithInterfaces(enum PIXELJIT_SHADER_STAGE,struct D3D10DDI_HSHADER,unsigned int,unsigned int const *,struct D3D11DDIARG_POINTERDATA const *)
d3d10warp.dll!UMDevice::PsSetShaderWithInterfaces(struct D3D10DDI_HDEVICE,struct D3D10DDI_HSHADER,unsigned int,unsigned int const *,struct D3D11DDIARG_POINTERDATA const *)
d3d11.dll!CContext::TID3D11DeviceContext_SetShaderWithInterfaces_<1,4>(class CContext *,struct ID3D11PixelShader *,struct ID3D11ClassInstance * const *,unsigned int)
d3d11.dll!CContext::TID3D11DeviceContext_SetShader_<1,4>()
MyTest.exe!MyFunctionThatCallsPSSetShader()
Update: With the D3D Debug layers enabled, I get the following additional output when the error occurs:
D3D11: Removing Device.
D3D11 WARNING: ID3D11Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DRIVER_INTERNAL_ERROR: There is strong evidence that the driver has performed an undefined operation; but it may be because the application performed an illegal or undefined operation to begin with.). [ EXECUTION WARNING #379: DEVICE_REMOVAL_PROCESS_POSSIBLY_AT_FAULT]
D3D11 ERROR: ID3D11DeviceContext::Map: Returning DXGI_ERROR_DEVICE_REMOVED, when a Resource was trying to be mapped with READ or READWRITE. [ RESOURCE_MANIPULATION ERROR #2097214: RESOURCE_MAP_DEVICEREMOVED_RETURN]
The third line about the call to Map happens after my test fails to notice and handle the device removed and later tries to map a texture, so I don't think that's related. The other is about what I expected; there's an error in the driver, and possibly my test is doing something bad to cause it. I still don't know what that might be, or why it worked in Windows 7.
Update 2: I have found that if I run my tests on Windows 10 in Windows 7 compatibility mode, there is no device removed error and all of my tests pass. It is still using d3d10warp.dll instead of d3d11ref.dll, so that wasn't exactly the problem. I'm not sure how to investigate "what am I doing that's incompatible with Windows 10 or its WARP device"; this might need to be a Microsoft support ticket.
The problem is that you haven't enabled the Windows 10 optional feature "Graphics Tools" on that system. That is how you install the DirectX 11/12 Debug Runtime on Windows 10 including Direct3D 11's reference device, WARP for DirectX 12, the DirectX SDK debug layer for DX11/DX12, etc.
WARP for DirectX 11 is available on all systems, not just those with the "Graphics Tools" feature. Generally speaking, most people have switched to using WARP instead of the software reference driver since it is a lot faster. If you are having crashes under WARP, you should investigate the source of those crashes by enabling the DEBUG device.
See this blog post.

How to fix processor affinity for Qt 5.9 Thread on Windows with Ryzen 7

I have been developing a program for my Master's Thesis with OpenSceneGraph-3.4.0 and GUI from Qt 5.9 (otherwise in Visual Studio 2015 and 2017). At work everything works fine, but now that I have a new Computer at home I tried to get it running.
However, when I call the frame() method for the viewer, I get a Read Access Violation in QtThread.cpp at the setProcessorAffinity(unsigned int cpunum), specifically in the following line:
QtThreadPrivateData* pd = static_cast<QtThreadPrivateData*>(_prvData);
Here is the complete function (QtThread.cpp is part of OpenThreads of OSG):
// Description: set processor affinity for the thread
//
// Use: public
//
int Thread::setProcessorAffinity(unsigned int cpunum)
{
QtThreadPrivateData* pd = static_cast<QtThreadPrivateData*>(_prvData);
pd->cpunum = cpunum;
if (!pd->isRunning) return 0;
// FIXME:
// Qt doesn't have a platform-independent thread affinity method at present.
// Does it automatically configure threads on different processors, or we have to do it ourselves?
return -1;
}
The viewer in OSG is set to osgViewer::Viewer::SingleThreaded, but if I remove that line I get an error "Cannot make QOpenGLContext current in a different thread" in GraphicsWindowQt.cpp(which is part of OsgQt), so that's probably a dead end.
Edit for clarification
I call frame()on the osgViewer::Viewer object.
In this function, the viewer calls realize() (which is a function of the Viewer class).
In there setUpThreading()is called (which is a function of the Viewer Base class).
This in turn calls OpenThreads::SetProcessorAffinityOfCurrentThread(0)
In there, the following code is executed:
Thread* thread = Thread::CurrentThread();
if (thread)
return thread->setProcessorAffinity(cpunum);
thread (after the first line) has a value 0x00000000fdfdfdfd which looks like an error to me.
In any case, the last call is the one I posted in my original question.
I don't even have an idea of where to start fixing this. I assume, it's some processor related problem. My processor is a Ryzen 7 1700 (at work it's an Intel i7 3770k), so maybe that helps.
Otherwise, at home I'm using Windows 10, wheras at work it's Windows 7.
I'd be thankful for any help at all.
So in the end, it seems to be a problem with OpenThreads (and thus the OpenSceneGraph part, which I can do nothing about). When using cmake for the OpenSceneGraph source, there is an option "BUILD_OPENTHREADS_WITH_QT" that needs to be disabled.
I found the solution in this thread in the OSG forum, so thanks to this guy.

Visual Studio 2013 Floating Point Support fix?

I have a DLL, which is commercial software, so therefore I cannot show the code here...
I get the error "6002" -floating point support not loaded, but only on some applications.
This dll is hooked to several applications, without problems.
I tried everything that I found on Google, like reinstalling VC++, clean PC, registry, everything.
So my conclusion is that either there is another dll compiled in another version of Visual Studio (2010) and it`s somehow conflicting with my dll ?!
Or, I have some memory leak, which I cannot find.
I use the following functions in my DLL which (I think) is the issue:
sscanf(); // DISABLED IT FOR TEST, BUT STILL GET ERROR
fprintf_s();
DetourTransactionBegin();
DetourUpdateThread(GetCurrentThread());
DisableThreadLibraryCalls(hModule);
DetourRestoreAfterWith();
_beginthreadex();
strtoul();
Example function I use for logging:
void ##SOFTWARE-CLASS##::write_log(char *text)
{
FILE *write_log;
char dateStr [9];
char timeStr [9];
_strdate( dateStr);
_strtime( timeStr );
write_log = fopen("##SOFTWARE-FILE##","a+");
fprintf_s(write_log,"[%s %s] %s \n", dateStr, timeStr, text);
fclose(write_log);
}
Nothing else is used that may cause floating errors...
The dll is hooked properly, I have no warnings, and no errors.
I must mention, that I have created an empty DLL, with a MessageBox, at the first run, I was getting the same error, but after switching to /fp:strict, the error disappeared. So I did the same thing to my project, but the error is still there. I even recoded the whole project, just to see if it fixes the problem, but no.
Please give me advice on how can I solve this problem, as this is the third day that I am testing...
From MSDN : R6002 the document says that a program will only load floating point support if needed. What this means, is that the detoured code is being injected into binaries which did not initialize the floating point subsystem. The solution would be to relink your commercial product without requiring the floating point code, or to relink with an alternative floating point library.
You could try and detect the missing initialization and perform it yourself, but that would have a larger impact on the injected system, and possibly create instabilities.
Use a debugger with the failing executable, and you should be able to get a call stack which identifies where the failure occurs.

glewInit() crashing (segfault) after creating osmesa (off-screen mesa) context

I'm trying to run an opengl application on a remote computing cluster. I'm using osmesa as I intend to execute off-screen software rendering (no x11 forwarding etc). I want to use glew (to make life dealing with shaders and other extension related calls easier), and I seem to have built and linked both mesa and glew fine.
When I call mesa-create-context, glewinit gives a OPENGL Version not available output, which probably means the context has not been created. When I call glGetString(GL_EXTENSIONS) i dont get any output, which confirms this. This also shows that glew is working fine on its own. (Other glew commands like glew version etc also work).
Now when I (as shown below), add the mesa-make-context-current function, glewinit crashes with a segfault. Running glGetString(GL_EXTENSIONS) gives me a list of extensions now however (which means context creation is successful!)
I've spent hours trying to figure this out, tried tinkering but nothing works. Would greatly appreciate any help on this. Maybe some of you has experienced something similar before?? Thanks again!
int Height = 1; int Width = 1;
OSMesaContext ctx; void *buffer;
ctx = OSMesaCreateContext( OSMESA_RGBA, NULL );
buffer = malloc( Width * Height * 4 * sizeof(GLfloat) );
if (!OSMesaMakeCurrent( ctx, buffer, GL_UNSIGNED_BYTE, Width, Height )) {
printf("OSMesaMakeCurrent failed!\n");
return 0;
}
-- glewinit() crashes after this.
Just to add, osmesa and glew actually did not compile initially. Because glew undefines GLAPI in it's last line and since osmesa will not include gl.h again, GLAPI remains undefined and causes an error in osmesa.h (119). I got around this by adding an extern to GLAPI, not sure if this is relevant though.
Looking at the source to glewInit in glew.c if glewContextInit succeeds it returns GLEW_OK, GLEW_OK is defined to 0, and so on Linux systems it will always call glxewContextInit which calls glX functions that in the case of OSMesa will likely not be ready for use. This will cause a segfault (as I see), and it seems that the glewInit function has no capability to handle this case unfortunately without patching the C source and recompiling the library.
If others have already solved this I would be interested, I have seen some patched versions of the glew.c sources that workaround this. It isn't clear if there is any energy in the GLEW community to merge changes in that address this use case.

Using the IBM 3514 Borland Graphics Interface driver in High resolution mode in Turbo C++ on Windows 7 64 bit OS using DosBox

I'm running a graphical program in Turbo C++ using DosBox on Windows 7 64 bit. Now, I want to use the IBM3514 graphics driver in the High resolution mode (IBM3514HI). So, I wrote the following bare bones program to test it:
#include <graphics.h>
#include <iostream.h>
void main() {
int gd = IBM3514, gm = IBM3514HI, e;
initgraph(&gd, &gm, "C:\\TC\\BGI");
if (e = graphresult()) {
cout << grapherrormsg(e);
}
cleardevice();
rectangle(100, 100, 300, 300);
cin.get();
closegraph();
restorecrtmode();
}
Now, the program compiles and runs without any errors. However, the initgraph function call doesn't initialize graphics mode. The return value of graphresult is 0. Hence, no error has occurred. Yet, the program still runs in text mode. The blinking underscore is visible and the rectangle is not drawn.
I checked my C:\TC\BGI folder and the IMB3514.BGI file exists. Thus I assume that it does load the graphics driver. Yet, I can't figure out why the program doesn't execute in graphics mode, or even throw an error. However it works perfectly fine if I use the default settings: int gd = DETECT, gm;
Any explanation as to why my program doesn't work will be greatly appreciated. Please try to provide a fix to this problem. I would really like to draw on a 1024x768 screen with 256 colors.
Under Windows your graphical adaptor is virtualized. You can't access it directly and use its specific features (unless you use DirectX/OpenGL/other strange methods). DOSBox emulates some "historical" graphical adaptors for the programs it runs (to be precise: Tandy/Hercules/CGA/EGA/VGA/VESA). You must use the VESA 2.0 driver of TC (or in general the VESA driver).
The correctly name of the driver is ibm8514.bgi - not "3514" and not "imb" or so. But like my previous speaker said, better you use another driver. The best choice is to use the egavga.bgi driver of the Turbo resp. Borland C++ or Turbo Pascal package. Then you should compile them successful.
Expect you need a special feature of this driver. Then you must check them of this effort if you need them. I think the egavga.bgi, vesa or a directly switch to the graphic mode with some special routines to make graphic should work in DOSBox, EmuDOS or in all 32-Bit-Version of Windows like Windows XP or so.
Try this code instead:
int gd = 6, gm = 0, e;
(Both variables are INTEGERS, not STRINGS)