I want to simulate a raytracing on non-RTX graphics card but I can't. I got this error "Raytracing not supported on device" that I indicate in a code at bottom. I set m_useWarpDevice to true but why I still got the error? According to my understand WARP makes an application run any feature (including raytracing) even the hardware is not supported, but why it doesn't work?
Question: How to perform raytracing on a non-RTX graphics card? The reason I insist is I tried to ask the question in Microsoft Forum but no answer.
What is Windows Advanced Rasterization Platform (WARP) Guide?
From https://learn.microsoft.com/en-us/windows/win32/direct3darticles/directx-warp
WARP does not require graphics hardware to execute. It can execute even in situations where hardware is not available or cannot be initialized.
From https://alternativesp.com/software/alternative/windows-advanced-rasterization-platform-warp/
In Windows 10, WARP has been updated to support Direct3D 12 at level 12_1; under Direct3D 12, WARP also replaces the reference rasterizer.
Compiler: Visual Studio 2019
Graphic card: NVIDIA GeForce 920M (non-RTX)
DXSample.cpp
From https://github.com/ScrappyCocco/DirectX-DXR-Tutorials/blob/master/01-Dx12DXRTriangle/Project/DXSample.cpp
At line 19
DXSample::DXSample(const UINT width, const UINT height, const std::wstring name) :
m_width(width),
m_height(height),
m_useWarpDevice(true), // <-- It was false but I set it to true.
m_title(name)
{
m_aspectRatio = static_cast<float>(width) / static_cast<float>(height);
}
D3D12HelloTriangle.cpp
From https://github.com/ScrappyCocco/DirectX-DXR-Tutorials/blob/master/01-Dx12DXRTriangle/Project/D3D12HelloTriangle.cpp
At line 91
if (m_useWarpDevice) { // m_useWarpDevice = true
ComPtr<IDXGIAdapter> warpAdapter;
ThrowIfFailed(factory->EnumWarpAdapter(IID_PPV_ARGS(&warpAdapter))); // <-- Success
ThrowIfFailed(D3D12CreateDevice(warpAdapter.Get(), D3D_FEATURE_LEVEL_12_1, IID_PPV_ARGS(&m_device))); // <-- Success
}
else {
ComPtr<IDXGIAdapter1> hardwareAdapter;
GetHardwareAdapter(factory.Get(), &hardwareAdapter);
ThrowIfFailed(D3D12CreateDevice(hardwareAdapter.Get(), D3D_FEATURE_LEVEL_12_1, IID_PPV_ARGS(&m_device)));
}
At line 494
void D3D12HelloTriangle::CheckRaytracingSupport() const {
D3D12_FEATURE_DATA_D3D12_OPTIONS5 options5 = {};
ThrowIfFailed(m_device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS5, &options5, sizeof(options5)));
if (options5.RaytracingTier < D3D12_RAYTRACING_TIER_1_0) // <-- options5.RaytracingTier = 0 on my computer which means that raytracing is not suppored.
throw std::runtime_error("Raytracing not supported on device"); // <-- I got this error.
}
Off-topic (just help in the future in case I forget):
https://alternativesp.com/software/alternative/windows-advanced-rasterization-platform-warp/
To force an application to use WARP without disabling the display driver, install the Direct X SDK. http://www.microsoft.com/en-us/download/details.aspx?id=6812 , go to C: / windows / system32, run dxcpl.exe, under “Scope” click “Edit list”, add the path to the application.
I tried to use dxcpl.exe to force WARP but options5.RaytracingTier is always 0.
Instead of using warp device you can use the dx12 RTX fallback layer.
https://github.com/microsoft/DirectX-Graphics-Samples/tree/e5ea2ac7430ce39e6f6d619fd85ae32581931589/Libraries/D3D12RaytracingFallback
Please note that is has a few limitations (resource binding is slightly different, also it's unlikely that they will continue to support it).
Also of course since it emulates the on chip RTX with compute shaders, performances are not as good as native.
Related
I am trying to run the "graphMemoryNodes" sample provided with Cuda Toolkit 11.6 on a A100 card. It compiles fine but stops after some compatibility checks:
cudaDeviceGetAttribute(&deviceSupportsMemoryPools,
cudaDevAttrMemoryPoolsSupported, device);
if (!deviceSupportsMemoryPools) {
printf("Waiving execution as device does not support Memory Pools\n");
exit(EXIT_WAIVED);
} else {
printf("Setting up sample.\n");
}
Here, deviceSupportsMemoryPools is equal to 0, meaning that the card is not supported to use Memory Pools. Is it correct? It sounds surprising to me but I couldn't find the answer.
I am using VS2022 on Windows Server 2019. The card driver is 513.46.
Thank you for your help
I have a perfectly working program which connects to a video camera (an IDS uEye camera) and continuously grabs frames from it and displays them.
However, when loading a specific dll before connecting to the camera, the program runs with 100% CPU load. If I load the dll after connecting to the camera, the program runs fine.
int main()
{
INT nRet = IS_NO_SUCCESS;
// init camera (open next available camera)
m_hCam = (HIDS)0;
// (A) Uncomment this for 100% CPU load:
// HMODULE handle = LoadLibrary(L"myInnocentDll.dll");
// This is the call to the 3rdparty camera vendor's library:
nRet = is_InitCamera(&m_hCam, 0);
// (B) Uncomment this instead of (A) and the CPU load won't change
// HMODULE handle = LoadLibrary(L"myInnocentDll.dll");
if (nRet == IS_SUCCESS)
{
/*
* Please note: I have removed all lines which are not necessary for the exploit.
* Therefore this is NOT a full example of how to properly initialize an IDS camera!
*/
is_GetSensorInfo(m_hCam, &m_sInfo);
GetMaxImageSize(m_hCam, &m_s32ImageWidth, &m_s32ImageHeight);
m_nColorMode = IS_CM_BGR8_PACKED;// IS_CM_BGRA8_PACKED;
m_nBitsPerPixel = 24; // 32;
nRet |= is_SetColorMode(m_hCam, m_nColorMode);
// allocate image memory.
if (is_AllocImageMem(m_hCam, m_s32ImageWidth, m_s32ImageHeight, m_nBitsPerPixel, &m_pcImageMemory, &m_lMemoryId) != IS_SUCCESS)
{
return 1;
}
else
{
is_SetImageMem(m_hCam, m_pcImageMemory, m_lMemoryId);
}
}
else
{
return 1;
}
std::thread([&]() {
while (true) {
is_FreezeVideo(m_hCam, IS_WAIT);
/*
* Usually, the image memory would now be grabbed via is_GetImageMem().
* but as it is not needed for the exploit, I removed it as well
*/
}
}).detach();
cv::waitKey(0);
}
Independently of the actually used camera driver, in what way could loading a dll change the performance of it, occupying 100% of all available CPU cores? When using the Visual Studio Diagnostic Tools, the excess CPU time is attributed to "[External Call] SwitchToThread" and not to the myInnocentDll.
Loading just the dll without the camera initialization does not result in 100% CPU load.
I was first thinking of some static initializers in the myInnocentDll.dll configuring some threading behavior, but I did not find anything pointing in this direction. For which aspects should I look for in the code of myInnocentDll.dll?
After a lot of digging I found the answer and it is both frustratingly simple and frustrating by itself:
It is Microsoft's poor support of OpenMP. When I disabled OpenMP in my project, the camera driver runs just fine.
The reason seems to be that the Microsoft compiler uses OpenMP with busy waiting and there is also the possibility to manually configure OMP_WAIT_POLICY, but as I was not depending on OpenMP anyways, disabling was the easiest solution for me.
https://developercommunity.visualstudio.com/content/problem/589564/how-to-control-omp-wait-policy-for-openmp.html
https://support.microsoft.com/en-us/help/2689322/redistributable-package-fix-high-cpu-usage-when-you-run-a-visual-c-201
I still don't understand why the CPU only went up high when using the camera and not when running the rest of my solution, even though the camera library is pre-built and my disabling/enabling of OpenMP compilation cannot have any effect on it. And I also don't understand why they bothered to make a hotfix for VS2010 but have no real fix as of VS2019, which I am using. But the problem is averted.
You can disable CPU idle state in the IDS camera manager and then the minimum CPU load in the windows energy plans is set to 100%
I think this is worth mentioning here, even you solved your problem already.
I develop an application which shows something like a video in its window. I use technologies which are described here Introducing Direct2D 1.1. In my case the only difference is that eventually I create a bitmap using
ID2D1DeviceContext::CreateBitmap
then I use
ID2D1Bitmap::CopyFromMemory
to copy raw RGB data to it and then I call
ID2D1DeviceContext::DrawBitmap
to draw the bitmap. I use the high quality cubic interpolation mode D2D1_INTERPOLATION_MODE_HIGH_QUALITY_CUBIC for scaling to have the best picture but in some cases (RDP, Citrix, virtual machines, etc) it is very slow and has very high CPU consumption. It happens because in those cases a non-hardware video adapter is used. So for non-hardware adapters I am trying to turn off the interpolation and use faster methods. The problem is that I cannot exactly check if the system has a true hardware adapter.
When I call D3D11CreateDevice, I use it with D3D_DRIVER_TYPE_HARDWARE but on virtual machines it typically returns "Microsoft Basic Render Driver" which is a software driver and does not use GPU (it consumes CPU). So currently I check the vendor ID. If the vendor is AMD (ATI), NVIDIA or Intel, then I use the cubic interpolation. In the other case I use the fastest method which does not consume CPU a lot.
Microsoft::WRL::ComPtr<IDXGIDevice> dxgiDevice;
if (SUCCEEDED(m_pD3dDevice->QueryInterface(...)))
{
Microsoft::WRL::ComPtr<IDXGIAdapter> adapter;
if (SUCCEEDED(dxgiDevice->GetAdapter(&adapter)))
{
DXGI_ADAPTER_DESC desc;
if (SUCCEEDED(adapter->GetDesc(&desc)))
{
// NVIDIA
if (desc.VendorId == 0x10DE ||
// AMD
desc.VendorId == 0x1002 || // 0x1022 ?
// Intel
desc.VendorId == 0x8086) // 0x163C, 0x8087 ?
{
bSupported = true;
}
}
}
}
It works for physical (console) Windows session even in virtual machines. But for RDP sessions IDXGIAdapter still returns the vendors in case of real machines but it does not use GPU (I can see it via the Process Hacker 2 and AMD System Monitor (in case of ATI Radeon)) so I still have high CPU consumption with the cubic interpolation. In case of an RDP session to Windows 7 with ATI Radeon it is 10% bigger than via the physical console.
Or am I mistaken and somehow RDP uses GPU resources and that is the reason why it returns a real hardware adapter via IDXGIAdapter::GetDesc?
DirectDraw
Also I looked at DirectX Diagnostic Tool. It looks like the "DirectDraw Acceleration" info field returns exactly what I need. In case of physical (console) sessions it says "Enabled". In case of RDP and virtual machine (without hardware video acceleration) sessions it says "Not Available". I looked at sources and theoretically I can use the verification algorithm. But it is actually for DirectDraw which I do not use in my application. I would like to use something which is directly linked to ID3D11Device, IDXGIDevice, IDXGIAdapter and so on.
IDXGIAdapter1::GetDesc1 and DXGI_ADAPTER_FLAG
I also tried to use IDXGIAdapter1::GetDesc1 and check the flags.
Microsoft::WRL::ComPtr<IDXGIDevice> dxgiDevice;
if (SUCCEEDED(m_pD3dDevice->QueryInterface(...)))
{
Microsoft::WRL::ComPtr<IDXGIAdapter> adapter;
if (SUCCEEDED(dxgiDevice->GetAdapter(&adapter)))
{
Microsoft::WRL::ComPtr<IDXGIAdapter1> adapter1;
if (SUCCEEDED(adapter->QueryInterface(__uuidof(IDXGIAdapter1), reinterpret_cast<void**>(adapter1.GetAddressOf()))))
{
DXGI_ADAPTER_DESC1 desc;
if (SUCCEEDED(adapter1->GetDesc1(&desc)))
{
// desc.Flags
// DXGI_ADAPTER_FLAG_NONE = 0,
// DXGI_ADAPTER_FLAG_REMOTE = 1,
// DXGI_ADAPTER_FLAG_SOFTWARE = 2,
// DXGI_ADAPTER_FLAG_FORCE_DWORD = 0xffffffff
}
}
}
}
Information about the DXGI_ADAPTER_FLAG_SOFTWARE flag
Virtual Machine RDP Win Serv 2012 (Microsoft Basic Render Driver) -> (0x02) DXGI_ADAPTER_FLAG_SOFTWARE
Physical Win 10 (Intel Video) -> (0x00) DXGI_ADAPTER_FLAG_NONE
Physical Win 7 (ATI Radeon) - > (0x00) DXGI_ADAPTER_FLAG_NONE
RDP Win 10 (Intel Video) -> (0x00) DXGI_ADAPTER_FLAG_NONE
RDP Win 7 (ATI Radeon) -> (0x00) DXGI_ADAPTER_FLAG_NONE
In case of RDP session on a real machine with a hardware adapter, Flags == 0 but as I can see via Process Hacker 2 the GPU is not used. At least on Windows 7 with ATI Radeon I can see bigger CPU usage in case of an RDP session. So it looks like DXGI_ADAPTER_FLAG_SOFTWARE is only for Microsoft Basic Render Driver. So the issue is not solved.
The question
Is there a correct way to check if a real hardware video card (GPU) is used for the current Windows session? Or maybe it is possible to check if a specific interpolation mode of ID2D1DeviceContext::DrawBitmap has hardware implementation and uses GPU for the current session?
UPD
The topic is not about detecting RDP or Citrix sessions. It is not about detecting if the application is inside a virtual machine or not. I already have the all verifications and use the linear interpolation for those cases. The topic is about detecting if a real GPU is used for the current Windows session to display the desktop. I am looking for a more sophisticated solution to make decision using features of DirectX and DXGI.
If you want to detect the Microsoft Basic Renderer, the best option is to use it's VID/PID combo:
ComPtr<IDXGIDevice> dxgiDevice;
if (SUCCEEDED(device.As(&dxgiDevice)))
{
ComPtr<IDXGIAdapter> adapter;
if (SUCCEEDED(dxgiDevice->GetAdapter(&adapter)))
{
DXGI_ADAPTER_DESC desc;
if (SUCCEEDED(adapter->GetDesc(&desc)))
{
if ( (desc.VendorId == 0x1414) && (desc.DeviceId == 0x8c) )
{
// WARNING: Microsoft Basic Render Driver is active.
// Performance of this application may be unsatisfactory.
// Please ensure that your video card is Direct3D10/11 capable
// and has the appropriate driver installed.
}
}
}
}
See Microsoft Docs and Anatomy of Direct3D 11 Create Device
You will probably find for testing/debugging that you don't want to explicitly block these scenarios, but you do want to provide some kind of warning or notice feedback to the user that they are using software rather than hardware rendering.
Remote Desktop detection from Win32 classic desktop applications is better done directly via GetSystemMetrics( SM_REMOTESESSION ).
See Microsoft Docs
Answering a 3 years old question as I struggled myself to do so.
I had to go through the registry. First thing is to find the adapter LUID in the registry, to get the adapter GUID
private string GetAdapterGuid(long luid)
{
var directXRegistryKey = Registry.LocalMachine.OpenSubKey(#"SOFTWARE\Microsoft\DirectX");
if (directXRegistryKey == null)
return "";
var subKeyNames = directXRegistryKey.GetSubKeyNames();
foreach (var subKeyName in subKeyNames)
{
var subKey = directXRegistryKey.OpenSubKey(subKeyName);
if (subKey.GetValueKind("AdapterLuid") != RegistryValueKind.QWord)
continue;
var luidValue = (long)subKey.GetValue("AdapterLuid");
if (luidValue == luid)
return subKeyName;
}
return "";
}
Once you have that Guid, you can search for the details of the graphic card in HKLM like this. If it is virtual, the service name will be "INDIRECTKMD" :
private bool IsVirtualAdapter(string adapterGuid)
{
var videoRegistryKey = Registry.LocalMachine.OpenSubKey($#"SYSTEM\CurrentControlSet\Control\Video\{adapterGuid}\Video");
if (videoRegistryKey == null)
return false;
if (videoRegistryKey.GetValueKind("Service") != RegistryValueKind.String)
return false;
var serviceName = (string)videoRegistryKey.GetValue("Service");
return serviceName.ToUpper() == "INDIRECTKMD";
}
Checking the service name felt easier than parsing the DeviceDesc value.
My use case involved having the Guid ready so I split up the function, you could merge it into one.
It also only detect RDP/MSTSC through this, additional service names might be needed for other virtual adapters. Or you could try to detect only Nvidia/AMD/Intel driver names... up to you.
I am currently using a nvidia 675M and in directx 11 I was fine running with feature level 11_0
I am following guides for directx 12 and they say I can still create a device with the feature level 11_0 but when I run it says it is not supported
I know 100% I'm using the correct adapter as it says 675m
So just wondered if there is any way around this or another method or if simply I need a new graphics card :(
The NVidia 675M is a "Fermi" GPU which should be supported for DirectX 12 by NVIDIA per this post. The initial focus for NVidia's DX12 driver support is their Maxwell and Kepler parts, so check with NVidia for a driver that supports Fermi.
Another issue to keep in mind is that in systems with more than one graphics card, you need to be sure you are in fact picking the right adapter. The DirectX 12 VS templates use the following code to achieve this:
void DX::DeviceResources::GetAdapter(IDXGIAdapter1** ppAdapter)
{
*ppAdapter = nullptr;
ComPtr<IDXGIAdapter1> adapter;
for (UINT adapterIndex = 0; DXGI_ERROR_NOT_FOUND != m_dxgiFactory->EnumAdapters1(adapterIndex, adapter.ReleaseAndGetAddressOf()); ++adapterIndex)
{
DXGI_ADAPTER_DESC1 desc;
DX::ThrowIfFailed(adapter->GetDesc1(&desc));
if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE)
{
// Don't select the Basic Render Driver adapter.
continue;
}
// Check to see if the adapter supports Direct3D 12, but don't create the actual device yet.
if (SUCCEEDED(D3D12CreateDevice(adapter.Get(), m_d3dMinFeatureLevel, _uuidof(ID3D12Device), nullptr)))
{
#ifdef _DEBUG
WCHAR buff[256] = {};
swprintf_s(buff, L"Direct3D Adapter (%u): VID:%04X, PID:%04X - %ls\n", adapterIndex, desc.VendorId, desc.DeviceId, desc.Description);
OutputDebugStringW(buff);
#endif
break;
}
}
#if !defined(NDEBUG)
if (!adapter)
{
// Try WARP12 instead
if (FAILED(m_dxgiFactory->EnumWarpAdapter(IID_PPV_ARGS(adapter.ReleaseAndGetAddressOf()))))
{
throw std::exception("WARP12 not available. Enable the 'Graphics Tools' feature-on-demand");
}
OutputDebugStringA("Direct3D Adapter - WARP12\n");
}
#endif
if (!adapter)
{
throw std::exception("No Direct3D 12 device found");
}
*ppAdapter = adapter.Detach();
}
NVIDIA have not yet released a driver that supports DX12 on Fermi, so this won't work.
Initial support for DirectX 12 in Fermi was introduced in the current R384.76, as observed by users on Guru3D here and here, although the driver release notes do not state this.
You may want to run 3DMark Time Spy or a similar DirectX 12 workload to confirm this.
I'm having trouble using multiple GPUs with OpenCL/OpenGL interop. I'm trying to write an application which renders the result of an intensive computation. In the end it will run an optimization problem, and then, based on the result, render something to the screen. As a test case, I'm starting with the particle simulation example code from this course: http://web.engr.oregonstate.edu/~mjb/sig13/
The example code creates and OpenGL context, then creates a OpenCL context that shares the state, using the cl_khr_gl_sharing extension. Everything works fine when I use a single GPU. Creating a context looks like this:
3. create an opencl context based on the opengl context:
cl_context_properties props[ ] =
{
CL_GL_CONTEXT_KHR, (cl_context_properties) glXGetCurrentContext( ),
CL_GLX_DISPLAY_KHR, (cl_context_properties) glXGetCurrentDisplay( ),
CL_CONTEXT_PLATFORM, (cl_context_properties) Platform,
0
};
cl_context Context = clCreateContext( props, 1, Device, NULL, NULL, &status );
if( status != CL_SUCCESS)
{
PrintCLError( status, "clCreateContext: " );
exit(1);
}
Later on, the example creates shared CL/GL buffers with clCreateFromGLBuffer.
Now, I would like to create a context from two GPU devices:
cl_context Context = clCreateContext( props, 2, Device, NULL, NULL, &status );
I've successfully opened the devices, and can query that they both support cl_khr_gl_sharing, and both work individually. However, when attempting to create the context as above, I get
CL_INVALID_OPERATION
Which is an error code added by the cl_khr_gl_sharing extension. In the extension description (linked above) it says
CL_INVALID_OPERATION if a context or share group object was
specified for one of CGL, EGL, GLX, or WGL and any of the
following conditions hold:
The OpenGL implementation does not support the window-system
binding API for which a context or share group objects was
specified.
More than one of the attributes CL_CGL_SHAREGROUP_KHR,
CL_EGL_DISPLAY_KHR, CL_GLX_DISPLAY_KHR, and CL_WGL_HDC_KHR is
set to a non-default value.
Both of the attributes CL_CGL_SHAREGROUP_KHR and
CL_GL_CONTEXT_KHR are set to non-default values.
Any of the devices specified in the argument cannot
support OpenCL objects which share the data store of an OpenGL
object, as described in section 9.12."
That description doesn't seem to fit any of my cases exactly. Is it not possible to do OpenCL/OpenGL interop with multiple GPUs? Or is it that I have heterogeneous hardware? I printed out a few parameters from my enumerated devices. I've just taken two random GPUs that I could get my hands on.
PlatformID: 18483216
Num Devices: 2
-------- Device 00 ---------
CL_DEVICE_NAME: GeForce GTX 285
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DEVICE_VERSION: OpenCL 1.0 CUDA
CL_DRIVER_VERSION: 304.88
CL_DEVICE_MAX_COMPUTE_UNITS: 30
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1476
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
-------- Device 01 ---------
CL_DEVICE_NAME: Quadro FX 580
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DEVICE_VERSION: OpenCL 1.0 CUDA
CL_DRIVER_VERSION: 304.88
CL_DEVICE_MAX_COMPUTE_UNITS: 4
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1125
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
cl_khr_gl_sharing is supported on dev 0.
cl_khr_gl_sharing is supported on dev 1.
Note that if I create the context without the interop portion (such that the props array looks like below) then it successfully creates the context, but obviously cannot share buffers with the OpenGL side of the application.
cl_context_properties props[ ] =
{
CL_CONTEXT_PLATFORM, (cl_context_properties) Platform,
0
};
Several related Questions and Examples
Here's a related example of a pure OpenGL approach to shared
processing between multiple gpus
Another pure OpenGL mulitiple gpu question
A producer/consumer example using multiple gpus see the producer source file for calls to make current (looks windows specific but the flow will be similar elsewhere). See glContext for details
bool stageProducer::preExecution()
{
if(!glContext::getInstance().makeCurrent(_rc))
{
window::getInstance().messageBoxWithLastError("wglMakeCurrent");
return false;
}
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, _fboID);
return true;
}
OpenCL specific, but relevant to this question:
"If you enqueue a write to the buffer on queueA(deviceA) then OpenCL will use that device to do the write. However, if you then use the buffer on queueB(deviceB) in the same context, OpenCL will recognize that deviceA has the most recent data and will move it over to deviceB before using it. In short, as long as you use events to ensure that no two devices are trying to access the same memory object at the same time, OpenCL will make sure that each use of the memory object has the most recent data, regardless of which device last used it."
I assume when you take OpenGL out of the equation sharing memory between gpus works as expected?
When you call these two lines:
CL_GL_CONTEXT_KHR, (cl_context_properties) glXGetCurrentContext( ),
CL_GLX_DISPLAY_KHR, (cl_context_properties) glXGetCurrentDisplay( ),
the calls need to come from inside a new thread with a new OpenGL context. You can usually only associate one OpenCL context with one OpenGL context for one device at a time per thread.