I'm having trouble using multiple GPUs with OpenCL/OpenGL interop. I'm trying to write an application which renders the result of an intensive computation. In the end it will run an optimization problem, and then, based on the result, render something to the screen. As a test case, I'm starting with the particle simulation example code from this course: http://web.engr.oregonstate.edu/~mjb/sig13/
The example code creates and OpenGL context, then creates a OpenCL context that shares the state, using the cl_khr_gl_sharing extension. Everything works fine when I use a single GPU. Creating a context looks like this:
3. create an opencl context based on the opengl context:
cl_context_properties props[ ] =
{
CL_GL_CONTEXT_KHR, (cl_context_properties) glXGetCurrentContext( ),
CL_GLX_DISPLAY_KHR, (cl_context_properties) glXGetCurrentDisplay( ),
CL_CONTEXT_PLATFORM, (cl_context_properties) Platform,
0
};
cl_context Context = clCreateContext( props, 1, Device, NULL, NULL, &status );
if( status != CL_SUCCESS)
{
PrintCLError( status, "clCreateContext: " );
exit(1);
}
Later on, the example creates shared CL/GL buffers with clCreateFromGLBuffer.
Now, I would like to create a context from two GPU devices:
cl_context Context = clCreateContext( props, 2, Device, NULL, NULL, &status );
I've successfully opened the devices, and can query that they both support cl_khr_gl_sharing, and both work individually. However, when attempting to create the context as above, I get
CL_INVALID_OPERATION
Which is an error code added by the cl_khr_gl_sharing extension. In the extension description (linked above) it says
CL_INVALID_OPERATION if a context or share group object was
specified for one of CGL, EGL, GLX, or WGL and any of the
following conditions hold:
The OpenGL implementation does not support the window-system
binding API for which a context or share group objects was
specified.
More than one of the attributes CL_CGL_SHAREGROUP_KHR,
CL_EGL_DISPLAY_KHR, CL_GLX_DISPLAY_KHR, and CL_WGL_HDC_KHR is
set to a non-default value.
Both of the attributes CL_CGL_SHAREGROUP_KHR and
CL_GL_CONTEXT_KHR are set to non-default values.
Any of the devices specified in the argument cannot
support OpenCL objects which share the data store of an OpenGL
object, as described in section 9.12."
That description doesn't seem to fit any of my cases exactly. Is it not possible to do OpenCL/OpenGL interop with multiple GPUs? Or is it that I have heterogeneous hardware? I printed out a few parameters from my enumerated devices. I've just taken two random GPUs that I could get my hands on.
PlatformID: 18483216
Num Devices: 2
-------- Device 00 ---------
CL_DEVICE_NAME: GeForce GTX 285
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DEVICE_VERSION: OpenCL 1.0 CUDA
CL_DRIVER_VERSION: 304.88
CL_DEVICE_MAX_COMPUTE_UNITS: 30
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1476
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
-------- Device 01 ---------
CL_DEVICE_NAME: Quadro FX 580
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DEVICE_VERSION: OpenCL 1.0 CUDA
CL_DRIVER_VERSION: 304.88
CL_DEVICE_MAX_COMPUTE_UNITS: 4
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1125
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
cl_khr_gl_sharing is supported on dev 0.
cl_khr_gl_sharing is supported on dev 1.
Note that if I create the context without the interop portion (such that the props array looks like below) then it successfully creates the context, but obviously cannot share buffers with the OpenGL side of the application.
cl_context_properties props[ ] =
{
CL_CONTEXT_PLATFORM, (cl_context_properties) Platform,
0
};
Several related Questions and Examples
Here's a related example of a pure OpenGL approach to shared
processing between multiple gpus
Another pure OpenGL mulitiple gpu question
A producer/consumer example using multiple gpus see the producer source file for calls to make current (looks windows specific but the flow will be similar elsewhere). See glContext for details
bool stageProducer::preExecution()
{
if(!glContext::getInstance().makeCurrent(_rc))
{
window::getInstance().messageBoxWithLastError("wglMakeCurrent");
return false;
}
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, _fboID);
return true;
}
OpenCL specific, but relevant to this question:
"If you enqueue a write to the buffer on queueA(deviceA) then OpenCL will use that device to do the write. However, if you then use the buffer on queueB(deviceB) in the same context, OpenCL will recognize that deviceA has the most recent data and will move it over to deviceB before using it. In short, as long as you use events to ensure that no two devices are trying to access the same memory object at the same time, OpenCL will make sure that each use of the memory object has the most recent data, regardless of which device last used it."
I assume when you take OpenGL out of the equation sharing memory between gpus works as expected?
When you call these two lines:
CL_GL_CONTEXT_KHR, (cl_context_properties) glXGetCurrentContext( ),
CL_GLX_DISPLAY_KHR, (cl_context_properties) glXGetCurrentDisplay( ),
the calls need to come from inside a new thread with a new OpenGL context. You can usually only associate one OpenCL context with one OpenGL context for one device at a time per thread.
Related
after enabling the required glfw extensions on vulkan and creating a surface using glfwCreateWindowSurface() function, is it really needed to check whether a physical device supports presentation features as well ( because we have already enabled the required extensions for understanding window surfaces) , before we choose our physical device to use??
i have come across a code , where they were checking whether a physical device has a queue family which supports presentation features, by using vkGetPhysicalDeviceSurfaceSupportKHR().
// Go through each queue family and check if it has at least 1 of the required types of queue
int i = 0;
for (const auto& queueFamily : queueFamilyList)
{
// First check if queue family has at least 1 queue in that family (could have no queues)
// Check if Queue Family supports presentation
VkBool32 presentationSupport = false;
vkGetPhysicalDeviceSurfaceSupportKHR(device, i, surface, &presentationSupport);
// Check if queue is presentation type (can be both graphics and presentation)
if (queueFamily.queueCount > 0 && presentationSupport)
{
indices.presentationFamily = i;
}
Instance extensions for surfaces allow the instance system and compatible drivers to talk to the owner of those surfaces (ie: the operating system). But this does not mean that every physical device actually has a connection to a display. A particular GPU may literally not be plugged into a monitor, which could make direct interactions with displayable images difficult or impossible.
So you have to check to see what a physical device can do with a surface.
I want to simulate a raytracing on non-RTX graphics card but I can't. I got this error "Raytracing not supported on device" that I indicate in a code at bottom. I set m_useWarpDevice to true but why I still got the error? According to my understand WARP makes an application run any feature (including raytracing) even the hardware is not supported, but why it doesn't work?
Question: How to perform raytracing on a non-RTX graphics card? The reason I insist is I tried to ask the question in Microsoft Forum but no answer.
What is Windows Advanced Rasterization Platform (WARP) Guide?
From https://learn.microsoft.com/en-us/windows/win32/direct3darticles/directx-warp
WARP does not require graphics hardware to execute. It can execute even in situations where hardware is not available or cannot be initialized.
From https://alternativesp.com/software/alternative/windows-advanced-rasterization-platform-warp/
In Windows 10, WARP has been updated to support Direct3D 12 at level 12_1; under Direct3D 12, WARP also replaces the reference rasterizer.
Compiler: Visual Studio 2019
Graphic card: NVIDIA GeForce 920M (non-RTX)
DXSample.cpp
From https://github.com/ScrappyCocco/DirectX-DXR-Tutorials/blob/master/01-Dx12DXRTriangle/Project/DXSample.cpp
At line 19
DXSample::DXSample(const UINT width, const UINT height, const std::wstring name) :
m_width(width),
m_height(height),
m_useWarpDevice(true), // <-- It was false but I set it to true.
m_title(name)
{
m_aspectRatio = static_cast<float>(width) / static_cast<float>(height);
}
D3D12HelloTriangle.cpp
From https://github.com/ScrappyCocco/DirectX-DXR-Tutorials/blob/master/01-Dx12DXRTriangle/Project/D3D12HelloTriangle.cpp
At line 91
if (m_useWarpDevice) { // m_useWarpDevice = true
ComPtr<IDXGIAdapter> warpAdapter;
ThrowIfFailed(factory->EnumWarpAdapter(IID_PPV_ARGS(&warpAdapter))); // <-- Success
ThrowIfFailed(D3D12CreateDevice(warpAdapter.Get(), D3D_FEATURE_LEVEL_12_1, IID_PPV_ARGS(&m_device))); // <-- Success
}
else {
ComPtr<IDXGIAdapter1> hardwareAdapter;
GetHardwareAdapter(factory.Get(), &hardwareAdapter);
ThrowIfFailed(D3D12CreateDevice(hardwareAdapter.Get(), D3D_FEATURE_LEVEL_12_1, IID_PPV_ARGS(&m_device)));
}
At line 494
void D3D12HelloTriangle::CheckRaytracingSupport() const {
D3D12_FEATURE_DATA_D3D12_OPTIONS5 options5 = {};
ThrowIfFailed(m_device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS5, &options5, sizeof(options5)));
if (options5.RaytracingTier < D3D12_RAYTRACING_TIER_1_0) // <-- options5.RaytracingTier = 0 on my computer which means that raytracing is not suppored.
throw std::runtime_error("Raytracing not supported on device"); // <-- I got this error.
}
Off-topic (just help in the future in case I forget):
https://alternativesp.com/software/alternative/windows-advanced-rasterization-platform-warp/
To force an application to use WARP without disabling the display driver, install the Direct X SDK. http://www.microsoft.com/en-us/download/details.aspx?id=6812 , go to C: / windows / system32, run dxcpl.exe, under “Scope” click “Edit list”, add the path to the application.
I tried to use dxcpl.exe to force WARP but options5.RaytracingTier is always 0.
Instead of using warp device you can use the dx12 RTX fallback layer.
https://github.com/microsoft/DirectX-Graphics-Samples/tree/e5ea2ac7430ce39e6f6d619fd85ae32581931589/Libraries/D3D12RaytracingFallback
Please note that is has a few limitations (resource binding is slightly different, also it's unlikely that they will continue to support it).
Also of course since it emulates the on chip RTX with compute shaders, performances are not as good as native.
I develop an application which shows something like a video in its window. I use technologies which are described here Introducing Direct2D 1.1. In my case the only difference is that eventually I create a bitmap using
ID2D1DeviceContext::CreateBitmap
then I use
ID2D1Bitmap::CopyFromMemory
to copy raw RGB data to it and then I call
ID2D1DeviceContext::DrawBitmap
to draw the bitmap. I use the high quality cubic interpolation mode D2D1_INTERPOLATION_MODE_HIGH_QUALITY_CUBIC for scaling to have the best picture but in some cases (RDP, Citrix, virtual machines, etc) it is very slow and has very high CPU consumption. It happens because in those cases a non-hardware video adapter is used. So for non-hardware adapters I am trying to turn off the interpolation and use faster methods. The problem is that I cannot exactly check if the system has a true hardware adapter.
When I call D3D11CreateDevice, I use it with D3D_DRIVER_TYPE_HARDWARE but on virtual machines it typically returns "Microsoft Basic Render Driver" which is a software driver and does not use GPU (it consumes CPU). So currently I check the vendor ID. If the vendor is AMD (ATI), NVIDIA or Intel, then I use the cubic interpolation. In the other case I use the fastest method which does not consume CPU a lot.
Microsoft::WRL::ComPtr<IDXGIDevice> dxgiDevice;
if (SUCCEEDED(m_pD3dDevice->QueryInterface(...)))
{
Microsoft::WRL::ComPtr<IDXGIAdapter> adapter;
if (SUCCEEDED(dxgiDevice->GetAdapter(&adapter)))
{
DXGI_ADAPTER_DESC desc;
if (SUCCEEDED(adapter->GetDesc(&desc)))
{
// NVIDIA
if (desc.VendorId == 0x10DE ||
// AMD
desc.VendorId == 0x1002 || // 0x1022 ?
// Intel
desc.VendorId == 0x8086) // 0x163C, 0x8087 ?
{
bSupported = true;
}
}
}
}
It works for physical (console) Windows session even in virtual machines. But for RDP sessions IDXGIAdapter still returns the vendors in case of real machines but it does not use GPU (I can see it via the Process Hacker 2 and AMD System Monitor (in case of ATI Radeon)) so I still have high CPU consumption with the cubic interpolation. In case of an RDP session to Windows 7 with ATI Radeon it is 10% bigger than via the physical console.
Or am I mistaken and somehow RDP uses GPU resources and that is the reason why it returns a real hardware adapter via IDXGIAdapter::GetDesc?
DirectDraw
Also I looked at DirectX Diagnostic Tool. It looks like the "DirectDraw Acceleration" info field returns exactly what I need. In case of physical (console) sessions it says "Enabled". In case of RDP and virtual machine (without hardware video acceleration) sessions it says "Not Available". I looked at sources and theoretically I can use the verification algorithm. But it is actually for DirectDraw which I do not use in my application. I would like to use something which is directly linked to ID3D11Device, IDXGIDevice, IDXGIAdapter and so on.
IDXGIAdapter1::GetDesc1 and DXGI_ADAPTER_FLAG
I also tried to use IDXGIAdapter1::GetDesc1 and check the flags.
Microsoft::WRL::ComPtr<IDXGIDevice> dxgiDevice;
if (SUCCEEDED(m_pD3dDevice->QueryInterface(...)))
{
Microsoft::WRL::ComPtr<IDXGIAdapter> adapter;
if (SUCCEEDED(dxgiDevice->GetAdapter(&adapter)))
{
Microsoft::WRL::ComPtr<IDXGIAdapter1> adapter1;
if (SUCCEEDED(adapter->QueryInterface(__uuidof(IDXGIAdapter1), reinterpret_cast<void**>(adapter1.GetAddressOf()))))
{
DXGI_ADAPTER_DESC1 desc;
if (SUCCEEDED(adapter1->GetDesc1(&desc)))
{
// desc.Flags
// DXGI_ADAPTER_FLAG_NONE = 0,
// DXGI_ADAPTER_FLAG_REMOTE = 1,
// DXGI_ADAPTER_FLAG_SOFTWARE = 2,
// DXGI_ADAPTER_FLAG_FORCE_DWORD = 0xffffffff
}
}
}
}
Information about the DXGI_ADAPTER_FLAG_SOFTWARE flag
Virtual Machine RDP Win Serv 2012 (Microsoft Basic Render Driver) -> (0x02) DXGI_ADAPTER_FLAG_SOFTWARE
Physical Win 10 (Intel Video) -> (0x00) DXGI_ADAPTER_FLAG_NONE
Physical Win 7 (ATI Radeon) - > (0x00) DXGI_ADAPTER_FLAG_NONE
RDP Win 10 (Intel Video) -> (0x00) DXGI_ADAPTER_FLAG_NONE
RDP Win 7 (ATI Radeon) -> (0x00) DXGI_ADAPTER_FLAG_NONE
In case of RDP session on a real machine with a hardware adapter, Flags == 0 but as I can see via Process Hacker 2 the GPU is not used. At least on Windows 7 with ATI Radeon I can see bigger CPU usage in case of an RDP session. So it looks like DXGI_ADAPTER_FLAG_SOFTWARE is only for Microsoft Basic Render Driver. So the issue is not solved.
The question
Is there a correct way to check if a real hardware video card (GPU) is used for the current Windows session? Or maybe it is possible to check if a specific interpolation mode of ID2D1DeviceContext::DrawBitmap has hardware implementation and uses GPU for the current session?
UPD
The topic is not about detecting RDP or Citrix sessions. It is not about detecting if the application is inside a virtual machine or not. I already have the all verifications and use the linear interpolation for those cases. The topic is about detecting if a real GPU is used for the current Windows session to display the desktop. I am looking for a more sophisticated solution to make decision using features of DirectX and DXGI.
If you want to detect the Microsoft Basic Renderer, the best option is to use it's VID/PID combo:
ComPtr<IDXGIDevice> dxgiDevice;
if (SUCCEEDED(device.As(&dxgiDevice)))
{
ComPtr<IDXGIAdapter> adapter;
if (SUCCEEDED(dxgiDevice->GetAdapter(&adapter)))
{
DXGI_ADAPTER_DESC desc;
if (SUCCEEDED(adapter->GetDesc(&desc)))
{
if ( (desc.VendorId == 0x1414) && (desc.DeviceId == 0x8c) )
{
// WARNING: Microsoft Basic Render Driver is active.
// Performance of this application may be unsatisfactory.
// Please ensure that your video card is Direct3D10/11 capable
// and has the appropriate driver installed.
}
}
}
}
See Microsoft Docs and Anatomy of Direct3D 11 Create Device
You will probably find for testing/debugging that you don't want to explicitly block these scenarios, but you do want to provide some kind of warning or notice feedback to the user that they are using software rather than hardware rendering.
Remote Desktop detection from Win32 classic desktop applications is better done directly via GetSystemMetrics( SM_REMOTESESSION ).
See Microsoft Docs
Answering a 3 years old question as I struggled myself to do so.
I had to go through the registry. First thing is to find the adapter LUID in the registry, to get the adapter GUID
private string GetAdapterGuid(long luid)
{
var directXRegistryKey = Registry.LocalMachine.OpenSubKey(#"SOFTWARE\Microsoft\DirectX");
if (directXRegistryKey == null)
return "";
var subKeyNames = directXRegistryKey.GetSubKeyNames();
foreach (var subKeyName in subKeyNames)
{
var subKey = directXRegistryKey.OpenSubKey(subKeyName);
if (subKey.GetValueKind("AdapterLuid") != RegistryValueKind.QWord)
continue;
var luidValue = (long)subKey.GetValue("AdapterLuid");
if (luidValue == luid)
return subKeyName;
}
return "";
}
Once you have that Guid, you can search for the details of the graphic card in HKLM like this. If it is virtual, the service name will be "INDIRECTKMD" :
private bool IsVirtualAdapter(string adapterGuid)
{
var videoRegistryKey = Registry.LocalMachine.OpenSubKey($#"SYSTEM\CurrentControlSet\Control\Video\{adapterGuid}\Video");
if (videoRegistryKey == null)
return false;
if (videoRegistryKey.GetValueKind("Service") != RegistryValueKind.String)
return false;
var serviceName = (string)videoRegistryKey.GetValue("Service");
return serviceName.ToUpper() == "INDIRECTKMD";
}
Checking the service name felt easier than parsing the DeviceDesc value.
My use case involved having the Guid ready so I split up the function, you could merge it into one.
It also only detect RDP/MSTSC through this, additional service names might be needed for other virtual adapters. Or you could try to detect only Nvidia/AMD/Intel driver names... up to you.
I am starting to write a little "engine" for using OpenCL. Now, I encountered a problem that is quite strange.
When I call clGetDeviceInfo() to query informations of the specific device, some of the options for the parameter param_name return the error code -30 ( = CL_INVALID_VALUE). A very famous one is the option CL_DEVICE_EXTENSIONS which should return me a string of extensions no matter what sdk or platform I am using. I checked every edge and also the parameters are double checked.
Another thing I do not understand is when I run my source on my Windows machine at work, the clGetPlatformInfo() function also returns me CL_INVALID_VALUE querying the CL_PLATFORM_EXTENSIONS string. At home I am using a Linux machine running Ubuntu and it shows the extensions string without any problem.
Here are the data of my platforms:
Work:
Intel Core i5 2500 CPU
NVIDIA Geforce 210 GPU
AMD APP SDK 3.0 Beta
Home:
Intel Core i7 5820K CPU
AMD Radeon HD7700 GPU
AMD APP SDK 3.0 Beta
And here is the source:
The source is written in cpp and the opencl fuctions are embedded in some wrapper classes (i.e. OCLDevice).
OCLDevice::OCLDevice(cl_device_id device)
{
cl_int errNum;
cl_uint uintBuffer;
cl_long longBuffer;
cl_bool boolBuffer;
char str[128];
size_t strSize = (sizeof(char) * 128);
size_t retSize;
//Device name string.
errNum =
clGetDeviceInfo(device,CL_DEVICE_NAME,strSize,(void*)str,&retSize);
throwException();
this->name = string(str,retSize);
//The platform associated with this device.
errNum =
clGetDeviceInfo(device, CL_DEVICE_PLATFORM,
sizeof(cl_platform_id),
(void*)&(this->platform), &retSize);
throwException();
//The OpenCL device type.
errNum =
clGetDeviceInfo(device, CL_DEVICE_TYPE,
sizeof(cl_device_type),
(void*)&(this->devType),&retSize);
throwException();
//Vendor name string.
errNum =
clGetDeviceInfo(device,CL_DEVICE_VENDOR,
strSize,(void*)str,&retSize);
throwException();
this->vendor = string(str,retSize);
//A unique device vendor identifier.
//An example of a unique device identifier could be the PCIe ID.
errNum =
clGetDeviceInfo(device, CL_DEVICE_VENDOR_ID,
sizeof(unsigned int),
(void*)&(this->vendorID),&retSize);
throwException();
//Returns a space separated list of extension names
//supported by the device.
clearString(str,retSize); //fills the char string with 0-characters
errNum =
clGetDeviceInfo(device,CL_DEVICE_EXTENSIONS,strSize,str,&retSize);
throwException();
//some more queries (some with some without the same error)...
}
As you can see in the code param_value_size > param_value_size_ret so that there is no reason to return the error, too. The param_name is copied from the header to be save there is no typing error.
It would be great if somebody knew an answer to this problem.
The OpenCL specification states that clGetDeviceInfo can return CL_INVALID_VALUE if (among other things):
... or if size in bytes specified by param_value_size is < size of return type as specified in table 4.3 ...
For the CL_DEVICE_EXTENSIONS query, you have allocated storage for 128 characters, and are passing 128 as the param_value_size parameter. If the device supports a lot of extensions, it is entirely possible that it needs more than 128 characters.
You can query the amount of space needed to store the query result by passing 0 and NULL to the param_value_size and param_value arguments, and then use this to allocate sufficient storage:
clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, 0, NULL, &retSize);
char extensions[retSize];
clGetDeviceInfo(device, CL_DEVICE_EXTENSIONS, retSize, extensions, &retSize);
I was wondering how I can get the graphics card model/brand from code particularly from DirectX 9.0c (from within C++ code).
The easiest way in DirectX is through IDirect3D9::GetAdapterIdentifier.
Just create a D3DADAPTER_IDENTIFIER9 object, pass a pointer to it to GetAdapterIdentifier. DirectX fills out the graphics card description as a string in the Description field. It also includes information on which display device the card is, and what driver version you have.
You get something like this:
Description: "NVIDIA GeForce GTX 570"
Device: "\.\DISPLAY1"
Driver:
"nvd3dum.dll"
At runtime, you can query the device model and vendor:
In OpenGL, use the command glGetString(GL_VENDOR) or GL_RENDERER or GL_VERSION to get the information you're after.
In DirectX 9, it appears the info is in the Microsoft config system, and is queried from the device database. It's section 3 of this document, which also has example code: http://msdn.microsoft.com/en-us/library/bb204848(VS.85).aspx
Using the same system you can get such information as the amount of ram the video card has, the driver number, etc.
Take a look at Chapter 2. Direct3D from my book The Direct3D Graphics Pipeline. See section 2.12, Identifying a Particular Device.
You can use "DirecX Diagnostic Tool" API, like in sample DxDiagOutput from DX SDK
http://msdn.microsoft.com/en-us/library/ee416986%28v=VS.85%29.aspx
IDirect3D9* d3dobject = Direct3DCreate9(D3D_SDK_VERSION);
D3DPRESENT_PARAMETERS d3dpresent;
memset(&d3dpresent, 0, sizeof(D3DPRESENT_PARAMETERS));
d3dpresent.Windowed = TRUE;
d3dpresent.SwapEffect = D3DSWAPEFFECT_DISCARD;
UINT adaptercount = d3dobject->GetAdapterCount();
D3DADAPTER_IDENTIFIER9* adapters = (D3DADAPTER_IDENTIFIER9*)malloc(sizeof(D3DADAPTER_IDENTIFIER9) * adaptercount);
for (int i = 0; i < adaptercount; i++)
{
d3dobject->GetAdapterIdentifier(i, 0, &(adapters[i]));
}
Then get the description of adapters (adapters->Description)