Ampere A100 not supporting Memory Pools? - c++

I am trying to run the "graphMemoryNodes" sample provided with Cuda Toolkit 11.6 on a A100 card. It compiles fine but stops after some compatibility checks:
cudaDeviceGetAttribute(&deviceSupportsMemoryPools,
cudaDevAttrMemoryPoolsSupported, device);
if (!deviceSupportsMemoryPools) {
printf("Waiving execution as device does not support Memory Pools\n");
exit(EXIT_WAIVED);
} else {
printf("Setting up sample.\n");
}
Here, deviceSupportsMemoryPools is equal to 0, meaning that the card is not supported to use Memory Pools. Is it correct? It sounds surprising to me but I couldn't find the answer.
I am using VS2022 on Windows Server 2019. The card driver is 513.46.
Thank you for your help

Related

How to perform raytracing on a non-RTX graphics card?

I want to simulate a raytracing on non-RTX graphics card but I can't. I got this error "Raytracing not supported on device" that I indicate in a code at bottom. I set m_useWarpDevice to true but why I still got the error? According to my understand WARP makes an application run any feature (including raytracing) even the hardware is not supported, but why it doesn't work?
Question: How to perform raytracing on a non-RTX graphics card? The reason I insist is I tried to ask the question in Microsoft Forum but no answer.
What is Windows Advanced Rasterization Platform (WARP) Guide?
From https://learn.microsoft.com/en-us/windows/win32/direct3darticles/directx-warp
WARP does not require graphics hardware to execute. It can execute even in situations where hardware is not available or cannot be initialized.
From https://alternativesp.com/software/alternative/windows-advanced-rasterization-platform-warp/
In Windows 10, WARP has been updated to support Direct3D 12 at level 12_1; under Direct3D 12, WARP also replaces the reference rasterizer.
Compiler: Visual Studio 2019
Graphic card: NVIDIA GeForce 920M (non-RTX)
DXSample.cpp
From https://github.com/ScrappyCocco/DirectX-DXR-Tutorials/blob/master/01-Dx12DXRTriangle/Project/DXSample.cpp
At line 19
DXSample::DXSample(const UINT width, const UINT height, const std::wstring name) :
m_width(width),
m_height(height),
m_useWarpDevice(true), // <-- It was false but I set it to true.
m_title(name)
{
m_aspectRatio = static_cast<float>(width) / static_cast<float>(height);
}
D3D12HelloTriangle.cpp
From https://github.com/ScrappyCocco/DirectX-DXR-Tutorials/blob/master/01-Dx12DXRTriangle/Project/D3D12HelloTriangle.cpp
At line 91
if (m_useWarpDevice) { // m_useWarpDevice = true
ComPtr<IDXGIAdapter> warpAdapter;
ThrowIfFailed(factory->EnumWarpAdapter(IID_PPV_ARGS(&warpAdapter))); // <-- Success
ThrowIfFailed(D3D12CreateDevice(warpAdapter.Get(), D3D_FEATURE_LEVEL_12_1, IID_PPV_ARGS(&m_device))); // <-- Success
}
else {
ComPtr<IDXGIAdapter1> hardwareAdapter;
GetHardwareAdapter(factory.Get(), &hardwareAdapter);
ThrowIfFailed(D3D12CreateDevice(hardwareAdapter.Get(), D3D_FEATURE_LEVEL_12_1, IID_PPV_ARGS(&m_device)));
}
At line 494
void D3D12HelloTriangle::CheckRaytracingSupport() const {
D3D12_FEATURE_DATA_D3D12_OPTIONS5 options5 = {};
ThrowIfFailed(m_device->CheckFeatureSupport(D3D12_FEATURE_D3D12_OPTIONS5, &options5, sizeof(options5)));
if (options5.RaytracingTier < D3D12_RAYTRACING_TIER_1_0) // <-- options5.RaytracingTier = 0 on my computer which means that raytracing is not suppored.
throw std::runtime_error("Raytracing not supported on device"); // <-- I got this error.
}
Off-topic (just help in the future in case I forget):
https://alternativesp.com/software/alternative/windows-advanced-rasterization-platform-warp/
To force an application to use WARP without disabling the display driver, install the Direct X SDK. http://www.microsoft.com/en-us/download/details.aspx?id=6812 , go to C: / windows / system32, run dxcpl.exe, under “Scope” click “Edit list”, add the path to the application.
I tried to use dxcpl.exe to force WARP but options5.RaytracingTier is always 0.
Instead of using warp device you can use the dx12 RTX fallback layer.
https://github.com/microsoft/DirectX-Graphics-Samples/tree/e5ea2ac7430ce39e6f6d619fd85ae32581931589/Libraries/D3D12RaytracingFallback
Please note that is has a few limitations (resource binding is slightly different, also it's unlikely that they will continue to support it).
Also of course since it emulates the on chip RTX with compute shaders, performances are not as good as native.

CUDA GPU __global__ function does not complete

__global__ void functionA()
{
printf("functionA");
}
int main()
{
printf("main1");
functionA<<<1,1>>>();
printf("main2");
}
I'm trying to run a simple test with the above. But the program only outputs "main1". The program should output "functionA" and "main2" too.
This seems to have two reasons:
First of all you need to add
cudaDeviceSynchronize();
after the CUDA routine in order to block the main until the device has completed all tasks.
Furthermore this might happen if you set the wrong GPU architecture/compute capability XX when compiling the code
$ nvcc -gencode=arch=compute_XX,code=sm_XX -o my_app my_app.cu
In this case only the host code is run while the parts on the accelerator will be omitted it seems. You can find an overview of the corresponding number XX for the different hardware generations over here. The K20m you are running is 35. So it should be
$ nvcc -gencode=arch=compute_35,code=sm_35 -o my_app my_app.cu
in your case.
This might also occur if you have multiple graphic accelerators in your system and the code is executed on the wrong one. Each graphics card/accelerator is assigned a particular device id. The device with number 0 should be assigned automatically to the most powerful device and will be used by default. Therefore the first time I compiled the code on my system containing a powerful Tesla K80 (architecture 37) and a low power Quadro P620 (architecture 60) I selected 37 and had the same error as you have while when selecting 60 the code would run. I then used then the Querying Device Properties example to give me a list of the CUDA-capable devices and their corresponding device id, just to find out that on my system the Tesla K80 is set as 1 and 2 while the simple Quadro P620 graphics card is set as 0. I assume this is the case as the K80 is deprecated in CUDA 11!
You can select the device inside your code with cudaSetDevice or change it when launching the program with
$ CUDA_VISIBLE_DEVICES="1" ./my_app
where 1 has to be replaced by the device id you wish to use. Doing so should make your code run without any problems.
You can also test if this really is the issue this by cloning the Github repository of "Learn CUDA Programming", then browsing Chapter01/01_cuda_introduction/01_hello_world/, compile the make file with $ make and finally run it with $ ./hello_world. It automatically compiles for multiple architectures/compute capabilities and should therefore run without any issue!

How to check if a true hardware video adapter is used

I develop an application which shows something like a video in its window. I use technologies which are described here Introducing Direct2D 1.1. In my case the only difference is that eventually I create a bitmap using
ID2D1DeviceContext::CreateBitmap
then I use
ID2D1Bitmap::CopyFromMemory
to copy raw RGB data to it and then I call
ID2D1DeviceContext::DrawBitmap
to draw the bitmap. I use the high quality cubic interpolation mode D2D1_INTERPOLATION_MODE_HIGH_QUALITY_CUBIC for scaling to have the best picture but in some cases (RDP, Citrix, virtual machines, etc) it is very slow and has very high CPU consumption. It happens because in those cases a non-hardware video adapter is used. So for non-hardware adapters I am trying to turn off the interpolation and use faster methods. The problem is that I cannot exactly check if the system has a true hardware adapter.
When I call D3D11CreateDevice, I use it with D3D_DRIVER_TYPE_HARDWARE but on virtual machines it typically returns "Microsoft Basic Render Driver" which is a software driver and does not use GPU (it consumes CPU). So currently I check the vendor ID. If the vendor is AMD (ATI), NVIDIA or Intel, then I use the cubic interpolation. In the other case I use the fastest method which does not consume CPU a lot.
Microsoft::WRL::ComPtr<IDXGIDevice> dxgiDevice;
if (SUCCEEDED(m_pD3dDevice->QueryInterface(...)))
{
Microsoft::WRL::ComPtr<IDXGIAdapter> adapter;
if (SUCCEEDED(dxgiDevice->GetAdapter(&adapter)))
{
DXGI_ADAPTER_DESC desc;
if (SUCCEEDED(adapter->GetDesc(&desc)))
{
// NVIDIA
if (desc.VendorId == 0x10DE ||
// AMD
desc.VendorId == 0x1002 || // 0x1022 ?
// Intel
desc.VendorId == 0x8086) // 0x163C, 0x8087 ?
{
bSupported = true;
}
}
}
}
It works for physical (console) Windows session even in virtual machines. But for RDP sessions IDXGIAdapter still returns the vendors in case of real machines but it does not use GPU (I can see it via the Process Hacker 2 and AMD System Monitor (in case of ATI Radeon)) so I still have high CPU consumption with the cubic interpolation. In case of an RDP session to Windows 7 with ATI Radeon it is 10% bigger than via the physical console.
Or am I mistaken and somehow RDP uses GPU resources and that is the reason why it returns a real hardware adapter via IDXGIAdapter::GetDesc?
DirectDraw
Also I looked at DirectX Diagnostic Tool. It looks like the "DirectDraw Acceleration" info field returns exactly what I need. In case of physical (console) sessions it says "Enabled". In case of RDP and virtual machine (without hardware video acceleration) sessions it says "Not Available". I looked at sources and theoretically I can use the verification algorithm. But it is actually for DirectDraw which I do not use in my application. I would like to use something which is directly linked to ID3D11Device, IDXGIDevice, IDXGIAdapter and so on.
IDXGIAdapter1::GetDesc1 and DXGI_ADAPTER_FLAG
I also tried to use IDXGIAdapter1::GetDesc1 and check the flags.
Microsoft::WRL::ComPtr<IDXGIDevice> dxgiDevice;
if (SUCCEEDED(m_pD3dDevice->QueryInterface(...)))
{
Microsoft::WRL::ComPtr<IDXGIAdapter> adapter;
if (SUCCEEDED(dxgiDevice->GetAdapter(&adapter)))
{
Microsoft::WRL::ComPtr<IDXGIAdapter1> adapter1;
if (SUCCEEDED(adapter->QueryInterface(__uuidof(IDXGIAdapter1), reinterpret_cast<void**>(adapter1.GetAddressOf()))))
{
DXGI_ADAPTER_DESC1 desc;
if (SUCCEEDED(adapter1->GetDesc1(&desc)))
{
// desc.Flags
// DXGI_ADAPTER_FLAG_NONE = 0,
// DXGI_ADAPTER_FLAG_REMOTE = 1,
// DXGI_ADAPTER_FLAG_SOFTWARE = 2,
// DXGI_ADAPTER_FLAG_FORCE_DWORD = 0xffffffff
}
}
}
}
Information about the DXGI_ADAPTER_FLAG_SOFTWARE flag
Virtual Machine RDP Win Serv 2012 (Microsoft Basic Render Driver) -> (0x02) DXGI_ADAPTER_FLAG_SOFTWARE
Physical Win 10 (Intel Video) -> (0x00) DXGI_ADAPTER_FLAG_NONE
Physical Win 7 (ATI Radeon) - > (0x00) DXGI_ADAPTER_FLAG_NONE
RDP Win 10 (Intel Video) -> (0x00) DXGI_ADAPTER_FLAG_NONE
RDP Win 7 (ATI Radeon) -> (0x00) DXGI_ADAPTER_FLAG_NONE
In case of RDP session on a real machine with a hardware adapter, Flags == 0 but as I can see via Process Hacker 2 the GPU is not used. At least on Windows 7 with ATI Radeon I can see bigger CPU usage in case of an RDP session. So it looks like DXGI_ADAPTER_FLAG_SOFTWARE is only for Microsoft Basic Render Driver. So the issue is not solved.
The question
Is there a correct way to check if a real hardware video card (GPU) is used for the current Windows session? Or maybe it is possible to check if a specific interpolation mode of ID2D1DeviceContext::DrawBitmap has hardware implementation and uses GPU for the current session?
UPD
The topic is not about detecting RDP or Citrix sessions. It is not about detecting if the application is inside a virtual machine or not. I already have the all verifications and use the linear interpolation for those cases. The topic is about detecting if a real GPU is used for the current Windows session to display the desktop. I am looking for a more sophisticated solution to make decision using features of DirectX and DXGI.
If you want to detect the Microsoft Basic Renderer, the best option is to use it's VID/PID combo:
ComPtr<IDXGIDevice> dxgiDevice;
if (SUCCEEDED(device.As(&dxgiDevice)))
{
ComPtr<IDXGIAdapter> adapter;
if (SUCCEEDED(dxgiDevice->GetAdapter(&adapter)))
{
DXGI_ADAPTER_DESC desc;
if (SUCCEEDED(adapter->GetDesc(&desc)))
{
if ( (desc.VendorId == 0x1414) && (desc.DeviceId == 0x8c) )
{
// WARNING: Microsoft Basic Render Driver is active.
// Performance of this application may be unsatisfactory.
// Please ensure that your video card is Direct3D10/11 capable
// and has the appropriate driver installed.
}
}
}
}
See Microsoft Docs and Anatomy of Direct3D 11 Create Device
You will probably find for testing/debugging that you don't want to explicitly block these scenarios, but you do want to provide some kind of warning or notice feedback to the user that they are using software rather than hardware rendering.
Remote Desktop detection from Win32 classic desktop applications is better done directly via GetSystemMetrics( SM_REMOTESESSION ).
See Microsoft Docs
Answering a 3 years old question as I struggled myself to do so.
I had to go through the registry. First thing is to find the adapter LUID in the registry, to get the adapter GUID
private string GetAdapterGuid(long luid)
{
var directXRegistryKey = Registry.LocalMachine.OpenSubKey(#"SOFTWARE\Microsoft\DirectX");
if (directXRegistryKey == null)
return "";
var subKeyNames = directXRegistryKey.GetSubKeyNames();
foreach (var subKeyName in subKeyNames)
{
var subKey = directXRegistryKey.OpenSubKey(subKeyName);
if (subKey.GetValueKind("AdapterLuid") != RegistryValueKind.QWord)
continue;
var luidValue = (long)subKey.GetValue("AdapterLuid");
if (luidValue == luid)
return subKeyName;
}
return "";
}
Once you have that Guid, you can search for the details of the graphic card in HKLM like this. If it is virtual, the service name will be "INDIRECTKMD" :
private bool IsVirtualAdapter(string adapterGuid)
{
var videoRegistryKey = Registry.LocalMachine.OpenSubKey($#"SYSTEM\CurrentControlSet\Control\Video\{adapterGuid}\Video");
if (videoRegistryKey == null)
return false;
if (videoRegistryKey.GetValueKind("Service") != RegistryValueKind.String)
return false;
var serviceName = (string)videoRegistryKey.GetValue("Service");
return serviceName.ToUpper() == "INDIRECTKMD";
}
Checking the service name felt easier than parsing the DeviceDesc value.
My use case involved having the Guid ready so I split up the function, you could merge it into one.
It also only detect RDP/MSTSC through this, additional service names might be needed for other virtual adapters. Or you could try to detect only Nvidia/AMD/Intel driver names... up to you.

OpenCL finds platform, but cannot open them

I am currently using a Lenovo Yoga 510 which makes use of an AMD Radon R5 Graphics card. OpenCL works on it, but however, when I run my code to query and get platform details, the total number of available platforms is returned, but if gives an error that this platforms cannot be opened. Please see error message below.
Error: Failed to open platforms key SOFTWARE\Intel\OpenCL\Boards to load board library at runtime.
Either link to the board library while compiling your host code or refer to your board vendor's documentation on how to install the board library so that it can be loaded at runtime.
Failed to close platforms key (null), ignoring
Warning: Cannot find any Intel(R) FPGA Board libraries.
No Intel(R) FPGA devices will be loaded.
Please contact your board vender or see section "Linking Your Host Application to the Khronos ICD Loader Library" of the Programming Guide to set up FCD manually.
2 PLATFORM(s) FOUND
SEE MY CODE BELOW
[INCLUDE STATEMENTS]
int main() {
cl_int returned;
cl_int zero = (cl_int)0;
//SET-UP DEVICE EXECUTION ENVIRONMENT
cl_uint no_of_platforms;
//cl_uint no_of_entries;
cl_platform_id* platforms;
size_t device_info_val_size;
char* detail;
//1. Query and select the vendor specific platform
returned = clGetPlatformIDs(zero, NULL, &no_of_platforms);
if (returned == CL_SUCCESS) {
printf("%d PLATFORM(s) FOUND \n", no_of_platforms);
}
else {
printf("No Platform Found\n");
return EXIT_FAILURE; //Terminante programme
}
platforms = (cl_platform_id*)malloc(sizeof(cl_platform_id) * no_of_platforms); //create enough space to put platofrm IDs into
clGetPlatformIDs(no_of_platforms, platforms, NULL); //Fill in platform with their ID
free(platforms);
return 0;
}
Any Idea What I may be doing Wrong or have set-up wrong? I am wondering why it is looking for Intel FPGAs on my Radon graphics card
Based on what you've provided, it sounds like the OpenCL ICD (Installable Client Driver) is configured incorrectly. This can be caused by a number of factors (independently):
Old/outdated graphics drivers
Corruption caused by a system update/Registry edit
The most reliable advice is to update (or, as a last resort, reinstall) your graphics drivers. Unless your GPU/iGPU are too old to have working OpenCL drivers, this should set everything up correctly.
Since you're using MSVC, I'll also recommend you to download the OpenCL SDK provided by Intel (or AMD if this were an AMD CPU), as not only does this ensure that you have the most up-to-date headers and utilities associated with OpenCL, it also installs a CPU ICD for OpenCL, giving you an extra platform to test with.

Directx 12 - Adapter not supported

I am currently using a nvidia 675M and in directx 11 I was fine running with feature level 11_0
I am following guides for directx 12 and they say I can still create a device with the feature level 11_0 but when I run it says it is not supported
I know 100% I'm using the correct adapter as it says 675m
So just wondered if there is any way around this or another method or if simply I need a new graphics card :(
The NVidia 675M is a "Fermi" GPU which should be supported for DirectX 12 by NVIDIA per this post. The initial focus for NVidia's DX12 driver support is their Maxwell and Kepler parts, so check with NVidia for a driver that supports Fermi.
Another issue to keep in mind is that in systems with more than one graphics card, you need to be sure you are in fact picking the right adapter. The DirectX 12 VS templates use the following code to achieve this:
void DX::DeviceResources::GetAdapter(IDXGIAdapter1** ppAdapter)
{
*ppAdapter = nullptr;
ComPtr<IDXGIAdapter1> adapter;
for (UINT adapterIndex = 0; DXGI_ERROR_NOT_FOUND != m_dxgiFactory->EnumAdapters1(adapterIndex, adapter.ReleaseAndGetAddressOf()); ++adapterIndex)
{
DXGI_ADAPTER_DESC1 desc;
DX::ThrowIfFailed(adapter->GetDesc1(&desc));
if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE)
{
// Don't select the Basic Render Driver adapter.
continue;
}
// Check to see if the adapter supports Direct3D 12, but don't create the actual device yet.
if (SUCCEEDED(D3D12CreateDevice(adapter.Get(), m_d3dMinFeatureLevel, _uuidof(ID3D12Device), nullptr)))
{
#ifdef _DEBUG
WCHAR buff[256] = {};
swprintf_s(buff, L"Direct3D Adapter (%u): VID:%04X, PID:%04X - %ls\n", adapterIndex, desc.VendorId, desc.DeviceId, desc.Description);
OutputDebugStringW(buff);
#endif
break;
}
}
#if !defined(NDEBUG)
if (!adapter)
{
// Try WARP12 instead
if (FAILED(m_dxgiFactory->EnumWarpAdapter(IID_PPV_ARGS(adapter.ReleaseAndGetAddressOf()))))
{
throw std::exception("WARP12 not available. Enable the 'Graphics Tools' feature-on-demand");
}
OutputDebugStringA("Direct3D Adapter - WARP12\n");
}
#endif
if (!adapter)
{
throw std::exception("No Direct3D 12 device found");
}
*ppAdapter = adapter.Detach();
}
NVIDIA have not yet released a driver that supports DX12 on Fermi, so this won't work.
Initial support for DirectX 12 in Fermi was introduced in the current R384.76, as observed by users on Guru3D here and here, although the driver release notes do not state this.
You may want to run 3DMark Time Spy or a similar DirectX 12 workload to confirm this.