After upgrading to 1.2.162.1: vkQueueWaitIdle == VK_ERROR_DEVICE_LOST - c++

I recently upgraded my ray tracing renderer from Vulkan SDK version 1.2.148.0 to 1.2.162.1.
This was necessary because the ray tracing extension went out of beta and thus now works with non-beta
graphics drivers (am on version 461.40 for my RTX 2070 SUPER). It required me to make quite a few changes to the ray tracing side of my renderer which
I managed thanks to the nvidia tutorial.
Unfortunately, code that used to work started to cause errors now.
In many situations, submitting a single time command causes vkQueueWaitIdle to fail with VK_ERROR_DEVICE_LOST which results in a validation error, saying I'm trying to free the command buffer while it's still in use. This happens for a variety of uses: transitioning an image layout(undef to general it seems), building acceleration structures, copying buffers but not every time (e.g. from a staging to a device buffer, after which freeing the staging buffer also throws an error, since it's still in use, the copy not having finished)... But for other uses, it works fine. I can't currently identify a common denominator...
Finally, the program crashes because presenting the first frame fails, because its layout is undefined - I assume this is caused by one or more of the previously mentioned errors.
Did something change about this since last I used it? This is the offending code (endSingleTimeCommands):
vkEndCommandBuffer(commandBuffer);
VkSubmitInfo submitInfo{};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &commandBuffer;
vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);
switch (vkQueueWaitIdle(graphicsQueue)) {
//debug output removed for brevity
};
vkFreeCommandBuffers(device, commandPool, 1, &commandBuffer);
One of the places where it fails is this:
//[fill the structs with info...]
//function pointer grabbed via vkGetDeviceProcAddr
vk::vkCmdBuildAccelerationStructuresKHR(cmd, 1, &buildInfo, &buildOffset);
//[call to the above code here]
But also code unrelated to extensions fails (sometimes!) such as this one:
VkCommandBuffer commandBuffer = beginSingleTimeCommands();
VkBufferCopy copyRegion{};
copyRegion.srcOffset = 0; // Optional
copyRegion.dstOffset = 0; // Optional
copyRegion.size = size;
vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, &copyRegion);
endSingleTimeCommands(commandBuffer);
Perhaps beginSingleTimeCommands is also relevant:
VkCommandBufferAllocateInfo allocInfo{};
allocInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
allocInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
allocInfo.commandPool = commandPool;
allocInfo.commandBufferCount = 1;
VkCommandBuffer commandBuffer;
if (vkAllocateCommandBuffers(device, &allocInfo, &commandBuffer) != VK_SUCCESS) {
std::cout << "beginSingleTimeCommands: could not allocate command buffer!\n";
}
VkCommandBufferBeginInfo beginInfo{};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;
if (vkBeginCommandBuffer(commandBuffer, &beginInfo) != VK_SUCCESS) {
std::cout << "beginSingleTimeCommands: could not begin command buffer!\n";
}
return commandBuffer;
Some additional info I think I gathered:
I used the nvidia pipeline checkpoint system to add a checkpoint before and after the call to vkCmdBuildAccelerationStructuresKHR and both checkpoints are at TOP_OF_PIPE. After the first call to this function, no more checkpoint output is generated, leading me to believe that the first call to the build somehow ruins everything. I will triplecheck my AS building I guess, I'll get back to you if I find anything.

Turns out, the actual error can occur before the command buffer whose vkQueueWaitIdle returns the DEVICE_LOST error. I've had and continue to have a variety of errors in my acceleration structure building code. I can't easily debug it, because apparently the validation layers don't show if there's subtle mistakes in the structs fed to vkCmdBuildAccelerationStructures, instead it's a lot of trial and error.
One notable example which I'm certain would've been caught by the validation layers pre-upgrade is forgetting to set the VkAccelerationStructureBuildGeometryInfoKHR::scratchData field, the last mistake I had to fix to finally get everything to run.
The answer to my question is thus: Don't look at the commands that trigger the DEVICE_LOST, look at what you do with the queue before that command, there's a chance the error is there, instead. In fact, once the first DEVICE_LOST error occurred, (almost?) all further vkQueueWaitIdle failed with the same error (same with the vkQueueSubmit). In cases such as my copy buffer code being the first to fail, the error was always found in the queue usage before that one.
I can't post the exact solution to my problem as - like I've said - there's more than one cause and I've only fixed some of them so far, there's still some left. I think the details are not relevant to future people who come across my question but if there's anything I can add to help other people, please let me know.

This is so true! I was stuck with this issue for couple of days only to figure out that VkAccelerationStructureBuildGeometryInfoKHR flags was mismatching when I query the size using vkGetAccelerationStructureBuildSizesKHR() vs when I use it to actually build the BLAS! In my case, I was using VK_BUILD_ACCELERATION_STRUCTURE_PREFER_FAST_TRACE_BIT_KHR | VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_KHR while querying the size and only FAST_TRACE while actually creating the AS, this was causing the same issue!

Related

DX12 Initialization Fail VS2019

I have done the init steps in DX10/11/12 many times before, all of a sudden in VS2019 DX12 will not create anything besides the following objects: ID3D12Debug, ID3D12InfoQueue, ID3D12Device2.
Even a straight creation of command queue fails:
bool DX12ObjectFactory::CreateCommandQueue(ID3D12Device* pDevice, __out
ID3D12CommandQueue** ppCmdQueue, const D3D12_COMMAND_QUEUE_DESC& queueCreateDesc)
{
OnFailedThrow(pDevice->CreateCommandQueue(&queueCreateDesc,
IID_PPV_ARGS(&*ppCmdQueue)));
return true;
}
HRESULT message is:
hr = 0x00000108 : An open/create operation completed while an oplock break is underway.
Error code lookup points to: ERROR_TOO_MANY_POSTS 298 (0x12A)
Weird thing is that things were working a few days ago, maybe a Windows update broke it...
Thanks
D3D12_COMMAND_QUEUE_DESC was initialized properly, issues seemed to be use of IID_PPV_ARGS, as it was fine with the old way of using IID_ID3D12CommandQueue, (void**)&(*ppCmdQueue).
Also my swapchain issue I forgot to initialize buffer count with a value >= 2.

Linking Cuda (cudart.lib) makes DXGI DuplicateOutput1() fail

For an obscure reason my call to IDXGIOutput5::DuplicateOutput1() fail with error 0x887a0004 (DXGI_ERROR_UNSUPPORTED) after I added cudart.lib in my project.
I work on Visual Studio 2019, my code for monitor duplication is the classic :
hr = output5->DuplicateOutput1(this->dxgiDevice, 0, sizeof(supportedFormats) / sizeof(DXGI_FORMAT), supportedFormats, &this->dxgiOutputDuplication);
And the only thing I tried to do with cuda at the moment is simply to list the Cuda devices :
int nDevices = 0;
cudaError_t error = cudaGetDeviceCount(&nDevices);
for (int i = 0; i < nDevices; i++) {
cudaDeviceProp prop;
cudaGetDeviceProperties(&prop, i);
LOG_FUNC_DEBUG("Graphic adapter : Descripion: %s, Memory Clock Rate : %d kHz, Memory Bus Width : %u bits",
prop.name,
prop.memoryClockRate,
prop.memoryBusWidth
);
}
Moreover this piece of code is called far later after I try to start monitor duplication with DXGI.
Every thing seems correct in my application : I do a call to SetProcessDpiAwarenessContext(DPI_AWARENESS_CONTEXT_PER_MONITOR_AWARE_V2), and I'm not running on e discrete GPU (see [https://support.microsoft.com/en-us/help/3019314/error-generated-when-desktop-duplication-api-capable-application-is-ru][1])
And by the way it used to work, and it works again if I just remove the "so simple" Cuda call and the cudart.lib from the linker input !
I really don't understand what can cause this strange behavior, any idea ?
...after I added cudart.lib in my project
When you link CUDA library you force your application to run on discrete GPU. You already know this should be avoided, but you still force it through this link.
...and I'm not running on e discrete GPU...
You are, static link to CUDA is a specific case which hints to use dGPU.
There are systems where Desktop Duplication is not working against dGPU and yours seems to be one of those. Even though unobvious, you are seeing behavior by [NVIDIA] design.
(There are however also other systems where Desktop Duplication is working against dGPU and is not working against iGPU).
Your potential solution is along this line:
Application is not directly linked against cuda.lib or cudart.lib or LoadLibrary to dynamically load the nvcuda.dll or cudart*.dll and uses GetProcAddress to retrieve function addresses from nvcuda.dll or cudart*.dll.

Trouble Reliably Enabling/Disabling MMCSS

I'm trying to implement MMCSS in my project to improve timing performance, and I am seeing odd behavior. I'm a low-level embedded software developer struggling to do software, and this is my first post here, so your patience and understanding is appreciated if my description needs work.
I can't share all my code, but here's some pseudo-code showing how I'm applying MMCSS (taken mostly from the exclusive-mode streams example from MSDN here: https://msdn.microsoft.com/en-us/library/bb614507.aspx):
main()
{
\\ setup/initialize some things
DWORD taskIndex = 0;
HANDLE hTask = NULL;
hTask = AvSetMmThreadCharacteristics(TEXT("Pro Audio"), &taskIndex);
if (hTask == NULL)
{
return 1;
}
while(1)
{
\\ print time since last loop
\\ do things - some conditions will break out of loop
\\ sleep
}
if (hTask != NULL)
{
AvRevertMmThreadCharacteristics(hTask);
}
}
The behavior that I'm seeing is hard to fully characterize. When I run my project, if MMCSS is successfully applied (which it may not be the first few times I run my code), it appears to work reliably the whole time I'm running the exe. If it ran right the previous time, then it will run right every time after that. However, if I comment out the MMCSS code and rebuild and run, it will continue to behave as if MMCSS is applied. I have found that if I restart my computer then run again, I will get timing information indicating MMCSS is no longer being applied.
It seems like I am missing something that is causing the transition from MMCSS enabled to disabled (and vice versa) to be messy. Am I calling the SetMmThreadCharacteristics and RevertMmThreadCharacteristics in the wrong place? Is there some other/additional method(s) I should be calling?

QCamera::start gives mysterious "failed to start" log message

It's just plain luck my program is so simple, so I eventually found out what causes the mysterious log message. My program log looks like this:
Debugging starts
failed to start
Debugging has finished
Which happens after:
camera = new QCamera(QCameraInfo::defaultCamera());
// see http://omg-it.works/how-to-grab-video-frames-directly-from-qcamera/
camera->setViewfinder(frameGrabber = new CameraFrameGrabber());
camera->start();
The start() method causes this message in console. Now the meaning of the message is obvious, what it's not very helpful. What steps should I take to troubleshoot it?
Reasons for this might differ, but in my case it was simply because I provided invalid QCameraInfo. The culprit is that QCameraInfo::defaultCamera() might return invalid value if Qt fails to detect any cameras on your system, which unfortunately happens even if cameras are present.

cvHaarDetectObjects(): "Stack aound the variable 'seq_thread' was corrupted."

I have been looking in to writing my own implementation of Haar Cascaded face detection for some time now, and have begun with diving in to the OpenCV 2.0 implementation.
Right out of the box, running in debug mode, Visual Studio breaks on cvhaar.cpp:1518, informing me:
Run-Time Check Failure #2 - Stack aound the variable seq_thread was corrupted.
It seems odd to me that OpenCV ships with a simple array out-of-bounds problem. Running the release works without any problems, but I suspect that it is merely not performing the check and the array is exceeding the bounds.
Why am I receiving this error message? Is it a bug in OpenCV?
A little debugging revealed the culprit, I believe. I "fixed" it, but this all still seems odd to me.
An array of size CV_MAX_THREADS is created on cvhaar.cpp:868:
CvSeq* seq_thread[CV_MAX_THREADS] = {0};
On line 918 it proceeds to specify max_threads:
max_threads = cvGetNumThreads();
In various places, seq_thread is looped using the following for statement:
for( i = 0; i < max_threads; i++ ) {
CvSeq* s = seq_thread[i];
// ...
}
However, cxmisc.h:108 declares CV_MAX_THREADS:
#define CV_MAX_THREADS 1
Hence, the declaration of seq_thread must never be allowed to exceed size 1, yet cvGetNumThreads() returns 2 (I assume this reflects the number of cores in my machine).
I resolved the problem by adding the following simple little statement:
if (max_threads > CV_MAX_THREADS) max_threads = CV_MAX_THREADS;
Does any of this make sense?