Vulkan timeline semaphore extension cannot be enabled - c++

I've been at this for a better part of today and I'm at the end of my wits.
I'm running Vulkan SDK 1.2.131.2
I have a RTX 2080 Ti.
I have Windows 10 Education, version 1909, build 18363.657.
I'm using Vulkan.hpp instead of Vulkan.h directly.
Here is where I specify the API version I use:
appInfo.apiVersion = VK_API_VERSION_1_2;
This is the relevant part of the code that creates the device:
// bla bla
const std::vector<const char*> deviceExtensions = {
VK_KHR_SWAPCHAIN_EXTENSION_NAME,
VK_KHR_TIMELINE_SEMAPHORE_EXTENSION_NAME,
VK_NV_RAY_TRACING_EXTENSION_NAME
};
deviceCreateInfo.enabledExtensionCount = static_cast<uint32_t>(deviceExtensions.size());
deviceCreateInfo.ppEnabledExtensionNames = deviceExtensions.data();
m_logicalDevice = m_physicalDevice.createDeviceUnique(deviceCreateInfo);
I use the following validation layers:
"VK_LAYER_LUNARG_api_dump"
"VK_LAYER_KHRONOS_validation"
This is how I later try to create a timeline semaphore:
vk::UniqueSemaphore VulkanContext::createTimelineSemaphore(const uint32_t initialValue) const {
vk::SemaphoreTypeCreateInfo timelineCreateInfo;
timelineCreateInfo.semaphoreType = vk::SemaphoreType::eTimeline;
timelineCreateInfo.initialValue = initialValue;
vk::SemaphoreCreateInfo createInfo;
createInfo.pNext = &timelineCreateInfo;
return m_logicalDevice->createSemaphoreUnique(createInfo);
}
I get the following error:
vkCreateSemaphore(device, pCreateInfo, pAllocator, pSemaphore) returns VkResultVkCreateSemaphore: timelineSemaphore feature is not enabled, can not create timeline semaphores The Vulkan spec states: If the timelineSemaphore feature is not enabled, semaphoreType must not equal VK_SEMAPHORE_TYPE_TIMELINE (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-VkSemaphoreTypeCreateInfo-timelineSemaphore-03252)
This is even more infuriating because timeline semaphores are supposed to be part of the core Vulkan 1.2, but I get the same error even if I ommit it from the extension list. Swapchain extension does work and I haven't had the time to verify that ray tracing extension is enabled.
It gets even more stupid because the next message tells me this:
VK_SUCCESS (0):
device: VkDevice = 0000023AA29BD8B0
pCreateInfo: const VkSemaphoreCreateInfo* = 0000008D145ED538:
sType: VkStructureType = VK_STRUCTURE_TYPE_SEMAPHORE_CREATE_INFO (9)
pNext: VkSemaphoreTypeCreateInfo = 0000008D145ED4F8:
sType: VkStructureType = VK_STRUCTURE_TYPE_SEMAPHORE_TYPE_CREATE_INFO (1000207002)
pNext: const void* = NULL
semaphoreType: VkSemaphoreType = VK_SEMAPHORE_TYPE_TIMELINE (1)
initialValue: uint64_t = 0
flags: VkSemaphoreCreateFlags = 0
pAllocator: const VkAllocationCallbacks* = NULL
pSemaphore: VkSemaphore* = AA989B000000001E
I have no idea if this creates the timeline semaphore or just creates a normal binary one.
When I later use it to submit to a transfer queue:
vk::CommandBufferBeginInfo beginInfo;
transferCmdBuffer->begin(beginInfo);
object->recordUploadToGPU(*transferCmdBuffer);
transferCmdBuffer->end();
vk::TimelineSemaphoreSubmitInfo timelineSubmitInfo;
timelineSubmitInfo.signalSemaphoreValueCount = 1;
timelineSubmitInfo.pSignalSemaphoreValues = &signalValue;
vk::SubmitInfo submitInfo;
submitInfo.pNext = &timelineSubmitInfo;
submitInfo.signalSemaphoreCount = 1;
submitInfo.pSignalSemaphores = &signalSemaphore;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &*transferCmdBuffer;
vkCtx.m_transferQueue.submit(submitInfo, nullptr);
I get this error here:
vkQueueSubmit(queue, submitCount, pSubmits, fence) returns VkResultVkQueueSubmit: VkQueue 0x23aa2539500[] contains timeline sempahore VkSemaphore 0xaa989b000000001e[] that sets its wait value with a margin greater than maxTimelineSemaphoreValueDifference The Vulkan spec states: For each element of pSignalSemaphores created with a VkSemaphoreType of VK_SEMAPHORE_TYPE_TIMELINE the corresponding element of VkTimelineSemaphoreSubmitInfo::pSignalSemaphoreValues must have a value which does not differ from the current value of the semaphore or the value of any outstanding semaphore wait or signal operation on that semaphore by more than maxTimelineSemaphoreValueDifference. (https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html#VUID-VkSubmitInfo-pSignalSemaphores-03244)
And just to be futher mocked, this is the next line:
VK_SUCCESS (0):
queue: VkQueue = 0000023AA2539500
submitCount: uint32_t = 1
pSubmits: const VkSubmitInfo* = 0000008D145ED370
pSubmits[0]: const VkSubmitInfo = 0000008D145ED370:
sType: VkStructureType = VK_STRUCTURE_TYPE_SUBMIT_INFO (4)
pNext: VkTimelineSemaphoreSubmitInfo = 0000008D145ED318:
sType: VkStructureType = VK_STRUCTURE_TYPE_TIMELINE_SEMAPHORE_SUBMIT_INFO (1000207003)
pNext: const void* = NULL
waitSemaphoreValueCount: uint32_t = 0
pWaitSemaphoreValues: const uint64_t* = NULL
signalSemaphoreValueCount: uint32_t = 1
pSignalSemaphoreValues: const uint64_t* = 0000008D145ED740
pSignalSemaphoreValues[0]: const uint64_t = 1
waitSemaphoreCount: uint32_t = 0
pWaitSemaphores: const VkSemaphore* = NULL
pWaitDstStageMask: const VkPipelineStageFlags* = NULL
commandBufferCount: uint32_t = 1
pCommandBuffers: const VkCommandBuffer* = 0000008D145EF408
pCommandBuffers[0]: const VkCommandBuffer = 0000023AA9CEC8E0
signalSemaphoreCount: uint32_t = 1
pSignalSemaphores: const VkSemaphore* = 0000008D145EF430
pSignalSemaphores[0]: const VkSemaphore = AA989B000000001E
fence: VkFence = 0000000000000000
I've also tried with VK_API_VERSION_1_1 and VK_API_VERSION_1_0, both with and without enabling the extension explicitly, none of them work.
The dumps are from the VK_LAYER_LUNARG_api_dump validation layer, while VK_LAYER_KHRONOS_validation validation layer is the one spewing out the errors. They seem to disagree.
So, what gives?
In what way am I stupid today?
EDIT:
Here is an example that you should be able to run yourself. I think I made it as minimal as I can:
#include <vulkan/vulkan.hpp>
#include <iostream>
VKAPI_ATTR VkBool32 VKAPI_CALL debugCallback(
VkDebugUtilsMessageSeverityFlagBitsEXT messageSeverity,
VkDebugUtilsMessageTypeFlagsEXT messageType,
const VkDebugUtilsMessengerCallbackDataEXT* pCallbackData,
void* pUserData) {
std::cerr << pCallbackData->pMessage << std::endl;
return VK_FALSE;
};
int main() {
vk::ApplicationInfo appInfo;
appInfo.apiVersion = VK_API_VERSION_1_2;
vk::InstanceCreateInfo instanceCreateInfo;
instanceCreateInfo.pApplicationInfo = &appInfo;
std::vector<const char*> extensions;
extensions.push_back(VK_EXT_DEBUG_UTILS_EXTENSION_NAME);
instanceCreateInfo.enabledExtensionCount = static_cast<uint32_t>(extensions.size());
instanceCreateInfo.ppEnabledExtensionNames = extensions.data();
const std::vector<const char*> validationLayers = {
"VK_LAYER_LUNARG_api_dump",
"VK_LAYER_KHRONOS_validation"
};
instanceCreateInfo.enabledLayerCount = static_cast<uint32_t>(validationLayers.size());
instanceCreateInfo.ppEnabledLayerNames = validationLayers.data();
vk::DebugUtilsMessengerCreateInfoEXT debugCreateInfo;
debugCreateInfo.messageSeverity =
vk::DebugUtilsMessageSeverityFlagBitsEXT::eInfo |
vk::DebugUtilsMessageSeverityFlagBitsEXT::eVerbose |
vk::DebugUtilsMessageSeverityFlagBitsEXT::eWarning |
vk::DebugUtilsMessageSeverityFlagBitsEXT::eError;
debugCreateInfo.messageType =
vk::DebugUtilsMessageTypeFlagBitsEXT::eGeneral |
vk::DebugUtilsMessageTypeFlagBitsEXT::eValidation |
vk::DebugUtilsMessageTypeFlagBitsEXT::ePerformance;
debugCreateInfo.pfnUserCallback = debugCallback;
instanceCreateInfo.pNext = &debugCreateInfo;
vk::Instance m_instance = vk::createInstance(instanceCreateInfo);
vk::DispatchLoaderDynamic m_loader = vk::DispatchLoaderDynamic(m_instance, vkGetInstanceProcAddr);
vk::DebugUtilsMessengerEXT m_debugMessenger = m_instance.createDebugUtilsMessengerEXT(debugCreateInfo, nullptr, m_loader);
vk::PhysicalDevice m_physicalDevice = m_instance.enumeratePhysicalDevices()[0];
std::vector<vk::DeviceQueueCreateInfo> queueCreateInfos;
vk::DeviceQueueCreateInfo queueInfo;
queueInfo.queueFamilyIndex = 0;
queueInfo.queueCount = 1;
queueCreateInfos.push_back(queueInfo);
vk::PhysicalDeviceFeatures deviceFeatures;
vk::DeviceCreateInfo deviceCreateInfo;
deviceCreateInfo.pQueueCreateInfos = queueCreateInfos.data();
deviceCreateInfo.queueCreateInfoCount = static_cast<uint32_t>(queueCreateInfos.size());
deviceCreateInfo.pEnabledFeatures = &deviceFeatures;
// This part can be omitted from here...
const std::vector<const char*> deviceExtensions = {
VK_KHR_TIMELINE_SEMAPHORE_EXTENSION_NAME
};
deviceCreateInfo.enabledExtensionCount = static_cast<uint32_t>(deviceExtensions.size());
deviceCreateInfo.ppEnabledExtensionNames = deviceExtensions.data();
// ...to here. It doesn't work either way.
vk::Device m_logicalDevice = m_physicalDevice.createDevice(deviceCreateInfo);
vk::SemaphoreTypeCreateInfo timelineCreateInfo;
timelineCreateInfo.semaphoreType = vk::SemaphoreType::eTimeline;
timelineCreateInfo.initialValue = 0;
vk::SemaphoreCreateInfo semaphoreCreateInfo;
semaphoreCreateInfo.pNext = &timelineCreateInfo;
m_logicalDevice.createSemaphore(semaphoreCreateInfo);
}

The feature needs to be enabled explicitly as well:
vk::PhysicalDeviceVulkan12Features features;
features.timelineSemaphore = true;
vk::DeviceCreateInfo deviceCreateInfo;
deviceCreateInfo.pNext = &features;

Related

open62541 client fails when calling method with custom datatype input argument

I'm using open62541 to connect to an OPC/UA server and I'm trying to call methods that a certain object on that server provides. Those methods have custom types as input arguments; for example, the following method takes a structure of three booleans:
<opc:Method SymbolicName="SetStatusMethodType" ModellingRule="Mandatory">
<opc:InputArguments>
<opc:Argument Name="Status" DataType="VisionStatusDataType" ValueRank="Scalar"/>
</opc:InputArguments>
<opc:OutputArguments />
</opc:Method>
Here, VisionStatusDataType is the following structure:
<opc:DataType SymbolicName="VisionStatusDataType" BaseType="ua:Structure">
<opc:ClassName>VisionStatus</opc:ClassName>
<opc:Fields>
<opc:Field Name="Camera" DataType="ua:Boolean" ValueRank="Scalar"/>
<opc:Field Name="StrobeController" DataType="ua:Boolean" ValueRank="Scalar"/>
<opc:Field Name="Server" DataType="ua:Boolean" ValueRank="Scalar"/>
</opc:Fields>
</opc:DataType>
Now, when calling the method, I'm encoding the data into an UA_ExtensionObject, and wrap that one as an UA_Variant to provide it to UA_Client_call. The encoding looks like this:
void encode(const QVariantList& vecqVar, size_t& nIdx, const DataType& dt, std::back_insert_iterator<std::vector<UAptr<UA_ByteString>>> itOut)
{
if (dt.isSimple())
{
auto&& qVar = vecqVar.at(nIdx++);
auto&& uaVar = convertToUaVar(qVar, dt.uaType());
auto pOutBuf = create<UA_ByteString>();
auto nStatus = UA_encodeBinary(uaVar.data, dt.uaType(), pOutBuf.get());
statusCheck(nStatus);
itOut = std::move(pOutBuf);
}
else
{
for (auto&& dtMember : dt.members())
encode(vecqVar, nIdx, dtMember, itOut);
}
}
UA_Variant ToUAVariant(const QVariant& qVar, const DataType& dt)
{
if (dt.isSimple())
return convertToUaVar(qVar, dt.uaType());
else
{
std::vector<UAptr<UA_ByteString>> vecByteStr;
auto&& qVarList = qVar.toList();
size_t nIdx = 0UL;
encode(qVarList, nIdx, dt, std::back_inserter(vecByteStr));
auto pExtObj = UA_ExtensionObject_new();
pExtObj->encoding = UA_EXTENSIONOBJECT_ENCODED_BYTESTRING;
auto nSizeAll = std::accumulate(vecByteStr.cbegin(), vecByteStr.cend(), 0ULL, [](size_t nSize, const UAptr<UA_ByteString>& pByteStr) {
return nSize + pByteStr->length;
});
auto&& uaEncoded = pExtObj->content.encoded;
uaEncoded.typeId = dt.uaType()->typeId;
uaEncoded.body.length = nSizeAll;
auto pData = uaEncoded.body.data = new UA_Byte[nSizeAll];
nIdx = 0UL;
for (auto&& pByteStr : vecByteStr)
{
memcpy_s(pData + nIdx, nSizeAll - nIdx, pByteStr->data, pByteStr->length);
nIdx += pByteStr->length;
}
UA_Variant uaVar;
UA_Variant_init(&uaVar);
UA_Variant_setScalar(&uaVar, pExtObj, &UA_TYPES[UA_TYPES_EXTENSIONOBJECT]);
return uaVar;
}
}
The DataType class is a wrapper for the UA_DataType structure; the original open62541 type can be accessed via DataType::uaType().
Now, once a have the variant (containing the extension object), the method call looks like this:
auto uavarInput = ToUAVariant(qvarArg, dtInput);
UA_Variant* pvarOut;
size_t nOutSize = 0UL;
auto nStatus = UA_Client_call(m_pClient, objNode.nodeId(), m_uaNodeId, 1UL, &uavarInput, &nOutSize, &pvarOut);
The status is 2158690304, i.e. BadInvalidArgument according to UA_StatusCode_name.
Is there really something wrong with the method argument? Are we supposed to send ExtensionObjects, or what data type should the variant contain?
Is it possible that the server itself (created using the .NET OPC/UA stack) is not configured correctly?
N.B., the types here are custom types; that is, the encoding is done manually (see above) by storing the byte representation of all members next to each other in an UA_ByteString - just the opposite of what I'm doing when reading variables or output arguments, which works just fine.
The problem is the typeId of the encoded object. For the server in order to understand the received data, it needs to know the NodeId of the encoding, not the actual NodeId of the type itself. That encoding can be found by following the HasEncoding reference (named "Default Binary") of the type:
auto pRequest = create<UA_BrowseRequest>();
auto pDescr = pRequest->nodesToBrowse = UA_BrowseDescription_new();
pRequest->nodesToBrowseSize = 1UL;
pDescr->nodeId = m_uaNodeId;
pDescr->resultMask = UA_BROWSERESULTMASK_ALL;
pDescr->browseDirection = UA_BROWSEDIRECTION_BOTH;
pDescr->referenceTypeId = UA_NODEID_NUMERIC(0, UA_NS0ID_HASENCODING);
auto response = UA_Client_Service_browse(m_pClient, *pRequest);
for (auto k = 0UL; k < response.resultsSize; ++k)
{
auto browseRes = response.results[k];
for (auto n = 0UL; n < browseRes.referencesSize; ++n)
{
auto browseRef = browseRes.references[n];
if (ToQString(browseRef.browseName.name).contains("Binary"))
{
m_nodeBinaryEnc = browseRef.nodeId.nodeId;
break;
}
}
}
Once you have that NodeId, you pass it to UA_ExtensionObject::content::encoded::typeId:
auto pExtObj = UA_ExtensionObject_new();
pExtObj->encoding = UA_EXTENSIONOBJECT_ENCODED_BYTESTRING;
auto nSizeAll = std::accumulate(vecByteStr.cbegin(), vecByteStr.cend(), 0ULL, [](size_t nSize, const UAptr<UA_ByteString>& pByteStr) {
return nSize + pByteStr->length;
});
auto&& uaEncoded = pExtObj->content.encoded;
uaEncoded.typeId = dt.encoding();
uaEncoded.body.length = nSizeAll;
auto pData = uaEncoded.body.data = new UA_Byte[nSizeAll];
nIdx = 0UL;
for (auto&& pByteStr : vecByteStr)
{
memcpy_s(pData + nIdx, nSizeAll - nIdx, pByteStr->data, pByteStr->length);
nIdx += pByteStr->length;
}

Compiler "Optimizes" Out Object Initialization Function

I have an object that I need to populate before using called pipelineInfo. To populate the object I use a function called createPipelineInfo. This works perfectly well when I use visual studios to compile a debug build but when I try to compile a release build the compiler "optimizes" out the entire createPipelineInfo function.
Here is the call to initialize the object and its use:
VkGraphicsPipelineCreateInfo pipelineInfo = createPipelineInfo(shaderStages, vertexInputInfo, inputAssembly, viewportState, rasterizer,
multisampling, colorBlending, pipelineLayout, renderPass);
if (vkCreateGraphicsPipelines(logicalDevice, VK_NULL_HANDLE, 1, &pipelineInfo, nullptr, &graphicsPipeline) != VK_SUCCESS) {
throw std::runtime_error("failed to create graphics pipeline!");
}
The following is the createPipelineInfo function:
inline static VkGraphicsPipelineCreateInfo createPipelineInfo(
const std::array<VkPipelineShaderStageCreateInfo, 2> shaderStages,
const VkPipelineVertexInputStateCreateInfo& vertexInputInfo,
const VkPipelineInputAssemblyStateCreateInfo& inputAssembly,
const VkPipelineViewportStateCreateInfo& viewportState,
const VkPipelineRasterizationStateCreateInfo& rasterizer,
const VkPipelineMultisampleStateCreateInfo& multisampling,
const VkPipelineColorBlendStateCreateInfo& colorBlending,
const VkPipelineLayout& pipelineLayout,
const VkRenderPass& renderPass) {
VkGraphicsPipelineCreateInfo pipelineInfo{};
//Shader Stage
pipelineInfo.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO;
pipelineInfo.stageCount = 2;
pipelineInfo.pStages = shaderStages.data();
//Fixed Pipeline Stage
pipelineInfo.pVertexInputState = &vertexInputInfo;
pipelineInfo.pInputAssemblyState = &inputAssembly;
pipelineInfo.pViewportState = &viewportState;
pipelineInfo.pRasterizationState = &rasterizer;
pipelineInfo.pMultisampleState = &multisampling;
//pipelineInfo.pDepthStencilState = &depthStencil;
pipelineInfo.pColorBlendState = &colorBlending;
pipelineInfo.pDynamicState = nullptr; // Optional
//Pipeline Layout
pipelineInfo.layout = pipelineLayout;
pipelineInfo.renderPass = renderPass;
pipelineInfo.subpass = 0;
pipelineInfo.basePipelineHandle = VK_NULL_HANDLE;
return pipelineInfo;
}
On the other hand if I copy the body of the function and dump it in place of the function call everything works perfectly fine.
VkGraphicsPipelineCreateInfo pipelineInfo{};
//Shader Stage
pipelineInfo.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO;
pipelineInfo.stageCount = 2;
pipelineInfo.pStages = shaderStages.data();
//Fixed Pipeline Stage
pipelineInfo.pVertexInputState = &vertexInputInfo;
pipelineInfo.pInputAssemblyState = &inputAssembly;
pipelineInfo.pViewportState = &viewportState;
pipelineInfo.pRasterizationState = &rasterizer;
pipelineInfo.pMultisampleState = &multisampling;
//pipelineInfo.pDepthStencilState = &depthStencil;
pipelineInfo.pColorBlendState = &colorBlending;
pipelineInfo.pDynamicState = nullptr; // Optional
//Pipeline Layout
pipelineInfo.layout = pipelineLayout;
pipelineInfo.renderPass = renderPass;
pipelineInfo.subpass = 0;
pipelineInfo.basePipelineHandle = VK_NULL_HANDLE;
if (vkCreateGraphicsPipelines(logicalDevice, VK_NULL_HANDLE, 1, &pipelineInfo, nullptr, &graphicsPipeline) != VK_SUCCESS) {
throw std::runtime_error("failed to create graphics pipeline!");
}
I'm trying to figure out why the compiler is optimizing out the function call and failing that trying to develop a work around that doesn't involve dumping the body of the function in place of every call to the function.
wild guess: this parameter is passed by copy
const std::array<VkPipelineShaderStageCreateInfo, 2> shaderStages
so when taking the address of its contents here with data method call:
pipelineInfo.pStages = shaderStages.data();
you invoke undefined behaviour. The compiler isn't smart enough to 1) warn you about taking a reference to temporary because of the complexity of the calls, and 2) it doesn't automatically perform copy elision on parameter passing.
Fix: pass it by reference (note that all other parameters use a by reference mode for a reason)
const std::array<VkPipelineShaderStageCreateInfo, 2> &shaderStages

vkQueueSubmit blocks when using timeline semaphores

I need to run a function on CPU between two GPU batches. For this I use timeline semaphores. As far as I know, vkQueueSubmit does not block. However, it blocks when I submit these GPU batches:
uint64_t host_wait = timeline;
uint64_t host_signal = ++timeline;
uint64_t wait0 = timeline;
uint64_t signal0 = ++timeline;
uint64_t wait1 = timeline;
uint64_t signal1 = ++timeline;
VkPipelineStageFlags wait_mask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
VkTimelineSemaphoreSubmitInfo sp_submit0 = {
.sType = VK_STRUCTURE_TYPE_TIMELINE_SEMAPHORE_SUBMIT_INFO,
.waitSemaphoreValueCount = 1,
.pWaitSemaphoreValues = &wait0,
.signalSemaphoreValueCount = 1,
.pSignalSemaphoreValues = &signal0,
};
VkSubmitInfo submit0 = {
.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO,
.pNext = &sp_submit0,
.waitSemaphoreCount = 1,
.pWaitSemaphores = &timeline_semaphore,
.pWaitDstStageMask = &wait_mask,
.signalSemaphoreCount = 1,
.pSignalSemaphores = &timeline_semaphore,
};
VkTimelineSemaphoreSubmitInfo sp_submit1 = {
.sType = VK_STRUCTURE_TYPE_TIMELINE_SEMAPHORE_SUBMIT_INFO,
.waitSemaphoreValueCount = 1,
.pWaitSemaphoreValues = &wait1,
.signalSemaphoreValueCount = 1,
.pSignalSemaphoreValues = &signal1,
};
VkSubmitInfo submit1 = {
.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO,
.pNext = &sp_submit1,
.waitSemaphoreCount = 1,
.pWaitSemaphores = &timeline_semaphore,
.pWaitDstStageMask = &wait_mask,
.commandBufferCount = 1,
.pCommandBuffers = &command_buffer,
.signalSemaphoreCount = 1,
.pSignalSemaphores = &timeline_semaphore,
};
VkSubmitInfo infos[2] = { submit0, submit1 };
vkQueueSubmit(queue, 2, infos, fence);
// here vkQueueSubmit blocks the thread
WaitSemaphore(timeline_semaphore, host_wait);
some_function();
SignalSemaphore(timeline_semaphore, host_signal);
It is blocking for seconds without return, I think this is something like a deadlock. In the debugger, I saw SleepEx function call from vkQueueSubmit: vk_icdGetInstanceProcAddrSG -> ... -> SleepEx.
But vkQueueSubmit does not block in this sample (combined batch):
uint64_t host_wait = timeline;
uint64_t host_signal = ++timeline;
uint64_t wait1 = timeline;
uint64_t signal1 = ++timeline;
VkPipelineStageFlags wait_mask = VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT;
VkTimelineSemaphoreSubmitInfo sp_submit1 = {
.sType = VK_STRUCTURE_TYPE_TIMELINE_SEMAPHORE_SUBMIT_INFO,
.waitSemaphoreValueCount = 1,
.pWaitSemaphoreValues = &wait1,
.signalSemaphoreValueCount = 1,
.pSignalSemaphoreValues = &signal1,
};
VkSubmitInfo submit1 = {
.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO,
.pNext = &sp_submit1,
.waitSemaphoreCount = 1,
.pWaitSemaphores = &timeline_semaphore,
.pWaitDstStageMask = &wait_mask,
.commandBufferCount = 1,
.pCommandBuffers = &command_buffer,
.signalSemaphoreCount = 1,
.pSignalSemaphores = &timeline_semaphore,
};
VkSubmitInfo infos[1] = { submit1 };
vkQueueSubmit(queue, 1, infos, fence);
WaitSemaphore(timeline_semaphore, host_wait);
some_function();
SignalSemaphore(timeline_semaphore, host_signal);
Why vkQueueSubmit blocks in the first code sample? What are the possible causes of this problem?
I use Vulkan 1.2 (SDK 1.2.135) on Windows 10 and Radeon RX 570 (driver 20.4.2).
EDIT: When I add a command buffer to submit0, vkQueueSubmit will not block. Is it a bug in the driver?
Doing something odd like submitting a batch with no work is far more likely to be the cause.
The spec doesn’t have performance requirements. The fact that something is legal does not make it a good idea. Broadly speaking, if there are two ways to do a thing, do it the obvious way. And sending an empty batch isn’t exactly obvious
https://community.khronos.org/t/vkqueuesubmit-blocks-when-using-timeline-semaphores/105704/2

Why would vkCreateSwapchainKHR result in an access violation at 0?

I'm trying to learn Vulkan by following the great tutorials from vulkan-tutorial.com but I'm having some trouble at the point where I must create the swap chain. As stated in the title, the vkCreateSwapchainKHR creates the following error: Access violation executing location 0x0000000000000000.
The tutorial suggest this might be a conflict with the steam overlay. This is not the case for me as copying the whole code from the tutorial works.
I'm trying to figure out what went wrong with my code and to learn how to debug such issues as I will not have a reference code in the future. The incriminated line looks this:
if (vkCreateSwapchainKHR(device, &swapChainCreateInfo, nullptr, &swapChain) != VK_SUCCESS) {
throw std::runtime_error("Could not create swap chain");
}
I setup a breakpoint at this line to compare the values of the arguments in my code with the values from the reference code. As far as I can tell, there is no difference. (The adresses of course are different)
Where should I look for a problem in my code? The variable swapChain is a NULL as expected. A wrongly formed swapChainCreateInfo should not make vkCreateSwapchainKHR crash. It would merely make it return something that is not VK_SUCCESS. And device was created without problem:
if (vkCreateDevice(physicalDevice, &createInfo, nullptr, &device) != VK_SUCCESS) {
throw std::runtime_error("Failed to create logical device");
}
EDIT - I am using the validation layer VK_LAYER_LUNARG_standard_validation and my createInfo setup is the following.
// Useful functions and structures
VkPhysicalDevice physicalDevice;
VkSurfaceKHR surface;
VkSwapchainKHR swapChain;
struct QueueFamilyIndices {
std::optional<uint32_t> graphicsFamily;
std::optional<uint32_t> presentationFamily;
bool isComplete() {
return graphicsFamily.has_value() && presentationFamily.has_value();
}
};
struct SwapChainSupportDetails {
VkSurfaceCapabilitiesKHR surfaceCapabilities;
std::vector<VkSurfaceFormatKHR> formats;
std::vector<VkPresentModeKHR> presentModes;
};
SwapChainSupportDetails querySwapChainSupport(VkPhysicalDevice physicalDevice) {
SwapChainSupportDetails swapChainSupportDetails;
vkGetPhysicalDeviceSurfaceCapabilitiesKHR(physicalDevice, surface, &swapChainSupportDetails.surfaceCapabilities);
uint32_t formatCount = 0;
vkGetPhysicalDeviceSurfaceFormatsKHR(physicalDevice, surface, &formatCount, nullptr);
if (formatCount != 0) {
swapChainSupportDetails.formats.resize(formatCount);
vkGetPhysicalDeviceSurfaceFormatsKHR(physicalDevice, surface, &formatCount, swapChainSupportDetails.formats.data());
}
uint32_t presentModeCount = 0;
vkGetPhysicalDeviceSurfacePresentModesKHR(physicalDevice, surface, &presentModeCount, nullptr);
if (presentModeCount != 0) {
swapChainSupportDetails.presentModes.resize(presentModeCount);
vkGetPhysicalDeviceSurfacePresentModesKHR(physicalDevice, surface, &presentModeCount, swapChainSupportDetails.presentModes.data());
}
return swapChainSupportDetails;
}
VkSurfaceFormatKHR chooseSwapChainSurfaceFormat(const std::vector<VkSurfaceFormatKHR> & availableFormats) {
if (availableFormats.size() == 1 && availableFormats[0].format == VK_FORMAT_UNDEFINED) {
return { VK_FORMAT_B8G8R8A8_UNORM, VK_COLOR_SPACE_SRGB_NONLINEAR_KHR };
}
for (const auto & availableFormat : availableFormats) {
if (availableFormat.format == VK_FORMAT_B8G8R8A8_UNORM && availableFormat.colorSpace == VK_COLOR_SPACE_SRGB_NONLINEAR_KHR) {
return availableFormat;
}
}
return availableFormats[0];
}
VkPresentModeKHR chooseSwapChainPresentMode(const std::vector<VkPresentModeKHR> & availablePresentModes) {
VkPresentModeKHR bestMode = VK_PRESENT_MODE_FIFO_KHR;
for (const auto & availablePresentMode : availablePresentModes) {
if (availablePresentMode == VK_PRESENT_MODE_MAILBOX_KHR) {
return availablePresentMode;
}
else if (availablePresentMode == VK_PRESENT_MODE_IMMEDIATE_KHR) {
bestMode = availablePresentMode;
}
}
return bestMode;
}
VkExtent2D chooseSwapChainExtent2D(const VkSurfaceCapabilitiesKHR & surfaceCapabilities) {
if (surfaceCapabilities.currentExtent.width != std::numeric_limits<uint32_t>::max()) {
return surfaceCapabilities.currentExtent;
}
else {
VkExtent2D actualExtent = { WIDTH, HEIGHT };
actualExtent.width = std::max(std::min(surfaceCapabilities.maxImageExtent.width, actualExtent.width), surfaceCapabilities.minImageExtent.width);
actualExtent.height = std::max(std::min(surfaceCapabilities.maxImageExtent.height, actualExtent.height), surfaceCapabilities.minImageExtent.height);
return actualExtent;
}
}
// Swap Chain creation code
SwapChainSupportDetails swapChainSupportDetails = querySwapChainSupport(physicalDevice);
VkSurfaceFormatKHR surfaceFormat = chooseSwapChainSurfaceFormat(swapChainSupportDetails.formats);
VkPresentModeKHR presentMode = chooseSwapChainPresentMode(swapChainSupportDetails.presentModes);
VkExtent2D extent = chooseSwapChainExtent2D(swapChainSupportDetails.surfaceCapabilities);
uint32_t imageCount = swapChainSupportDetails.surfaceCapabilities.minImageCount + 1;
if (swapChainSupportDetails.surfaceCapabilities.maxImageCount > 0 && imageCount > swapChainSupportDetails.surfaceCapabilities.maxImageCount) {
imageCount = swapChainSupportDetails.surfaceCapabilities.minImageCount;
}
VkSwapchainCreateInfoKHR swapChainCreateInfo = {};
swapChainCreateInfo.sType = VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR;
swapChainCreateInfo.surface = surface;
swapChainCreateInfo.minImageCount = imageCount;
swapChainCreateInfo.imageFormat = surfaceFormat.format;
swapChainCreateInfo.imageColorSpace = surfaceFormat.colorSpace;
swapChainCreateInfo.imageExtent = extent;
swapChainCreateInfo.imageArrayLayers = 1;
swapChainCreateInfo.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
QueueFamilyIndices familyIndices = findQueueFamilies(physicalDevice);
uint32_t queueFamilyIndices[] = { familyIndices.graphicsFamily.value(), familyIndices.presentationFamily.value() };
if (familyIndices.graphicsFamily != familyIndices.presentationFamily) {
swapChainCreateInfo.imageSharingMode = VK_SHARING_MODE_CONCURRENT;
swapChainCreateInfo.queueFamilyIndexCount = 2;
swapChainCreateInfo.pQueueFamilyIndices = queueFamilyIndices;
}
else {
swapChainCreateInfo.imageSharingMode = VK_SHARING_MODE_EXCLUSIVE;
swapChainCreateInfo.queueFamilyIndexCount = 0;
swapChainCreateInfo.pQueueFamilyIndices = nullptr;
}
swapChainCreateInfo.preTransform = swapChainSupportDetails.surfaceCapabilities.currentTransform;
swapChainCreateInfo.compositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR;
swapChainCreateInfo.presentMode = presentMode;
swapChainCreateInfo.clipped = VK_TRUE;
swapChainCreateInfo.oldSwapchain = VK_NULL_HANDLE;
if (vkCreateSwapchainKHR(device, &swapChainCreateInfo, nullptr, &swapChain) != VK_SUCCESS) {
throw std::runtime_error("Could not create swap chain");
}
I get the resulting structure:
Well, when creating the logical device one needs to set enabledExtensionCount to the actual number of required extensions and not 0 if one expects extensions to work. In my case, it was a simple edit failure. Here is the gem in my code:
createInfo.enabledExtensionCount = static_cast<uint32_t>(deviceExtensions.size());
createInfo.ppEnabledExtensionNames = deviceExtensions.data();
createInfo.enabledExtensionCount = 0;
I figured it out by replacing every function from my code by the ones from the reference code until it worked. I'm a bit disappointed that the validation layers didn't catch this. Did I set them wrong? Is this something they should be catching?
EDIT: As pointed out by LIANG LIU, here is the initialization for deviceExtensions:
const std::vector<const char*> deviceExtensions = {
VK_KHR_SWAPCHAIN_EXTENSION_NAME
};
Enable VK_KHR_SWAPCHAIN_EXTENSION_NAME when creating VkDevice
void VKRenderer::createVkLogicalDevice()
{
// device extensions
vector<const char*>::type deviceExtensionNames = { VK_KHR_SWAPCHAIN_EXTENSION_NAME };
// priorities
float queuePrioritys[2] = { 1.f, 1.f};
// graphics queue
VkDeviceQueueCreateInfo queueCreateInfos;
queueCreateInfos.sType = VK_STRUCTURE_TYPE_DEVICE_QUEUE_CREATE_INFO;
queueCreateInfos.pNext = nullptr;
queueCreateInfos.queueFamilyIndex = getGraphicsQueueFamilyIndex();
queueCreateInfos.queueCount = 1;
queueCreateInfos.pQueuePriorities = &queuePrioritys[0];
// device features
VkPhysicalDeviceFeatures deviceFeatures = {};
VkDeviceCreateInfo createInfo = {};
createInfo.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
createInfo.pNext = nullptr;
createInfo.pQueueCreateInfos = &queueCreateInfos;
createInfo.queueCreateInfoCount = 1;
createInfo.pEnabledFeatures = &deviceFeatures;
createInfo.enabledExtensionCount = deviceExtensionNames.size();
createInfo.ppEnabledExtensionNames = deviceExtensionNames.data();
// create logical device and retrieve graphics queue
if (VK_SUCCESS == vkCreateDevice(m_vkPhysicalDevice, &createInfo, nullptr, &m_vkDevice))
{
vkGetDeviceQueue(m_vkDevice, getGraphicsQueueFamilyIndex(), 0, &m_vkGraphicsQueue);
vkGetDeviceQueue(m_vkDevice, getPresentQueueFamilyIndex(), 0, &m_vkPresentQueue);
}
else
{
EchoLogError("Failed to create vulkan logical device!");
}
}
It looks like you are calling vkCreateDevice at the end of your code segment for creating the swapchain and passing in the VkSwapchainCreateInfo into it. Perhaps you want to call vkCreateSwapchainKHR instead, like:
if (vkCreateSwapchainKHR(device, &swapChainCreateInfo, nullptr, &swapChain) !=
VK_SUCCESS) {
throw std::runtime_error("failed to create swap chain");
}
If you are actually calling vkCreateSwapchainKHR, could you edit your question to indicate this?

MFXVideoDECODE_Init fail with MFX_ERR_MEMORY_ALLOC

I'm trying to use intel-media-sdk decoder for h.264 videos. Here is my code for initializing decoder :
mfxStatus decoder::initDecoder(HWND window, mfxBitstream *Header) {
mfxStatus sts = MFX_ERR_NONE;
mfxVersion ver = { { 0, 1 } };
mfxVideoParam mfxVideoParams;
mfxFrameAllocator mfxAllocator;
mfxFrameAllocResponse mfxResponse;
sts = m_mfxSession.Init(MFX_IMPL_AUTO_ANY, &ver); //sts = MFX_ERR_NONE
if (sts == MFX_ERR_NONE) {
sts = m_mfxSession.SetHandle(MFX_HANDLE_DIRECT3D_DEVICE_MANAGER9,
m_renderer.initD3d(GetIntelDeviceAdapterNum(), window)); //sts = MFX_ERR_NONE
if (sts == MFX_ERR_NONE) {
mfxAllocator.pthis = m_mfxSession;
sts = m_mfxSession.SetFrameAllocator(&mfxAllocator); //sts = MFX_ERR_NONE
if (sts == MFX_ERR_NONE) {
MFXVideoDECODE mfxDEC(m_mfxSession);
m_mfxVideoDecode = mfxDEC;
memset(&mfxVideoParams, 0, sizeof(mfxVideoParams));
mfxVideoParams.mfx.CodecId = MFX_CODEC_AVC;
mfxVideoParams.IOPattern = MFX_IOPATTERN_OUT_VIDEO_MEMORY;
sts = m_mfxVideoDecode.DecodeHeader(Header, &mfxVideoParams); //sts = MFX_ERR_NONE
if (sts == MFX_ERR_NONE) {
memset(&m_mfxRequest, 0, sizeof(m_mfxRequest));
sts = m_mfxVideoDecode.QueryIOSurf(&mfxVideoParams, &m_mfxRequest); //sts = MFX_ERR_NONE
if (sts == MFX_ERR_NONE) {
sts = m_renderer.allocSurfaces(mfxAllocator.pthis, &m_mfxRequest, &mfxResponse);
if (sts == MFX_ERR_NONE) {
m_pmfxSurfaces = new mfxFrameSurface1 *[m_mfxRequest.NumFrameSuggested];
for (int i = 0; i < m_mfxRequest.NumFrameSuggested; i++) {
m_pmfxSurfaces[i] = new mfxFrameSurface1;
memset(m_pmfxSurfaces[i], 0, sizeof(mfxFrameSurface1));
memcpy(&(m_pmfxSurfaces[i]->Info), &(mfxVideoParams.mfx.FrameInfo), sizeof(mfxFrameInfo));
// MID (memory id) represents one video NV12 surface
m_pmfxSurfaces[i]->Data.MemId = mfxResponse.mids[i];
};
sts = m_mfxVideoDecode.Init(&mfxVideoParams); //sts = MFX_ERR_MEMORY_ALLOC
}
}
}
}
}
}
return sts;
}
So as you can see MFXVideoDECODE::Init(mfxVideoParam*) (which internally calls MFXVideoDECODE_Init) returns MFX_ERR_MEMORY_ALLOC and the strange thing here is in this document it says this function does not have this return value.
Here is some debug information about mfxVideoParams :
AllocId = 0, AsyncDepth = 0, IOPattern = 16, mfx.CodecId = 541283905,
mfx.CodecProfile = 77, mfx.CodecLevel = 30, vpp.In.FourCC = 842094158,
vpp.In.Width = 864, vpp.In.Height = 480, vpp.In.CropW = 854,
vpp.In.CropH = 480, vpp.In.BufferSize = 31458144, vpp.In.AspectRatioW
= 1, vpp.In.AspectRatioH = 1, vpp.In.PicStruct = 1, vpp.In.ChromaFormat = 1
Here is Some member data definition in header which used here :
MFXVideoSession m_mfxSession;
MFXVideoDECODE m_mfxVideoDecode;
mfxFrameAllocRequest m_mfxRequest;
mfxFrameSurface1** m_pmfxSurfaces;
And Here is some information about my current working device that might relate to this problem:
Operating System : Windows 8.1
Processor : Intel(R) Core(TM) i5-3470 CPU # 3.20GHZ
System type : 64-bit operating system, x64-based processor
Installed memory (RAM) : 8.00 GB
And finally to reproduce the exact same situation I downloaded video named big_buck_bunny_1080p_h264.mov from this site and then extracted it with ffmpeg to h264 and used it in my program.
You need to initialize mfxFrameAlocator callback functions (Alloc, Free, GetHDL, ...) with proper functions.
for example :
//static member
mfxStatus decoder::gethdl(mfxHDL pthis, mfxMemId mid, mfxHDL* handle)
{
pthis; // To avoid warning for this unused parameter
if (handle == 0) return MFX_ERR_INVALID_HANDLE;
*handle = mid;
return MFX_ERR_NONE;
}
mfxStatus decoder::initDecoder(HWND window, mfxBitstream *Header) {
//blah blah
mfxAllocator.pthis = m_mfxSession;
mfxAllocator.GetHDL = gethdl;
//define for these too
//mfxAllocator.Alloc = alloc;
//mfxAllocator.Free = free;
//mfxAllocator.Lock = lock;
//mfxAllocator.Unlock = unlock;
//rest of your code
}