vkCreateSwapchainKHR Error - c++

Even though validation layers and debug callback extensions are enabled and working (they respond to wrong structs etc.), I'm still getting a "VK_ERROR_VALIDATION_FAILED_EXT" result from vkCreateSwapchainKHR(), and there's no validation layer error to pinpoint the mistake...
Swap chain creation (using GTX 970 ) :
VkBool32 isSupported = false;
vkGetPhysicalDeviceSurfaceSupportKHR( physicalDevices[0], 0, surface, &isSupported);
if (!isSupported) {
std::cout << "*ERROR* This device doesn't support surfaces" << std::endl;
}
VkSurfaceCapabilitiesKHR surfCaps;
vkGetPhysicalDeviceSurfaceCapabilitiesKHR(physicalDevices[0], surface, &surfCaps);
std::vector<VkSurfaceFormatKHR> deviceFormats;
uint32_t formatCount;
vkGetPhysicalDeviceSurfaceFormatsKHR(physicalDevices[0], surface, &formatCount, nullptr);
deviceFormats.resize(formatCount);
vkGetPhysicalDeviceSurfaceFormatsKHR(physicalDevices[0], surface, &formatCount, deviceFormats.data());
swapChainInfo.sType = VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR;
swapChainInfo.pNext = nullptr;
swapChainInfo.flags = 0;
swapChainInfo.surface = surface;
swapChainInfo.minImageCount = surfCaps.minImageCount;
swapChainInfo.imageFormat = VK_FORMAT_B8G8R8A8_UNORM;
swapChainInfo.imageColorSpace = VK_COLOR_SPACE_SRGB_NONLINEAR_KHR;
swapChainInfo.imageExtent = surfCaps.currentExtent;
swapChainInfo.imageArrayLayers = 1;
swapChainInfo.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
swapChainInfo.imageSharingMode = VK_SHARING_MODE_EXCLUSIVE;
swapChainInfo.queueFamilyIndexCount = 0;
swapChainInfo.pQueueFamilyIndices = VK_NULL_HANDLE;
swapChainInfo.preTransform = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
swapChainInfo.compositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR;
swapChainInfo.presentMode = VK_PRESENT_MODE_FIFO_KHR;
swapChainInfo.clipped = VK_TRUE; // TODO : TEST clipping against another window
swapChainInfo.oldSwapchain = VK_NULL_HANDLE;
result = vkCreateSwapchainKHR( device, &swapChainInfo, nullptr, &swapChain );
if (result) {
std::cout << "*ERROR* Swapchain Creation Failed :" << result << std::endl;
}
Surface Creation using GLFW (Which doesn't return any error):
if (result = glfwCreateWindowSurface(instance, window, nullptr, &surface))
{
std::cout << "*ERROR* Surface Creation Failed : " << result << std::endl;
}

There are a huge number of reasons validation of vkCreateSwapchainKHR will fail (check out PreCallValidateCreateSwapchainKHR in core_validation.cpp).
There may not be enough code here to tell why it is failing, for example, the failure might be because of an invalid surface. But, to pinpoint the problem, it should give a failure message in the debug log, which will tell you exactly why. You should enable it by calling CreateDebugReportCallbackEXT, before trying to create the swapchain. This will also require you to enable the VK_EXT_debug_report extension. See here for details.

Just a few points about your code (potentially cause of problems):
You are not checking VkResult of all the vkGet* commands
You are not checking swapChainInfo.imageFormat against supported formats in your deviceFormats
You are not accounting for the situation that surfCaps.currentExtent can be 0xFFFFFFFF
swapChainInfo.pQueueFamilyIndices is a pointer not a handle; use nullptr
You are not checking swapChainInfo.preTransform against surfCaps.supportedTransforms
You are not checking swapChainInfo.compositeAlpha against surfCaps.supportedCompositeAlpha

Related

Intel OneAPI Video decoding memory leak when using C++ CLI

I am trying to use Intel OneAPI/OneVPL to decode a stream I receive from an RTSP Camera in C#. But when I run the code I get an enormous memory leak. Around 1-200MB per run, which is around once every second.
When I've collected a GoP from the camera where I know the first data is a keyframe I pass it as a byte array to my CLI and C++ code.
Here I expect it to decode all the frames and return decoded images. It receives 30 frames and returns 16 decoded images but has a memory leak.
I've tried to use Visual Studio memory profiler and all I can tell from it is that its unmanaged memory that's my problem. I've tried to override the "new" and "delete" method inside videoHandler.cpp to track and compare all allocations and deallocations and as far as I can tell everything is handled correctly in there. I cannot see any classes that get instantiated that do not get cleaned up. I think my issue is in the CLI class videoHandlerWrapper.cpp. Am I missing something obvious?
videoHandlerWrapper.cpp
array<imgFrameWrapper^>^ videoHandlerWrapper::decode(array<System::Byte>^ byteArray)
{
array<imgFrameWrapper^>^ returnFrames = gcnew array<imgFrameWrapper^>(30);
{
std::vector<imgFrame> frames(30); //Output from decoding process. imgFrame implements a deconstructor that will rid the data when exiting scope
std::vector<unsigned char> bytes(byteArray->Length); //Input for decoding process
Marshal::Copy(byteArray, 0, IntPtr((unsigned char*)(&((bytes)[0]))), byteArray->Length); //Copy from managed (C#) to unmanaged (C++)
int status = _pVideoHandler->decode(bytes, frames); //Decode
for (size_t i = 0; i < frames.size(); i++)
{
if (frames[i].size > 0)
returnFrames[i] = gcnew imgFrameWrapper(frames[i].size, frames[i].bytes);
}
}
//PrintMemoryUsage();
return returnFrames;
}
videoHandler.cpp
#define BITSTREAM_BUFFER_SIZE 2000000 //TODO Maybe higher or lower bitstream buffer. Thorough testing has been done at 2000000
int videoHandler::decode(std::vector<unsigned char> bytes, std::vector<imgFrame> &frameData)
{
int result = -1;
bool isStillGoing = true;
mfxBitstream bitstream = { 0 };
mfxSession session = NULL;
mfxStatus sts = MFX_ERR_NONE;
mfxSurfaceArray* outSurfaces = nullptr;
mfxU32 framenum = 0;
mfxU32 numVPPCh = 0;
mfxVideoChannelParam* mfxVPPChParams = nullptr;
void* accelHandle = NULL;
mfxVideoParam mfxDecParams = {};
mfxVersion version = { 0, 1 };
//variables used only in 2.x version
mfxConfig cfg = NULL;
mfxLoader loader = NULL;
mfxVariant inCodec = {};
std::vector<mfxU8> input_buffer;
// Initialize VPL session for any implementation of HEVC/H265 decode
loader = MFXLoad();
VERIFY(NULL != loader, "MFXLoad failed -- is implementation in path?");
cfg = MFXCreateConfig(loader);
VERIFY(NULL != cfg, "MFXCreateConfig failed")
inCodec.Type = MFX_VARIANT_TYPE_U32;
inCodec.Data.U32 = MFX_CODEC_AVC;
sts = MFXSetConfigFilterProperty(
cfg,
(mfxU8*)"mfxImplDescription.mfxDecoderDescription.decoder.CodecID",
inCodec);
VERIFY(MFX_ERR_NONE == sts, "MFXSetConfigFilterProperty failed for decoder CodecID");
sts = MFXCreateSession(loader, 0, &session);
VERIFY(MFX_ERR_NONE == sts, "Not able to create VPL session");
// Print info about implementation loaded
version = ShowImplInfo(session);
//VERIFY(version.Major > 1, "Sample requires 2.x API implementation, exiting");
if (version.Major == 1) {
mfxVariant ImplValueSW;
ImplValueSW.Type = MFX_VARIANT_TYPE_U32;
ImplValueSW.Data.U32 = MFX_IMPL_TYPE_SOFTWARE;
MFXSetConfigFilterProperty(cfg, (mfxU8*)"mfxImplDescription.Impl", ImplValueSW);
sts = MFXCreateSession(loader, 0, &session);
VERIFY(MFX_ERR_NONE == sts, "Not able to create VPL session");
}
// Convenience function to initialize available accelerator(s)
accelHandle = InitAcceleratorHandle(session);
bitstream.MaxLength = BITSTREAM_BUFFER_SIZE;
bitstream.Data = (mfxU8*)calloc(bytes.size(), sizeof(mfxU8));
VERIFY(bitstream.Data, "Not able to allocate input buffer");
bitstream.CodecId = MFX_CODEC_AVC;
std::copy(bytes.begin(), bytes.end(), bitstream.Data);
bitstream.DataLength = static_cast<mfxU32>(bytes.size());
memset(&mfxDecParams, 0, sizeof(mfxDecParams));
mfxDecParams.mfx.CodecId = MFX_CODEC_AVC;
mfxDecParams.IOPattern = MFX_IOPATTERN_OUT_SYSTEM_MEMORY;
sts = MFXVideoDECODE_DecodeHeader(session, &bitstream, &mfxDecParams);
VERIFY(MFX_ERR_NONE == sts, "Error decoding header\n");
numVPPCh = 1;
mfxVPPChParams = new mfxVideoChannelParam[numVPPCh];
for (mfxU32 i = 0; i < numVPPCh; i++) {
mfxVPPChParams[i] = {};
}
//mfxVPPChParams[0].VPP.FourCC = mfxDecParams.mfx.FrameInfo.FourCC;
mfxVPPChParams[0].VPP.FourCC = MFX_FOURCC_BGRA;
mfxVPPChParams[0].VPP.ChromaFormat = MFX_CHROMAFORMAT_YUV420;
mfxVPPChParams[0].VPP.PicStruct = MFX_PICSTRUCT_PROGRESSIVE;
mfxVPPChParams[0].VPP.FrameRateExtN = 30;
mfxVPPChParams[0].VPP.FrameRateExtD = 1;
mfxVPPChParams[0].VPP.CropW = 1920;
mfxVPPChParams[0].VPP.CropH = 1080;
//Set value directly if input and output is the same.
mfxVPPChParams[0].VPP.Width = 1920;
mfxVPPChParams[0].VPP.Height = 1080;
//// USED TO RESIZE. IF INPUT IS THE SAME AS OUTPUT THIS WILL MAKE IT SHIFT A BIT. 1920x1080 becomes 1920x1088.
//mfxVPPChParams[0].VPP.Width = ALIGN16(mfxVPPChParams[0].VPP.CropW);
//mfxVPPChParams[0].VPP.Height = ALIGN16(mfxVPPChParams[0].VPP.CropH);
mfxVPPChParams[0].VPP.ChannelId = 1;
mfxVPPChParams[0].Protected = 0;
mfxVPPChParams[0].IOPattern = MFX_IOPATTERN_IN_SYSTEM_MEMORY | MFX_IOPATTERN_OUT_SYSTEM_MEMORY;
mfxVPPChParams[0].ExtParam = NULL;
mfxVPPChParams[0].NumExtParam = 0;
sts = MFXVideoDECODE_VPP_Init(session, &mfxDecParams, &mfxVPPChParams, numVPPCh); //This causes a MINOR memory leak!
outSurfaces = new mfxSurfaceArray;
while (isStillGoing == true) {
sts = MFXVideoDECODE_VPP_DecodeFrameAsync(session,
&bitstream,
NULL,
0,
&outSurfaces); //Big memory leak. 100MB pr run in the while loop.
switch (sts) {
case MFX_ERR_NONE:
// decode output
if (framenum >= 30)
{
isStillGoing = false;
break;
}
sts = WriteRawFrameToByte(outSurfaces->Surfaces[1], &frameData[framenum]);
VERIFY(MFX_ERR_NONE == sts, "Could not write 1st vpp output");
framenum++;
break;
case MFX_ERR_MORE_DATA:
// The function requires more bitstream at input before decoding can proceed
isStillGoing = false;
break;
case MFX_ERR_MORE_SURFACE:
// The function requires more frame surface at output before decoding can proceed.
// This applies to external memory allocations and should not be expected for
// a simple internal allocation case like this
break;
case MFX_ERR_DEVICE_LOST:
// For non-CPU implementations,
// Cleanup if device is lost
break;
case MFX_WRN_DEVICE_BUSY:
// For non-CPU implementations,
// Wait a few milliseconds then try again
break;
case MFX_WRN_VIDEO_PARAM_CHANGED:
// The decoder detected a new sequence header in the bitstream.
// Video parameters may have changed.
// In external memory allocation case, might need to reallocate the output surface
break;
case MFX_ERR_INCOMPATIBLE_VIDEO_PARAM:
// The function detected that video parameters provided by the application
// are incompatible with initialization parameters.
// The application should close the component and then reinitialize it
break;
case MFX_ERR_REALLOC_SURFACE:
// Bigger surface_work required. May be returned only if
// mfxInfoMFX::EnableReallocRequest was set to ON during initialization.
// This applies to external memory allocations and should not be expected for
// a simple internal allocation case like this
break;
default:
printf("unknown status %d\n", sts);
isStillGoing = false;
break;
}
}
sts = MFXVideoDECODE_VPP_Close(session); // Helps massively! Halves the memory leak speed. Closes internal structures and tables.
VERIFY(MFX_ERR_NONE == sts, "Error closing VPP session\n");
result = 0;
end:
printf("Decode and VPP processed %d frames\n", framenum);
// Clean up resources - It is recommended to close components first, before
// releasing allocated surfaces, since some surfaces may still be locked by
// internal resources.
if (mfxVPPChParams)
delete[] mfxVPPChParams;
if (outSurfaces)
delete outSurfaces;
if (bitstream.Data)
free(bitstream.Data);
if (accelHandle)
FreeAcceleratorHandle(accelHandle);
if (loader)
MFXUnload(loader);
return result;
}
imgFrameWrapper.h
public ref class imgFrameWrapper
{
private:
size_t size;
array<System::Byte>^ bytes;
public:
imgFrameWrapper(size_t u_size, unsigned char* u_bytes);
~imgFrameWrapper();
!imgFrameWrapper();
size_t get_size();
array<System::Byte>^ get_bytes();
};
imgFrameWrapper.cpp
imgFrameWrapper::imgFrameWrapper(size_t u_size, unsigned char* u_bytes)
{
size = u_size;
bytes = gcnew array<System::Byte>(size);
Marshal::Copy((IntPtr)u_bytes, bytes, 0, size);
}
imgFrameWrapper::~imgFrameWrapper()
{
}
imgFrameWrapper::!imgFrameWrapper()
{
}
size_t imgFrameWrapper::get_size()
{
return size;
}
array<System::Byte>^ imgFrameWrapper::get_bytes()
{
return bytes;
}
imgFrame.h
struct imgFrame
{
int size;
unsigned char* bytes;
~imgFrame()
{
if (bytes)
delete[] bytes;
}
};
MFXVideoDECODE_VPP_DecodeFrameAsync() function creates internal memory surfaces for the processing.
You should release surfaces.
Please check this link it's mentioning about it.
https://spec.oneapi.com/onevpl/latest/API_ref/VPL_structs_decode_vpp.html#_CPPv415mfxSurfaceArray
mfxStatus (*Release)(struct mfxSurfaceArray *surface_array)ΒΆ
Decrements the internal reference counter of the surface. (*Release) should be
called after using the (*AddRef) function to add a surface or when allocation
logic requires it.
And please check this sample.
https://github.com/oneapi-src/oneVPL/blob/master/examples/hello-decvpp/src/hello-decvpp.cpp
Especially, WriteRawFrame_InternalMem() function in https://github.com/oneapi-src/oneVPL/blob/17968d8d2299352f5a9e09388d24e81064c81c87/examples/util/util/util.h
It shows how to release surfaces.

OSX MetalKit CVMetalTextureCacheCreateTextureFromImage failed, status: -6660

I'm trying to make firstly CVPixelBuffer from RAW memory
then MTLTexture from CVPixelBuffer, but
after running following code I've always got error
CVMetalTextureCacheCreateTextureFromImage failed, status: -6660 0x0
Where is this error came from?
id<MTLTexture> makeTextureWithBytes(id<MTLDevice> mtl_device,
int width,
int height,
void *baseAddress, int bytesPerRow)
{
CVMetalTextureCacheRef textureCache = NULL;
CVReturn status = CVMetalTextureCacheCreate(kCFAllocatorDefault, nullptr, mtl_device, nullptr, &textureCache);
if(status != kCVReturnSuccess || textureCache == NULL)
{
return nullptr;
}
NSDictionary* cvBufferProperties = #{
(__bridge NSString*)kCVPixelBufferOpenGLCompatibilityKey : #YES,
(__bridge NSString*)kCVPixelBufferMetalCompatibilityKey : #YES,
};
CVPixelBufferRef pxbuffer = NULL;
status = CVPixelBufferCreateWithBytes(kCFAllocatorDefault,
width,
height,
kCVPixelFormatType_32BGRA,
baseAddress,
bytesPerRow,
releaseCallback,
NULL/*releaseRefCon*/,
(__bridge CFDictionaryRef)cvBufferProperties,
&pxbuffer);
if(status != kCVReturnSuccess || pxbuffer == NULL)
{
return nullptr;
}
CVMetalTextureRef cvTexture = NULL;
status = CVMetalTextureCacheCreateTextureFromImage(kCFAllocatorDefault,
textureCache,
pxbuffer,
nullptr,
MTLPixelFormatBGRA8Unorm,
1920,
1080,
0,
&cvTexture);
if(status != kCVReturnSuccess || cvTexture == NULL)
{
std::cout << "CVMetalTextureCacheCreateTextureFromImage failed, status: " << status << " " << cvTexture << std::endl;
return nullptr;
}
id<MTLTexture> metalTexture = CVMetalTextureGetTexture(cvTexture);
CFRelease(cvTexture);
return metalTexture;
}
The error occurs when the CVPixelBuffer isn't backed by an IOSurface. However you can't make an IOSurface backed CVPixelBuffer from Bytes. So despite having the kCVPixelBufferMetalCompatibilityKey set, CVPixelBufferCreateWithBytes (and its planar counterpart) will not back the buffer with an IOSurface.
2 ways around this (and possible 3rd)
create an empty CVpixelBuffer and use memcpy. You'll need to create a new pixelbuffer each loop, so using a PixelBufferPool is advised.
CVPixelBufferPoolRef pixelPool; // initialised prior
void *srcBaseAddress; // initialised prior
CVPixelBufferRef currentFrame;
CVPixelBufferPoolCreatePixelBuffer(nil, pixelPool, &currentFrame);
CVPixelBufferLockBaseAddress(currentFrame,0);
void *cvBaseAddress=CVPixelBufferGetBaseAddress(currentFrame);
size_t size= CVPixelBufferGetDataSize(currentFrame);
memcpy(cvBaseAddress,srcBaseAddress,size);
Skip the CVPixelBuffer entirely and write directly to the MTLTexture memory; As long as metal supports your pixel format. However since you're probably rendering to an RGB display, you can write your own metal kernel to convert to RGB. Remember to make the texture first using metalTextureDescriptor.
id<MTLDevice> metalDevice; // You know how to get this
unsigned int width; // from your source image data
unsigned int height;// from your source image data
unsigned int rowBytes; // from your source image data
MTLTextureDescriptor *mtd = [MTLTextureDescriptor
texture2DDescriptorWithPixelFormat:MTLPixelFormatBGRG422
width:width
height:height
mipmapped:NO];
id<MTLTexture> mtlTexture = [metalDevice newTextureWithDescriptor:mtd];
[mtlTexture replaceRegion:MTLRegionMake2D(0,0,width,height)
mipmapLevel:0
withBytes:srcBaseAddress
bytesPerRow:rowBytes];
A third way might be to convert the CVPixelBuffer into a CIImage, and use a Metal backed CIContext. Something like this;
id<MTLDevice> metalDevice;
CIContext* ciContext = [CIContext contextWithMTLDevice:metalDevice
options:[NSDictionary dictionaryWithObjectsAndKeys:#(NO),kCIContextUseSoftwareRenderer,nil]
];
CIImage *inputImage = [[CIImage alloc] initWithCVPixelBuffer:currentFrame];
[ciContext render:inputImage
toMTLTexture:metalTexture
commandBuffer:metalCommandBuffer
bounds:viewRect
colorSpace:colorSpace];
I successfully used this to render directly to the CAMetalLayer's drawable texture, but didn't have much luck rendering to an intermediate texture (not that I tried very hard to get it working) Hope one of these works for you.

vkCreateSwapchainKHR: internal drawable creation failed

I had my Vulkan application working but for some reason it stopped working(I don't believe I touched anything that could have broken it, besides making my engine project a .lib instead of a .dll) and started giving the "vkCreateSwapchainKHR: internal drawable creation failed" error in the validation layers. vkCreateSwapchainKHR returns VK_ERROR_VALIDATION_FAILED_EXT.
I already checked this answer:
Wat does the "vkCreateSwapchainKHR:internal drawable creation failed." means, but it was not my problem, (as I said, it was working until it wasn't). Here's all the code I believe is relevant, if you need something else just comment:
Window Creation:
/* Initialize the library */
if (!glfwInit())
/* Create a windowed mode window and its OpenGL context */
glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
glfwWindowHint(GLFW_DECORATED, _WCI.IsDecorated);
GLFWWindow = glfwCreateWindow(Extent.Width, Extent.Height, _WCI.Name.c_str(), nullptr, nullptr);
if (!GLFWWindow) glfwTerminate();
//glfwMakeContextCurrent(GLFWWindow);
uint32 Count = 0;
auto ff = glfwGetRequiredInstanceExtensions(&Count);
GS_BASIC_LOG_MESSAGE("GLFW required extensions:")
for (uint8 i = 0; i < Count; ++i)
{
GS_BASIC_LOG_MESSAGE("%d: %s", i, ff[i]);
}
WindowObject = glfwGetWin32Window(GLFWWindow);
WindowInstance = GetModuleHandle(nullptr);
I'm using the correct instance extensions:
const char* Extensions[] = { VK_KHR_SURFACE_EXTENSION_NAME, VK_KHR_WIN32_SURFACE_EXTENSION_NAME, VK_EXT_DEBUG_UTILS_EXTENSION_NAME };
VKSwapchainCreator VulkanRenderContext::CreateSwapchain(VKDevice* _Device, VkSwapchainKHR _OldSwapchain) const
{
VkSwapchainCreateInfoKHR SwapchainCreateInfo = { VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR };
SwapchainCreateInfo.surface = Surface.GetHandle();
SwapchainCreateInfo.minImageCount = 3;
SwapchainCreateInfo.imageFormat = Format.format;
SwapchainCreateInfo.imageColorSpace = Format.colorSpace;
SwapchainCreateInfo.imageExtent = Extent2DToVkExtent2D(RenderExtent);
//The imageArrayLayers specifies the amount of layers each image consists of. This is always 1 unless you are developing a stereoscopic 3D application.
SwapchainCreateInfo.imageArrayLayers = 1;
SwapchainCreateInfo.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
SwapchainCreateInfo.imageSharingMode = VK_SHARING_MODE_EXCLUSIVE;
SwapchainCreateInfo.queueFamilyIndexCount = 0;
SwapchainCreateInfo.pQueueFamilyIndices = nullptr;
SwapchainCreateInfo.preTransform = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
SwapchainCreateInfo.compositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR;
SwapchainCreateInfo.presentMode = PresentMode;
SwapchainCreateInfo.clipped = VK_TRUE;
SwapchainCreateInfo.oldSwapchain = _OldSwapchain;
return VKSwapchainCreator(_Device, &SwapchainCreateInfo);
}
Both VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR and VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR are supported by my GPU.
VkBool32 Supports = 0;
vkGetPhysicalDeviceSurfaceSupportKHR(_PD, PresentationQueue.GetQueueIndex(), _Surface, &Supports);
VkSurfaceCapabilitiesKHR SurfaceCapabilities = {};
vkGetPhysicalDeviceSurfaceCapabilitiesKHR(_PD, _Surface, &SurfaceCapabilities);
VkBool32 Supported = 0;
vkGetPhysicalDeviceSurfaceSupportKHR(_PD, PresentationQueue.GetQueueIndex(), _Surface, &Supported);
auto bb = vkGetPhysicalDeviceWin32PresentationSupportKHR(_PD, PresentationQueue.GetQueueIndex());
Everything here returns true, although it seemed suspicious to me that VkSurfaceCapabilitiesKHR returned the same extent for currentExtent, minImageExtent and maxImageExtent.
/* Initialize the library */
if (!glfwInit())
/* Create a windowed mode window and its OpenGL context */
glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
glfwWindowHint(GLFW_DECORATED, _WCI.IsDecorated);
If this is your actual code, it means "if glfwInit() succeeds skip glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);". And as we know not calling the hint causes the error.
The if should perhaps not have negation inside. And (this not being Python) it should have scope {} to cover all the init code.
The solution was to remove the glfwInit call from inside the if and place it outside.
if (!glfwInit()) -> glfwInit()
No idea why, if someone knows why this could've have caused the problem I would love to know.

Is it possible to wait for a transfer from the staging buffer to complete without calling vkQueueWaitIdle

The following piece of code show you how i transfer a vertex buffer data from the staging buffer to a local memory buffer :
bool Vulkan::UpdateVertexBuffer(std::vector<VERTEX>& data, VULKAN_BUFFER& vertex_buffer)
{
std::memcpy(this->staging_buffer.pointer, &data[0], vertex_buffer.size);
size_t flush_size = static_cast<size_t>(vertex_buffer.size);
unsigned int multiple = static_cast<unsigned int>(flush_size / this->physical_device.properties.limits.nonCoherentAtomSize);
flush_size = this->physical_device.properties.limits.nonCoherentAtomSize * ((uint64_t)multiple + 1);
VkMappedMemoryRange flush_range = {};
flush_range.sType = VK_STRUCTURE_TYPE_MAPPED_MEMORY_RANGE;
flush_range.pNext = nullptr;
flush_range.memory = this->staging_buffer.memory;
flush_range.offset = 0;
flush_range.size = flush_size;
vkFlushMappedMemoryRanges(this->device, 1, &flush_range);
VkResult result = vkWaitForFences(this->device, 1, &this->transfer.fence, VK_FALSE, 1000000000);
if(result != VK_SUCCESS) {
#if defined(_DEBUG)
std::cout << "UpdateVertexBuffer => vkWaitForFences : Timeout" << std::endl;
#endif
return false;
}
vkResetFences(this->device, 1, &this->transfer.fence);
VkCommandBufferBeginInfo command_buffer_begin_info = {};
command_buffer_begin_info.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
command_buffer_begin_info.pNext = nullptr;
command_buffer_begin_info.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;
command_buffer_begin_info.pInheritanceInfo = nullptr;
vkBeginCommandBuffer(this->transfer.command_buffer, &command_buffer_begin_info);
VkBufferCopy buffer_copy_info = {};
buffer_copy_info.srcOffset = 0;
buffer_copy_info.dstOffset = 0;
buffer_copy_info.size = vertex_buffer.size;
vkCmdCopyBuffer(this->transfer.command_buffer, this->staging_buffer.handle, vertex_buffer.handle, 1, &buffer_copy_info);
VkBufferMemoryBarrier buffer_memory_barrier = {};
buffer_memory_barrier.sType = VK_STRUCTURE_TYPE_BUFFER_MEMORY_BARRIER;
buffer_memory_barrier.pNext = nullptr;
buffer_memory_barrier.srcAccessMask = VK_ACCESS_MEMORY_WRITE_BIT;
buffer_memory_barrier.dstAccessMask = VK_ACCESS_VERTEX_ATTRIBUTE_READ_BIT;
buffer_memory_barrier.srcQueueFamilyIndex = this->queue_stack[this->transfer_stack_index].index;
buffer_memory_barrier.dstQueueFamilyIndex = this->queue_stack[this->graphics_stack_index].index;
buffer_memory_barrier.buffer = vertex_buffer.handle;
buffer_memory_barrier.offset = 0;
buffer_memory_barrier.size = VK_WHOLE_SIZE;
vkCmdPipelineBarrier(this->transfer.command_buffer, VK_PIPELINE_STAGE_TRANSFER_BIT, VK_PIPELINE_STAGE_VERTEX_INPUT_BIT, 0, 0, nullptr, 1, &buffer_memory_barrier, 0, nullptr);
vkEndCommandBuffer(this->transfer.command_buffer);
VkSubmitInfo submit_info = {};
submit_info.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
submit_info.pNext = nullptr;
submit_info.waitSemaphoreCount = 0;
submit_info.pWaitSemaphores = nullptr;
submit_info.pWaitDstStageMask = nullptr;
submit_info.commandBufferCount = 1;
submit_info.pCommandBuffers = &this->transfer.command_buffer;
submit_info.signalSemaphoreCount = 0;
submit_info.pSignalSemaphores = nullptr;
VkResult result = vkQueueSubmit(this->queue_stack[this->transfer_stack_index].handle, 1, &submit_info, this->transfer.fence);
if(result != VK_SUCCESS) {
#if defined(_DEBUG)
std::cout << "UpdateVertexBuffer => vkQueueSubmit : Failed" << std::endl;
#endif
return false;
}
#if defined(_DEBUG)
std::cout << "UpdateVertexBuffer : Success" << std::endl;
#endif
return true;
}
It works perfectly without any validation layer warning. But when i call i twice, both buffers contains the same data, from the second call. For example :
UpdateVertexBuffer(cube_data, cube_buffer);
UpdateVertexBuffer(prism_data, prism_buffer);
This will result in having a prism inside both cube_buffer and prism_buffer. To fix this, i can simply wait for a few milliseconds between the two calls :
UpdateVertexBuffer(cube_data, cube_buffer);
std::this_thread::sleep_for(std::chrono::milliseconds(100));
UpdateVertexBuffer(prism_data, prism_buffer);
or preferably, i can replace the fence by a call to
vkQueueWaitIdle(this->queue_stack[this->transfer_stack_index].handle);
In my opinion this will result in performance loss and the fence is supposed to be the optimal way to wait for transfer operation to complete properly, so why is my first buffer filled by second when i'm using a fence. And is there a way to do this properly without using vkQueueWaitIdle.
Thanks for your help.
You wait for the fence for the previous upload after you have already written the data to the staging buffer. That's too late; the fence is there to prevent you from writing data to memory that's being read.
But really, your problem is that your design is wrong. Your design is such that sequential updates all use the same memory. They shouldn't. Instead, sequential updates should use different regions of the same memory, so that they cannot overlap. That way, you can perform the transfers and not have to wait on fences at all (or at least, not until next frame).
Basically, you should treat your staging buffer like a ring buffer. Every operation that wants to do some staged transfer work should "allocate" X bytes of memory from the staging ring buffer. The staging buffer system allocates memory sequentially, wrapping around if there is insufficient space. But it also remembers where the last memory region is that it synchronized with. If you try to stage too much work, then it has to synchronize.
Also, one of the purposes behind mapping memory is that you can write directly to that memory, rather than writing to some other CPU memory and copying it in. So instead of passing in a VULKAN_BUFFER (whatever that is), the process that generated that data should have fetched a pointer to a region of the active staging buffer and written its data into that.
Oh, and one more thing: never, ever create a command buffer and immediately submit it. Just don't do it. There's a reason why vkQueueSubmit can take multiple command buffers, and multiple batches of command buffers. For any one queue, you should never be submitting more than once (or maybe twice) per frame.

E_INVALIDARG when calling CreateGraphicsPipelineState

I've been getting a weird error when calling CreateGraphicsPipelineState().
The function returns E_INVALIDARG even though the description is all set up.
The description worked before and the I tried to add indexbuffers to my pipeline, I didn't even touch any of the code for PSO or Shaders and now the PSO creation is all messed up.
The issue is that I don't get any DX-error messages from the driver when enabling the debug-layer. I only get this
"Microsoft C++ exception: _com_error at memory location
when I step through the function.
It feels like it is some pointer error or similar, but I can't figure out what the error is. Perhaps anyone of you can see an obvious mistake that I made?
Here's my code:
CGraphicsPSO* pso = new CGraphicsPSO();
D3D12_GRAPHICS_PIPELINE_STATE_DESC psoDesc = {};
// Input Layout
std::vector<D3D12_INPUT_ELEMENT_DESC> elements;
if (aPSODesc.inputLayout != nullptr)
{
auto& ilData = aPSODesc.inputLayout->desc;
for (auto& element : ilData)
{
// All Data here is correct when breaking
D3D12_INPUT_ELEMENT_DESC elementDesc;
elementDesc.SemanticName = element.mySemanticName;
elementDesc.SemanticIndex = element.mySemanticIndex;
elementDesc.InputSlot = element.myInputSlot;
elementDesc.AlignedByteOffset = element.myAlignedByteOffset;
elementDesc.InputSlotClass = _ConvertInputClassificationDX12(element.myInputSlotClass);
elementDesc.Format = _ConvertFormatDX12(element.myFormat);
elementDesc.InstanceDataStepRate = element.myInstanceDataStepRate;
elements.push_back(elementDesc);
}
D3D12_INPUT_LAYOUT_DESC inputLayout = {};
inputLayout.NumElements = (UINT)elements.size();
inputLayout.pInputElementDescs = elements.data();
psoDesc.InputLayout = inputLayout;
}
// TOPOLOGY
switch (aPSODesc.topology)
{
default:
case EPrimitiveTopology::TriangleList:
psoDesc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE; // <--- Always this option
break;
case EPrimitiveTopology::PointList:
psoDesc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_POINT;
break;
case EPrimitiveTopology::LineList:
psoDesc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_LINE;
break;
//case EPrimitiveTopology::Patch:
// psoDesc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_PATCH;
// break;
}
// Shaders
if (aPSODesc.vs != nullptr)
{
D3D12_SHADER_BYTECODE vertexShaderBytecode = {};
vertexShaderBytecode.BytecodeLength = aPSODesc.vs->myByteCodeSize;
vertexShaderBytecode.pShaderBytecode = aPSODesc.vs->myByteCode;
psoDesc.VS = vertexShaderBytecode;
}
if (aPSODesc.ps != nullptr)
{
D3D12_SHADER_BYTECODE pixelShaderBytecode = {};
pixelShaderBytecode.BytecodeLength = aPSODesc.ps->myByteCodeSize;
pixelShaderBytecode.pShaderBytecode = aPSODesc.ps->myByteCode;
psoDesc.PS = pixelShaderBytecode;
}
psoDesc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM; // format of the render target
DXGI_SAMPLE_DESC sampleDesc = {};
sampleDesc.Count = 1;
sampleDesc.Quality = 0;
psoDesc.DepthStencilState.DepthEnable = FALSE;
psoDesc.DepthStencilState.StencilEnable = FALSE;
psoDesc.SampleDesc = sampleDesc; // must be the same sample description as the swapchain and depth/stencil buffer
psoDesc.SampleMask = UINT_MAX; // sample mask has to do with multi-sampling. 0xffffffff means point sampling is done
psoDesc.RasterizerState = CD3DX12_RASTERIZER_DESC(D3D12_DEFAULT); // a default rasterizer state.
psoDesc.BlendState = CD3DX12_BLEND_DESC(D3D12_DEFAULT); // a default blent state.
psoDesc.NumRenderTargets = 1; // we are only binding one render target
psoDesc.pRootSignature = myGraphicsRootSignature;
psoDesc.Flags = D3D12_PIPELINE_STATE_FLAG_NONE;
ID3D12PipelineState* pipelineState;
HRESULT hr = myDevice->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pipelineState));
pso->myPipelineState = pipelineState;
if (FAILED(hr))
{
delete pso;
return nullptr;
}
return pso;
So I just found the error.
It seems like the way I parsed my semantics for my input-layout gave me an invalid pointer.
Thus the memory at the adress was invalid and giving the DX12-device incorrect decriptions.
So what I did was to locally store the semantic-names within my CreatePSO function until the PSO was created, and now it all works.
Looks to me like the pointers to storage you promised are going out of scope.
..
D3D12_INPUT_LAYOUT_DESC inputLayout = {};
..
psoDesc.InputLayout = inputLayout;
}