nvEncRegisterResource() fails with -23 - opengl

I've hit a complete brick wall in my attempt to use NVEnc to stream OpenGL frames as H264. I've been at this particular issue for close to 8 hours without any progress.
The problem is the call to nvEncRegisterResource(), which invariably fails with code -23 (enum value NV_ENC_ERR_RESOURCE_REGISTER_FAILED, documented as "failed to register the resource" - thanks NVidia).
I'm trying to follow a procedure outlined in this document from the University of Oslo (page 54, "OpenGL interop"), so I know for a fact that this is supposed to work, though unfortunately said document does not provide the code itself.
The idea is fairly straightforward:
map the texture produced by the OpenGL frame buffer object into CUDA;
copy the texture into a (previously allocated) CUDA buffer;
map that buffer as an NVEnc input resource
use that input resource as the source for the encoding
As I said, the problem is step (3). Here are the relevant code snippets (I'm omitting error handling for brevity.)
// Round up width and height
priv->encWidth = (_resolution.w + 31) & ~31, priv->encHeight = (_resolution.h + 31) & ~31;
// Allocate CUDA "pitched" memory to match the input texture (YUV, one byte per component)
cuErr = cudaMallocPitch(&priv->cudaMemPtr, &priv->cudaMemPitch, 3 * priv->encWidth, priv->encHeight);
This should allocate on-device CUDA memory (the "pitched" variety, though I've tried non-pitched too, without any change in the outcome.)
// Register the CUDA buffer as an input resource
NV_ENC_REGISTER_RESOURCE regResParams = { 0 };
regResParams.version = NV_ENC_REGISTER_RESOURCE_VER;
regResParams.resourceType = NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
regResParams.width = priv->encWidth;
regResParams.height = priv->encHeight;
regResParams.bufferFormat = NV_ENC_BUFFER_FORMAT_YUV444_PL;
regResParams.resourceToRegister = priv->cudaMemPtr;
regResParams.pitch = priv->cudaMemPitch;
encStat = nvEncApi.nvEncRegisterResource(priv->nvEncoder, &regResParams);
// ^^^ FAILS
priv->nvEncInpRes = regResParams.registeredResource;
This is the brick wall. No matter what I try, nvEncRegisterResource() fails.
I should note that I rather think (though I may be wrong) that I've done all the required initializations. Here is the code that creates and activates the CUDA context:
// Pop the current context
cuRes = cuCtxPopCurrent(&priv->cuOldCtx);
// Create a context for the device
priv->cuCtx = nullptr;
cuRes = cuCtxCreate(&priv->cuCtx, CU_CTX_SCHED_BLOCKING_SYNC, priv->cudaDevice);
// Push our context
cuRes = cuCtxPushCurrent(priv->cuCtx);
.. followed by the creation of the encoding session:
// Create an NV Encoder session
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS nvEncSessParams = { 0 };
nvEncSessParams.apiVersion = NVENCAPI_VERSION;
nvEncSessParams.version = NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS_VER;
nvEncSessParams.deviceType = NV_ENC_DEVICE_TYPE_CUDA;
nvEncSessParams.device = priv->cuCtx; // nullptr
auto encStat = nvEncApi.nvEncOpenEncodeSessionEx(&nvEncSessParams, &priv->nvEncoder);
And finally, the code initializing the encoder:
// Configure the encoder via preset
NV_ENC_PRESET_CONFIG presetConfig = { 0 };
GUID codecGUID = NV_ENC_CODEC_H264_GUID;
GUID presetGUID = NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID;
presetConfig.version = NV_ENC_PRESET_CONFIG_VER;
presetConfig.presetCfg.version = NV_ENC_CONFIG_VER;
encStat = nvEncApi.nvEncGetEncodePresetConfig(priv->nvEncoder, codecGUID, presetGUID, &presetConfig);
NV_ENC_INITIALIZE_PARAMS initParams = { 0 };
initParams.version = NV_ENC_INITIALIZE_PARAMS_VER;
initParams.encodeGUID = codecGUID;
initParams.encodeWidth = priv->encWidth;
initParams.encodeHeight = priv->encHeight;
initParams.darWidth = 1;
initParams.darHeight = 1;
initParams.frameRateNum = 25; // TODO: make this configurable
initParams.frameRateDen = 1; // ditto
// .max_surface_count = (num_mbs >= 8160) ? 32 : 48;
// .buffer_delay ? necessary
initParams.enableEncodeAsync = 0;
initParams.enablePTD = 1;
initParams.presetGUID = presetGUID;
memcpy(&priv->nvEncConfig, &presetConfig.presetCfg, sizeof(priv->nvEncConfig));
initParams.encodeConfig = &priv->nvEncConfig;
encStat = nvEncApi.nvEncInitializeEncoder(priv->nvEncoder, &initParams);
All the above initializations report success.
I'd be extremely grateful to anyone who can get me past this hurdle.
EDIT: here is the complete code to reproduce the problem. The only observable difference to the original code is that cuPopContext() returns an error (which can be ignored) here - probably my original program creates such a context as a side effect of using OpenGL. Otherwise, the code behaves exactly as the original does.
I've built the code with Visual Studio 2013. You must link the following library file (adapt path if not on C:): C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\Win32\cuda.lib
You must also make sure that C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\ (or similar) is in the include path.
NEW EDIT: modified the code to only use the CUDA driver interface, instead of mixing with the runtime API. Still the same error code.
#ifdef _WIN32
#include <Windows.h>
#endif
#include <cassert>
#include <GL/gl.h>
#include <iostream>
#include <string>
#include <stdexcept>
#include <string>
#include <cuda.h>
//#include <cuda_runtime.h>
#include <cuda_gl_interop.h>
#include <nvEncodeAPI.h>
// NV Encoder API ---------------------------------------------------
#if defined(_WIN32)
#define LOAD_FUNC(l, s) GetProcAddress(l, s)
#define DL_CLOSE_FUNC(l) FreeLibrary(l)
#else
#define LOAD_FUNC(l, s) dlsym(l, s)
#define DL_CLOSE_FUNC(l) dlclose(l)
#endif
typedef NVENCSTATUS(NVENCAPI* PNVENCODEAPICREATEINSTANCE)(NV_ENCODE_API_FUNCTION_LIST *functionList);
struct NVEncAPI : public NV_ENCODE_API_FUNCTION_LIST {
public:
// ~NVEncAPI() { cleanup(); }
void init() {
#if defined(_WIN32)
if (sizeof(void*) == 8) {
nvEncLib = LoadLibrary(TEXT("nvEncodeAPI64.dll"));
}
else {
nvEncLib = LoadLibrary(TEXT("nvEncodeAPI.dll"));
}
if (nvEncLib == NULL) throw std::runtime_error("Failed to load NVidia Encoder library: " + std::to_string(GetLastError()));
#else
nvEncLib = dlopen("libnvidia-encode.so.1", RTLD_LAZY);
if (nvEncLib == nullptr)
throw std::runtime_error("Failed to load NVidia Encoder library: " + std::string(dlerror()));
#endif
auto nvEncodeAPICreateInstance = (PNVENCODEAPICREATEINSTANCE) LOAD_FUNC(nvEncLib, "NvEncodeAPICreateInstance");
version = NV_ENCODE_API_FUNCTION_LIST_VER;
NVENCSTATUS encStat = nvEncodeAPICreateInstance(static_cast<NV_ENCODE_API_FUNCTION_LIST *>(this));
}
void cleanup() {
#if defined(_WIN32)
if (nvEncLib != NULL) {
FreeLibrary(nvEncLib);
nvEncLib = NULL;
}
#else
if (nvEncLib != nullptr) {
dlclose(nvEncLib);
nvEncLib = nullptr;
}
#endif
}
private:
#if defined(_WIN32)
HMODULE nvEncLib;
#else
void* nvEncLib;
#endif
bool init_done;
};
static NVEncAPI nvEncApi;
// Encoder class ----------------------------------------------------
class Encoder {
public:
typedef unsigned int uint_t;
struct Size { uint_t w, h; };
Encoder() {
CUresult cuRes = cuInit(0);
nvEncApi.init();
}
void init(const Size & resolution, uint_t texture) {
NVENCSTATUS encStat;
CUresult cuRes;
texSize = resolution;
yuvTex = texture;
// Purely for information
int devCount = 0;
cuRes = cuDeviceGetCount(&devCount);
// Initialize NVEnc
initEncodeSession(); // start an encoding session
initEncoder();
// Register the YUV texture as a CUDA graphics resource
// CODE COMMENTED OUT AS THE INPUT TEXTURE IS NOT NEEDED YET (TO MY UNDERSTANDING) AT SETUP TIME
//cudaGraphicsGLRegisterImage(&priv->cudaInpTexRes, priv->yuvTex, GL_TEXTURE_2D, cudaGraphicsRegisterFlagsReadOnly);
// Allocate CUDA "pitched" memory to match the input texture (YUV, one byte per component)
encWidth = (texSize.w + 31) & ~31, encHeight = (texSize.h + 31) & ~31;
cuRes = cuMemAllocPitch(&cuDevPtr, &cuMemPitch, 4 * encWidth, encHeight, 16);
// Register the CUDA buffer as an input resource
NV_ENC_REGISTER_RESOURCE regResParams = { 0 };
regResParams.version = NV_ENC_REGISTER_RESOURCE_VER;
regResParams.resourceType = NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
regResParams.width = encWidth;
regResParams.height = encHeight;
regResParams.bufferFormat = NV_ENC_BUFFER_FORMAT_YUV444_PL;
regResParams.resourceToRegister = (void*) cuDevPtr;
regResParams.pitch = cuMemPitch;
encStat = nvEncApi.nvEncRegisterResource(nvEncoder, &regResParams);
assert(encStat == NV_ENC_SUCCESS); // THIS IS THE POINT OF FAILURE
nvEncInpRes = regResParams.registeredResource;
}
void cleanup() { /* OMITTED */ }
void encode() {
// THE FOLLOWING CODE WAS NEVER REACHED YET BECAUSE OF THE ISSUE.
// INCLUDED HERE FOR REFERENCE.
CUresult cuRes;
NVENCSTATUS encStat;
cuRes = cuGraphicsResourceSetMapFlags(cuInpTexRes, CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY);
cuRes = cuGraphicsMapResources(1, &cuInpTexRes, 0);
CUarray mappedArray;
cuRes = cuGraphicsSubResourceGetMappedArray(&mappedArray, cuInpTexRes, 0, 0);
cuRes = cuMemcpyDtoA(mappedArray, 0, cuDevPtr, 4 * encWidth * encHeight);
NV_ENC_MAP_INPUT_RESOURCE mapInputResParams = { 0 };
mapInputResParams.version = NV_ENC_MAP_INPUT_RESOURCE_VER;
mapInputResParams.registeredResource = nvEncInpRes;
encStat = nvEncApi.nvEncMapInputResource(nvEncoder, &mapInputResParams);
// TODO: encode...
cuRes = cuGraphicsUnmapResources(1, &cuInpTexRes, 0);
}
private:
struct PrivateData;
void initEncodeSession() {
CUresult cuRes;
NVENCSTATUS encStat;
// Pop the current context
cuRes = cuCtxPopCurrent(&cuOldCtx); // THIS IS ALLOWED TO FAIL (it doesn't
// Create a context for the device
cuCtx = nullptr;
cuRes = cuCtxCreate(&cuCtx, CU_CTX_SCHED_BLOCKING_SYNC, 0);
// Push our context
cuRes = cuCtxPushCurrent(cuCtx);
// Create an NV Encoder session
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS nvEncSessParams = { 0 };
nvEncSessParams.apiVersion = NVENCAPI_VERSION;
nvEncSessParams.version = NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS_VER;
nvEncSessParams.deviceType = NV_ENC_DEVICE_TYPE_CUDA;
nvEncSessParams.device = cuCtx;
encStat = nvEncApi.nvEncOpenEncodeSessionEx(&nvEncSessParams, &nvEncoder);
}
void Encoder::initEncoder()
{
NVENCSTATUS encStat;
// Configure the encoder via preset
NV_ENC_PRESET_CONFIG presetConfig = { 0 };
GUID codecGUID = NV_ENC_CODEC_H264_GUID;
GUID presetGUID = NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID;
presetConfig.version = NV_ENC_PRESET_CONFIG_VER;
presetConfig.presetCfg.version = NV_ENC_CONFIG_VER;
encStat = nvEncApi.nvEncGetEncodePresetConfig(nvEncoder, codecGUID, presetGUID, &presetConfig);
NV_ENC_INITIALIZE_PARAMS initParams = { 0 };
initParams.version = NV_ENC_INITIALIZE_PARAMS_VER;
initParams.encodeGUID = codecGUID;
initParams.encodeWidth = texSize.w;
initParams.encodeHeight = texSize.h;
initParams.darWidth = texSize.w;
initParams.darHeight = texSize.h;
initParams.frameRateNum = 25;
initParams.frameRateDen = 1;
initParams.enableEncodeAsync = 0;
initParams.enablePTD = 1;
initParams.presetGUID = presetGUID;
memcpy(&nvEncConfig, &presetConfig.presetCfg, sizeof(nvEncConfig));
initParams.encodeConfig = &nvEncConfig;
encStat = nvEncApi.nvEncInitializeEncoder(nvEncoder, &initParams);
}
//void cleanupEncodeSession();
//void cleanupEncoder;
Size texSize;
GLuint yuvTex;
uint_t encWidth, encHeight;
CUdeviceptr cuDevPtr;
size_t cuMemPitch;
NV_ENC_CONFIG nvEncConfig;
NV_ENC_INPUT_PTR nvEncInpBuf;
NV_ENC_REGISTERED_PTR nvEncInpRes;
CUdevice cuDevice;
CUcontext cuCtx, cuOldCtx;
void *nvEncoder;
CUgraphicsResource cuInpTexRes;
};
int main(int argc, char *argv[])
{
Encoder encoder;
encoder.init({1920, 1080}, 0); // OMITTED THE TEXTURE AS IT IS NOT NEEDED TO REPRODUCE THE ISSUE
return 0;
}

After comparing the NVidia sample NvEncoderCudaInterop with my minimal code, I finally found the item that makes the difference between success and failure: its the pitch parameter of the NV_ENC_REGISTER_RESOURCE structure passed to nvEncRegisterResource().
I haven't seen it documented anywhere, but there's a hard limit on that value, which I've determined experimentally to be at 2560. Anything above that will result in NV_ENC_ERR_RESOURCE_REGISTER_FAILED.
It does not appear to matter that the pitch I was passing was calculated by another API call, cuMemAllocPitch().
(Another thing that was missing from my code was "locking" and unlocking the CUDA context to the current thread via cuCtxPushCurrent() and cuCtxPopCurrent(). Done in the sample via a RAII class.)
EDIT:
I have worked around the problem by doing something for which I had another reason: using NV12 as input format for the encoder instead of YUV444.
With NV12, the pitch parameter drops below the 2560 limit because the byte size per row is equal to the width, so in my case 1920 bytes.
This was necessary (at the time) because my graphics card was a GTX 760 with a "Kepler" GPU, which (as I was initially unaware) only supports NV12 as input format for NVEnc. I have since upgraded to a GTX 970, but as I just found out, the 2560 limit is still there.
This makes me wonder just how exactly one is expected to use NVEnc with YUV444. The only possibility that comes to my mind is to use non-pitched memory, which seems bizarre. I'd appreciate comments from people who've actually used NVEnc with YUV444.
EDIT #2 - PENDING FURTHER UPDATE:
New information has surfaced in the form of another SO question: NVencs Output Bitstream is not readable
It is quite possible that my answer so far was wrong. It seems now that the pitch should not only be set when registering the CUDA resource, but also when actually sending it to the encoder via nvEncEncodePicture(). I cannot check this right now, but I will next time I work on that project.

Related

Vulkan RT BLAS build sometimes fail. No Validation Layer message

I have a problem when creating BLAS in Vulkan with Ray Tracing. Basically, not always, but often when I send the command "vkCmdBuildAccelerationStructuresKHR" via a commandBuffer in the Compute queue the VkDevice becomes VK_ERROR_DEVICE_LOST. The vkQueueSubmit returns VK_SUCCESS, but when I try to wait for the sent command to finish vkDeviceWaitIdle returns VK_ERROR_DEVICE_LOST. All the buffers used are allocated without errors and it is possible to obtain the address on the device. I also use the VMA (Vulkan Memory Management) library to manage the allocations. The buffers were created with the property VK_SHARING_MODE_EXCLUSIVE but are only used in the commandBuffer of the Compute queue. The real problem is that the validation layer does not give any error messages.
The code for creating vertex buffer:
VkBufferCreateInfo vkVertexBufferCreateInfo{};
vkVertexBufferCreateInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
vkVertexBufferCreateInfo.size = vertexSize;
vkVertexBufferCreateInfo.usage = VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_BUILD_INPUT_READ_ONLY_BIT_KHR
| VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT
| VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_STORAGE_BUFFER_BIT;
vkVertexBufferCreateInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
vkVertexBufferCreateInfo.pQueueFamilyIndices = VK_NULL_HANDLE;
vkVertexBufferCreateInfo.queueFamilyIndexCount = 0;
VmaAllocationCreateInfo vmaVertexBufferAllocationCreateInfo{};
vmaVertexBufferAllocationCreateInfo.flags = VMA_ALLOCATION_CREATE_STRATEGY_MIN_MEMORY_BIT;
vmaVertexBufferAllocationCreateInfo.requiredFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT;
The code for creating index buffer:
VkBufferCreateInfo vkIndexBufferCreateInfo{};
vkIndexBufferCreateInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
vkIndexBufferCreateInfo.size = faceSize;
vkIndexBufferCreateInfo.usage = VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_BUILD_INPUT_READ_ONLY_BIT_KHR
| VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT
| VK_BUFFER_USAGE_INDEX_BUFFER_BIT | VK_BUFFER_USAGE_STORAGE_BUFFER_BIT;
vkIndexBufferCreateInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
vkIndexBufferCreateInfo.pQueueFamilyIndices = VK_NULL_HANDLE;
vkIndexBufferCreateInfo.queueFamilyIndexCount = 0;
VmaAllocationCreateInfo vmaIndexBufferAllocationCreateInfo = {};
vmaIndexBufferAllocationCreateInfo.flags = VMA_ALLOCATION_CREATE_STRATEGY_MIN_MEMORY_BIT;
vmaIndexBufferAllocationCreateInfo.requiredFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT;
The code to create struct for geometry info:
// Query the 64-bit vertex/index buffer device address value through which buffer memory
// can be accessed in a shader
std::optional<VkDeviceAddress> vertexBufferAddress = geometry.getVertexBuffer().getBufferDeviceAddress();
if (vertexBufferAddress.has_value() == false)
{
OV_LOG_ERROR("Fail to retrive the device address of the vertex buffer. Probably geometry not uploaded.");
return false;
}
std::optional<VkDeviceAddress> faceBufferAddress = geometry.getFaceBuffer().getBufferDeviceAddress();
if (faceBufferAddress.has_value() == false)
{
OV_LOG_ERROR("Fail to retrive the device address of the face buffer. Probably geometry not uploaded.");
return false;
}
VkDeviceOrHostAddressConstKHR vertexDeviceOrHostAddressConst = {};
vertexDeviceOrHostAddressConst.deviceAddress = vertexBufferAddress.value();
VkDeviceOrHostAddressConstKHR faceDeviceOrHostAddressConst = {};
faceDeviceOrHostAddressConst.deviceAddress = faceBufferAddress.value();
// Structure specifying a triangle geometry in a bottom-level acceleration structure
VkAccelerationStructureGeometryTrianglesDataKHR accelerationStructureGeometryTrianglesData = {};
accelerationStructureGeometryTrianglesData.sType =
VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_TRIANGLES_DATA_KHR;
accelerationStructureGeometryTrianglesData.pNext = NULL;
// Vertex glm::vec3
accelerationStructureGeometryTrianglesData.vertexFormat = VK_FORMAT_R32G32B32_SFLOAT;
accelerationStructureGeometryTrianglesData.vertexData = vertexDeviceOrHostAddressConst;
// sizeof(float) * 3 => vertex
// sizeof(uint32_t) * 3 => normal / uv / tangent
accelerationStructureGeometryTrianglesData.vertexStride = sizeof(Ov::Geometry::VertexData);
// # vertices = vertex buffer size bytes / vertex stride
accelerationStructureGeometryTrianglesData.maxVertex = geometry.getNrOfVertices();
accelerationStructureGeometryTrianglesData.indexType = VK_INDEX_TYPE_UINT32;
accelerationStructureGeometryTrianglesData.indexData = faceDeviceOrHostAddressConst;
// transformData is a device or host address to memory containing an optional reference to
// a VkTransformMatrixKHR structure
accelerationStructureGeometryTrianglesData.transformData = transformData;
// Union specifying acceleration structure geometry data
VkAccelerationStructureGeometryDataKHR accelerationStructureGeometryData = {};
accelerationStructureGeometryData.triangles = accelerationStructureGeometryTrianglesData;
// Structure specifying geometries to be built into an acceleration structure
VkAccelerationStructureGeometryKHR& accelerationStructureGeometry = reserved->geometriesAS.emplace_back();
accelerationStructureGeometry.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_GEOMETRY_KHR;
accelerationStructureGeometry.pNext = NULL;
accelerationStructureGeometry.geometryType = VK_GEOMETRY_TYPE_TRIANGLES_KHR;
accelerationStructureGeometry.geometry = accelerationStructureGeometryData;
accelerationStructureGeometry.flags = geometryFlags;
// Structure specifying build offsets and counts for acceleration structure builds
VkAccelerationStructureBuildRangeInfoKHR& accelerationStructureBuildRangeInfoKHR = reserved->geometriesBuildRangeAS.emplace_back();
// primitiveCount defines the number of primitives for a corresponding acceleration structure geometry.
accelerationStructureBuildRangeInfoKHR.primitiveCount = geometry.getNrOfFaces();
accelerationStructureBuildRangeInfoKHR.primitiveOffset = 0;
accelerationStructureBuildRangeInfoKHR.firstVertex = 0;
accelerationStructureBuildRangeInfoKHR.transformOffset = 0;
Here is the code for building BLAS:
// Structure specifying the geometry data used to build an acceleration structure.
reserved->accelerationStructureBuildGeometryInfo.sType =
VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_BUILD_GEOMETRY_INFO_KHR;
reserved->accelerationStructureBuildGeometryInfo.pNext = NULL;
reserved->accelerationStructureBuildGeometryInfo.type = type;
reserved->accelerationStructureBuildGeometryInfo.flags = flags;
// VK_BUILD_ACCELERATION_STRUCTURE_MODE_BUILD_KHR => specifies that the destination acceleration
// structure will be built using the specified geometries.
reserved->accelerationStructureBuildGeometryInfo.mode = VK_BUILD_ACCELERATION_STRUCTURE_MODE_BUILD_KHR;
reserved->accelerationStructureBuildGeometryInfo.srcAccelerationStructure = VK_NULL_HANDLE;
reserved->accelerationStructureBuildGeometryInfo.dstAccelerationStructure = VK_NULL_HANDLE;
reserved->accelerationStructureBuildGeometryInfo.geometryCount = nrOfgeometriesStructuresAS;
// The index of each element of the pGeometries or ppGeometries members of VkAccelerationStructureBuildGeometryInfoKHR
// is used as the geometry index during ray traversal.The geometry index is available in ray shaders via the
// RayGeometryIndexKHR built - in, and is used to determine hitand intersection shaders executed
// during traversal.The geometry index is available to ray queries via the OpRayQueryGetIntersectionGeometryIndexKHR instruction.
reserved->accelerationStructureBuildGeometryInfo.pGeometries = geometriesStructuresAS.data();
reserved->accelerationStructureBuildGeometryInfo.ppGeometries = NULL;
reserved->accelerationStructureBuildGeometryInfo.scratchData = {};
// Structure specifying build sizes for an acceleration structure
VkAccelerationStructureBuildSizesInfoKHR accelerationStructureBuildSizesInfo = {};
accelerationStructureBuildSizesInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_BUILD_SIZES_INFO_KHR;
accelerationStructureBuildSizesInfo.pNext = NULL;
accelerationStructureBuildSizesInfo.accelerationStructureSize = 0;
accelerationStructureBuildSizesInfo.updateScratchSize = 0;
accelerationStructureBuildSizesInfo.buildScratchSize = 0;
// Retrieve the required size for an acceleration structure
// VK_ACCELERATION_STRUCTURE_BUILD_TYPE_DEVICE_KHR => requests the memory requirement for operations
// performed by the device.
PFN_vkGetAccelerationStructureBuildSizesKHR pvkGetAccelerationStructureBuildSizesKHR =
(PFN_vkGetAccelerationStructureBuildSizesKHR)vkGetDeviceProcAddr(logicalDevice.get().getVkDevice(), "vkGetAccelerationStructureBuildSizesKHR");
pvkGetAccelerationStructureBuildSizesKHR(logicalDevice.get().getVkDevice(),
VK_ACCELERATION_STRUCTURE_BUILD_TYPE_HOST_KHR,
&reserved->accelerationStructureBuildGeometryInfo,
&reserved->accelerationStructureBuildGeometryInfo.geometryCount,
&accelerationStructureBuildSizesInfo);
////////////////////
// Scratch buffer //
////////////////////
#pragma region ScratchBuffer
///////////////////////////
// Create scratch buffer //
///////////////////////////
// Create info buffer
VkBufferCreateInfo vkScratchBufferCreateInfo{};
vkScratchBufferCreateInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
// The second field of the struct is size, which specifies the size of the buffer in bytes
vkScratchBufferCreateInfo.size = accelerationStructureBuildSizesInfo.buildScratchSize;
// The third field is usage, which indicates for which purposes the data in the buffer
// is going to be used. It is possible to specify multiple purposes using a bitwise or.
// VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_BUILD_INPUT_READ_ONLY_BIT_KHR => specifies that the buffer is suitable for
// use as a read-only input to an acceleration structure build.
// VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT => specifies that the buffer can be used to retrieve a buffer device address
// via vkGetBufferDeviceAddress and use that address to access the buffer’s memory from a shader.
vkScratchBufferCreateInfo.usage = VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_BUILD_INPUT_READ_ONLY_BIT_KHR
| VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT | VK_BUFFER_USAGE_STORAGE_BUFFER_BIT;
// buffers can also be owned by a specific queue family or be shared between multiple
// at the same time.
// VK_SHARING_MODE_CONCURRENT specifies that concurrent access to any range or image subresource of the object
// from multiple queue families is supported.
vkScratchBufferCreateInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
// From which queue family the buffer will be accessed.
vkScratchBufferCreateInfo.pQueueFamilyIndices = NULL;
vkScratchBufferCreateInfo.queueFamilyIndexCount = 0;
// Create allocation info
VmaAllocationCreateInfo vmaScratchBufferAllocationCreateInfo = {};
// VMA_ALLOCATION_CREATE_MAPPED_BIT => Set this flag to use a memory that will be persistently
// mappedand retrieve pointer to it. It is valid to use this flag for allocation made from memory
// type that is not HOST_VISIBLE. This flag is then ignored and memory is not mapped. This is useful
// if you need an allocation that is efficient to use on GPU (DEVICE_LOCAL) and still want to map it
// directly if possible on platforms that support it (e.g. Intel GPU).
vmaScratchBufferAllocationCreateInfo.flags = VMA_ALLOCATION_CREATE_STRATEGY_MIN_MEMORY_BIT;
// Flags that must be set in a Memory Type chosen for an allocation.
vmaScratchBufferAllocationCreateInfo.requiredFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT;
if (!reserved->scratchBuffer.create(vkScratchBufferCreateInfo, vmaScratchBufferAllocationCreateInfo))
{
OV_LOG_ERROR("Fail creation scrath buffer for BLAS.");
this->free();
return false;
}
////////////////////////
// Set scratch buffer //
////////////////////////
std::optional<VkDeviceAddress> deviceAddress = reserved->scratchBuffer.getBufferDeviceAddress();
if (deviceAddress.has_value() == false)
{
OV_LOG_ERROR("Fail to retrieve the scratch buffer device address.");
this->free();
return false;
}
VkDeviceOrHostAddressKHR scratchDeviceOrHostAddress = {};
scratchDeviceOrHostAddress.deviceAddress = deviceAddress.value();
// ScratchData is the device or host address to memory that will be used as scratch memory for the build.
reserved->accelerationStructureBuildGeometryInfo.scratchData = scratchDeviceOrHostAddress;
#pragma endregion
/////////////////
// BLAS buffer //
/////////////////
#pragma region BLASBuffer
// Create BLASBuffer
// Create info buffer
VkBufferCreateInfo vkBLASBufferCreateInfo{};
vkBLASBufferCreateInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
// The second field of the struct is size, which specifies the size of the buffer in bytes
vkBLASBufferCreateInfo.size = accelerationStructureBuildSizesInfo.accelerationStructureSize;
// The third field is usage, which indicates for which purposes the data in the buffer
// is going to be used. It is possible to specify multiple purposes using a bitwise or.
// VK_BUFFER_USAGE_TRANSFER_SRC_BIT specifies that the buffer can be used as the source of a transfer command.
vkBLASBufferCreateInfo.usage = VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_STORAGE_BIT_KHR |
VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT;
// buffers can also be owned by a specific queue family or be shared between multiple
// at the same time.
// VK_SHARING_MODE_CONCURRENT specifies that concurrent access to any range or image subresource of the object
// from multiple queue families is supported.
vkBLASBufferCreateInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
// From which queue family the buffer will be accessed.
vkBLASBufferCreateInfo.pQueueFamilyIndices = NULL;
vkBLASBufferCreateInfo.queueFamilyIndexCount = 0;
// Create allocation info
VmaAllocationCreateInfo vmaBLASBufferAllocationCreateInfo = {};
// VMA_ALLOCATION_CREATE_STRATEGY_MIN_MEMORY_BIT => Allocation strategy that chooses smallest possible free range for the allocation
// to minimize memory usage and fragmentation, possibly at the expense of allocation time.
vmaBLASBufferAllocationCreateInfo.flags = VMA_ALLOCATION_CREATE_STRATEGY_MIN_MEMORY_BIT;
// Flags that must be set in a Memory Type chosen for an allocation.
vmaBLASBufferAllocationCreateInfo.requiredFlags = VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT;
if (!reserved->accelerationStructureBuffer.create(vkBLASBufferCreateInfo, vmaBLASBufferAllocationCreateInfo))
{
OV_LOG_ERROR("Fail creation scrath buffer for BLAS.");
this->free();
return false;
}
#pragma endregion
//////////////
// Build AS //
//////////////
#pragma region BuildAS
// Structure specifying the parameters of a newly created acceleration structure object
VkAccelerationStructureCreateInfoKHR accelerationStructureCreateInfo = {};
accelerationStructureCreateInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_CREATE_INFO_KHR;
accelerationStructureCreateInfo.pNext = NULL;
accelerationStructureCreateInfo.createFlags = 0;
accelerationStructureCreateInfo.buffer = reserved->accelerationStructureBuffer.getVkBuffer();
accelerationStructureCreateInfo.offset = 0;
accelerationStructureCreateInfo.size = accelerationStructureBuildSizesInfo.accelerationStructureSize;
accelerationStructureCreateInfo.type = type;
accelerationStructureCreateInfo.deviceAddress = 0;
// Create a new acceleration structure object
PFN_vkCreateAccelerationStructureKHR pvkCreateAccelerationStructureKHR =
(PFN_vkCreateAccelerationStructureKHR)vkGetDeviceProcAddr(logicalDevice.get().getVkDevice(), "vkCreateAccelerationStructureKHR");
if (pvkCreateAccelerationStructureKHR(logicalDevice.get().getVkDevice(), &accelerationStructureCreateInfo, NULL,
&reserved->accelerationStructure) != VK_SUCCESS)
{
OV_LOG_ERROR("Fail to create AS, id: %d.", this->Ov::Object::getId());
this->free();
return false;
}
// dstAccelerationStructure is a pointer to the target acceleration structure for the build.
reserved->accelerationStructureBuildGeometryInfo.dstAccelerationStructure = reserved->accelerationStructure;
After a good night's sleep I found the answer to the problem. Basically, the error is due to an incorrect parameter passed to the function pvkGetAccelerationStructureBuildSizesKHR(). As parameter pMaxPrimitiveCounts I was passing the number of geometries but it is necessary to pass the number of primitives for each geometry.
Vulkan spec:
If pBuildInfo->geometryCount is not 0, pMaxPrimitiveCounts must be a valid pointer to an array of pBuildInfo->geometryCount uint32_t values

Intel OneAPI Video decoding memory leak when using C++ CLI

I am trying to use Intel OneAPI/OneVPL to decode a stream I receive from an RTSP Camera in C#. But when I run the code I get an enormous memory leak. Around 1-200MB per run, which is around once every second.
When I've collected a GoP from the camera where I know the first data is a keyframe I pass it as a byte array to my CLI and C++ code.
Here I expect it to decode all the frames and return decoded images. It receives 30 frames and returns 16 decoded images but has a memory leak.
I've tried to use Visual Studio memory profiler and all I can tell from it is that its unmanaged memory that's my problem. I've tried to override the "new" and "delete" method inside videoHandler.cpp to track and compare all allocations and deallocations and as far as I can tell everything is handled correctly in there. I cannot see any classes that get instantiated that do not get cleaned up. I think my issue is in the CLI class videoHandlerWrapper.cpp. Am I missing something obvious?
videoHandlerWrapper.cpp
array<imgFrameWrapper^>^ videoHandlerWrapper::decode(array<System::Byte>^ byteArray)
{
array<imgFrameWrapper^>^ returnFrames = gcnew array<imgFrameWrapper^>(30);
{
std::vector<imgFrame> frames(30); //Output from decoding process. imgFrame implements a deconstructor that will rid the data when exiting scope
std::vector<unsigned char> bytes(byteArray->Length); //Input for decoding process
Marshal::Copy(byteArray, 0, IntPtr((unsigned char*)(&((bytes)[0]))), byteArray->Length); //Copy from managed (C#) to unmanaged (C++)
int status = _pVideoHandler->decode(bytes, frames); //Decode
for (size_t i = 0; i < frames.size(); i++)
{
if (frames[i].size > 0)
returnFrames[i] = gcnew imgFrameWrapper(frames[i].size, frames[i].bytes);
}
}
//PrintMemoryUsage();
return returnFrames;
}
videoHandler.cpp
#define BITSTREAM_BUFFER_SIZE 2000000 //TODO Maybe higher or lower bitstream buffer. Thorough testing has been done at 2000000
int videoHandler::decode(std::vector<unsigned char> bytes, std::vector<imgFrame> &frameData)
{
int result = -1;
bool isStillGoing = true;
mfxBitstream bitstream = { 0 };
mfxSession session = NULL;
mfxStatus sts = MFX_ERR_NONE;
mfxSurfaceArray* outSurfaces = nullptr;
mfxU32 framenum = 0;
mfxU32 numVPPCh = 0;
mfxVideoChannelParam* mfxVPPChParams = nullptr;
void* accelHandle = NULL;
mfxVideoParam mfxDecParams = {};
mfxVersion version = { 0, 1 };
//variables used only in 2.x version
mfxConfig cfg = NULL;
mfxLoader loader = NULL;
mfxVariant inCodec = {};
std::vector<mfxU8> input_buffer;
// Initialize VPL session for any implementation of HEVC/H265 decode
loader = MFXLoad();
VERIFY(NULL != loader, "MFXLoad failed -- is implementation in path?");
cfg = MFXCreateConfig(loader);
VERIFY(NULL != cfg, "MFXCreateConfig failed")
inCodec.Type = MFX_VARIANT_TYPE_U32;
inCodec.Data.U32 = MFX_CODEC_AVC;
sts = MFXSetConfigFilterProperty(
cfg,
(mfxU8*)"mfxImplDescription.mfxDecoderDescription.decoder.CodecID",
inCodec);
VERIFY(MFX_ERR_NONE == sts, "MFXSetConfigFilterProperty failed for decoder CodecID");
sts = MFXCreateSession(loader, 0, &session);
VERIFY(MFX_ERR_NONE == sts, "Not able to create VPL session");
// Print info about implementation loaded
version = ShowImplInfo(session);
//VERIFY(version.Major > 1, "Sample requires 2.x API implementation, exiting");
if (version.Major == 1) {
mfxVariant ImplValueSW;
ImplValueSW.Type = MFX_VARIANT_TYPE_U32;
ImplValueSW.Data.U32 = MFX_IMPL_TYPE_SOFTWARE;
MFXSetConfigFilterProperty(cfg, (mfxU8*)"mfxImplDescription.Impl", ImplValueSW);
sts = MFXCreateSession(loader, 0, &session);
VERIFY(MFX_ERR_NONE == sts, "Not able to create VPL session");
}
// Convenience function to initialize available accelerator(s)
accelHandle = InitAcceleratorHandle(session);
bitstream.MaxLength = BITSTREAM_BUFFER_SIZE;
bitstream.Data = (mfxU8*)calloc(bytes.size(), sizeof(mfxU8));
VERIFY(bitstream.Data, "Not able to allocate input buffer");
bitstream.CodecId = MFX_CODEC_AVC;
std::copy(bytes.begin(), bytes.end(), bitstream.Data);
bitstream.DataLength = static_cast<mfxU32>(bytes.size());
memset(&mfxDecParams, 0, sizeof(mfxDecParams));
mfxDecParams.mfx.CodecId = MFX_CODEC_AVC;
mfxDecParams.IOPattern = MFX_IOPATTERN_OUT_SYSTEM_MEMORY;
sts = MFXVideoDECODE_DecodeHeader(session, &bitstream, &mfxDecParams);
VERIFY(MFX_ERR_NONE == sts, "Error decoding header\n");
numVPPCh = 1;
mfxVPPChParams = new mfxVideoChannelParam[numVPPCh];
for (mfxU32 i = 0; i < numVPPCh; i++) {
mfxVPPChParams[i] = {};
}
//mfxVPPChParams[0].VPP.FourCC = mfxDecParams.mfx.FrameInfo.FourCC;
mfxVPPChParams[0].VPP.FourCC = MFX_FOURCC_BGRA;
mfxVPPChParams[0].VPP.ChromaFormat = MFX_CHROMAFORMAT_YUV420;
mfxVPPChParams[0].VPP.PicStruct = MFX_PICSTRUCT_PROGRESSIVE;
mfxVPPChParams[0].VPP.FrameRateExtN = 30;
mfxVPPChParams[0].VPP.FrameRateExtD = 1;
mfxVPPChParams[0].VPP.CropW = 1920;
mfxVPPChParams[0].VPP.CropH = 1080;
//Set value directly if input and output is the same.
mfxVPPChParams[0].VPP.Width = 1920;
mfxVPPChParams[0].VPP.Height = 1080;
//// USED TO RESIZE. IF INPUT IS THE SAME AS OUTPUT THIS WILL MAKE IT SHIFT A BIT. 1920x1080 becomes 1920x1088.
//mfxVPPChParams[0].VPP.Width = ALIGN16(mfxVPPChParams[0].VPP.CropW);
//mfxVPPChParams[0].VPP.Height = ALIGN16(mfxVPPChParams[0].VPP.CropH);
mfxVPPChParams[0].VPP.ChannelId = 1;
mfxVPPChParams[0].Protected = 0;
mfxVPPChParams[0].IOPattern = MFX_IOPATTERN_IN_SYSTEM_MEMORY | MFX_IOPATTERN_OUT_SYSTEM_MEMORY;
mfxVPPChParams[0].ExtParam = NULL;
mfxVPPChParams[0].NumExtParam = 0;
sts = MFXVideoDECODE_VPP_Init(session, &mfxDecParams, &mfxVPPChParams, numVPPCh); //This causes a MINOR memory leak!
outSurfaces = new mfxSurfaceArray;
while (isStillGoing == true) {
sts = MFXVideoDECODE_VPP_DecodeFrameAsync(session,
&bitstream,
NULL,
0,
&outSurfaces); //Big memory leak. 100MB pr run in the while loop.
switch (sts) {
case MFX_ERR_NONE:
// decode output
if (framenum >= 30)
{
isStillGoing = false;
break;
}
sts = WriteRawFrameToByte(outSurfaces->Surfaces[1], &frameData[framenum]);
VERIFY(MFX_ERR_NONE == sts, "Could not write 1st vpp output");
framenum++;
break;
case MFX_ERR_MORE_DATA:
// The function requires more bitstream at input before decoding can proceed
isStillGoing = false;
break;
case MFX_ERR_MORE_SURFACE:
// The function requires more frame surface at output before decoding can proceed.
// This applies to external memory allocations and should not be expected for
// a simple internal allocation case like this
break;
case MFX_ERR_DEVICE_LOST:
// For non-CPU implementations,
// Cleanup if device is lost
break;
case MFX_WRN_DEVICE_BUSY:
// For non-CPU implementations,
// Wait a few milliseconds then try again
break;
case MFX_WRN_VIDEO_PARAM_CHANGED:
// The decoder detected a new sequence header in the bitstream.
// Video parameters may have changed.
// In external memory allocation case, might need to reallocate the output surface
break;
case MFX_ERR_INCOMPATIBLE_VIDEO_PARAM:
// The function detected that video parameters provided by the application
// are incompatible with initialization parameters.
// The application should close the component and then reinitialize it
break;
case MFX_ERR_REALLOC_SURFACE:
// Bigger surface_work required. May be returned only if
// mfxInfoMFX::EnableReallocRequest was set to ON during initialization.
// This applies to external memory allocations and should not be expected for
// a simple internal allocation case like this
break;
default:
printf("unknown status %d\n", sts);
isStillGoing = false;
break;
}
}
sts = MFXVideoDECODE_VPP_Close(session); // Helps massively! Halves the memory leak speed. Closes internal structures and tables.
VERIFY(MFX_ERR_NONE == sts, "Error closing VPP session\n");
result = 0;
end:
printf("Decode and VPP processed %d frames\n", framenum);
// Clean up resources - It is recommended to close components first, before
// releasing allocated surfaces, since some surfaces may still be locked by
// internal resources.
if (mfxVPPChParams)
delete[] mfxVPPChParams;
if (outSurfaces)
delete outSurfaces;
if (bitstream.Data)
free(bitstream.Data);
if (accelHandle)
FreeAcceleratorHandle(accelHandle);
if (loader)
MFXUnload(loader);
return result;
}
imgFrameWrapper.h
public ref class imgFrameWrapper
{
private:
size_t size;
array<System::Byte>^ bytes;
public:
imgFrameWrapper(size_t u_size, unsigned char* u_bytes);
~imgFrameWrapper();
!imgFrameWrapper();
size_t get_size();
array<System::Byte>^ get_bytes();
};
imgFrameWrapper.cpp
imgFrameWrapper::imgFrameWrapper(size_t u_size, unsigned char* u_bytes)
{
size = u_size;
bytes = gcnew array<System::Byte>(size);
Marshal::Copy((IntPtr)u_bytes, bytes, 0, size);
}
imgFrameWrapper::~imgFrameWrapper()
{
}
imgFrameWrapper::!imgFrameWrapper()
{
}
size_t imgFrameWrapper::get_size()
{
return size;
}
array<System::Byte>^ imgFrameWrapper::get_bytes()
{
return bytes;
}
imgFrame.h
struct imgFrame
{
int size;
unsigned char* bytes;
~imgFrame()
{
if (bytes)
delete[] bytes;
}
};
MFXVideoDECODE_VPP_DecodeFrameAsync() function creates internal memory surfaces for the processing.
You should release surfaces.
Please check this link it's mentioning about it.
https://spec.oneapi.com/onevpl/latest/API_ref/VPL_structs_decode_vpp.html#_CPPv415mfxSurfaceArray
mfxStatus (*Release)(struct mfxSurfaceArray *surface_array)¶
Decrements the internal reference counter of the surface. (*Release) should be
called after using the (*AddRef) function to add a surface or when allocation
logic requires it.
And please check this sample.
https://github.com/oneapi-src/oneVPL/blob/master/examples/hello-decvpp/src/hello-decvpp.cpp
Especially, WriteRawFrame_InternalMem() function in https://github.com/oneapi-src/oneVPL/blob/17968d8d2299352f5a9e09388d24e81064c81c87/examples/util/util/util.h
It shows how to release surfaces.

vkCreateSwapchainKHR: internal drawable creation failed

I had my Vulkan application working but for some reason it stopped working(I don't believe I touched anything that could have broken it, besides making my engine project a .lib instead of a .dll) and started giving the "vkCreateSwapchainKHR: internal drawable creation failed" error in the validation layers. vkCreateSwapchainKHR returns VK_ERROR_VALIDATION_FAILED_EXT.
I already checked this answer:
Wat does the "vkCreateSwapchainKHR:internal drawable creation failed." means, but it was not my problem, (as I said, it was working until it wasn't). Here's all the code I believe is relevant, if you need something else just comment:
Window Creation:
/* Initialize the library */
if (!glfwInit())
/* Create a windowed mode window and its OpenGL context */
glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
glfwWindowHint(GLFW_DECORATED, _WCI.IsDecorated);
GLFWWindow = glfwCreateWindow(Extent.Width, Extent.Height, _WCI.Name.c_str(), nullptr, nullptr);
if (!GLFWWindow) glfwTerminate();
//glfwMakeContextCurrent(GLFWWindow);
uint32 Count = 0;
auto ff = glfwGetRequiredInstanceExtensions(&Count);
GS_BASIC_LOG_MESSAGE("GLFW required extensions:")
for (uint8 i = 0; i < Count; ++i)
{
GS_BASIC_LOG_MESSAGE("%d: %s", i, ff[i]);
}
WindowObject = glfwGetWin32Window(GLFWWindow);
WindowInstance = GetModuleHandle(nullptr);
I'm using the correct instance extensions:
const char* Extensions[] = { VK_KHR_SURFACE_EXTENSION_NAME, VK_KHR_WIN32_SURFACE_EXTENSION_NAME, VK_EXT_DEBUG_UTILS_EXTENSION_NAME };
VKSwapchainCreator VulkanRenderContext::CreateSwapchain(VKDevice* _Device, VkSwapchainKHR _OldSwapchain) const
{
VkSwapchainCreateInfoKHR SwapchainCreateInfo = { VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR };
SwapchainCreateInfo.surface = Surface.GetHandle();
SwapchainCreateInfo.minImageCount = 3;
SwapchainCreateInfo.imageFormat = Format.format;
SwapchainCreateInfo.imageColorSpace = Format.colorSpace;
SwapchainCreateInfo.imageExtent = Extent2DToVkExtent2D(RenderExtent);
//The imageArrayLayers specifies the amount of layers each image consists of. This is always 1 unless you are developing a stereoscopic 3D application.
SwapchainCreateInfo.imageArrayLayers = 1;
SwapchainCreateInfo.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
SwapchainCreateInfo.imageSharingMode = VK_SHARING_MODE_EXCLUSIVE;
SwapchainCreateInfo.queueFamilyIndexCount = 0;
SwapchainCreateInfo.pQueueFamilyIndices = nullptr;
SwapchainCreateInfo.preTransform = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
SwapchainCreateInfo.compositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR;
SwapchainCreateInfo.presentMode = PresentMode;
SwapchainCreateInfo.clipped = VK_TRUE;
SwapchainCreateInfo.oldSwapchain = _OldSwapchain;
return VKSwapchainCreator(_Device, &SwapchainCreateInfo);
}
Both VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR and VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR are supported by my GPU.
VkBool32 Supports = 0;
vkGetPhysicalDeviceSurfaceSupportKHR(_PD, PresentationQueue.GetQueueIndex(), _Surface, &Supports);
VkSurfaceCapabilitiesKHR SurfaceCapabilities = {};
vkGetPhysicalDeviceSurfaceCapabilitiesKHR(_PD, _Surface, &SurfaceCapabilities);
VkBool32 Supported = 0;
vkGetPhysicalDeviceSurfaceSupportKHR(_PD, PresentationQueue.GetQueueIndex(), _Surface, &Supported);
auto bb = vkGetPhysicalDeviceWin32PresentationSupportKHR(_PD, PresentationQueue.GetQueueIndex());
Everything here returns true, although it seemed suspicious to me that VkSurfaceCapabilitiesKHR returned the same extent for currentExtent, minImageExtent and maxImageExtent.
/* Initialize the library */
if (!glfwInit())
/* Create a windowed mode window and its OpenGL context */
glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
glfwWindowHint(GLFW_DECORATED, _WCI.IsDecorated);
If this is your actual code, it means "if glfwInit() succeeds skip glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);". And as we know not calling the hint causes the error.
The if should perhaps not have negation inside. And (this not being Python) it should have scope {} to cover all the init code.
The solution was to remove the glfwInit call from inside the if and place it outside.
if (!glfwInit()) -> glfwInit()
No idea why, if someone knows why this could've have caused the problem I would love to know.

Audio distorted with VST plugin

I had to plug into a pre-existing software, managing ASIO audio streams, a simple VST host. Despite of lack of some documentation, I managed to do so however once I load the plugin I get a badly distorted audio signal back.
The VST I'm using works properly (with other VST Hosts) so it's probably some kind of bug in the code I made, however when I disable the "PROCESS" from the plugin (my stream goes through the plugin, it simply does not get processed) it gets back as I sent without any noise or distortion on it.
One thing I'm slightly concerned about is the type of the data used as the ASIO driver fills an __int32 buffer while the plugins wants some float buffer.
That's really depressing as I reviewed zillions of times my code and it seems to be fine.
Here is the code of the class I'm using; please note that some numbers are temporarily hard-coded to help debugging.
VSTPlugIn::VSTPlugIn(const char* fullDirectoryName, const char* ID)
: plugin(NULL)
, blocksize(128) // TODO
, sampleRate(44100.0F) // TODO
, hostID(ID)
{
this->LoadPlugin(fullDirectoryName);
this->ConfigurePluginCallbacks();
this->StartPlugin();
out = new float*[2];
for (int i = 0; i < 2; ++i)
{
out[i] = new float[128];
memset(out[i], 0, 128);
}
}
void VSTPlugIn::LoadPlugin(const char* path)
{
HMODULE modulePtr = LoadLibrary(path);
if(modulePtr == NULL)
{
printf("Failed trying to load VST from '%s', error %d\n", path, GetLastError());
plugin = NULL;
}
// vst 2.4 export name
vstPluginFuncPtr mainEntryPoint = (vstPluginFuncPtr)GetProcAddress(modulePtr, "VSTPluginMain");
// if "VSTPluginMain" was not found, search for "main" (backwards compatibility mode)
if(!mainEntryPoint)
{
mainEntryPoint = (vstPluginFuncPtr)GetProcAddress(modulePtr, "main");
}
// Instantiate the plugin
plugin = mainEntryPoint(hostCallback);
}
void VSTPlugIn::ConfigurePluginCallbacks()
{
// Check plugin's magic number
// If incorrect, then the file either was not loaded properly, is not a
// real VST plugin, or is otherwise corrupt.
if(plugin->magic != kEffectMagic)
{
printf("Plugin's magic number is bad. Plugin will be discarded\n");
plugin = NULL;
}
// Create dispatcher handle
this->dispatcher = (dispatcherFuncPtr)(plugin->dispatcher);
// Set up plugin callback functions
plugin->getParameter = (getParameterFuncPtr)plugin->getParameter;
plugin->processReplacing = (processFuncPtr)plugin->processReplacing;
plugin->setParameter = (setParameterFuncPtr)plugin->setParameter;
}
void VSTPlugIn::StartPlugin()
{
// Set some default properties
dispatcher(plugin, effOpen, 0, 0, NULL, 0);
dispatcher(plugin, effSetSampleRate, 0, 0, NULL, sampleRate);
dispatcher(plugin, effSetBlockSize, 0, blocksize, NULL, 0.0f);
this->ResumePlugin();
}
void VSTPlugIn::ResumePlugin()
{
dispatcher(plugin, effMainsChanged, 0, 1, NULL, 0.0f);
}
void VSTPlugIn::SuspendPlugin()
{
dispatcher(plugin, effMainsChanged, 0, 0, NULL, 0.0f);
}
void VSTPlugIn::ProcessAudio(float** inputs, float** outputs, long numFrames)
{
plugin->processReplacing(plugin, inputs, out, 128);
memcpy(outputs, out, sizeof(float) * 128);
}
EDIT: Here's the code I use to interface my sw with the VST Host
// Copying the outer buffer in the inner container
for(unsigned i = 0; i < bufferLenght; i++)
{
float f;
f = ((float) buff[i]) / (float) std::numeric_limits<int>::max()
if( f > 1 ) f = 1;
if( f < -1 ) f = -1;
samples[0][i] = f;
}
// DO JOB
for(auto it = inserts.begin(); it != inserts.end(); ++it)
{
(*it)->ProcessAudio(samples, samples, bufferLenght);
}
// Copying the result back into the buffer
for(unsigned i = 0; i < bufferLenght; i++)
{
float f = samples[0][i];
int intval;
f = f * std::numeric_limits<int>::max();
if( f > std::numeric_limits<int>::max() ) f = std::numeric_limits<int>::max();
if( f < std::numeric_limits<int>::min() ) f = std::numeric_limits<int>::min();
intval = (int) f;
buff[i] = intval;
}
where "buff" is defined as "__int32* buff"
I'm guessing that when you call f = std::numeric_limits<int>::max() (and the related min() case on the line below), this might cause overflow. Have you tried f = std::numeric_limits<int>::max() - 1?
Same goes for the code snippit above with f = ((float) buff[i]) / (float) std::numeric_limits<int>::max()... I'd also subtract one there to avoid a potential overflow later on.

WinRT C++ (Win10) Accessing bytes from SoftwareBitmap / BitmapBuffer

To process my previewFrames of my camera in OpenCV, I need access to the raw Pixel data / bytes. So, there is the new SoftwareBitmap, which should exactly provide this.
There is an example for c#, but in visual c++ I can't get the IMemoryBufferByteAccess (see remarks) Interface working.
Code with Exceptions:
// Capture the preview frame
return create_task(_mediaCapture->GetPreviewFrameAsync(videoFrame))
.then([this](VideoFrame^ currentFrame)
{
// Collect the resulting frame
auto previewFrame = currentFrame->SoftwareBitmap;
auto buffer = previewFrame->LockBuffer(Windows::Graphics::Imaging::BitmapBufferAccessMode::ReadWrite);
auto reference = buffer->CreateReference();
// Get a pointer to the pixel buffer
byte* pData = nullptr;
UINT capacity = 0;
// Obtain ByteAccess
ComPtr<IUnknown> inspectable = reinterpret_cast<IUnknown*>(buffer);
// Query the IBufferByteAccess interface.
Microsoft::WRL::ComPtr<IMemoryBufferByteAccess> bufferByteAccess;
ThrowIfFailed(inspectable.As(&bufferByteAccess)); // ERROR ---> Throws HRESULT = E_NOINTERFACE
// Retrieve the buffer data.
ThrowIfFailed(bufferByteAccess->GetBuffer(_Out_ &pData, _Out_ &capacity)); // ERROR ---> Throws HRESULT = E_NOINTERFACE, because bufferByteAccess is null
I tried this too:
HRESULT hr = ((IMemoryBufferByteAccess*)reference)->GetBuffer(&pData, &capacity);
HRESULT is ok, but I can't access pData -> Access Violation reading Memory.
Thanks for your help.
You should use reference instead of buffer in reinterpret_cast.
#include "pch.h"
#include <wrl\wrappers\corewrappers.h>
#include <wrl\client.h>
MIDL_INTERFACE("5b0d3235-4dba-4d44-865e-8f1d0e4fd04d")
IMemoryBufferByteAccess : IUnknown
{
virtual HRESULT STDMETHODCALLTYPE GetBuffer(
BYTE **value,
UINT32 *capacity
);
};
auto previewFrame = currentFrame->SoftwareBitmap;
auto buffer = previewFrame->LockBuffer(BitmapBufferAccessMode::ReadWrite);
auto reference = buffer->CreateReference();
ComPtr<IMemoryBufferByteAccess> bufferByteAccess;
HRESULT result = reinterpret_cast<IInspectable*>(reference)->QueryInterface(IID_PPV_ARGS(&bufferByteAccess));
if (result == S_OK)
{
WriteLine("Get interface successfully");
BYTE* data = nullptr;
UINT32 capacity = 0;
result = bufferByteAccess->GetBuffer(&data, &capacity);
if (result == S_OK)
{
WriteLine("get data access successfully, capacity: " + capacity);
}
}
Based on answer from #jeffrey-chen and example from #kennykerr, I've assembled a tiny bit cleaner solution:
#include <wrl/client.h>
// other includes, as required by your project
MIDL_INTERFACE("5b0d3235-4dba-4d44-865e-8f1d0e4fd04d")
IMemoryBufferByteAccess : ::IUnknown
{
virtual HRESULT __stdcall GetBuffer(BYTE **value, UINT32 *capacity) = 0;
};
// your code:
auto previewFrame = currentFrame->SoftwareBitmap;
auto buffer = previewFrame->LockBuffer(BitmapBufferAccessMode::ReadWrite);
auto bufferByteAccess= buffer->CreateReference().as<IMemoryBufferByteAccess>();
WriteLine("Get interface successfully"); // otherwise - exception is thrown
BYTE* data = nullptr;
UINT32 capacity = 0;
winrt::check_hresult(bufferByteAccess->GetBuffer(&data, &capacity));
WriteLine("get data access successfully, capacity: " + capacity);
I'm currently accessing the raw unsigned char* data from each frame I obtain on a MediaFrameReader::FrameArrived event without using WRL and COM...
Here it is how:
void MainPage::OnFrameArrived(MediaFrameReader ^reader, MediaFrameArrivedEventArgs ^args)
{
MediaFrameReference ^mfr = reader->TryAcquireLatestFrame();
VideoMediaFrame ^vmf = mfr->VideoMediaFrame;
VideoFrame ^vf = vmf->GetVideoFrame();
SoftwareBitmap ^sb = vf->SoftwareBitmap;
Buffer ^buff = ref new Buffer(sb->PixelHeight * sb->PixelWidth * 2);
sb->CopyToBuffer(buff);
DataReader ^dataReader = DataReader::FromBuffer(buffer);
Platform::Array<unsigned char, 1> ^arr = ref new Platform::Array<unsigned char, 1>(buffer->Length);
dataReader->ReadBytes(arr);
// here arr->Data is a pointer to the raw pixel data
}
NOTE: The MediaCapture object needs to be configured with MediaCaptureMemoryPreference::Cpu in order to have a valid SoftwareBitmap
Hope the above helps someone