vkCreateSwapchainKHR: internal drawable creation failed - c++

I had my Vulkan application working but for some reason it stopped working(I don't believe I touched anything that could have broken it, besides making my engine project a .lib instead of a .dll) and started giving the "vkCreateSwapchainKHR: internal drawable creation failed" error in the validation layers. vkCreateSwapchainKHR returns VK_ERROR_VALIDATION_FAILED_EXT.
I already checked this answer:
Wat does the "vkCreateSwapchainKHR:internal drawable creation failed." means, but it was not my problem, (as I said, it was working until it wasn't). Here's all the code I believe is relevant, if you need something else just comment:
Window Creation:
/* Initialize the library */
if (!glfwInit())
/* Create a windowed mode window and its OpenGL context */
glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
glfwWindowHint(GLFW_DECORATED, _WCI.IsDecorated);
GLFWWindow = glfwCreateWindow(Extent.Width, Extent.Height, _WCI.Name.c_str(), nullptr, nullptr);
if (!GLFWWindow) glfwTerminate();
//glfwMakeContextCurrent(GLFWWindow);
uint32 Count = 0;
auto ff = glfwGetRequiredInstanceExtensions(&Count);
GS_BASIC_LOG_MESSAGE("GLFW required extensions:")
for (uint8 i = 0; i < Count; ++i)
{
GS_BASIC_LOG_MESSAGE("%d: %s", i, ff[i]);
}
WindowObject = glfwGetWin32Window(GLFWWindow);
WindowInstance = GetModuleHandle(nullptr);
I'm using the correct instance extensions:
const char* Extensions[] = { VK_KHR_SURFACE_EXTENSION_NAME, VK_KHR_WIN32_SURFACE_EXTENSION_NAME, VK_EXT_DEBUG_UTILS_EXTENSION_NAME };
VKSwapchainCreator VulkanRenderContext::CreateSwapchain(VKDevice* _Device, VkSwapchainKHR _OldSwapchain) const
{
VkSwapchainCreateInfoKHR SwapchainCreateInfo = { VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR };
SwapchainCreateInfo.surface = Surface.GetHandle();
SwapchainCreateInfo.minImageCount = 3;
SwapchainCreateInfo.imageFormat = Format.format;
SwapchainCreateInfo.imageColorSpace = Format.colorSpace;
SwapchainCreateInfo.imageExtent = Extent2DToVkExtent2D(RenderExtent);
//The imageArrayLayers specifies the amount of layers each image consists of. This is always 1 unless you are developing a stereoscopic 3D application.
SwapchainCreateInfo.imageArrayLayers = 1;
SwapchainCreateInfo.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
SwapchainCreateInfo.imageSharingMode = VK_SHARING_MODE_EXCLUSIVE;
SwapchainCreateInfo.queueFamilyIndexCount = 0;
SwapchainCreateInfo.pQueueFamilyIndices = nullptr;
SwapchainCreateInfo.preTransform = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
SwapchainCreateInfo.compositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR;
SwapchainCreateInfo.presentMode = PresentMode;
SwapchainCreateInfo.clipped = VK_TRUE;
SwapchainCreateInfo.oldSwapchain = _OldSwapchain;
return VKSwapchainCreator(_Device, &SwapchainCreateInfo);
}
Both VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR and VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR are supported by my GPU.
VkBool32 Supports = 0;
vkGetPhysicalDeviceSurfaceSupportKHR(_PD, PresentationQueue.GetQueueIndex(), _Surface, &Supports);
VkSurfaceCapabilitiesKHR SurfaceCapabilities = {};
vkGetPhysicalDeviceSurfaceCapabilitiesKHR(_PD, _Surface, &SurfaceCapabilities);
VkBool32 Supported = 0;
vkGetPhysicalDeviceSurfaceSupportKHR(_PD, PresentationQueue.GetQueueIndex(), _Surface, &Supported);
auto bb = vkGetPhysicalDeviceWin32PresentationSupportKHR(_PD, PresentationQueue.GetQueueIndex());
Everything here returns true, although it seemed suspicious to me that VkSurfaceCapabilitiesKHR returned the same extent for currentExtent, minImageExtent and maxImageExtent.

/* Initialize the library */
if (!glfwInit())
/* Create a windowed mode window and its OpenGL context */
glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
glfwWindowHint(GLFW_DECORATED, _WCI.IsDecorated);
If this is your actual code, it means "if glfwInit() succeeds skip glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);". And as we know not calling the hint causes the error.
The if should perhaps not have negation inside. And (this not being Python) it should have scope {} to cover all the init code.

The solution was to remove the glfwInit call from inside the if and place it outside.
if (!glfwInit()) -> glfwInit()
No idea why, if someone knows why this could've have caused the problem I would love to know.

Related

Dear IMGUI and DirectX 12 Overlay (DXGI_ERROR_INVALID_CALL)

I'm trying to make a simple frame counter for DirectX 12 games using Dear IMGUI. I simply want to overlay a small transparent window that displays the sequential order of frames during gameplay. To do so, I hook Present(), so I can get the SwapChain, and count the number of times the method is called (frame counting). THIS IS NOT FOR A CHEAT. I am not writing cheats for games, I simply want to record frame numbers for analytical purposes.
I have successfully done this for DirectX 11 using the ShowExampleAppSimpleOverlay() example provided in here: https://github.com/ocornut/imgui/blob/master/imgui_demo.cpp
Here is an image sample showing the frame counter in a DX 11 game.
I'm now trying to do the same with DirectX 12. Hooking the Present() is not an issue.
Using example code provided here: https://github.com/ocornut/imgui/blob/master/examples/example_win32_directx12/main.cpp
I attempt to use the ShowExampleAppSimpleOverlay() method again, however in my code on the call to d3d12CommandQueue->ExecuteCommandLists(1, (ID3D12CommandList* const*)&d3d12CommandList); (to render the overlay) it results in an error saying (0x887A0001: DXGI_ERROR_INVALID_CALL). This is the last line of code in the code sample provided below:
I'm not sure how to proceed. Any thoughts?
Edit: I forgot to mention that I'm also hooking and acquiring the games command que. So d3d12CommandQueue is acquired directly from the game. It doesn't return NULL so I'm assuming it is the correct object. I could be wrong though...
For each call to Present() do the following:
//iterate frame
Frame_Number = Frame_Number + 1;
//Get Device, using IDXGISwapChain3
ID3D12Device* device;
HRESULT gd = pSwapChain->GetDevice(__uuidof(ID3D12Device), (void**)&device);
assert(gd == S_OK);
//Get window handle from swapchain for IMGUI
DXGI_SWAP_CHAIN_DESC sd;
pSwapChain->GetDesc(&sd);
window = sd.OutputWindow;
//Get backbuffers
buffersCounts = sd.BufferCount;
frameContext = new FrameContext[buffersCounts];
D3D12_DESCRIPTOR_HEAP_DESC descriptorImGuiRender = {};
descriptorImGuiRender.Type = D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV;
descriptorImGuiRender.NumDescriptors = buffersCounts;
descriptorImGuiRender.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE;
// Create Descriptor Heap IMGUI render
if (device->CreateDescriptorHeap(&descriptorImGuiRender, IID_PPV_ARGS(&d3d12DescriptorHeapImGuiRender)) != S_OK)
return false;
//Create Command Allocator
ID3D12CommandAllocator* allocator;
if (device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&allocator)) != S_OK)
return false;
for (size_t i = 0; i < buffersCounts; i++) {
frameContext[i].commandAllocator = allocator;
}
if (device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT, allocator, NULL, IID_PPV_ARGS(&d3d12CommandList)) != S_OK ||
d3d12CommandList->Close() != S_OK)
return false;
//create descriptor heap, describe and create a render target view (RTV) descriptor heap.
D3D12_DESCRIPTOR_HEAP_DESC descriptorBackBuffers;
descriptorBackBuffers.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV;
descriptorBackBuffers.NumDescriptors = buffersCounts;
descriptorBackBuffers.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE;
descriptorBackBuffers.NodeMask = 1;
if (device->CreateDescriptorHeap(&descriptorBackBuffers, IID_PPV_ARGS(&d3d12DescriptorHeapBackBuffers)) != S_OK)
return false;
const auto rtvDescriptorSize = device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_RTV);
// Create frame resources.
D3D12_CPU_DESCRIPTOR_HANDLE rtvHandle = d3d12DescriptorHeapBackBuffers->GetCPUDescriptorHandleForHeapStart();
// Create a RTV for each frame.
for (size_t i = 0; i < buffersCounts; i++) {
ID3D12Resource* pBackBuffer = nullptr;
frameContext[i].main_render_target_descriptor = rtvHandle;
pSwapChain->GetBuffer(i, IID_PPV_ARGS(&pBackBuffer));
device->CreateRenderTargetView(pBackBuffer, nullptr, rtvHandle);
frameContext[i].main_render_target_resource = pBackBuffer;
rtvHandle.ptr += rtvDescriptorSize;
}
// Setup Platform/Renderer bindings dor IMGUI
ImGui_ImplWin32_Init(window);
ImGui_ImplDX12_Init(device, buffersCounts,
DXGI_FORMAT_R8G8B8A8_UNORM, d3d12DescriptorHeapImGuiRender,
d3d12DescriptorHeapImGuiRender->GetCPUDescriptorHandleForHeapStart(),
d3d12DescriptorHeapImGuiRender->GetGPUDescriptorHandleForHeapStart());
ImGui::GetIO().ImeWindowHandle = window;
// Start the Dear ImGui frame
ImGui_ImplDX12_NewFrame();
ImGui_ImplWin32_NewFrame();
ImGui::NewFrame();
//call imgui menues here
bool bShow = true;
ShowExampleAppSimpleOverlay(&bShow);
// Rendering (imgui)
FrameContext& currentFrameContext = frameContext[pSwapChain->GetCurrentBackBufferIndex()];
currentFrameContext.commandAllocator->Reset();
D3D12_RESOURCE_BARRIER barrier;
barrier.Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
barrier.Flags = D3D12_RESOURCE_BARRIER_FLAG_NONE;
barrier.Transition.pResource = currentFrameContext.main_render_target_resource;
barrier.Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;
barrier.Transition.StateBefore = D3D12_RESOURCE_STATE_PRESENT;
barrier.Transition.StateAfter = D3D12_RESOURCE_STATE_RENDER_TARGET;
d3d12CommandList->Reset(currentFrameContext.commandAllocator, nullptr);
d3d12CommandList->ResourceBarrier(1, &barrier);
d3d12CommandList->OMSetRenderTargets(1, &currentFrameContext.main_render_target_descriptor, FALSE, nullptr);
d3d12CommandList->SetDescriptorHeaps(1, &d3d12DescriptorHeapImGuiRender);
ImGui::Render();
ImGui_ImplDX12_RenderDrawData(ImGui::GetDrawData(), d3d12CommandList);
barrier.Transition.StateBefore = D3D12_RESOURCE_STATE_RENDER_TARGET;
barrier.Transition.StateAfter = D3D12_RESOURCE_STATE_PRESENT;
d3d12CommandList->ResourceBarrier(1, &barrier);
d3d12CommandList->Close();
d3d12CommandQueue->ExecuteCommandLists(1, (ID3D12CommandList* const*)&d3d12CommandList);
DXGI_ERROR_INVALID_CALL tells that one command in the list is invalid but not which.
you need to use the d3d12 debug layer for runtime checks at command list creation.
The debug layer also tell you the reason why it is invalid.
see msdn for more info.
you can aktivate it with the following code, but it needs to be called before device creation
ID3D12Debug* debugInterface;
if (SUCCEEDED(D3D12GetDebugInterface(IID_PPV_ARGS(&debugInterface)))) {
debugInterface->EnableDebugLayer();
}

C++/libscreen fails to update visibility

I'm writing a GUI in C++ (qcc) for Neutrino 6.5.0 with QNX CAR 2.0 using the screen windowing system, following the documentation tutorial. No visible output is generated, and /dev/screen/{pid}/win-{n}/win-{n} has status = WIN_STATUS_INVISIBLE.
(Using {n} and {pid} to indicate substitutions of runtime/dynamic variables' values)
I've tried following the tutorial to the letter, but had the same results. In particular, the value of window_group_name (mainwindowgroup or veryuniquewgn) makes no difference; the former being the value suggested by the tutorial, and the latter my own.
I've constructed an MCVE:
#include <cstdlib> // for EXIT_ constants
#include <stdio.h> // for printf(), getchar()
#include <process.h> // for getpid()
#include <screen.h>
#define screen_err(desc) \
if(screen_res != 0) { \
printf("ERR screen: failed to %s\n", desc); \
return EXIT_FAILURE; \
}
int main_withwindow(screen_context_t scr_ctx, screen_window_t scr_win) {
static const char *window_group_name = "veryuniquewgn";
// Dummy to pass as ptr to enum values
int prop;
int screen_res;
screen_res = screen_create_window_group(scr_win, window_group_name);
screen_err("create window group");
prop = SCREEN_FORMAT_RGBA8888;
screen_res = screen_set_window_property_iv(scr_win, SCREEN_PROPERTY_FORMAT,
&prop);
screen_err("set window property: format -> RGBA");
prop = SCREEN_USAGE_NATIVE;
screen_res = screen_set_window_property_iv(scr_win, SCREEN_PROPERTY_USAGE,
&prop);
screen_err("set window property: usage -> native");
screen_res = screen_create_window_buffers(scr_win, 1);
screen_err("create window buffers");
int win_buf_rect[4] = { 0, 0 };
screen_res = screen_get_window_property_iv(scr_win,
SCREEN_PROPERTY_BUFFER_SIZE, win_buf_rect + 2);
screen_err("get window property: buffer_size");
// Array type to easily support multi-buffering in future
screen_buffer_t scr_buf[1];
screen_res = screen_get_window_property_pv(scr_win,
SCREEN_PROPERTY_RENDER_BUFFERS, (void **)scr_buf);
screen_err("get window property: render_buffers");
int bg[] = { SCREEN_BLIT_COLOR, 0xffffff00, SCREEN_BLIT_END };
screen_res = screen_fill(scr_ctx, scr_buf[0], bg);
screen_err("fill buffer with yellow");
screen_res = screen_post_window(scr_win, scr_buf[0], 1, win_buf_rect, 0);
screen_err("post window");
prop = 255;
screen_res = screen_set_window_property_iv(scr_win,
SCREEN_PROPERTY_ZORDER, &prop);
screen_err("set window property: zorder -> 255");
prop = 1;
screen_res = screen_set_window_property_iv(scr_win,
SCREEN_PROPERTY_VISIBLE, &prop);
screen_err("set window property: visible -> true");
screen_res = screen_flush_context(scr_ctx, SCREEN_WAIT_IDLE);
screen_err("flush context to idle");
getchar();
return EXIT_SUCCESS;
}
int main_withcontext(screen_context_t scr_ctx) {
screen_window_t scr_win;
int ret;
int screen_res;
screen_res = screen_create_window(&scr_win, scr_ctx);
screen_err("create window");
ret = main_withwindow(scr_ctx, scr_win);
screen_res = screen_destroy_window(scr_win);
screen_err("destroy window");
return ret;
}
int main(int argc, char *argv[]) {
printf("%d\n", getpid());
screen_context_t scr_ctx;
int ret;
int screen_res;
screen_res = screen_create_context(&scr_ctx, SCREEN_APPLICATION_CONTEXT);
screen_err("create context");
ret = main_withcontext(scr_ctx);
screen_res = screen_destroy_context(scr_ctx);
screen_err("destroy context");
return ret;
}
Given the use of the screen_err() define, no output - except the first printf for the pid - indicates no errors from screen_...() calls. That's exactly what I see when I run this.
Looking in /dev/screen/{pid}, I see a ctx-{n}/ctx-{n} and a win-{n}/win-{n} file, as expected. The former is not human-readable, but the latter, and its differences to its counterpart in a working nto 6.5.0 + CAR 2.0 + libscreen application, yield some insight.
The working application is an HMI for which I do not have the source, and is launched from /etc/ro as root, and mine is launched from ksh, logged in as root.
They're both 98-line files, so I've produced a diff - the win-{n} from the working application on the left, and mine on the right - excluding metrics:
(lines over 39 characters compared vertically)
autonomous = 0 autonomous = 1
status = WIN_STATUS_FULLSCREEN status = WIN_STATUS_INVISIBLE
id string = DPY_HMI id string =
insert id = 1 insert id = 3
reclip = 0 reclip = 1
flags = WIN_FLAG_VISIBLE WIN_FLAG_FLOATING
flags = WIN_FLAG_FLOATING
usage = SCREEN_USAGE_OPENGL_ES2 usage = SCREEN_USAGE_NATIVE
order = 240 order = 0
regions = (0,0;800,480) regions = (none)
clipped source viewport = (0,0;800,480 800x480)
clipped source viewport = (0,0;0,0 0x0)
clipped destination rectangle = (0,0;800,480 800x480)
clipped destination rectangle = (0,0;0,0 0x0)
transform = [[1 0 0],[0 1 0],[0 0 1]] transform = [[0 0 0],[0 0 0],[0 0 0]]
Of these differences, usage is the only one with the value I expect, considering that it reflects the parameter of the relevant call to screen_set_window_property_iv(...). For all the rest, especially regions and flags, I do not understand why their values differ from those of the working application.
The target's display's native resolution is 800x480.
As it turns out, that code was completely valid and correct. The reason for its failure was a window manager daemon I wasn't aware of suppressing the window. Turning this off solved the problem.

vkCreateSwapchainKHR Error

Even though validation layers and debug callback extensions are enabled and working (they respond to wrong structs etc.), I'm still getting a "VK_ERROR_VALIDATION_FAILED_EXT" result from vkCreateSwapchainKHR(), and there's no validation layer error to pinpoint the mistake...
Swap chain creation (using GTX 970 ) :
VkBool32 isSupported = false;
vkGetPhysicalDeviceSurfaceSupportKHR( physicalDevices[0], 0, surface, &isSupported);
if (!isSupported) {
std::cout << "*ERROR* This device doesn't support surfaces" << std::endl;
}
VkSurfaceCapabilitiesKHR surfCaps;
vkGetPhysicalDeviceSurfaceCapabilitiesKHR(physicalDevices[0], surface, &surfCaps);
std::vector<VkSurfaceFormatKHR> deviceFormats;
uint32_t formatCount;
vkGetPhysicalDeviceSurfaceFormatsKHR(physicalDevices[0], surface, &formatCount, nullptr);
deviceFormats.resize(formatCount);
vkGetPhysicalDeviceSurfaceFormatsKHR(physicalDevices[0], surface, &formatCount, deviceFormats.data());
swapChainInfo.sType = VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR;
swapChainInfo.pNext = nullptr;
swapChainInfo.flags = 0;
swapChainInfo.surface = surface;
swapChainInfo.minImageCount = surfCaps.minImageCount;
swapChainInfo.imageFormat = VK_FORMAT_B8G8R8A8_UNORM;
swapChainInfo.imageColorSpace = VK_COLOR_SPACE_SRGB_NONLINEAR_KHR;
swapChainInfo.imageExtent = surfCaps.currentExtent;
swapChainInfo.imageArrayLayers = 1;
swapChainInfo.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
swapChainInfo.imageSharingMode = VK_SHARING_MODE_EXCLUSIVE;
swapChainInfo.queueFamilyIndexCount = 0;
swapChainInfo.pQueueFamilyIndices = VK_NULL_HANDLE;
swapChainInfo.preTransform = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
swapChainInfo.compositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR;
swapChainInfo.presentMode = VK_PRESENT_MODE_FIFO_KHR;
swapChainInfo.clipped = VK_TRUE; // TODO : TEST clipping against another window
swapChainInfo.oldSwapchain = VK_NULL_HANDLE;
result = vkCreateSwapchainKHR( device, &swapChainInfo, nullptr, &swapChain );
if (result) {
std::cout << "*ERROR* Swapchain Creation Failed :" << result << std::endl;
}
Surface Creation using GLFW (Which doesn't return any error):
if (result = glfwCreateWindowSurface(instance, window, nullptr, &surface))
{
std::cout << "*ERROR* Surface Creation Failed : " << result << std::endl;
}
There are a huge number of reasons validation of vkCreateSwapchainKHR will fail (check out PreCallValidateCreateSwapchainKHR in core_validation.cpp).
There may not be enough code here to tell why it is failing, for example, the failure might be because of an invalid surface. But, to pinpoint the problem, it should give a failure message in the debug log, which will tell you exactly why. You should enable it by calling CreateDebugReportCallbackEXT, before trying to create the swapchain. This will also require you to enable the VK_EXT_debug_report extension. See here for details.
Just a few points about your code (potentially cause of problems):
You are not checking VkResult of all the vkGet* commands
You are not checking swapChainInfo.imageFormat against supported formats in your deviceFormats
You are not accounting for the situation that surfCaps.currentExtent can be 0xFFFFFFFF
swapChainInfo.pQueueFamilyIndices is a pointer not a handle; use nullptr
You are not checking swapChainInfo.preTransform against surfCaps.supportedTransforms
You are not checking swapChainInfo.compositeAlpha against surfCaps.supportedCompositeAlpha

Audio distorted with VST plugin

I had to plug into a pre-existing software, managing ASIO audio streams, a simple VST host. Despite of lack of some documentation, I managed to do so however once I load the plugin I get a badly distorted audio signal back.
The VST I'm using works properly (with other VST Hosts) so it's probably some kind of bug in the code I made, however when I disable the "PROCESS" from the plugin (my stream goes through the plugin, it simply does not get processed) it gets back as I sent without any noise or distortion on it.
One thing I'm slightly concerned about is the type of the data used as the ASIO driver fills an __int32 buffer while the plugins wants some float buffer.
That's really depressing as I reviewed zillions of times my code and it seems to be fine.
Here is the code of the class I'm using; please note that some numbers are temporarily hard-coded to help debugging.
VSTPlugIn::VSTPlugIn(const char* fullDirectoryName, const char* ID)
: plugin(NULL)
, blocksize(128) // TODO
, sampleRate(44100.0F) // TODO
, hostID(ID)
{
this->LoadPlugin(fullDirectoryName);
this->ConfigurePluginCallbacks();
this->StartPlugin();
out = new float*[2];
for (int i = 0; i < 2; ++i)
{
out[i] = new float[128];
memset(out[i], 0, 128);
}
}
void VSTPlugIn::LoadPlugin(const char* path)
{
HMODULE modulePtr = LoadLibrary(path);
if(modulePtr == NULL)
{
printf("Failed trying to load VST from '%s', error %d\n", path, GetLastError());
plugin = NULL;
}
// vst 2.4 export name
vstPluginFuncPtr mainEntryPoint = (vstPluginFuncPtr)GetProcAddress(modulePtr, "VSTPluginMain");
// if "VSTPluginMain" was not found, search for "main" (backwards compatibility mode)
if(!mainEntryPoint)
{
mainEntryPoint = (vstPluginFuncPtr)GetProcAddress(modulePtr, "main");
}
// Instantiate the plugin
plugin = mainEntryPoint(hostCallback);
}
void VSTPlugIn::ConfigurePluginCallbacks()
{
// Check plugin's magic number
// If incorrect, then the file either was not loaded properly, is not a
// real VST plugin, or is otherwise corrupt.
if(plugin->magic != kEffectMagic)
{
printf("Plugin's magic number is bad. Plugin will be discarded\n");
plugin = NULL;
}
// Create dispatcher handle
this->dispatcher = (dispatcherFuncPtr)(plugin->dispatcher);
// Set up plugin callback functions
plugin->getParameter = (getParameterFuncPtr)plugin->getParameter;
plugin->processReplacing = (processFuncPtr)plugin->processReplacing;
plugin->setParameter = (setParameterFuncPtr)plugin->setParameter;
}
void VSTPlugIn::StartPlugin()
{
// Set some default properties
dispatcher(plugin, effOpen, 0, 0, NULL, 0);
dispatcher(plugin, effSetSampleRate, 0, 0, NULL, sampleRate);
dispatcher(plugin, effSetBlockSize, 0, blocksize, NULL, 0.0f);
this->ResumePlugin();
}
void VSTPlugIn::ResumePlugin()
{
dispatcher(plugin, effMainsChanged, 0, 1, NULL, 0.0f);
}
void VSTPlugIn::SuspendPlugin()
{
dispatcher(plugin, effMainsChanged, 0, 0, NULL, 0.0f);
}
void VSTPlugIn::ProcessAudio(float** inputs, float** outputs, long numFrames)
{
plugin->processReplacing(plugin, inputs, out, 128);
memcpy(outputs, out, sizeof(float) * 128);
}
EDIT: Here's the code I use to interface my sw with the VST Host
// Copying the outer buffer in the inner container
for(unsigned i = 0; i < bufferLenght; i++)
{
float f;
f = ((float) buff[i]) / (float) std::numeric_limits<int>::max()
if( f > 1 ) f = 1;
if( f < -1 ) f = -1;
samples[0][i] = f;
}
// DO JOB
for(auto it = inserts.begin(); it != inserts.end(); ++it)
{
(*it)->ProcessAudio(samples, samples, bufferLenght);
}
// Copying the result back into the buffer
for(unsigned i = 0; i < bufferLenght; i++)
{
float f = samples[0][i];
int intval;
f = f * std::numeric_limits<int>::max();
if( f > std::numeric_limits<int>::max() ) f = std::numeric_limits<int>::max();
if( f < std::numeric_limits<int>::min() ) f = std::numeric_limits<int>::min();
intval = (int) f;
buff[i] = intval;
}
where "buff" is defined as "__int32* buff"
I'm guessing that when you call f = std::numeric_limits<int>::max() (and the related min() case on the line below), this might cause overflow. Have you tried f = std::numeric_limits<int>::max() - 1?
Same goes for the code snippit above with f = ((float) buff[i]) / (float) std::numeric_limits<int>::max()... I'd also subtract one there to avoid a potential overflow later on.

nvEncRegisterResource() fails with -23

I've hit a complete brick wall in my attempt to use NVEnc to stream OpenGL frames as H264. I've been at this particular issue for close to 8 hours without any progress.
The problem is the call to nvEncRegisterResource(), which invariably fails with code -23 (enum value NV_ENC_ERR_RESOURCE_REGISTER_FAILED, documented as "failed to register the resource" - thanks NVidia).
I'm trying to follow a procedure outlined in this document from the University of Oslo (page 54, "OpenGL interop"), so I know for a fact that this is supposed to work, though unfortunately said document does not provide the code itself.
The idea is fairly straightforward:
map the texture produced by the OpenGL frame buffer object into CUDA;
copy the texture into a (previously allocated) CUDA buffer;
map that buffer as an NVEnc input resource
use that input resource as the source for the encoding
As I said, the problem is step (3). Here are the relevant code snippets (I'm omitting error handling for brevity.)
// Round up width and height
priv->encWidth = (_resolution.w + 31) & ~31, priv->encHeight = (_resolution.h + 31) & ~31;
// Allocate CUDA "pitched" memory to match the input texture (YUV, one byte per component)
cuErr = cudaMallocPitch(&priv->cudaMemPtr, &priv->cudaMemPitch, 3 * priv->encWidth, priv->encHeight);
This should allocate on-device CUDA memory (the "pitched" variety, though I've tried non-pitched too, without any change in the outcome.)
// Register the CUDA buffer as an input resource
NV_ENC_REGISTER_RESOURCE regResParams = { 0 };
regResParams.version = NV_ENC_REGISTER_RESOURCE_VER;
regResParams.resourceType = NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
regResParams.width = priv->encWidth;
regResParams.height = priv->encHeight;
regResParams.bufferFormat = NV_ENC_BUFFER_FORMAT_YUV444_PL;
regResParams.resourceToRegister = priv->cudaMemPtr;
regResParams.pitch = priv->cudaMemPitch;
encStat = nvEncApi.nvEncRegisterResource(priv->nvEncoder, &regResParams);
// ^^^ FAILS
priv->nvEncInpRes = regResParams.registeredResource;
This is the brick wall. No matter what I try, nvEncRegisterResource() fails.
I should note that I rather think (though I may be wrong) that I've done all the required initializations. Here is the code that creates and activates the CUDA context:
// Pop the current context
cuRes = cuCtxPopCurrent(&priv->cuOldCtx);
// Create a context for the device
priv->cuCtx = nullptr;
cuRes = cuCtxCreate(&priv->cuCtx, CU_CTX_SCHED_BLOCKING_SYNC, priv->cudaDevice);
// Push our context
cuRes = cuCtxPushCurrent(priv->cuCtx);
.. followed by the creation of the encoding session:
// Create an NV Encoder session
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS nvEncSessParams = { 0 };
nvEncSessParams.apiVersion = NVENCAPI_VERSION;
nvEncSessParams.version = NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS_VER;
nvEncSessParams.deviceType = NV_ENC_DEVICE_TYPE_CUDA;
nvEncSessParams.device = priv->cuCtx; // nullptr
auto encStat = nvEncApi.nvEncOpenEncodeSessionEx(&nvEncSessParams, &priv->nvEncoder);
And finally, the code initializing the encoder:
// Configure the encoder via preset
NV_ENC_PRESET_CONFIG presetConfig = { 0 };
GUID codecGUID = NV_ENC_CODEC_H264_GUID;
GUID presetGUID = NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID;
presetConfig.version = NV_ENC_PRESET_CONFIG_VER;
presetConfig.presetCfg.version = NV_ENC_CONFIG_VER;
encStat = nvEncApi.nvEncGetEncodePresetConfig(priv->nvEncoder, codecGUID, presetGUID, &presetConfig);
NV_ENC_INITIALIZE_PARAMS initParams = { 0 };
initParams.version = NV_ENC_INITIALIZE_PARAMS_VER;
initParams.encodeGUID = codecGUID;
initParams.encodeWidth = priv->encWidth;
initParams.encodeHeight = priv->encHeight;
initParams.darWidth = 1;
initParams.darHeight = 1;
initParams.frameRateNum = 25; // TODO: make this configurable
initParams.frameRateDen = 1; // ditto
// .max_surface_count = (num_mbs >= 8160) ? 32 : 48;
// .buffer_delay ? necessary
initParams.enableEncodeAsync = 0;
initParams.enablePTD = 1;
initParams.presetGUID = presetGUID;
memcpy(&priv->nvEncConfig, &presetConfig.presetCfg, sizeof(priv->nvEncConfig));
initParams.encodeConfig = &priv->nvEncConfig;
encStat = nvEncApi.nvEncInitializeEncoder(priv->nvEncoder, &initParams);
All the above initializations report success.
I'd be extremely grateful to anyone who can get me past this hurdle.
EDIT: here is the complete code to reproduce the problem. The only observable difference to the original code is that cuPopContext() returns an error (which can be ignored) here - probably my original program creates such a context as a side effect of using OpenGL. Otherwise, the code behaves exactly as the original does.
I've built the code with Visual Studio 2013. You must link the following library file (adapt path if not on C:): C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\Win32\cuda.lib
You must also make sure that C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\ (or similar) is in the include path.
NEW EDIT: modified the code to only use the CUDA driver interface, instead of mixing with the runtime API. Still the same error code.
#ifdef _WIN32
#include <Windows.h>
#endif
#include <cassert>
#include <GL/gl.h>
#include <iostream>
#include <string>
#include <stdexcept>
#include <string>
#include <cuda.h>
//#include <cuda_runtime.h>
#include <cuda_gl_interop.h>
#include <nvEncodeAPI.h>
// NV Encoder API ---------------------------------------------------
#if defined(_WIN32)
#define LOAD_FUNC(l, s) GetProcAddress(l, s)
#define DL_CLOSE_FUNC(l) FreeLibrary(l)
#else
#define LOAD_FUNC(l, s) dlsym(l, s)
#define DL_CLOSE_FUNC(l) dlclose(l)
#endif
typedef NVENCSTATUS(NVENCAPI* PNVENCODEAPICREATEINSTANCE)(NV_ENCODE_API_FUNCTION_LIST *functionList);
struct NVEncAPI : public NV_ENCODE_API_FUNCTION_LIST {
public:
// ~NVEncAPI() { cleanup(); }
void init() {
#if defined(_WIN32)
if (sizeof(void*) == 8) {
nvEncLib = LoadLibrary(TEXT("nvEncodeAPI64.dll"));
}
else {
nvEncLib = LoadLibrary(TEXT("nvEncodeAPI.dll"));
}
if (nvEncLib == NULL) throw std::runtime_error("Failed to load NVidia Encoder library: " + std::to_string(GetLastError()));
#else
nvEncLib = dlopen("libnvidia-encode.so.1", RTLD_LAZY);
if (nvEncLib == nullptr)
throw std::runtime_error("Failed to load NVidia Encoder library: " + std::string(dlerror()));
#endif
auto nvEncodeAPICreateInstance = (PNVENCODEAPICREATEINSTANCE) LOAD_FUNC(nvEncLib, "NvEncodeAPICreateInstance");
version = NV_ENCODE_API_FUNCTION_LIST_VER;
NVENCSTATUS encStat = nvEncodeAPICreateInstance(static_cast<NV_ENCODE_API_FUNCTION_LIST *>(this));
}
void cleanup() {
#if defined(_WIN32)
if (nvEncLib != NULL) {
FreeLibrary(nvEncLib);
nvEncLib = NULL;
}
#else
if (nvEncLib != nullptr) {
dlclose(nvEncLib);
nvEncLib = nullptr;
}
#endif
}
private:
#if defined(_WIN32)
HMODULE nvEncLib;
#else
void* nvEncLib;
#endif
bool init_done;
};
static NVEncAPI nvEncApi;
// Encoder class ----------------------------------------------------
class Encoder {
public:
typedef unsigned int uint_t;
struct Size { uint_t w, h; };
Encoder() {
CUresult cuRes = cuInit(0);
nvEncApi.init();
}
void init(const Size & resolution, uint_t texture) {
NVENCSTATUS encStat;
CUresult cuRes;
texSize = resolution;
yuvTex = texture;
// Purely for information
int devCount = 0;
cuRes = cuDeviceGetCount(&devCount);
// Initialize NVEnc
initEncodeSession(); // start an encoding session
initEncoder();
// Register the YUV texture as a CUDA graphics resource
// CODE COMMENTED OUT AS THE INPUT TEXTURE IS NOT NEEDED YET (TO MY UNDERSTANDING) AT SETUP TIME
//cudaGraphicsGLRegisterImage(&priv->cudaInpTexRes, priv->yuvTex, GL_TEXTURE_2D, cudaGraphicsRegisterFlagsReadOnly);
// Allocate CUDA "pitched" memory to match the input texture (YUV, one byte per component)
encWidth = (texSize.w + 31) & ~31, encHeight = (texSize.h + 31) & ~31;
cuRes = cuMemAllocPitch(&cuDevPtr, &cuMemPitch, 4 * encWidth, encHeight, 16);
// Register the CUDA buffer as an input resource
NV_ENC_REGISTER_RESOURCE regResParams = { 0 };
regResParams.version = NV_ENC_REGISTER_RESOURCE_VER;
regResParams.resourceType = NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
regResParams.width = encWidth;
regResParams.height = encHeight;
regResParams.bufferFormat = NV_ENC_BUFFER_FORMAT_YUV444_PL;
regResParams.resourceToRegister = (void*) cuDevPtr;
regResParams.pitch = cuMemPitch;
encStat = nvEncApi.nvEncRegisterResource(nvEncoder, &regResParams);
assert(encStat == NV_ENC_SUCCESS); // THIS IS THE POINT OF FAILURE
nvEncInpRes = regResParams.registeredResource;
}
void cleanup() { /* OMITTED */ }
void encode() {
// THE FOLLOWING CODE WAS NEVER REACHED YET BECAUSE OF THE ISSUE.
// INCLUDED HERE FOR REFERENCE.
CUresult cuRes;
NVENCSTATUS encStat;
cuRes = cuGraphicsResourceSetMapFlags(cuInpTexRes, CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY);
cuRes = cuGraphicsMapResources(1, &cuInpTexRes, 0);
CUarray mappedArray;
cuRes = cuGraphicsSubResourceGetMappedArray(&mappedArray, cuInpTexRes, 0, 0);
cuRes = cuMemcpyDtoA(mappedArray, 0, cuDevPtr, 4 * encWidth * encHeight);
NV_ENC_MAP_INPUT_RESOURCE mapInputResParams = { 0 };
mapInputResParams.version = NV_ENC_MAP_INPUT_RESOURCE_VER;
mapInputResParams.registeredResource = nvEncInpRes;
encStat = nvEncApi.nvEncMapInputResource(nvEncoder, &mapInputResParams);
// TODO: encode...
cuRes = cuGraphicsUnmapResources(1, &cuInpTexRes, 0);
}
private:
struct PrivateData;
void initEncodeSession() {
CUresult cuRes;
NVENCSTATUS encStat;
// Pop the current context
cuRes = cuCtxPopCurrent(&cuOldCtx); // THIS IS ALLOWED TO FAIL (it doesn't
// Create a context for the device
cuCtx = nullptr;
cuRes = cuCtxCreate(&cuCtx, CU_CTX_SCHED_BLOCKING_SYNC, 0);
// Push our context
cuRes = cuCtxPushCurrent(cuCtx);
// Create an NV Encoder session
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS nvEncSessParams = { 0 };
nvEncSessParams.apiVersion = NVENCAPI_VERSION;
nvEncSessParams.version = NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS_VER;
nvEncSessParams.deviceType = NV_ENC_DEVICE_TYPE_CUDA;
nvEncSessParams.device = cuCtx;
encStat = nvEncApi.nvEncOpenEncodeSessionEx(&nvEncSessParams, &nvEncoder);
}
void Encoder::initEncoder()
{
NVENCSTATUS encStat;
// Configure the encoder via preset
NV_ENC_PRESET_CONFIG presetConfig = { 0 };
GUID codecGUID = NV_ENC_CODEC_H264_GUID;
GUID presetGUID = NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID;
presetConfig.version = NV_ENC_PRESET_CONFIG_VER;
presetConfig.presetCfg.version = NV_ENC_CONFIG_VER;
encStat = nvEncApi.nvEncGetEncodePresetConfig(nvEncoder, codecGUID, presetGUID, &presetConfig);
NV_ENC_INITIALIZE_PARAMS initParams = { 0 };
initParams.version = NV_ENC_INITIALIZE_PARAMS_VER;
initParams.encodeGUID = codecGUID;
initParams.encodeWidth = texSize.w;
initParams.encodeHeight = texSize.h;
initParams.darWidth = texSize.w;
initParams.darHeight = texSize.h;
initParams.frameRateNum = 25;
initParams.frameRateDen = 1;
initParams.enableEncodeAsync = 0;
initParams.enablePTD = 1;
initParams.presetGUID = presetGUID;
memcpy(&nvEncConfig, &presetConfig.presetCfg, sizeof(nvEncConfig));
initParams.encodeConfig = &nvEncConfig;
encStat = nvEncApi.nvEncInitializeEncoder(nvEncoder, &initParams);
}
//void cleanupEncodeSession();
//void cleanupEncoder;
Size texSize;
GLuint yuvTex;
uint_t encWidth, encHeight;
CUdeviceptr cuDevPtr;
size_t cuMemPitch;
NV_ENC_CONFIG nvEncConfig;
NV_ENC_INPUT_PTR nvEncInpBuf;
NV_ENC_REGISTERED_PTR nvEncInpRes;
CUdevice cuDevice;
CUcontext cuCtx, cuOldCtx;
void *nvEncoder;
CUgraphicsResource cuInpTexRes;
};
int main(int argc, char *argv[])
{
Encoder encoder;
encoder.init({1920, 1080}, 0); // OMITTED THE TEXTURE AS IT IS NOT NEEDED TO REPRODUCE THE ISSUE
return 0;
}
After comparing the NVidia sample NvEncoderCudaInterop with my minimal code, I finally found the item that makes the difference between success and failure: its the pitch parameter of the NV_ENC_REGISTER_RESOURCE structure passed to nvEncRegisterResource().
I haven't seen it documented anywhere, but there's a hard limit on that value, which I've determined experimentally to be at 2560. Anything above that will result in NV_ENC_ERR_RESOURCE_REGISTER_FAILED.
It does not appear to matter that the pitch I was passing was calculated by another API call, cuMemAllocPitch().
(Another thing that was missing from my code was "locking" and unlocking the CUDA context to the current thread via cuCtxPushCurrent() and cuCtxPopCurrent(). Done in the sample via a RAII class.)
EDIT:
I have worked around the problem by doing something for which I had another reason: using NV12 as input format for the encoder instead of YUV444.
With NV12, the pitch parameter drops below the 2560 limit because the byte size per row is equal to the width, so in my case 1920 bytes.
This was necessary (at the time) because my graphics card was a GTX 760 with a "Kepler" GPU, which (as I was initially unaware) only supports NV12 as input format for NVEnc. I have since upgraded to a GTX 970, but as I just found out, the 2560 limit is still there.
This makes me wonder just how exactly one is expected to use NVEnc with YUV444. The only possibility that comes to my mind is to use non-pitched memory, which seems bizarre. I'd appreciate comments from people who've actually used NVEnc with YUV444.
EDIT #2 - PENDING FURTHER UPDATE:
New information has surfaced in the form of another SO question: NVencs Output Bitstream is not readable
It is quite possible that my answer so far was wrong. It seems now that the pitch should not only be set when registering the CUDA resource, but also when actually sending it to the encoder via nvEncEncodePicture(). I cannot check this right now, but I will next time I work on that project.