C++/libscreen fails to update visibility - c++

I'm writing a GUI in C++ (qcc) for Neutrino 6.5.0 with QNX CAR 2.0 using the screen windowing system, following the documentation tutorial. No visible output is generated, and /dev/screen/{pid}/win-{n}/win-{n} has status = WIN_STATUS_INVISIBLE.
(Using {n} and {pid} to indicate substitutions of runtime/dynamic variables' values)
I've tried following the tutorial to the letter, but had the same results. In particular, the value of window_group_name (mainwindowgroup or veryuniquewgn) makes no difference; the former being the value suggested by the tutorial, and the latter my own.
I've constructed an MCVE:
#include <cstdlib> // for EXIT_ constants
#include <stdio.h> // for printf(), getchar()
#include <process.h> // for getpid()
#include <screen.h>
#define screen_err(desc) \
if(screen_res != 0) { \
printf("ERR screen: failed to %s\n", desc); \
return EXIT_FAILURE; \
}
int main_withwindow(screen_context_t scr_ctx, screen_window_t scr_win) {
static const char *window_group_name = "veryuniquewgn";
// Dummy to pass as ptr to enum values
int prop;
int screen_res;
screen_res = screen_create_window_group(scr_win, window_group_name);
screen_err("create window group");
prop = SCREEN_FORMAT_RGBA8888;
screen_res = screen_set_window_property_iv(scr_win, SCREEN_PROPERTY_FORMAT,
&prop);
screen_err("set window property: format -> RGBA");
prop = SCREEN_USAGE_NATIVE;
screen_res = screen_set_window_property_iv(scr_win, SCREEN_PROPERTY_USAGE,
&prop);
screen_err("set window property: usage -> native");
screen_res = screen_create_window_buffers(scr_win, 1);
screen_err("create window buffers");
int win_buf_rect[4] = { 0, 0 };
screen_res = screen_get_window_property_iv(scr_win,
SCREEN_PROPERTY_BUFFER_SIZE, win_buf_rect + 2);
screen_err("get window property: buffer_size");
// Array type to easily support multi-buffering in future
screen_buffer_t scr_buf[1];
screen_res = screen_get_window_property_pv(scr_win,
SCREEN_PROPERTY_RENDER_BUFFERS, (void **)scr_buf);
screen_err("get window property: render_buffers");
int bg[] = { SCREEN_BLIT_COLOR, 0xffffff00, SCREEN_BLIT_END };
screen_res = screen_fill(scr_ctx, scr_buf[0], bg);
screen_err("fill buffer with yellow");
screen_res = screen_post_window(scr_win, scr_buf[0], 1, win_buf_rect, 0);
screen_err("post window");
prop = 255;
screen_res = screen_set_window_property_iv(scr_win,
SCREEN_PROPERTY_ZORDER, &prop);
screen_err("set window property: zorder -> 255");
prop = 1;
screen_res = screen_set_window_property_iv(scr_win,
SCREEN_PROPERTY_VISIBLE, &prop);
screen_err("set window property: visible -> true");
screen_res = screen_flush_context(scr_ctx, SCREEN_WAIT_IDLE);
screen_err("flush context to idle");
getchar();
return EXIT_SUCCESS;
}
int main_withcontext(screen_context_t scr_ctx) {
screen_window_t scr_win;
int ret;
int screen_res;
screen_res = screen_create_window(&scr_win, scr_ctx);
screen_err("create window");
ret = main_withwindow(scr_ctx, scr_win);
screen_res = screen_destroy_window(scr_win);
screen_err("destroy window");
return ret;
}
int main(int argc, char *argv[]) {
printf("%d\n", getpid());
screen_context_t scr_ctx;
int ret;
int screen_res;
screen_res = screen_create_context(&scr_ctx, SCREEN_APPLICATION_CONTEXT);
screen_err("create context");
ret = main_withcontext(scr_ctx);
screen_res = screen_destroy_context(scr_ctx);
screen_err("destroy context");
return ret;
}
Given the use of the screen_err() define, no output - except the first printf for the pid - indicates no errors from screen_...() calls. That's exactly what I see when I run this.
Looking in /dev/screen/{pid}, I see a ctx-{n}/ctx-{n} and a win-{n}/win-{n} file, as expected. The former is not human-readable, but the latter, and its differences to its counterpart in a working nto 6.5.0 + CAR 2.0 + libscreen application, yield some insight.
The working application is an HMI for which I do not have the source, and is launched from /etc/ro as root, and mine is launched from ksh, logged in as root.
They're both 98-line files, so I've produced a diff - the win-{n} from the working application on the left, and mine on the right - excluding metrics:
(lines over 39 characters compared vertically)
autonomous = 0 autonomous = 1
status = WIN_STATUS_FULLSCREEN status = WIN_STATUS_INVISIBLE
id string = DPY_HMI id string =
insert id = 1 insert id = 3
reclip = 0 reclip = 1
flags = WIN_FLAG_VISIBLE WIN_FLAG_FLOATING
flags = WIN_FLAG_FLOATING
usage = SCREEN_USAGE_OPENGL_ES2 usage = SCREEN_USAGE_NATIVE
order = 240 order = 0
regions = (0,0;800,480) regions = (none)
clipped source viewport = (0,0;800,480 800x480)
clipped source viewport = (0,0;0,0 0x0)
clipped destination rectangle = (0,0;800,480 800x480)
clipped destination rectangle = (0,0;0,0 0x0)
transform = [[1 0 0],[0 1 0],[0 0 1]] transform = [[0 0 0],[0 0 0],[0 0 0]]
Of these differences, usage is the only one with the value I expect, considering that it reflects the parameter of the relevant call to screen_set_window_property_iv(...). For all the rest, especially regions and flags, I do not understand why their values differ from those of the working application.
The target's display's native resolution is 800x480.

As it turns out, that code was completely valid and correct. The reason for its failure was a window manager daemon I wasn't aware of suppressing the window. Turning this off solved the problem.

Related

vkCreateSwapchainKHR: internal drawable creation failed

I had my Vulkan application working but for some reason it stopped working(I don't believe I touched anything that could have broken it, besides making my engine project a .lib instead of a .dll) and started giving the "vkCreateSwapchainKHR: internal drawable creation failed" error in the validation layers. vkCreateSwapchainKHR returns VK_ERROR_VALIDATION_FAILED_EXT.
I already checked this answer:
Wat does the "vkCreateSwapchainKHR:internal drawable creation failed." means, but it was not my problem, (as I said, it was working until it wasn't). Here's all the code I believe is relevant, if you need something else just comment:
Window Creation:
/* Initialize the library */
if (!glfwInit())
/* Create a windowed mode window and its OpenGL context */
glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
glfwWindowHint(GLFW_DECORATED, _WCI.IsDecorated);
GLFWWindow = glfwCreateWindow(Extent.Width, Extent.Height, _WCI.Name.c_str(), nullptr, nullptr);
if (!GLFWWindow) glfwTerminate();
//glfwMakeContextCurrent(GLFWWindow);
uint32 Count = 0;
auto ff = glfwGetRequiredInstanceExtensions(&Count);
GS_BASIC_LOG_MESSAGE("GLFW required extensions:")
for (uint8 i = 0; i < Count; ++i)
{
GS_BASIC_LOG_MESSAGE("%d: %s", i, ff[i]);
}
WindowObject = glfwGetWin32Window(GLFWWindow);
WindowInstance = GetModuleHandle(nullptr);
I'm using the correct instance extensions:
const char* Extensions[] = { VK_KHR_SURFACE_EXTENSION_NAME, VK_KHR_WIN32_SURFACE_EXTENSION_NAME, VK_EXT_DEBUG_UTILS_EXTENSION_NAME };
VKSwapchainCreator VulkanRenderContext::CreateSwapchain(VKDevice* _Device, VkSwapchainKHR _OldSwapchain) const
{
VkSwapchainCreateInfoKHR SwapchainCreateInfo = { VK_STRUCTURE_TYPE_SWAPCHAIN_CREATE_INFO_KHR };
SwapchainCreateInfo.surface = Surface.GetHandle();
SwapchainCreateInfo.minImageCount = 3;
SwapchainCreateInfo.imageFormat = Format.format;
SwapchainCreateInfo.imageColorSpace = Format.colorSpace;
SwapchainCreateInfo.imageExtent = Extent2DToVkExtent2D(RenderExtent);
//The imageArrayLayers specifies the amount of layers each image consists of. This is always 1 unless you are developing a stereoscopic 3D application.
SwapchainCreateInfo.imageArrayLayers = 1;
SwapchainCreateInfo.imageUsage = VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
SwapchainCreateInfo.imageSharingMode = VK_SHARING_MODE_EXCLUSIVE;
SwapchainCreateInfo.queueFamilyIndexCount = 0;
SwapchainCreateInfo.pQueueFamilyIndices = nullptr;
SwapchainCreateInfo.preTransform = VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR;
SwapchainCreateInfo.compositeAlpha = VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR;
SwapchainCreateInfo.presentMode = PresentMode;
SwapchainCreateInfo.clipped = VK_TRUE;
SwapchainCreateInfo.oldSwapchain = _OldSwapchain;
return VKSwapchainCreator(_Device, &SwapchainCreateInfo);
}
Both VK_SURFACE_TRANSFORM_IDENTITY_BIT_KHR and VK_COMPOSITE_ALPHA_OPAQUE_BIT_KHR are supported by my GPU.
VkBool32 Supports = 0;
vkGetPhysicalDeviceSurfaceSupportKHR(_PD, PresentationQueue.GetQueueIndex(), _Surface, &Supports);
VkSurfaceCapabilitiesKHR SurfaceCapabilities = {};
vkGetPhysicalDeviceSurfaceCapabilitiesKHR(_PD, _Surface, &SurfaceCapabilities);
VkBool32 Supported = 0;
vkGetPhysicalDeviceSurfaceSupportKHR(_PD, PresentationQueue.GetQueueIndex(), _Surface, &Supported);
auto bb = vkGetPhysicalDeviceWin32PresentationSupportKHR(_PD, PresentationQueue.GetQueueIndex());
Everything here returns true, although it seemed suspicious to me that VkSurfaceCapabilitiesKHR returned the same extent for currentExtent, minImageExtent and maxImageExtent.
/* Initialize the library */
if (!glfwInit())
/* Create a windowed mode window and its OpenGL context */
glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);
glfwWindowHint(GLFW_DECORATED, _WCI.IsDecorated);
If this is your actual code, it means "if glfwInit() succeeds skip glfwWindowHint(GLFW_CLIENT_API, GLFW_NO_API);". And as we know not calling the hint causes the error.
The if should perhaps not have negation inside. And (this not being Python) it should have scope {} to cover all the init code.
The solution was to remove the glfwInit call from inside the if and place it outside.
if (!glfwInit()) -> glfwInit()
No idea why, if someone knows why this could've have caused the problem I would love to know.

Specific filepath to store Screen Record using CGDisplayStream in OSX

I have been working on a c++ command line tool to record screen. After some searching I have come up with this following code. Looks like screen is being recorded when I compile and run the code. I am looking for functions where I can provide the specific filepath where the screen record is to be stored. Also I would like to append the timestamp along with filename. If anybody has better approach or method to this problem please suggest here. Any leads are appreciated. Thanks
#include <ApplicationServices/ApplicationServices.h>
int main(int argc, const char * argv[]) {
// insert code here...
CGRect mainMonitor = CGDisplayBounds(CGMainDisplayID());
CGFloat monitorHeight = CGRectGetHeight(mainMonitor);
CGFloat monitorWidth = CGRectGetWidth(mainMonitor);
const void *keys[1] = { kCGDisplayStreamSourceRect };
const void *values[1] = { CGRectCreateDictionaryRepresentation(CGRectMake(0, 0, 100, 100)) };
CFDictionaryRef properties = CFDictionaryCreate(NULL, keys, values, 1, NULL, NULL);
CGDisplayStreamRef stream = CGDisplayStreamCreate(CGMainDisplayID(), monitorWidth, monitorHeight, '420f' , properties, ^(CGDisplayStreamFrameStatus status, uint64_t displayTime, IOSurfaceRef frameSurface, CGDisplayStreamUpdateRef updateRef){});
CGDirectDisplayID displayID = CGMainDisplayID();
CGImageRef image_create = CGDisplayCreateImage(displayID);
CFRunLoopSourceRef runLoop = CGDisplayStreamGetRunLoopSource(stream);
// CFRunLoopAddSource(<#CFRunLoopRef rl#>, runLoop, <#CFRunLoopMode mode#>);
CGError err = CGDisplayStreamStart(stream);
if (err == CGDisplayNoErr) {
std::cout<<"WORKING"<<std::endl;
sleep(5);
} else {
std::cout<<"Error: "<<err<<std::endl;
}
//std::cout << "Hello, World!\n";
return 0;
}
You should do that in the callback which you provide in CGDisplayStreamCreate. You can access the pixels via IOSurfaceGetBaseAddress (see other IOSurface functions). If you don't want to do the pixel twiddling yourself, you could create a CVPixelBuffer with CVPixelBufferCreateWithBytes from the IOSurface and then create a CIImage with [CIImage imageWithCVImageBuffer] and save that to file as seen here.

Get window title with XCB

I am trying to get information about the window in focus. It seems that I get a correct window id from xcb_get_input_focus_reply_t->focus: it stays the same for my Eclipse IDE (56623164) and is another for any other window in focus. However, the value length is always 0 for XCB_ATOM_WM_NAME.
shortened code
cookie = xcb_get_property(c, 0, fr->focus, XCB_ATOM_WM_NAME,
XCB_ATOM_STRING, 0, 0);
if ((reply = xcb_get_property_reply(c, cookie, NULL))) {
int len = xcb_get_property_value_length(reply);
if (len == 0) {
printf("Zero Length\n");
free(reply);
return;
}
printf("WM_NAME is %.*s\n", len, (char*) xcb_get_property_value(reply));
}
Eclipse Debugger
reply xcb_get_property_reply_t * 0x60bd40
response_type uint8_t 1 '\001'
format uint8_t 0 '\0'
sequence uint16_t 2
length uint32_t 0
type xcb_atom_t 0
bytes_after uint32_t 0
value_len uint32_t 0
pad0 unsigned char [12] 0x60bd54
There is no error (I passed and inspected a xcb_generic_error_t). Do you have any idea what could go wrong? Maybe I should use Xlib instead...
This code works for me, it is in js-ctypes but you can ignore that part and see this for API use:
var win = aXcbWindowT;
// console.log('win:', win);
var req_title = ostypes.API('xcb_get_property')(ostypes.HELPER.cachedXCBConn(), 0, win, ostypes.CONST.XCB_ATOM_WM_NAME, ostypes.CONST.XCB_ATOM_STRING, 0, 100); // `100` means it will get 100*4 so 400 bytes, so that 400 char, so `rez_title.bytes_after` should be `0` but i can loop till it comes out to be 0
var rez_title = ostypes.API('xcb_get_property_reply')(ostypes.HELPER.cachedXCBConn(), req_title, null);
// console.log('rez_title:', rez_title);
var title_len = ostypes.API('xcb_get_property_value_length')(rez_title); // length is not null terminated so "Console - chrome://nativeshot/content/resources/scripts/MainWorker.js?0.01966718940939427" will be length of `88`, this matches `rez_title.length` but the docs recommend to use this call to get the value, i dont know why
console.log('title_len:', title_len, 'rez_title.contents.length:', rez_title.contents.length); // i think `rez_title.contents.length` is the actual length DIVIDED by 4, and rez_title_len is not dividied by 4
var title_buf = ostypes.API('xcb_get_property_value')(rez_title); // "title_len: 89 rez_title.contents.length: 23" for test case of "Console - chrome://nativeshot/content/resources/scripts/MainWorker.js?0.01966718940939427"
// console.log('title_buf:', title_buf);
var title = ctypes.cast(title_buf, ctypes.char.array(title_len).ptr).contents.readString();
console.log('title:', title);
ostypes.API('free')(rez_title);
return title;
Sometimes though what is returned by xcb_get_input_focus_reply_t->focus is not the window to act on. I have found that sometimes it doesn't have a title, but if you use xcb_query_tree you can find its parent window probaby has a title:
var req_query = ostypes.API('xcb_query_tree')(ostypes.HELPER.cachedXCBConn(), win);
var rez_query = ostypes.API('xcb_query_tree_reply')(ostypes.HELPER.cachedXCBConn(), req_query, null);
console.log('rez_query.contents:', rez_query.contents);
if (root === -1) {
root = rez_query.contents.root;
}
win = rez_query.contents.parent; // this win should have the title

Audio distorted with VST plugin

I had to plug into a pre-existing software, managing ASIO audio streams, a simple VST host. Despite of lack of some documentation, I managed to do so however once I load the plugin I get a badly distorted audio signal back.
The VST I'm using works properly (with other VST Hosts) so it's probably some kind of bug in the code I made, however when I disable the "PROCESS" from the plugin (my stream goes through the plugin, it simply does not get processed) it gets back as I sent without any noise or distortion on it.
One thing I'm slightly concerned about is the type of the data used as the ASIO driver fills an __int32 buffer while the plugins wants some float buffer.
That's really depressing as I reviewed zillions of times my code and it seems to be fine.
Here is the code of the class I'm using; please note that some numbers are temporarily hard-coded to help debugging.
VSTPlugIn::VSTPlugIn(const char* fullDirectoryName, const char* ID)
: plugin(NULL)
, blocksize(128) // TODO
, sampleRate(44100.0F) // TODO
, hostID(ID)
{
this->LoadPlugin(fullDirectoryName);
this->ConfigurePluginCallbacks();
this->StartPlugin();
out = new float*[2];
for (int i = 0; i < 2; ++i)
{
out[i] = new float[128];
memset(out[i], 0, 128);
}
}
void VSTPlugIn::LoadPlugin(const char* path)
{
HMODULE modulePtr = LoadLibrary(path);
if(modulePtr == NULL)
{
printf("Failed trying to load VST from '%s', error %d\n", path, GetLastError());
plugin = NULL;
}
// vst 2.4 export name
vstPluginFuncPtr mainEntryPoint = (vstPluginFuncPtr)GetProcAddress(modulePtr, "VSTPluginMain");
// if "VSTPluginMain" was not found, search for "main" (backwards compatibility mode)
if(!mainEntryPoint)
{
mainEntryPoint = (vstPluginFuncPtr)GetProcAddress(modulePtr, "main");
}
// Instantiate the plugin
plugin = mainEntryPoint(hostCallback);
}
void VSTPlugIn::ConfigurePluginCallbacks()
{
// Check plugin's magic number
// If incorrect, then the file either was not loaded properly, is not a
// real VST plugin, or is otherwise corrupt.
if(plugin->magic != kEffectMagic)
{
printf("Plugin's magic number is bad. Plugin will be discarded\n");
plugin = NULL;
}
// Create dispatcher handle
this->dispatcher = (dispatcherFuncPtr)(plugin->dispatcher);
// Set up plugin callback functions
plugin->getParameter = (getParameterFuncPtr)plugin->getParameter;
plugin->processReplacing = (processFuncPtr)plugin->processReplacing;
plugin->setParameter = (setParameterFuncPtr)plugin->setParameter;
}
void VSTPlugIn::StartPlugin()
{
// Set some default properties
dispatcher(plugin, effOpen, 0, 0, NULL, 0);
dispatcher(plugin, effSetSampleRate, 0, 0, NULL, sampleRate);
dispatcher(plugin, effSetBlockSize, 0, blocksize, NULL, 0.0f);
this->ResumePlugin();
}
void VSTPlugIn::ResumePlugin()
{
dispatcher(plugin, effMainsChanged, 0, 1, NULL, 0.0f);
}
void VSTPlugIn::SuspendPlugin()
{
dispatcher(plugin, effMainsChanged, 0, 0, NULL, 0.0f);
}
void VSTPlugIn::ProcessAudio(float** inputs, float** outputs, long numFrames)
{
plugin->processReplacing(plugin, inputs, out, 128);
memcpy(outputs, out, sizeof(float) * 128);
}
EDIT: Here's the code I use to interface my sw with the VST Host
// Copying the outer buffer in the inner container
for(unsigned i = 0; i < bufferLenght; i++)
{
float f;
f = ((float) buff[i]) / (float) std::numeric_limits<int>::max()
if( f > 1 ) f = 1;
if( f < -1 ) f = -1;
samples[0][i] = f;
}
// DO JOB
for(auto it = inserts.begin(); it != inserts.end(); ++it)
{
(*it)->ProcessAudio(samples, samples, bufferLenght);
}
// Copying the result back into the buffer
for(unsigned i = 0; i < bufferLenght; i++)
{
float f = samples[0][i];
int intval;
f = f * std::numeric_limits<int>::max();
if( f > std::numeric_limits<int>::max() ) f = std::numeric_limits<int>::max();
if( f < std::numeric_limits<int>::min() ) f = std::numeric_limits<int>::min();
intval = (int) f;
buff[i] = intval;
}
where "buff" is defined as "__int32* buff"
I'm guessing that when you call f = std::numeric_limits<int>::max() (and the related min() case on the line below), this might cause overflow. Have you tried f = std::numeric_limits<int>::max() - 1?
Same goes for the code snippit above with f = ((float) buff[i]) / (float) std::numeric_limits<int>::max()... I'd also subtract one there to avoid a potential overflow later on.

nvEncRegisterResource() fails with -23

I've hit a complete brick wall in my attempt to use NVEnc to stream OpenGL frames as H264. I've been at this particular issue for close to 8 hours without any progress.
The problem is the call to nvEncRegisterResource(), which invariably fails with code -23 (enum value NV_ENC_ERR_RESOURCE_REGISTER_FAILED, documented as "failed to register the resource" - thanks NVidia).
I'm trying to follow a procedure outlined in this document from the University of Oslo (page 54, "OpenGL interop"), so I know for a fact that this is supposed to work, though unfortunately said document does not provide the code itself.
The idea is fairly straightforward:
map the texture produced by the OpenGL frame buffer object into CUDA;
copy the texture into a (previously allocated) CUDA buffer;
map that buffer as an NVEnc input resource
use that input resource as the source for the encoding
As I said, the problem is step (3). Here are the relevant code snippets (I'm omitting error handling for brevity.)
// Round up width and height
priv->encWidth = (_resolution.w + 31) & ~31, priv->encHeight = (_resolution.h + 31) & ~31;
// Allocate CUDA "pitched" memory to match the input texture (YUV, one byte per component)
cuErr = cudaMallocPitch(&priv->cudaMemPtr, &priv->cudaMemPitch, 3 * priv->encWidth, priv->encHeight);
This should allocate on-device CUDA memory (the "pitched" variety, though I've tried non-pitched too, without any change in the outcome.)
// Register the CUDA buffer as an input resource
NV_ENC_REGISTER_RESOURCE regResParams = { 0 };
regResParams.version = NV_ENC_REGISTER_RESOURCE_VER;
regResParams.resourceType = NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
regResParams.width = priv->encWidth;
regResParams.height = priv->encHeight;
regResParams.bufferFormat = NV_ENC_BUFFER_FORMAT_YUV444_PL;
regResParams.resourceToRegister = priv->cudaMemPtr;
regResParams.pitch = priv->cudaMemPitch;
encStat = nvEncApi.nvEncRegisterResource(priv->nvEncoder, &regResParams);
// ^^^ FAILS
priv->nvEncInpRes = regResParams.registeredResource;
This is the brick wall. No matter what I try, nvEncRegisterResource() fails.
I should note that I rather think (though I may be wrong) that I've done all the required initializations. Here is the code that creates and activates the CUDA context:
// Pop the current context
cuRes = cuCtxPopCurrent(&priv->cuOldCtx);
// Create a context for the device
priv->cuCtx = nullptr;
cuRes = cuCtxCreate(&priv->cuCtx, CU_CTX_SCHED_BLOCKING_SYNC, priv->cudaDevice);
// Push our context
cuRes = cuCtxPushCurrent(priv->cuCtx);
.. followed by the creation of the encoding session:
// Create an NV Encoder session
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS nvEncSessParams = { 0 };
nvEncSessParams.apiVersion = NVENCAPI_VERSION;
nvEncSessParams.version = NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS_VER;
nvEncSessParams.deviceType = NV_ENC_DEVICE_TYPE_CUDA;
nvEncSessParams.device = priv->cuCtx; // nullptr
auto encStat = nvEncApi.nvEncOpenEncodeSessionEx(&nvEncSessParams, &priv->nvEncoder);
And finally, the code initializing the encoder:
// Configure the encoder via preset
NV_ENC_PRESET_CONFIG presetConfig = { 0 };
GUID codecGUID = NV_ENC_CODEC_H264_GUID;
GUID presetGUID = NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID;
presetConfig.version = NV_ENC_PRESET_CONFIG_VER;
presetConfig.presetCfg.version = NV_ENC_CONFIG_VER;
encStat = nvEncApi.nvEncGetEncodePresetConfig(priv->nvEncoder, codecGUID, presetGUID, &presetConfig);
NV_ENC_INITIALIZE_PARAMS initParams = { 0 };
initParams.version = NV_ENC_INITIALIZE_PARAMS_VER;
initParams.encodeGUID = codecGUID;
initParams.encodeWidth = priv->encWidth;
initParams.encodeHeight = priv->encHeight;
initParams.darWidth = 1;
initParams.darHeight = 1;
initParams.frameRateNum = 25; // TODO: make this configurable
initParams.frameRateDen = 1; // ditto
// .max_surface_count = (num_mbs >= 8160) ? 32 : 48;
// .buffer_delay ? necessary
initParams.enableEncodeAsync = 0;
initParams.enablePTD = 1;
initParams.presetGUID = presetGUID;
memcpy(&priv->nvEncConfig, &presetConfig.presetCfg, sizeof(priv->nvEncConfig));
initParams.encodeConfig = &priv->nvEncConfig;
encStat = nvEncApi.nvEncInitializeEncoder(priv->nvEncoder, &initParams);
All the above initializations report success.
I'd be extremely grateful to anyone who can get me past this hurdle.
EDIT: here is the complete code to reproduce the problem. The only observable difference to the original code is that cuPopContext() returns an error (which can be ignored) here - probably my original program creates such a context as a side effect of using OpenGL. Otherwise, the code behaves exactly as the original does.
I've built the code with Visual Studio 2013. You must link the following library file (adapt path if not on C:): C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\Win32\cuda.lib
You must also make sure that C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\ (or similar) is in the include path.
NEW EDIT: modified the code to only use the CUDA driver interface, instead of mixing with the runtime API. Still the same error code.
#ifdef _WIN32
#include <Windows.h>
#endif
#include <cassert>
#include <GL/gl.h>
#include <iostream>
#include <string>
#include <stdexcept>
#include <string>
#include <cuda.h>
//#include <cuda_runtime.h>
#include <cuda_gl_interop.h>
#include <nvEncodeAPI.h>
// NV Encoder API ---------------------------------------------------
#if defined(_WIN32)
#define LOAD_FUNC(l, s) GetProcAddress(l, s)
#define DL_CLOSE_FUNC(l) FreeLibrary(l)
#else
#define LOAD_FUNC(l, s) dlsym(l, s)
#define DL_CLOSE_FUNC(l) dlclose(l)
#endif
typedef NVENCSTATUS(NVENCAPI* PNVENCODEAPICREATEINSTANCE)(NV_ENCODE_API_FUNCTION_LIST *functionList);
struct NVEncAPI : public NV_ENCODE_API_FUNCTION_LIST {
public:
// ~NVEncAPI() { cleanup(); }
void init() {
#if defined(_WIN32)
if (sizeof(void*) == 8) {
nvEncLib = LoadLibrary(TEXT("nvEncodeAPI64.dll"));
}
else {
nvEncLib = LoadLibrary(TEXT("nvEncodeAPI.dll"));
}
if (nvEncLib == NULL) throw std::runtime_error("Failed to load NVidia Encoder library: " + std::to_string(GetLastError()));
#else
nvEncLib = dlopen("libnvidia-encode.so.1", RTLD_LAZY);
if (nvEncLib == nullptr)
throw std::runtime_error("Failed to load NVidia Encoder library: " + std::string(dlerror()));
#endif
auto nvEncodeAPICreateInstance = (PNVENCODEAPICREATEINSTANCE) LOAD_FUNC(nvEncLib, "NvEncodeAPICreateInstance");
version = NV_ENCODE_API_FUNCTION_LIST_VER;
NVENCSTATUS encStat = nvEncodeAPICreateInstance(static_cast<NV_ENCODE_API_FUNCTION_LIST *>(this));
}
void cleanup() {
#if defined(_WIN32)
if (nvEncLib != NULL) {
FreeLibrary(nvEncLib);
nvEncLib = NULL;
}
#else
if (nvEncLib != nullptr) {
dlclose(nvEncLib);
nvEncLib = nullptr;
}
#endif
}
private:
#if defined(_WIN32)
HMODULE nvEncLib;
#else
void* nvEncLib;
#endif
bool init_done;
};
static NVEncAPI nvEncApi;
// Encoder class ----------------------------------------------------
class Encoder {
public:
typedef unsigned int uint_t;
struct Size { uint_t w, h; };
Encoder() {
CUresult cuRes = cuInit(0);
nvEncApi.init();
}
void init(const Size & resolution, uint_t texture) {
NVENCSTATUS encStat;
CUresult cuRes;
texSize = resolution;
yuvTex = texture;
// Purely for information
int devCount = 0;
cuRes = cuDeviceGetCount(&devCount);
// Initialize NVEnc
initEncodeSession(); // start an encoding session
initEncoder();
// Register the YUV texture as a CUDA graphics resource
// CODE COMMENTED OUT AS THE INPUT TEXTURE IS NOT NEEDED YET (TO MY UNDERSTANDING) AT SETUP TIME
//cudaGraphicsGLRegisterImage(&priv->cudaInpTexRes, priv->yuvTex, GL_TEXTURE_2D, cudaGraphicsRegisterFlagsReadOnly);
// Allocate CUDA "pitched" memory to match the input texture (YUV, one byte per component)
encWidth = (texSize.w + 31) & ~31, encHeight = (texSize.h + 31) & ~31;
cuRes = cuMemAllocPitch(&cuDevPtr, &cuMemPitch, 4 * encWidth, encHeight, 16);
// Register the CUDA buffer as an input resource
NV_ENC_REGISTER_RESOURCE regResParams = { 0 };
regResParams.version = NV_ENC_REGISTER_RESOURCE_VER;
regResParams.resourceType = NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
regResParams.width = encWidth;
regResParams.height = encHeight;
regResParams.bufferFormat = NV_ENC_BUFFER_FORMAT_YUV444_PL;
regResParams.resourceToRegister = (void*) cuDevPtr;
regResParams.pitch = cuMemPitch;
encStat = nvEncApi.nvEncRegisterResource(nvEncoder, &regResParams);
assert(encStat == NV_ENC_SUCCESS); // THIS IS THE POINT OF FAILURE
nvEncInpRes = regResParams.registeredResource;
}
void cleanup() { /* OMITTED */ }
void encode() {
// THE FOLLOWING CODE WAS NEVER REACHED YET BECAUSE OF THE ISSUE.
// INCLUDED HERE FOR REFERENCE.
CUresult cuRes;
NVENCSTATUS encStat;
cuRes = cuGraphicsResourceSetMapFlags(cuInpTexRes, CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY);
cuRes = cuGraphicsMapResources(1, &cuInpTexRes, 0);
CUarray mappedArray;
cuRes = cuGraphicsSubResourceGetMappedArray(&mappedArray, cuInpTexRes, 0, 0);
cuRes = cuMemcpyDtoA(mappedArray, 0, cuDevPtr, 4 * encWidth * encHeight);
NV_ENC_MAP_INPUT_RESOURCE mapInputResParams = { 0 };
mapInputResParams.version = NV_ENC_MAP_INPUT_RESOURCE_VER;
mapInputResParams.registeredResource = nvEncInpRes;
encStat = nvEncApi.nvEncMapInputResource(nvEncoder, &mapInputResParams);
// TODO: encode...
cuRes = cuGraphicsUnmapResources(1, &cuInpTexRes, 0);
}
private:
struct PrivateData;
void initEncodeSession() {
CUresult cuRes;
NVENCSTATUS encStat;
// Pop the current context
cuRes = cuCtxPopCurrent(&cuOldCtx); // THIS IS ALLOWED TO FAIL (it doesn't
// Create a context for the device
cuCtx = nullptr;
cuRes = cuCtxCreate(&cuCtx, CU_CTX_SCHED_BLOCKING_SYNC, 0);
// Push our context
cuRes = cuCtxPushCurrent(cuCtx);
// Create an NV Encoder session
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS nvEncSessParams = { 0 };
nvEncSessParams.apiVersion = NVENCAPI_VERSION;
nvEncSessParams.version = NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS_VER;
nvEncSessParams.deviceType = NV_ENC_DEVICE_TYPE_CUDA;
nvEncSessParams.device = cuCtx;
encStat = nvEncApi.nvEncOpenEncodeSessionEx(&nvEncSessParams, &nvEncoder);
}
void Encoder::initEncoder()
{
NVENCSTATUS encStat;
// Configure the encoder via preset
NV_ENC_PRESET_CONFIG presetConfig = { 0 };
GUID codecGUID = NV_ENC_CODEC_H264_GUID;
GUID presetGUID = NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID;
presetConfig.version = NV_ENC_PRESET_CONFIG_VER;
presetConfig.presetCfg.version = NV_ENC_CONFIG_VER;
encStat = nvEncApi.nvEncGetEncodePresetConfig(nvEncoder, codecGUID, presetGUID, &presetConfig);
NV_ENC_INITIALIZE_PARAMS initParams = { 0 };
initParams.version = NV_ENC_INITIALIZE_PARAMS_VER;
initParams.encodeGUID = codecGUID;
initParams.encodeWidth = texSize.w;
initParams.encodeHeight = texSize.h;
initParams.darWidth = texSize.w;
initParams.darHeight = texSize.h;
initParams.frameRateNum = 25;
initParams.frameRateDen = 1;
initParams.enableEncodeAsync = 0;
initParams.enablePTD = 1;
initParams.presetGUID = presetGUID;
memcpy(&nvEncConfig, &presetConfig.presetCfg, sizeof(nvEncConfig));
initParams.encodeConfig = &nvEncConfig;
encStat = nvEncApi.nvEncInitializeEncoder(nvEncoder, &initParams);
}
//void cleanupEncodeSession();
//void cleanupEncoder;
Size texSize;
GLuint yuvTex;
uint_t encWidth, encHeight;
CUdeviceptr cuDevPtr;
size_t cuMemPitch;
NV_ENC_CONFIG nvEncConfig;
NV_ENC_INPUT_PTR nvEncInpBuf;
NV_ENC_REGISTERED_PTR nvEncInpRes;
CUdevice cuDevice;
CUcontext cuCtx, cuOldCtx;
void *nvEncoder;
CUgraphicsResource cuInpTexRes;
};
int main(int argc, char *argv[])
{
Encoder encoder;
encoder.init({1920, 1080}, 0); // OMITTED THE TEXTURE AS IT IS NOT NEEDED TO REPRODUCE THE ISSUE
return 0;
}
After comparing the NVidia sample NvEncoderCudaInterop with my minimal code, I finally found the item that makes the difference between success and failure: its the pitch parameter of the NV_ENC_REGISTER_RESOURCE structure passed to nvEncRegisterResource().
I haven't seen it documented anywhere, but there's a hard limit on that value, which I've determined experimentally to be at 2560. Anything above that will result in NV_ENC_ERR_RESOURCE_REGISTER_FAILED.
It does not appear to matter that the pitch I was passing was calculated by another API call, cuMemAllocPitch().
(Another thing that was missing from my code was "locking" and unlocking the CUDA context to the current thread via cuCtxPushCurrent() and cuCtxPopCurrent(). Done in the sample via a RAII class.)
EDIT:
I have worked around the problem by doing something for which I had another reason: using NV12 as input format for the encoder instead of YUV444.
With NV12, the pitch parameter drops below the 2560 limit because the byte size per row is equal to the width, so in my case 1920 bytes.
This was necessary (at the time) because my graphics card was a GTX 760 with a "Kepler" GPU, which (as I was initially unaware) only supports NV12 as input format for NVEnc. I have since upgraded to a GTX 970, but as I just found out, the 2560 limit is still there.
This makes me wonder just how exactly one is expected to use NVEnc with YUV444. The only possibility that comes to my mind is to use non-pitched memory, which seems bizarre. I'd appreciate comments from people who've actually used NVEnc with YUV444.
EDIT #2 - PENDING FURTHER UPDATE:
New information has surfaced in the form of another SO question: NVencs Output Bitstream is not readable
It is quite possible that my answer so far was wrong. It seems now that the pitch should not only be set when registering the CUDA resource, but also when actually sending it to the encoder via nvEncEncodePicture(). I cannot check this right now, but I will next time I work on that project.