WinRT C++ (Win10) Accessing bytes from SoftwareBitmap / BitmapBuffer - c++

To process my previewFrames of my camera in OpenCV, I need access to the raw Pixel data / bytes. So, there is the new SoftwareBitmap, which should exactly provide this.
There is an example for c#, but in visual c++ I can't get the IMemoryBufferByteAccess (see remarks) Interface working.
Code with Exceptions:
// Capture the preview frame
return create_task(_mediaCapture->GetPreviewFrameAsync(videoFrame))
.then([this](VideoFrame^ currentFrame)
{
// Collect the resulting frame
auto previewFrame = currentFrame->SoftwareBitmap;
auto buffer = previewFrame->LockBuffer(Windows::Graphics::Imaging::BitmapBufferAccessMode::ReadWrite);
auto reference = buffer->CreateReference();
// Get a pointer to the pixel buffer
byte* pData = nullptr;
UINT capacity = 0;
// Obtain ByteAccess
ComPtr<IUnknown> inspectable = reinterpret_cast<IUnknown*>(buffer);
// Query the IBufferByteAccess interface.
Microsoft::WRL::ComPtr<IMemoryBufferByteAccess> bufferByteAccess;
ThrowIfFailed(inspectable.As(&bufferByteAccess)); // ERROR ---> Throws HRESULT = E_NOINTERFACE
// Retrieve the buffer data.
ThrowIfFailed(bufferByteAccess->GetBuffer(_Out_ &pData, _Out_ &capacity)); // ERROR ---> Throws HRESULT = E_NOINTERFACE, because bufferByteAccess is null
I tried this too:
HRESULT hr = ((IMemoryBufferByteAccess*)reference)->GetBuffer(&pData, &capacity);
HRESULT is ok, but I can't access pData -> Access Violation reading Memory.
Thanks for your help.

You should use reference instead of buffer in reinterpret_cast.
#include "pch.h"
#include <wrl\wrappers\corewrappers.h>
#include <wrl\client.h>
MIDL_INTERFACE("5b0d3235-4dba-4d44-865e-8f1d0e4fd04d")
IMemoryBufferByteAccess : IUnknown
{
virtual HRESULT STDMETHODCALLTYPE GetBuffer(
BYTE **value,
UINT32 *capacity
);
};
auto previewFrame = currentFrame->SoftwareBitmap;
auto buffer = previewFrame->LockBuffer(BitmapBufferAccessMode::ReadWrite);
auto reference = buffer->CreateReference();
ComPtr<IMemoryBufferByteAccess> bufferByteAccess;
HRESULT result = reinterpret_cast<IInspectable*>(reference)->QueryInterface(IID_PPV_ARGS(&bufferByteAccess));
if (result == S_OK)
{
WriteLine("Get interface successfully");
BYTE* data = nullptr;
UINT32 capacity = 0;
result = bufferByteAccess->GetBuffer(&data, &capacity);
if (result == S_OK)
{
WriteLine("get data access successfully, capacity: " + capacity);
}
}

Based on answer from #jeffrey-chen and example from #kennykerr, I've assembled a tiny bit cleaner solution:
#include <wrl/client.h>
// other includes, as required by your project
MIDL_INTERFACE("5b0d3235-4dba-4d44-865e-8f1d0e4fd04d")
IMemoryBufferByteAccess : ::IUnknown
{
virtual HRESULT __stdcall GetBuffer(BYTE **value, UINT32 *capacity) = 0;
};
// your code:
auto previewFrame = currentFrame->SoftwareBitmap;
auto buffer = previewFrame->LockBuffer(BitmapBufferAccessMode::ReadWrite);
auto bufferByteAccess= buffer->CreateReference().as<IMemoryBufferByteAccess>();
WriteLine("Get interface successfully"); // otherwise - exception is thrown
BYTE* data = nullptr;
UINT32 capacity = 0;
winrt::check_hresult(bufferByteAccess->GetBuffer(&data, &capacity));
WriteLine("get data access successfully, capacity: " + capacity);

I'm currently accessing the raw unsigned char* data from each frame I obtain on a MediaFrameReader::FrameArrived event without using WRL and COM...
Here it is how:
void MainPage::OnFrameArrived(MediaFrameReader ^reader, MediaFrameArrivedEventArgs ^args)
{
MediaFrameReference ^mfr = reader->TryAcquireLatestFrame();
VideoMediaFrame ^vmf = mfr->VideoMediaFrame;
VideoFrame ^vf = vmf->GetVideoFrame();
SoftwareBitmap ^sb = vf->SoftwareBitmap;
Buffer ^buff = ref new Buffer(sb->PixelHeight * sb->PixelWidth * 2);
sb->CopyToBuffer(buff);
DataReader ^dataReader = DataReader::FromBuffer(buffer);
Platform::Array<unsigned char, 1> ^arr = ref new Platform::Array<unsigned char, 1>(buffer->Length);
dataReader->ReadBytes(arr);
// here arr->Data is a pointer to the raw pixel data
}
NOTE: The MediaCapture object needs to be configured with MediaCaptureMemoryPreference::Cpu in order to have a valid SoftwareBitmap
Hope the above helps someone

Related

Intel OneAPI Video decoding memory leak when using C++ CLI

I am trying to use Intel OneAPI/OneVPL to decode a stream I receive from an RTSP Camera in C#. But when I run the code I get an enormous memory leak. Around 1-200MB per run, which is around once every second.
When I've collected a GoP from the camera where I know the first data is a keyframe I pass it as a byte array to my CLI and C++ code.
Here I expect it to decode all the frames and return decoded images. It receives 30 frames and returns 16 decoded images but has a memory leak.
I've tried to use Visual Studio memory profiler and all I can tell from it is that its unmanaged memory that's my problem. I've tried to override the "new" and "delete" method inside videoHandler.cpp to track and compare all allocations and deallocations and as far as I can tell everything is handled correctly in there. I cannot see any classes that get instantiated that do not get cleaned up. I think my issue is in the CLI class videoHandlerWrapper.cpp. Am I missing something obvious?
videoHandlerWrapper.cpp
array<imgFrameWrapper^>^ videoHandlerWrapper::decode(array<System::Byte>^ byteArray)
{
array<imgFrameWrapper^>^ returnFrames = gcnew array<imgFrameWrapper^>(30);
{
std::vector<imgFrame> frames(30); //Output from decoding process. imgFrame implements a deconstructor that will rid the data when exiting scope
std::vector<unsigned char> bytes(byteArray->Length); //Input for decoding process
Marshal::Copy(byteArray, 0, IntPtr((unsigned char*)(&((bytes)[0]))), byteArray->Length); //Copy from managed (C#) to unmanaged (C++)
int status = _pVideoHandler->decode(bytes, frames); //Decode
for (size_t i = 0; i < frames.size(); i++)
{
if (frames[i].size > 0)
returnFrames[i] = gcnew imgFrameWrapper(frames[i].size, frames[i].bytes);
}
}
//PrintMemoryUsage();
return returnFrames;
}
videoHandler.cpp
#define BITSTREAM_BUFFER_SIZE 2000000 //TODO Maybe higher or lower bitstream buffer. Thorough testing has been done at 2000000
int videoHandler::decode(std::vector<unsigned char> bytes, std::vector<imgFrame> &frameData)
{
int result = -1;
bool isStillGoing = true;
mfxBitstream bitstream = { 0 };
mfxSession session = NULL;
mfxStatus sts = MFX_ERR_NONE;
mfxSurfaceArray* outSurfaces = nullptr;
mfxU32 framenum = 0;
mfxU32 numVPPCh = 0;
mfxVideoChannelParam* mfxVPPChParams = nullptr;
void* accelHandle = NULL;
mfxVideoParam mfxDecParams = {};
mfxVersion version = { 0, 1 };
//variables used only in 2.x version
mfxConfig cfg = NULL;
mfxLoader loader = NULL;
mfxVariant inCodec = {};
std::vector<mfxU8> input_buffer;
// Initialize VPL session for any implementation of HEVC/H265 decode
loader = MFXLoad();
VERIFY(NULL != loader, "MFXLoad failed -- is implementation in path?");
cfg = MFXCreateConfig(loader);
VERIFY(NULL != cfg, "MFXCreateConfig failed")
inCodec.Type = MFX_VARIANT_TYPE_U32;
inCodec.Data.U32 = MFX_CODEC_AVC;
sts = MFXSetConfigFilterProperty(
cfg,
(mfxU8*)"mfxImplDescription.mfxDecoderDescription.decoder.CodecID",
inCodec);
VERIFY(MFX_ERR_NONE == sts, "MFXSetConfigFilterProperty failed for decoder CodecID");
sts = MFXCreateSession(loader, 0, &session);
VERIFY(MFX_ERR_NONE == sts, "Not able to create VPL session");
// Print info about implementation loaded
version = ShowImplInfo(session);
//VERIFY(version.Major > 1, "Sample requires 2.x API implementation, exiting");
if (version.Major == 1) {
mfxVariant ImplValueSW;
ImplValueSW.Type = MFX_VARIANT_TYPE_U32;
ImplValueSW.Data.U32 = MFX_IMPL_TYPE_SOFTWARE;
MFXSetConfigFilterProperty(cfg, (mfxU8*)"mfxImplDescription.Impl", ImplValueSW);
sts = MFXCreateSession(loader, 0, &session);
VERIFY(MFX_ERR_NONE == sts, "Not able to create VPL session");
}
// Convenience function to initialize available accelerator(s)
accelHandle = InitAcceleratorHandle(session);
bitstream.MaxLength = BITSTREAM_BUFFER_SIZE;
bitstream.Data = (mfxU8*)calloc(bytes.size(), sizeof(mfxU8));
VERIFY(bitstream.Data, "Not able to allocate input buffer");
bitstream.CodecId = MFX_CODEC_AVC;
std::copy(bytes.begin(), bytes.end(), bitstream.Data);
bitstream.DataLength = static_cast<mfxU32>(bytes.size());
memset(&mfxDecParams, 0, sizeof(mfxDecParams));
mfxDecParams.mfx.CodecId = MFX_CODEC_AVC;
mfxDecParams.IOPattern = MFX_IOPATTERN_OUT_SYSTEM_MEMORY;
sts = MFXVideoDECODE_DecodeHeader(session, &bitstream, &mfxDecParams);
VERIFY(MFX_ERR_NONE == sts, "Error decoding header\n");
numVPPCh = 1;
mfxVPPChParams = new mfxVideoChannelParam[numVPPCh];
for (mfxU32 i = 0; i < numVPPCh; i++) {
mfxVPPChParams[i] = {};
}
//mfxVPPChParams[0].VPP.FourCC = mfxDecParams.mfx.FrameInfo.FourCC;
mfxVPPChParams[0].VPP.FourCC = MFX_FOURCC_BGRA;
mfxVPPChParams[0].VPP.ChromaFormat = MFX_CHROMAFORMAT_YUV420;
mfxVPPChParams[0].VPP.PicStruct = MFX_PICSTRUCT_PROGRESSIVE;
mfxVPPChParams[0].VPP.FrameRateExtN = 30;
mfxVPPChParams[0].VPP.FrameRateExtD = 1;
mfxVPPChParams[0].VPP.CropW = 1920;
mfxVPPChParams[0].VPP.CropH = 1080;
//Set value directly if input and output is the same.
mfxVPPChParams[0].VPP.Width = 1920;
mfxVPPChParams[0].VPP.Height = 1080;
//// USED TO RESIZE. IF INPUT IS THE SAME AS OUTPUT THIS WILL MAKE IT SHIFT A BIT. 1920x1080 becomes 1920x1088.
//mfxVPPChParams[0].VPP.Width = ALIGN16(mfxVPPChParams[0].VPP.CropW);
//mfxVPPChParams[0].VPP.Height = ALIGN16(mfxVPPChParams[0].VPP.CropH);
mfxVPPChParams[0].VPP.ChannelId = 1;
mfxVPPChParams[0].Protected = 0;
mfxVPPChParams[0].IOPattern = MFX_IOPATTERN_IN_SYSTEM_MEMORY | MFX_IOPATTERN_OUT_SYSTEM_MEMORY;
mfxVPPChParams[0].ExtParam = NULL;
mfxVPPChParams[0].NumExtParam = 0;
sts = MFXVideoDECODE_VPP_Init(session, &mfxDecParams, &mfxVPPChParams, numVPPCh); //This causes a MINOR memory leak!
outSurfaces = new mfxSurfaceArray;
while (isStillGoing == true) {
sts = MFXVideoDECODE_VPP_DecodeFrameAsync(session,
&bitstream,
NULL,
0,
&outSurfaces); //Big memory leak. 100MB pr run in the while loop.
switch (sts) {
case MFX_ERR_NONE:
// decode output
if (framenum >= 30)
{
isStillGoing = false;
break;
}
sts = WriteRawFrameToByte(outSurfaces->Surfaces[1], &frameData[framenum]);
VERIFY(MFX_ERR_NONE == sts, "Could not write 1st vpp output");
framenum++;
break;
case MFX_ERR_MORE_DATA:
// The function requires more bitstream at input before decoding can proceed
isStillGoing = false;
break;
case MFX_ERR_MORE_SURFACE:
// The function requires more frame surface at output before decoding can proceed.
// This applies to external memory allocations and should not be expected for
// a simple internal allocation case like this
break;
case MFX_ERR_DEVICE_LOST:
// For non-CPU implementations,
// Cleanup if device is lost
break;
case MFX_WRN_DEVICE_BUSY:
// For non-CPU implementations,
// Wait a few milliseconds then try again
break;
case MFX_WRN_VIDEO_PARAM_CHANGED:
// The decoder detected a new sequence header in the bitstream.
// Video parameters may have changed.
// In external memory allocation case, might need to reallocate the output surface
break;
case MFX_ERR_INCOMPATIBLE_VIDEO_PARAM:
// The function detected that video parameters provided by the application
// are incompatible with initialization parameters.
// The application should close the component and then reinitialize it
break;
case MFX_ERR_REALLOC_SURFACE:
// Bigger surface_work required. May be returned only if
// mfxInfoMFX::EnableReallocRequest was set to ON during initialization.
// This applies to external memory allocations and should not be expected for
// a simple internal allocation case like this
break;
default:
printf("unknown status %d\n", sts);
isStillGoing = false;
break;
}
}
sts = MFXVideoDECODE_VPP_Close(session); // Helps massively! Halves the memory leak speed. Closes internal structures and tables.
VERIFY(MFX_ERR_NONE == sts, "Error closing VPP session\n");
result = 0;
end:
printf("Decode and VPP processed %d frames\n", framenum);
// Clean up resources - It is recommended to close components first, before
// releasing allocated surfaces, since some surfaces may still be locked by
// internal resources.
if (mfxVPPChParams)
delete[] mfxVPPChParams;
if (outSurfaces)
delete outSurfaces;
if (bitstream.Data)
free(bitstream.Data);
if (accelHandle)
FreeAcceleratorHandle(accelHandle);
if (loader)
MFXUnload(loader);
return result;
}
imgFrameWrapper.h
public ref class imgFrameWrapper
{
private:
size_t size;
array<System::Byte>^ bytes;
public:
imgFrameWrapper(size_t u_size, unsigned char* u_bytes);
~imgFrameWrapper();
!imgFrameWrapper();
size_t get_size();
array<System::Byte>^ get_bytes();
};
imgFrameWrapper.cpp
imgFrameWrapper::imgFrameWrapper(size_t u_size, unsigned char* u_bytes)
{
size = u_size;
bytes = gcnew array<System::Byte>(size);
Marshal::Copy((IntPtr)u_bytes, bytes, 0, size);
}
imgFrameWrapper::~imgFrameWrapper()
{
}
imgFrameWrapper::!imgFrameWrapper()
{
}
size_t imgFrameWrapper::get_size()
{
return size;
}
array<System::Byte>^ imgFrameWrapper::get_bytes()
{
return bytes;
}
imgFrame.h
struct imgFrame
{
int size;
unsigned char* bytes;
~imgFrame()
{
if (bytes)
delete[] bytes;
}
};
MFXVideoDECODE_VPP_DecodeFrameAsync() function creates internal memory surfaces for the processing.
You should release surfaces.
Please check this link it's mentioning about it.
https://spec.oneapi.com/onevpl/latest/API_ref/VPL_structs_decode_vpp.html#_CPPv415mfxSurfaceArray
mfxStatus (*Release)(struct mfxSurfaceArray *surface_array)ΒΆ
Decrements the internal reference counter of the surface. (*Release) should be
called after using the (*AddRef) function to add a surface or when allocation
logic requires it.
And please check this sample.
https://github.com/oneapi-src/oneVPL/blob/master/examples/hello-decvpp/src/hello-decvpp.cpp
Especially, WriteRawFrame_InternalMem() function in https://github.com/oneapi-src/oneVPL/blob/17968d8d2299352f5a9e09388d24e81064c81c87/examples/util/util/util.h
It shows how to release surfaces.

DirectX 11 Shader Reflection: Need help setting Constant Buffers variables by name

I'm creating a system that allows the manipulation of constant buffer variables by name using their byte offsets and byte sizes via Shader Reflection. I can bind the buffers to the Device Context just fine, but none of the cubes in my 3D scene are showing up. I believe this is because something is wrong with how I'm mapping data to the Constant Buffers. I cast the struct, be it a float4 or a matrix to a void pointer, and then copy that void pointer data to a another structure that has the variable's name. Once the shader needs to have its buffers updated before a draw call, I map the data of every structure in the list during the Map/Unmap call with a pointer iterator. Also, there seems to be a crash whenever the program calls the destructor on one of the shader constant structures. Below is the code i've written so far:
I'm mapping the buffer data through this algorithm here:
void DynamicConstantBuffer::UpdateChanges(ID3D11DeviceContext* pDeviceContext)
{
D3D11_MAPPED_SUBRESOURCE mappedResource;
HRESULT hr = pDeviceContext->Map(m_Buffer.Get(), 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
if (FAILED(hr)) return;
// Set mapped data
for (const auto& constant : m_BufferConstants)
{
char* startPosition = static_cast<char*>(mappedResource.pData) + constant.desc.StartOffset;
memcpy(startPosition, constant.pData, sizeof(constant.pData));
}
// Copy memory and unmap
pDeviceContext->Unmap(m_Buffer.Get(), 0);
}
I'm initializing the Constant Buffer here:
BOOL DynamicConstantBuffer::Initialize(UINT nBufferSlot, ID3D11Device* pDevice, ID3D11ShaderReflection* pShaderReflection)
{
ID3D11ShaderReflectionConstantBuffer* pReflectionBuffer = NULL;
D3D11_SHADER_BUFFER_DESC cbShaderDesc = {};
// Fetch constant buffer description
if (!(pReflectionBuffer = pShaderReflection->GetConstantBufferByIndex(nBufferSlot)))
return FALSE;
// Get description
pReflectionBuffer->GetDesc(&cbShaderDesc);
m_BufferSize = cbShaderDesc.Size;
// Create constant buffer on gpu end
D3D11_BUFFER_DESC cbDescription = {};
cbDescription.Usage = D3D11_USAGE_DYNAMIC;
cbDescription.BindFlags = D3D11_BIND_CONSTANT_BUFFER;
cbDescription.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
cbDescription.ByteWidth = cbShaderDesc.Size;
if (FAILED(pDevice->CreateBuffer(&cbDescription, NULL, m_Buffer.GetAddressOf())))
return FALSE;
// Poll shader variables
for (UINT i = 0; i < cbShaderDesc.Variables; i++)
{
ID3D11ShaderReflectionVariable* pVariable = NULL;
pVariable = pReflectionBuffer->GetVariableByIndex(i);
// Get variable description
D3D11_SHADER_VARIABLE_DESC variableDesc = {};
pVariable->GetDesc(&variableDesc);
// Push variable back into list of variables
m_BufferConstants.push_back(ShaderConstant(variableDesc));
}
return TRUE;}
Here's the method that sets a variable within the constant buffer:
BOOL DynamicConstantBuffer::SetConstantVariable(const std::string& varName, const void* varData)
{
for (auto& v : m_BufferConstants)
{
if (v.desc.Name == varName)
{
memcpy(v.pData, varData, sizeof(varData));
bBufferDirty = TRUE;
return TRUE;
}
}
// No variable to assign :(
return FALSE;
}
Here's the class layout for ShaderConstant:
class ShaderConstant
{
public:
ShaderConstant(D3D11_SHADER_VARIABLE_DESC& desc)
{
this->desc = desc;
pData = new char[desc.Size];
_size = desc.Size;
};
~ShaderConstant()
{
if (!pData)
return;
delete[] pData;
pData = NULL;
}
D3D11_SHADER_VARIABLE_DESC desc;
void* pData;
size_t _size;
};
Any help at all would be appreciated. Thank you.

Pass byte array within struct to a com object

I wrote a C++ COM server (out-of-proc) and client so:
idl (interface is IDispatch):
typedef[uuid(0952A366-20CC-4342-B590-2D8920D61613)]
struct MyStruct{
LONG id;
BYTE* data;
} MyStruct;
[helpstring("")] HRESULT foo([out] MyStruct* pStreamInfo);
server:
STDMETHODIMP foo(MyStruct* myStruct)
{
myStruct.id = 7;
myStruct.data = pData; // pData is a pointer to some data of variable length
return S_OK;
}
client:
MyStruct ms;
hr = comObj->foo(&ms);
The code will work fine except when adding the myStruct.data = pData; line which crashes the server. Assigning memory in the client e.g. ms.data = new BYTE[1000] does not help as the pointer still arrives to foo as NULL.
What would be a solution, 1. preferably most simple one for client side since interface will be used by various users 2. Would there by a different solution if interface is used by C# client 3. If data needs to be out of the struct (I hope not) is there a reference to a complete example.
As others have mentioned in the comments, you cannot pass a raw array this way. Minimally, you have to copy the byte array into a SAFEARRAY of bytes (SAFEARRAY(BYTE) in IDL). I just tested the code below with a custom proxy/stub (compiled from the P/S code generated by midl.exe), and I was able to get my data across the wire.
If you want to use a standard P/S such as PSDispatch ({00020420-0000-0000-C000-000000000046}) or PSOAInterface ({00020424-0000-0000-C000-000000000046}), or if you want to use VBA as a client, then you may have to convert this to a SAFEARRAY(VARIANT) and/or put the resulting safearray into a VARIANT. Try the simplest approach of just using SAFEARRAY(BYTE) first, because that's the one with the least overhead. (A SAFEARRAY(VARIANT) uses 16x more memory than a SAFEARRAY(BYTE) because a VARIANT is 16 bytes long. And you would have to manually convert each byte to/from a VARIANT, as opposed to the simple memcpy calls show below.)
Packing the byte array into a SAFEARRAY: (Note that this copies the byte array into the SAFEARRAY. You could muck around with the internals of the SAFEARRAY struct to prevent the copy, but you'd be doing things in a non-standard way.)
/// <summary>Packs an array of bytes into a SAFEARRAY.</summary>
/// <param name="count">The number of bytes.</param>
/// <param name="pData">A reference to the byte array. Not read if count is 0.</param>
/// <param name="pResult">Receives the packed LPSAFEARRAY on success.</param>
HRESULT PackBytes(ULONG count, const BYTE* pData, /*[ref]*/ LPSAFEARRAY* pResult)
{
// initialize output parameters
*pResult = NULL;
// describe the boundaries of the safearray (1 dimension of the specified length, starting at standard index 1)
SAFEARRAYBOUND bound{ count, 1 };
// create the safearray
LPSAFEARRAY safearray = SafeArrayCreate(VT_UI1, 1, &bound);
if (!safearray)
return E_OUTOFMEMORY;
// when there is actually data...
if (count > 0)
{
// begin accessing the safearray data
BYTE* safearrayData;
HRESULT hr = SafeArrayAccessData(safearray, reinterpret_cast<LPVOID*>(&safearrayData));
if (FAILED(hr))
{
SafeArrayDestroy(safearray);
return hr;
}
// copy the data into the safearray
memcpy(safearrayData, pData, count);
// finish accessing the safearray data
hr = SafeArrayUnaccessData(safearray);
if (FAILED(hr))
{
SafeArrayDestroy(safearray);
return hr;
}
}
// set output parameters
*pResult = safearray;
// success
return S_OK;
}
Unpacking the byte array from the SAFEARRAY: (Note that this copies the byte array from the SAFEARRAY. You could muck around with the internals of the SAFEARRAY struct to prevent the copy, but you'd be doing things in a non-standard way. Also, you could choose to use the data directly from the SAFEARRAY by putting the consuming code between SafeArrayAccessData and SafeArrayUnaccessData.)
/// <summary>Unpacks an array of bytes from a SAFEARRAY.</summary>
/// <param name="safearray">The source SAFEARRAY.</param>
/// <param name="pCount">A pointer to a ULONG that will receive the number of bytes.</param>
/// <param name="ppData">A pointer to a BYTE* that will receive a reference to the new byte array.
/// This array must be deallocated (delete []) when the caller is done with it.</param>
HRESULT UnpackBytes(LPSAFEARRAY safearray, /*[out]*/ ULONG* pCount, /*[out]*/ BYTE** ppData)
{
// initialize output parameters
*pCount = 0;
*ppData = NULL;
// validate the safearray element type (must be VT_UI1)
VARTYPE vartype;
HRESULT hr = SafeArrayGetVartype(safearray, &vartype);
if (FAILED(hr))
return hr;
if (vartype != VT_UI1)
return E_INVALIDARG;
// validate the number of dimensions (must be 1)
UINT dim = SafeArrayGetDim(safearray);
if (dim != 1)
return E_INVALIDARG;
// get the lower bound of dimension 1
LONG lBound;
hr = SafeArrayGetLBound(safearray, 1, &lBound);
if (FAILED(hr))
return hr;
// get the upper bound of dimension 1
LONG uBound;
hr = SafeArrayGetUBound(safearray, 1, &uBound);
if (FAILED(hr))
return hr;
// if the upper bound is less than the lower bound, it's an empty array
if (uBound < lBound)
return S_OK;
// calculate the count of the bytes
ULONG count = uBound - lBound + 1;
// create buffer
BYTE* pData = new BYTE[count];
if (!pData)
return E_OUTOFMEMORY;
// begin accessing the safearray data
BYTE* safearrayData;
hr = SafeArrayAccessData(safearray, reinterpret_cast<LPVOID*>(&safearrayData));
if (FAILED(hr))
{
delete[] pData;
return hr;
}
// copy the data
memcpy(pData, safearrayData, count);
// finish accessing the safearray data
hr = SafeArrayUnaccessData(safearray);
if (FAILED(hr))
{
delete[] pData;
return hr;
}
// set output parameters
*pCount = count;
*ppData = pData;
// success
return S_OK;
}

nvEncRegisterResource() fails with -23

I've hit a complete brick wall in my attempt to use NVEnc to stream OpenGL frames as H264. I've been at this particular issue for close to 8 hours without any progress.
The problem is the call to nvEncRegisterResource(), which invariably fails with code -23 (enum value NV_ENC_ERR_RESOURCE_REGISTER_FAILED, documented as "failed to register the resource" - thanks NVidia).
I'm trying to follow a procedure outlined in this document from the University of Oslo (page 54, "OpenGL interop"), so I know for a fact that this is supposed to work, though unfortunately said document does not provide the code itself.
The idea is fairly straightforward:
map the texture produced by the OpenGL frame buffer object into CUDA;
copy the texture into a (previously allocated) CUDA buffer;
map that buffer as an NVEnc input resource
use that input resource as the source for the encoding
As I said, the problem is step (3). Here are the relevant code snippets (I'm omitting error handling for brevity.)
// Round up width and height
priv->encWidth = (_resolution.w + 31) & ~31, priv->encHeight = (_resolution.h + 31) & ~31;
// Allocate CUDA "pitched" memory to match the input texture (YUV, one byte per component)
cuErr = cudaMallocPitch(&priv->cudaMemPtr, &priv->cudaMemPitch, 3 * priv->encWidth, priv->encHeight);
This should allocate on-device CUDA memory (the "pitched" variety, though I've tried non-pitched too, without any change in the outcome.)
// Register the CUDA buffer as an input resource
NV_ENC_REGISTER_RESOURCE regResParams = { 0 };
regResParams.version = NV_ENC_REGISTER_RESOURCE_VER;
regResParams.resourceType = NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
regResParams.width = priv->encWidth;
regResParams.height = priv->encHeight;
regResParams.bufferFormat = NV_ENC_BUFFER_FORMAT_YUV444_PL;
regResParams.resourceToRegister = priv->cudaMemPtr;
regResParams.pitch = priv->cudaMemPitch;
encStat = nvEncApi.nvEncRegisterResource(priv->nvEncoder, &regResParams);
// ^^^ FAILS
priv->nvEncInpRes = regResParams.registeredResource;
This is the brick wall. No matter what I try, nvEncRegisterResource() fails.
I should note that I rather think (though I may be wrong) that I've done all the required initializations. Here is the code that creates and activates the CUDA context:
// Pop the current context
cuRes = cuCtxPopCurrent(&priv->cuOldCtx);
// Create a context for the device
priv->cuCtx = nullptr;
cuRes = cuCtxCreate(&priv->cuCtx, CU_CTX_SCHED_BLOCKING_SYNC, priv->cudaDevice);
// Push our context
cuRes = cuCtxPushCurrent(priv->cuCtx);
.. followed by the creation of the encoding session:
// Create an NV Encoder session
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS nvEncSessParams = { 0 };
nvEncSessParams.apiVersion = NVENCAPI_VERSION;
nvEncSessParams.version = NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS_VER;
nvEncSessParams.deviceType = NV_ENC_DEVICE_TYPE_CUDA;
nvEncSessParams.device = priv->cuCtx; // nullptr
auto encStat = nvEncApi.nvEncOpenEncodeSessionEx(&nvEncSessParams, &priv->nvEncoder);
And finally, the code initializing the encoder:
// Configure the encoder via preset
NV_ENC_PRESET_CONFIG presetConfig = { 0 };
GUID codecGUID = NV_ENC_CODEC_H264_GUID;
GUID presetGUID = NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID;
presetConfig.version = NV_ENC_PRESET_CONFIG_VER;
presetConfig.presetCfg.version = NV_ENC_CONFIG_VER;
encStat = nvEncApi.nvEncGetEncodePresetConfig(priv->nvEncoder, codecGUID, presetGUID, &presetConfig);
NV_ENC_INITIALIZE_PARAMS initParams = { 0 };
initParams.version = NV_ENC_INITIALIZE_PARAMS_VER;
initParams.encodeGUID = codecGUID;
initParams.encodeWidth = priv->encWidth;
initParams.encodeHeight = priv->encHeight;
initParams.darWidth = 1;
initParams.darHeight = 1;
initParams.frameRateNum = 25; // TODO: make this configurable
initParams.frameRateDen = 1; // ditto
// .max_surface_count = (num_mbs >= 8160) ? 32 : 48;
// .buffer_delay ? necessary
initParams.enableEncodeAsync = 0;
initParams.enablePTD = 1;
initParams.presetGUID = presetGUID;
memcpy(&priv->nvEncConfig, &presetConfig.presetCfg, sizeof(priv->nvEncConfig));
initParams.encodeConfig = &priv->nvEncConfig;
encStat = nvEncApi.nvEncInitializeEncoder(priv->nvEncoder, &initParams);
All the above initializations report success.
I'd be extremely grateful to anyone who can get me past this hurdle.
EDIT: here is the complete code to reproduce the problem. The only observable difference to the original code is that cuPopContext() returns an error (which can be ignored) here - probably my original program creates such a context as a side effect of using OpenGL. Otherwise, the code behaves exactly as the original does.
I've built the code with Visual Studio 2013. You must link the following library file (adapt path if not on C:): C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\Win32\cuda.lib
You must also make sure that C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\ (or similar) is in the include path.
NEW EDIT: modified the code to only use the CUDA driver interface, instead of mixing with the runtime API. Still the same error code.
#ifdef _WIN32
#include <Windows.h>
#endif
#include <cassert>
#include <GL/gl.h>
#include <iostream>
#include <string>
#include <stdexcept>
#include <string>
#include <cuda.h>
//#include <cuda_runtime.h>
#include <cuda_gl_interop.h>
#include <nvEncodeAPI.h>
// NV Encoder API ---------------------------------------------------
#if defined(_WIN32)
#define LOAD_FUNC(l, s) GetProcAddress(l, s)
#define DL_CLOSE_FUNC(l) FreeLibrary(l)
#else
#define LOAD_FUNC(l, s) dlsym(l, s)
#define DL_CLOSE_FUNC(l) dlclose(l)
#endif
typedef NVENCSTATUS(NVENCAPI* PNVENCODEAPICREATEINSTANCE)(NV_ENCODE_API_FUNCTION_LIST *functionList);
struct NVEncAPI : public NV_ENCODE_API_FUNCTION_LIST {
public:
// ~NVEncAPI() { cleanup(); }
void init() {
#if defined(_WIN32)
if (sizeof(void*) == 8) {
nvEncLib = LoadLibrary(TEXT("nvEncodeAPI64.dll"));
}
else {
nvEncLib = LoadLibrary(TEXT("nvEncodeAPI.dll"));
}
if (nvEncLib == NULL) throw std::runtime_error("Failed to load NVidia Encoder library: " + std::to_string(GetLastError()));
#else
nvEncLib = dlopen("libnvidia-encode.so.1", RTLD_LAZY);
if (nvEncLib == nullptr)
throw std::runtime_error("Failed to load NVidia Encoder library: " + std::string(dlerror()));
#endif
auto nvEncodeAPICreateInstance = (PNVENCODEAPICREATEINSTANCE) LOAD_FUNC(nvEncLib, "NvEncodeAPICreateInstance");
version = NV_ENCODE_API_FUNCTION_LIST_VER;
NVENCSTATUS encStat = nvEncodeAPICreateInstance(static_cast<NV_ENCODE_API_FUNCTION_LIST *>(this));
}
void cleanup() {
#if defined(_WIN32)
if (nvEncLib != NULL) {
FreeLibrary(nvEncLib);
nvEncLib = NULL;
}
#else
if (nvEncLib != nullptr) {
dlclose(nvEncLib);
nvEncLib = nullptr;
}
#endif
}
private:
#if defined(_WIN32)
HMODULE nvEncLib;
#else
void* nvEncLib;
#endif
bool init_done;
};
static NVEncAPI nvEncApi;
// Encoder class ----------------------------------------------------
class Encoder {
public:
typedef unsigned int uint_t;
struct Size { uint_t w, h; };
Encoder() {
CUresult cuRes = cuInit(0);
nvEncApi.init();
}
void init(const Size & resolution, uint_t texture) {
NVENCSTATUS encStat;
CUresult cuRes;
texSize = resolution;
yuvTex = texture;
// Purely for information
int devCount = 0;
cuRes = cuDeviceGetCount(&devCount);
// Initialize NVEnc
initEncodeSession(); // start an encoding session
initEncoder();
// Register the YUV texture as a CUDA graphics resource
// CODE COMMENTED OUT AS THE INPUT TEXTURE IS NOT NEEDED YET (TO MY UNDERSTANDING) AT SETUP TIME
//cudaGraphicsGLRegisterImage(&priv->cudaInpTexRes, priv->yuvTex, GL_TEXTURE_2D, cudaGraphicsRegisterFlagsReadOnly);
// Allocate CUDA "pitched" memory to match the input texture (YUV, one byte per component)
encWidth = (texSize.w + 31) & ~31, encHeight = (texSize.h + 31) & ~31;
cuRes = cuMemAllocPitch(&cuDevPtr, &cuMemPitch, 4 * encWidth, encHeight, 16);
// Register the CUDA buffer as an input resource
NV_ENC_REGISTER_RESOURCE regResParams = { 0 };
regResParams.version = NV_ENC_REGISTER_RESOURCE_VER;
regResParams.resourceType = NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
regResParams.width = encWidth;
regResParams.height = encHeight;
regResParams.bufferFormat = NV_ENC_BUFFER_FORMAT_YUV444_PL;
regResParams.resourceToRegister = (void*) cuDevPtr;
regResParams.pitch = cuMemPitch;
encStat = nvEncApi.nvEncRegisterResource(nvEncoder, &regResParams);
assert(encStat == NV_ENC_SUCCESS); // THIS IS THE POINT OF FAILURE
nvEncInpRes = regResParams.registeredResource;
}
void cleanup() { /* OMITTED */ }
void encode() {
// THE FOLLOWING CODE WAS NEVER REACHED YET BECAUSE OF THE ISSUE.
// INCLUDED HERE FOR REFERENCE.
CUresult cuRes;
NVENCSTATUS encStat;
cuRes = cuGraphicsResourceSetMapFlags(cuInpTexRes, CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY);
cuRes = cuGraphicsMapResources(1, &cuInpTexRes, 0);
CUarray mappedArray;
cuRes = cuGraphicsSubResourceGetMappedArray(&mappedArray, cuInpTexRes, 0, 0);
cuRes = cuMemcpyDtoA(mappedArray, 0, cuDevPtr, 4 * encWidth * encHeight);
NV_ENC_MAP_INPUT_RESOURCE mapInputResParams = { 0 };
mapInputResParams.version = NV_ENC_MAP_INPUT_RESOURCE_VER;
mapInputResParams.registeredResource = nvEncInpRes;
encStat = nvEncApi.nvEncMapInputResource(nvEncoder, &mapInputResParams);
// TODO: encode...
cuRes = cuGraphicsUnmapResources(1, &cuInpTexRes, 0);
}
private:
struct PrivateData;
void initEncodeSession() {
CUresult cuRes;
NVENCSTATUS encStat;
// Pop the current context
cuRes = cuCtxPopCurrent(&cuOldCtx); // THIS IS ALLOWED TO FAIL (it doesn't
// Create a context for the device
cuCtx = nullptr;
cuRes = cuCtxCreate(&cuCtx, CU_CTX_SCHED_BLOCKING_SYNC, 0);
// Push our context
cuRes = cuCtxPushCurrent(cuCtx);
// Create an NV Encoder session
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS nvEncSessParams = { 0 };
nvEncSessParams.apiVersion = NVENCAPI_VERSION;
nvEncSessParams.version = NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS_VER;
nvEncSessParams.deviceType = NV_ENC_DEVICE_TYPE_CUDA;
nvEncSessParams.device = cuCtx;
encStat = nvEncApi.nvEncOpenEncodeSessionEx(&nvEncSessParams, &nvEncoder);
}
void Encoder::initEncoder()
{
NVENCSTATUS encStat;
// Configure the encoder via preset
NV_ENC_PRESET_CONFIG presetConfig = { 0 };
GUID codecGUID = NV_ENC_CODEC_H264_GUID;
GUID presetGUID = NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID;
presetConfig.version = NV_ENC_PRESET_CONFIG_VER;
presetConfig.presetCfg.version = NV_ENC_CONFIG_VER;
encStat = nvEncApi.nvEncGetEncodePresetConfig(nvEncoder, codecGUID, presetGUID, &presetConfig);
NV_ENC_INITIALIZE_PARAMS initParams = { 0 };
initParams.version = NV_ENC_INITIALIZE_PARAMS_VER;
initParams.encodeGUID = codecGUID;
initParams.encodeWidth = texSize.w;
initParams.encodeHeight = texSize.h;
initParams.darWidth = texSize.w;
initParams.darHeight = texSize.h;
initParams.frameRateNum = 25;
initParams.frameRateDen = 1;
initParams.enableEncodeAsync = 0;
initParams.enablePTD = 1;
initParams.presetGUID = presetGUID;
memcpy(&nvEncConfig, &presetConfig.presetCfg, sizeof(nvEncConfig));
initParams.encodeConfig = &nvEncConfig;
encStat = nvEncApi.nvEncInitializeEncoder(nvEncoder, &initParams);
}
//void cleanupEncodeSession();
//void cleanupEncoder;
Size texSize;
GLuint yuvTex;
uint_t encWidth, encHeight;
CUdeviceptr cuDevPtr;
size_t cuMemPitch;
NV_ENC_CONFIG nvEncConfig;
NV_ENC_INPUT_PTR nvEncInpBuf;
NV_ENC_REGISTERED_PTR nvEncInpRes;
CUdevice cuDevice;
CUcontext cuCtx, cuOldCtx;
void *nvEncoder;
CUgraphicsResource cuInpTexRes;
};
int main(int argc, char *argv[])
{
Encoder encoder;
encoder.init({1920, 1080}, 0); // OMITTED THE TEXTURE AS IT IS NOT NEEDED TO REPRODUCE THE ISSUE
return 0;
}
After comparing the NVidia sample NvEncoderCudaInterop with my minimal code, I finally found the item that makes the difference between success and failure: its the pitch parameter of the NV_ENC_REGISTER_RESOURCE structure passed to nvEncRegisterResource().
I haven't seen it documented anywhere, but there's a hard limit on that value, which I've determined experimentally to be at 2560. Anything above that will result in NV_ENC_ERR_RESOURCE_REGISTER_FAILED.
It does not appear to matter that the pitch I was passing was calculated by another API call, cuMemAllocPitch().
(Another thing that was missing from my code was "locking" and unlocking the CUDA context to the current thread via cuCtxPushCurrent() and cuCtxPopCurrent(). Done in the sample via a RAII class.)
EDIT:
I have worked around the problem by doing something for which I had another reason: using NV12 as input format for the encoder instead of YUV444.
With NV12, the pitch parameter drops below the 2560 limit because the byte size per row is equal to the width, so in my case 1920 bytes.
This was necessary (at the time) because my graphics card was a GTX 760 with a "Kepler" GPU, which (as I was initially unaware) only supports NV12 as input format for NVEnc. I have since upgraded to a GTX 970, but as I just found out, the 2560 limit is still there.
This makes me wonder just how exactly one is expected to use NVEnc with YUV444. The only possibility that comes to my mind is to use non-pitched memory, which seems bizarre. I'd appreciate comments from people who've actually used NVEnc with YUV444.
EDIT #2 - PENDING FURTHER UPDATE:
New information has surfaced in the form of another SO question: NVencs Output Bitstream is not readable
It is quite possible that my answer so far was wrong. It seems now that the pitch should not only be set when registering the CUDA resource, but also when actually sending it to the encoder via nvEncEncodePicture(). I cannot check this right now, but I will next time I work on that project.

Why I got crash when allocating buffer?

I am developing a driver with Filter. So when I write the SendNetBufferListsComplete function in filter.cpp I got a crash (bluescreen). WinDbug pointed to some buffer allocation. The code is here:
Edited:
sendNetBufferListsComplete(
IN PNET_BUFFER_LIST NetBufferLists,
IN ULONG SendCompleteFlags) {
PNET_BUFFER_LIST pNetBufferList = NetBufferLists;
PNET_BUFFER_LIST pNextNetBufferList = NULL;
while (pNetBufferList)
{
pNextNetBufferList = NET_BUFFER_LIST_NEXT_NBL(pNetBufferList);
NET_BUFFER_LIST_NEXT_NBL(pNetBufferList) = NULL;
PNET_BUFFER_LIST pParentNetBufferList = pNetBufferList->ParentNetBufferList;
if (pParentNetBufferList != NULL)
{
NDIS_STATUS status = NET_BUFFER_LIST_STATUS(pNetBufferList);
NdisFreeNetBufferList(pNetBufferList);
if (NdisInterlockedDecrement(&pParentNetBufferList->ChildRefCount) == 0) {
NET_BUFFER_LIST_STATUS(pParentNetBufferList) = status;
NdisFSendNetBufferListsComplete(m_hFilter, pParentNetBufferList, SendCompleteFlags);
}
}
else
{
if(pNetBufferList != NULL)
{
**---windbug pointed here---****
PVOID pBuffer = *(PVOID*) NET_BUFFER_LIST_CONTEXT_DATA_START(pNetBufferList);
PMDL pMdl = NET_BUFFER_FIRST_MDL(NET_BUFFER_LIST_FIRST_NB(pNetBufferList));
if(pMdl)
NdisFreeMdl(pMdl);
if(pBuffer)
delete[] (UCHAR*) pBuffer;
NdisFreeNetBufferList(pNetBufferList);
}
}
NdisInterlockedDecrement(&m_nSendNetBufferListCount);
pNetBufferList = pNextNetBufferList;
}
What is the actual problem? Is it overflow? Or NULL check problems?
In ndish.h
#define NET_BUFFER_LIST_CONTEXT_DATA_START(_NBL) ((PUCHAR)(((_NBL)->Context)+1)+(_NBL)->Context->Offset)
like this . and in Wdm.h
//
// I/O system definitions.
//
// Define a Memory Descriptor List (MDL)
//
// An MDL describes pages in a virtual buffer in terms of physical pages. The
// pages associated with the buffer are described in an array that is allocated
// just after the MDL header structure itself.
//
typedef
_Struct_size_bytes_(_Inexpressible_(sizeof(struct _MDL) + // 747934
(ByteOffset + ByteCount + PAGE_SIZE-1) / PAGE_SIZE * sizeof(PFN_NUMBER)))
struct _MDL {
struct _MDL *Next;
CSHORT Size;
CSHORT MdlFlags;
struct _EPROCESS *Process;
PVOID MappedSystemVa; /* see creators for field size annotations. */
PVOID StartVa; /* see creators for validity; could be address 0. */
ULONG ByteCount;
ULONG ByteOffset;
} MDL, *PMDL;