Libjpeg write image to memory data - c++

I would like to save image into memory (vector) using libjpeg library.
I found there funcitons:
init_destination
empty_output_buffer
term_destination
My question is how to do it safely and properly in parallel programs ? My function may be executed from different threads.
I want to do it in c++ and Visual Studio 2010.
Other libraries with callback functionality always have additional function parameter to store some additional data.
I don't see any way to add any additional parameters e.g. pointer to my local instance of vector.
Edit:
The nice solution of mmy question is here: http://www.christian-etter.de/?cat=48

The nice solution is described here: http://www.christian-etter.de/?cat=48
typedef struct _jpeg_destination_mem_mgr
{
jpeg_destination_mgr mgr;
std::vector<unsigned char> data;
} jpeg_destination_mem_mgr;
Initialization:
static void mem_init_destination( j_compress_ptr cinfo )
{
jpeg_destination_mem_mgr* dst = (jpeg_destination_mem_mgr*)cinfo->dest;
dst->data.resize( JPEG_MEM_DST_MGR_BUFFER_SIZE );
cinfo->dest->next_output_byte = dst->data.data();
cinfo->dest->free_in_buffer = dst->data.size();
}
When we finished then we need to resize buffer to actual size:
static void mem_term_destination( j_compress_ptr cinfo )
{
jpeg_destination_mem_mgr* dst = (jpeg_destination_mem_mgr*)cinfo->dest;
dst->data.resize( dst->data.size() - cinfo->dest->free_in_buffer );
}
When the buffer size is too small then we need to increase it:
static boolean mem_empty_output_buffer( j_compress_ptr cinfo )
{
jpeg_destination_mem_mgr* dst = (jpeg_destination_mem_mgr*)cinfo->dest;
size_t oldsize = dst->data.size();
dst->data.resize( oldsize + JPEG_MEM_DST_MGR_BUFFER_SIZE );
cinfo->dest->next_output_byte = dst->data.data() + oldsize;
cinfo->dest->free_in_buffer = JPEG_MEM_DST_MGR_BUFFER_SIZE;
return true;
}
Callbacks configuration:
static void jpeg_mem_dest( j_compress_ptr cinfo, jpeg_destination_mem_mgr * dst )
{
cinfo->dest = (jpeg_destination_mgr*)dst;
cinfo->dest->init_destination = mem_init_destination;
cinfo->dest->term_destination = mem_term_destination;
cinfo->dest->empty_output_buffer = mem_empty_output_buffer;
}
And sample usage:
jpeg_destination_mem_mgr dst_mem;
jpeg_compress_struct_wrapper cinfo;
j_compress_ptr pcinfo = cinfo;
jpeg_mem_dest( cinfo, &dst_mem);

Related

How to compress/decompress buffer using Fast-LZMA2

I want to compress/decompress a unsigned char buffer using fast-LZMA2 by 7Zip : https://github.com/conor42/fast-lzma2
In the sample there's two function :
static int compress_file(FL2_CStream *fcs)
{
unsigned char in_buffer[8 * 1024];
unsigned char out_buffer[4 * 1024];
FL2_inBuffer in_buf = { in_buffer, sizeof(in_buffer), sizeof(in_buffer) };
FL2_outBuffer out_buf = { out_buffer, sizeof(out_buffer), 0 };
size_t res = 0;
size_t in_size = 0;
size_t out_size = 0;
do {
if (in_buf.pos == in_buf.size) {
in_buf.size = fread(in_buffer, 1, sizeof(in_buffer), fin);
in_size += in_buf.size;
in_buf.pos = 0;
}
res = FL2_compressStream(fcs, &out_buf, &in_buf);
if (FL2_isError(res))
goto error_out;
fwrite(out_buf.dst, 1, out_buf.pos, fout);
out_size += out_buf.pos;
out_buf.pos = 0;
} while (in_buf.size == sizeof(in_buffer));
do {
res = FL2_endStream(fcs, &out_buf);
if (FL2_isError(res))
goto error_out;
fwrite(out_buf.dst, 1, out_buf.pos, fout);
out_size += out_buf.pos;
out_buf.pos = 0;
} while (res);
fprintf(stdout, "\t%ld -> %ld\n", in_size, out_size);
return 0;
error_out:
fprintf(stderr, "Error: %s\n", FL2_getErrorName(res));
return 1;
}
static int decompress_file(FL2_DStream *fds)
{
unsigned char in_buffer[4 * 1024];
unsigned char out_buffer[8 * 1024];
FL2_inBuffer in_buf = { in_buffer, sizeof(in_buffer), sizeof(in_buffer) };
FL2_outBuffer out_buf = { out_buffer, sizeof(out_buffer), 0 };
size_t res;
size_t in_size = 0;
size_t out_size = 0;
do {
if (in_buf.pos == in_buf.size) {
in_buf.size = fread(in_buffer, 1, sizeof(in_buffer), fout);
in_size += in_buf.size;
in_buf.pos = 0;
}
res = FL2_decompressStream(fds, &out_buf, &in_buf);
if (FL2_isError(res))
goto error_out;
/* Discard the output. XXhash will verify the integrity. */
out_size += out_buf.pos;
out_buf.pos = 0;
} while (res && in_buf.size);
fprintf(stdout, "\t%ld -> %ld\n", in_size, out_size);
return 0;
error_out:
fprintf(stderr, "Error: %s\n", FL2_getErrorName(res));
return 1;
}
But I have no idea how to make it work with a buffer and also without size limit like 8*1024
like zlib deflate compression.
I want something like
LZMA2_Compress(void* buffer,size_t bufferSize);
and LZMA2_Decompress(void* buffer,size_t bufferSize);
I want to use this algorithm on some heavy files and Fast LZMA2 is the fastest high ratio compression I found, Please don't suggest me using other methods.
Here's my test code, It's working but just need to correct information:
https://gist.github.com/Bit00009/3241bb66301f8aaba16074537d094e61
Check the header file for all of the functions available. This one looks like the one you need. You will need to cast your buffers as (void *).
High level functions
fast-lzma2.h
...
/*! FL2_compress() :
* Compresses `src` content as a single LZMA2 compressed stream into already allocated `dst`.
* Call FL2_compressMt() to use > 1 thread. Specify nbThreads = 0 to use all cores.
* #return : compressed size written into `dst` (<= `dstCapacity),
* or an error code if it fails (which can be tested using FL2_isError()). */
FL2LIB_API size_t FL2LIB_CALL FL2_compress(void* dst, size_t dstCapacity,
const void* src, size_t srcSize,
int compressionLevel);
...
Management of memory and options
To do explicit memory management (set dictionary size, buffer size, etc.) you need to create a context:
fast-lzma2.h
/*= Compression context
* When compressing many times, it is recommended to allocate a context just once,
* and re-use it for each successive compression operation. This will make workload
* friendlier for system's memory. The context may not use the number of threads requested
* if the library is compiled for single-threaded compression or nbThreads > FL2_MAXTHREADS.
* Call FL2_getCCtxThreadCount to obtain the actual number allocated. */
typedef struct FL2_CCtx_s FL2_CCtx;
FL2LIB_API FL2_CCtx* FL2LIB_CALL FL2_createCCtx(void);
than you can use FL2_CCtx_setParameter() to set the parameters in the context. The possible values for the paramters are listed in FL2_cParameter , and the value FL2_p_dictionarySize will allow you to set the dictionary size.
/*! FL2_CCtx_setParameter() :
* Set one compression parameter, selected by enum FL2_cParameter.
* #result : informational value (typically, the one being set, possibly corrected),
* or an error code (which can be tested with FL2_isError()). */
FL2LIB_API size_t FL2LIB_CALL FL2_CCtx_setParameter(FL2_CCtx* cctx, FL2_cParameter param, size_t value);
Finally you can compress the buffer by calling FL2_compressCCtx()
/*! FL2_compressCCtx() :
* Same as FL2_compress(), but requires an allocated FL2_CCtx (see FL2_createCCtx()). */
FL2LIB_API size_t FL2LIB_CALL FL2_compressCCtx(FL2_CCtx* cctx,
void* dst, size_t dstCapacity,
const void* src, size_t srcSize,
int compressionLevel);

Byte offset greater than Byte Length in BufferView

I'm trying to read data from scene.bin files using Microsoft::glTF SDK. TinyGLTF is not an option. When I try to read MeshPrimitive attribute called TEXCOORD_0 i get a situation where BufferView byteOffset is greater than byteLength. Therefore, I don't know how to properly read given data and my program crashes.
I tried reading data using IStreamReader which is a part of SDK, and is a must when reading bin files using this SDK. I calculate data offset by adding accessor.byteOffset + bufferView.byteOffset which is > byteLength.
struct BuffersAccessors {
Microsoft::glTF::Accessor accessor;
Microsoft::glTF::BufferView view;
Microsoft::glTF::Buffer buffer;
void operator=(BuffersAccessors accessors);
};
template<typename T> struct BufferInfo {
BuffersAccessors buffersAccessors;
std::vector<T> bufferData;
BufferInfo<T>();
BufferInfo<T>(BuffersAccessors buffersAccessors, std::vector<T> bufferData);
const void operator=(const BufferInfo<T> &info) {
buffersAccessors = info.buffersAccessors;
bufferData = info.bufferData;
};
};
template<typename T>
std::vector<T> readBufferData(Microsoft::glTF::Document document, BufferInfo<T> bufferInfo, std::filesystem::path path) {
std::vector<T> stream;
if (bufferInfo.buffersAccessors.buffer.uri.length() > 0 || bufferInfo.buffersAccessors.buffer.byteLength > 0) {
Microsoft::glTF::Buffer buffer = bufferInfo.buffersAccessors.buffer;
path += bufferInfo.buffersAccessors.buffer.uri;
path = std::filesystem::absolute(path);
buffer.uri = path.string();
std::shared_ptr<StreamReader> streamReader = std::make_shared<StreamReader>(path);
Microsoft::glTF::GLTFResourceReader reader(streamReader);
stream = reader.ReadBinaryData<T>(buffer, bufferInfo.buffersAccessors.view);
}
return stream;
}
template<typename T>
BufferInfo<T> getFullBufferData(Microsoft::glTF::Document document, std::string accessorKey, std::filesystem::path path) {
BufferInfo<T> bufferInfo{};
BuffersAccessors mainPart = getBufferAccessorFromDocument(document, accessorKey);
bufferInfo.buffersAccessors = mainPart;
std::vector<T> bufferData = vkglTF::readBufferData<T>(document, bufferInfo, path);
const size_t bufferDataOffset = mainPart.accessor.byteOffset + mainPart.view.byteOffset; //How to properly calculate offset?
bufferData.erase(bufferData.begin(), bufferData.begin() + bufferDataOffset);
bufferInfo.bufferData = bufferData;
return bufferInfo;
}
I expect data in formats like uint8 and uint16 but my program crashes when trying to do bufferData.erase(..).
Edit: This happens while reading WEIGHTS_0 too.
I think the most likely error with your code is the mixing of byte offsets and vector element indices. Have you tried dividing bufferDataOffset by sizeof(T)?
Second, if you only want to read an accessor's data then try using the ReadBinaryData overload that accepts an Accessor parameter instead. That way the glTF SDK will handle all of the offset calculations for you.
There is no documentation but the deserialize sample demonstrates the basic code structure recommended when using the glTF SDK.

nvEncRegisterResource() fails with -23

I've hit a complete brick wall in my attempt to use NVEnc to stream OpenGL frames as H264. I've been at this particular issue for close to 8 hours without any progress.
The problem is the call to nvEncRegisterResource(), which invariably fails with code -23 (enum value NV_ENC_ERR_RESOURCE_REGISTER_FAILED, documented as "failed to register the resource" - thanks NVidia).
I'm trying to follow a procedure outlined in this document from the University of Oslo (page 54, "OpenGL interop"), so I know for a fact that this is supposed to work, though unfortunately said document does not provide the code itself.
The idea is fairly straightforward:
map the texture produced by the OpenGL frame buffer object into CUDA;
copy the texture into a (previously allocated) CUDA buffer;
map that buffer as an NVEnc input resource
use that input resource as the source for the encoding
As I said, the problem is step (3). Here are the relevant code snippets (I'm omitting error handling for brevity.)
// Round up width and height
priv->encWidth = (_resolution.w + 31) & ~31, priv->encHeight = (_resolution.h + 31) & ~31;
// Allocate CUDA "pitched" memory to match the input texture (YUV, one byte per component)
cuErr = cudaMallocPitch(&priv->cudaMemPtr, &priv->cudaMemPitch, 3 * priv->encWidth, priv->encHeight);
This should allocate on-device CUDA memory (the "pitched" variety, though I've tried non-pitched too, without any change in the outcome.)
// Register the CUDA buffer as an input resource
NV_ENC_REGISTER_RESOURCE regResParams = { 0 };
regResParams.version = NV_ENC_REGISTER_RESOURCE_VER;
regResParams.resourceType = NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
regResParams.width = priv->encWidth;
regResParams.height = priv->encHeight;
regResParams.bufferFormat = NV_ENC_BUFFER_FORMAT_YUV444_PL;
regResParams.resourceToRegister = priv->cudaMemPtr;
regResParams.pitch = priv->cudaMemPitch;
encStat = nvEncApi.nvEncRegisterResource(priv->nvEncoder, &regResParams);
// ^^^ FAILS
priv->nvEncInpRes = regResParams.registeredResource;
This is the brick wall. No matter what I try, nvEncRegisterResource() fails.
I should note that I rather think (though I may be wrong) that I've done all the required initializations. Here is the code that creates and activates the CUDA context:
// Pop the current context
cuRes = cuCtxPopCurrent(&priv->cuOldCtx);
// Create a context for the device
priv->cuCtx = nullptr;
cuRes = cuCtxCreate(&priv->cuCtx, CU_CTX_SCHED_BLOCKING_SYNC, priv->cudaDevice);
// Push our context
cuRes = cuCtxPushCurrent(priv->cuCtx);
.. followed by the creation of the encoding session:
// Create an NV Encoder session
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS nvEncSessParams = { 0 };
nvEncSessParams.apiVersion = NVENCAPI_VERSION;
nvEncSessParams.version = NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS_VER;
nvEncSessParams.deviceType = NV_ENC_DEVICE_TYPE_CUDA;
nvEncSessParams.device = priv->cuCtx; // nullptr
auto encStat = nvEncApi.nvEncOpenEncodeSessionEx(&nvEncSessParams, &priv->nvEncoder);
And finally, the code initializing the encoder:
// Configure the encoder via preset
NV_ENC_PRESET_CONFIG presetConfig = { 0 };
GUID codecGUID = NV_ENC_CODEC_H264_GUID;
GUID presetGUID = NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID;
presetConfig.version = NV_ENC_PRESET_CONFIG_VER;
presetConfig.presetCfg.version = NV_ENC_CONFIG_VER;
encStat = nvEncApi.nvEncGetEncodePresetConfig(priv->nvEncoder, codecGUID, presetGUID, &presetConfig);
NV_ENC_INITIALIZE_PARAMS initParams = { 0 };
initParams.version = NV_ENC_INITIALIZE_PARAMS_VER;
initParams.encodeGUID = codecGUID;
initParams.encodeWidth = priv->encWidth;
initParams.encodeHeight = priv->encHeight;
initParams.darWidth = 1;
initParams.darHeight = 1;
initParams.frameRateNum = 25; // TODO: make this configurable
initParams.frameRateDen = 1; // ditto
// .max_surface_count = (num_mbs >= 8160) ? 32 : 48;
// .buffer_delay ? necessary
initParams.enableEncodeAsync = 0;
initParams.enablePTD = 1;
initParams.presetGUID = presetGUID;
memcpy(&priv->nvEncConfig, &presetConfig.presetCfg, sizeof(priv->nvEncConfig));
initParams.encodeConfig = &priv->nvEncConfig;
encStat = nvEncApi.nvEncInitializeEncoder(priv->nvEncoder, &initParams);
All the above initializations report success.
I'd be extremely grateful to anyone who can get me past this hurdle.
EDIT: here is the complete code to reproduce the problem. The only observable difference to the original code is that cuPopContext() returns an error (which can be ignored) here - probably my original program creates such a context as a side effect of using OpenGL. Otherwise, the code behaves exactly as the original does.
I've built the code with Visual Studio 2013. You must link the following library file (adapt path if not on C:): C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v7.5\lib\Win32\cuda.lib
You must also make sure that C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v7.5\include\ (or similar) is in the include path.
NEW EDIT: modified the code to only use the CUDA driver interface, instead of mixing with the runtime API. Still the same error code.
#ifdef _WIN32
#include <Windows.h>
#endif
#include <cassert>
#include <GL/gl.h>
#include <iostream>
#include <string>
#include <stdexcept>
#include <string>
#include <cuda.h>
//#include <cuda_runtime.h>
#include <cuda_gl_interop.h>
#include <nvEncodeAPI.h>
// NV Encoder API ---------------------------------------------------
#if defined(_WIN32)
#define LOAD_FUNC(l, s) GetProcAddress(l, s)
#define DL_CLOSE_FUNC(l) FreeLibrary(l)
#else
#define LOAD_FUNC(l, s) dlsym(l, s)
#define DL_CLOSE_FUNC(l) dlclose(l)
#endif
typedef NVENCSTATUS(NVENCAPI* PNVENCODEAPICREATEINSTANCE)(NV_ENCODE_API_FUNCTION_LIST *functionList);
struct NVEncAPI : public NV_ENCODE_API_FUNCTION_LIST {
public:
// ~NVEncAPI() { cleanup(); }
void init() {
#if defined(_WIN32)
if (sizeof(void*) == 8) {
nvEncLib = LoadLibrary(TEXT("nvEncodeAPI64.dll"));
}
else {
nvEncLib = LoadLibrary(TEXT("nvEncodeAPI.dll"));
}
if (nvEncLib == NULL) throw std::runtime_error("Failed to load NVidia Encoder library: " + std::to_string(GetLastError()));
#else
nvEncLib = dlopen("libnvidia-encode.so.1", RTLD_LAZY);
if (nvEncLib == nullptr)
throw std::runtime_error("Failed to load NVidia Encoder library: " + std::string(dlerror()));
#endif
auto nvEncodeAPICreateInstance = (PNVENCODEAPICREATEINSTANCE) LOAD_FUNC(nvEncLib, "NvEncodeAPICreateInstance");
version = NV_ENCODE_API_FUNCTION_LIST_VER;
NVENCSTATUS encStat = nvEncodeAPICreateInstance(static_cast<NV_ENCODE_API_FUNCTION_LIST *>(this));
}
void cleanup() {
#if defined(_WIN32)
if (nvEncLib != NULL) {
FreeLibrary(nvEncLib);
nvEncLib = NULL;
}
#else
if (nvEncLib != nullptr) {
dlclose(nvEncLib);
nvEncLib = nullptr;
}
#endif
}
private:
#if defined(_WIN32)
HMODULE nvEncLib;
#else
void* nvEncLib;
#endif
bool init_done;
};
static NVEncAPI nvEncApi;
// Encoder class ----------------------------------------------------
class Encoder {
public:
typedef unsigned int uint_t;
struct Size { uint_t w, h; };
Encoder() {
CUresult cuRes = cuInit(0);
nvEncApi.init();
}
void init(const Size & resolution, uint_t texture) {
NVENCSTATUS encStat;
CUresult cuRes;
texSize = resolution;
yuvTex = texture;
// Purely for information
int devCount = 0;
cuRes = cuDeviceGetCount(&devCount);
// Initialize NVEnc
initEncodeSession(); // start an encoding session
initEncoder();
// Register the YUV texture as a CUDA graphics resource
// CODE COMMENTED OUT AS THE INPUT TEXTURE IS NOT NEEDED YET (TO MY UNDERSTANDING) AT SETUP TIME
//cudaGraphicsGLRegisterImage(&priv->cudaInpTexRes, priv->yuvTex, GL_TEXTURE_2D, cudaGraphicsRegisterFlagsReadOnly);
// Allocate CUDA "pitched" memory to match the input texture (YUV, one byte per component)
encWidth = (texSize.w + 31) & ~31, encHeight = (texSize.h + 31) & ~31;
cuRes = cuMemAllocPitch(&cuDevPtr, &cuMemPitch, 4 * encWidth, encHeight, 16);
// Register the CUDA buffer as an input resource
NV_ENC_REGISTER_RESOURCE regResParams = { 0 };
regResParams.version = NV_ENC_REGISTER_RESOURCE_VER;
regResParams.resourceType = NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
regResParams.width = encWidth;
regResParams.height = encHeight;
regResParams.bufferFormat = NV_ENC_BUFFER_FORMAT_YUV444_PL;
regResParams.resourceToRegister = (void*) cuDevPtr;
regResParams.pitch = cuMemPitch;
encStat = nvEncApi.nvEncRegisterResource(nvEncoder, &regResParams);
assert(encStat == NV_ENC_SUCCESS); // THIS IS THE POINT OF FAILURE
nvEncInpRes = regResParams.registeredResource;
}
void cleanup() { /* OMITTED */ }
void encode() {
// THE FOLLOWING CODE WAS NEVER REACHED YET BECAUSE OF THE ISSUE.
// INCLUDED HERE FOR REFERENCE.
CUresult cuRes;
NVENCSTATUS encStat;
cuRes = cuGraphicsResourceSetMapFlags(cuInpTexRes, CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY);
cuRes = cuGraphicsMapResources(1, &cuInpTexRes, 0);
CUarray mappedArray;
cuRes = cuGraphicsSubResourceGetMappedArray(&mappedArray, cuInpTexRes, 0, 0);
cuRes = cuMemcpyDtoA(mappedArray, 0, cuDevPtr, 4 * encWidth * encHeight);
NV_ENC_MAP_INPUT_RESOURCE mapInputResParams = { 0 };
mapInputResParams.version = NV_ENC_MAP_INPUT_RESOURCE_VER;
mapInputResParams.registeredResource = nvEncInpRes;
encStat = nvEncApi.nvEncMapInputResource(nvEncoder, &mapInputResParams);
// TODO: encode...
cuRes = cuGraphicsUnmapResources(1, &cuInpTexRes, 0);
}
private:
struct PrivateData;
void initEncodeSession() {
CUresult cuRes;
NVENCSTATUS encStat;
// Pop the current context
cuRes = cuCtxPopCurrent(&cuOldCtx); // THIS IS ALLOWED TO FAIL (it doesn't
// Create a context for the device
cuCtx = nullptr;
cuRes = cuCtxCreate(&cuCtx, CU_CTX_SCHED_BLOCKING_SYNC, 0);
// Push our context
cuRes = cuCtxPushCurrent(cuCtx);
// Create an NV Encoder session
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS nvEncSessParams = { 0 };
nvEncSessParams.apiVersion = NVENCAPI_VERSION;
nvEncSessParams.version = NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS_VER;
nvEncSessParams.deviceType = NV_ENC_DEVICE_TYPE_CUDA;
nvEncSessParams.device = cuCtx;
encStat = nvEncApi.nvEncOpenEncodeSessionEx(&nvEncSessParams, &nvEncoder);
}
void Encoder::initEncoder()
{
NVENCSTATUS encStat;
// Configure the encoder via preset
NV_ENC_PRESET_CONFIG presetConfig = { 0 };
GUID codecGUID = NV_ENC_CODEC_H264_GUID;
GUID presetGUID = NV_ENC_PRESET_LOW_LATENCY_DEFAULT_GUID;
presetConfig.version = NV_ENC_PRESET_CONFIG_VER;
presetConfig.presetCfg.version = NV_ENC_CONFIG_VER;
encStat = nvEncApi.nvEncGetEncodePresetConfig(nvEncoder, codecGUID, presetGUID, &presetConfig);
NV_ENC_INITIALIZE_PARAMS initParams = { 0 };
initParams.version = NV_ENC_INITIALIZE_PARAMS_VER;
initParams.encodeGUID = codecGUID;
initParams.encodeWidth = texSize.w;
initParams.encodeHeight = texSize.h;
initParams.darWidth = texSize.w;
initParams.darHeight = texSize.h;
initParams.frameRateNum = 25;
initParams.frameRateDen = 1;
initParams.enableEncodeAsync = 0;
initParams.enablePTD = 1;
initParams.presetGUID = presetGUID;
memcpy(&nvEncConfig, &presetConfig.presetCfg, sizeof(nvEncConfig));
initParams.encodeConfig = &nvEncConfig;
encStat = nvEncApi.nvEncInitializeEncoder(nvEncoder, &initParams);
}
//void cleanupEncodeSession();
//void cleanupEncoder;
Size texSize;
GLuint yuvTex;
uint_t encWidth, encHeight;
CUdeviceptr cuDevPtr;
size_t cuMemPitch;
NV_ENC_CONFIG nvEncConfig;
NV_ENC_INPUT_PTR nvEncInpBuf;
NV_ENC_REGISTERED_PTR nvEncInpRes;
CUdevice cuDevice;
CUcontext cuCtx, cuOldCtx;
void *nvEncoder;
CUgraphicsResource cuInpTexRes;
};
int main(int argc, char *argv[])
{
Encoder encoder;
encoder.init({1920, 1080}, 0); // OMITTED THE TEXTURE AS IT IS NOT NEEDED TO REPRODUCE THE ISSUE
return 0;
}
After comparing the NVidia sample NvEncoderCudaInterop with my minimal code, I finally found the item that makes the difference between success and failure: its the pitch parameter of the NV_ENC_REGISTER_RESOURCE structure passed to nvEncRegisterResource().
I haven't seen it documented anywhere, but there's a hard limit on that value, which I've determined experimentally to be at 2560. Anything above that will result in NV_ENC_ERR_RESOURCE_REGISTER_FAILED.
It does not appear to matter that the pitch I was passing was calculated by another API call, cuMemAllocPitch().
(Another thing that was missing from my code was "locking" and unlocking the CUDA context to the current thread via cuCtxPushCurrent() and cuCtxPopCurrent(). Done in the sample via a RAII class.)
EDIT:
I have worked around the problem by doing something for which I had another reason: using NV12 as input format for the encoder instead of YUV444.
With NV12, the pitch parameter drops below the 2560 limit because the byte size per row is equal to the width, so in my case 1920 bytes.
This was necessary (at the time) because my graphics card was a GTX 760 with a "Kepler" GPU, which (as I was initially unaware) only supports NV12 as input format for NVEnc. I have since upgraded to a GTX 970, but as I just found out, the 2560 limit is still there.
This makes me wonder just how exactly one is expected to use NVEnc with YUV444. The only possibility that comes to my mind is to use non-pitched memory, which seems bizarre. I'd appreciate comments from people who've actually used NVEnc with YUV444.
EDIT #2 - PENDING FURTHER UPDATE:
New information has surfaced in the form of another SO question: NVencs Output Bitstream is not readable
It is quite possible that my answer so far was wrong. It seems now that the pitch should not only be set when registering the CUDA resource, but also when actually sending it to the encoder via nvEncEncodePicture(). I cannot check this right now, but I will next time I work on that project.

WinRT C++ (Win10) Accessing bytes from SoftwareBitmap / BitmapBuffer

To process my previewFrames of my camera in OpenCV, I need access to the raw Pixel data / bytes. So, there is the new SoftwareBitmap, which should exactly provide this.
There is an example for c#, but in visual c++ I can't get the IMemoryBufferByteAccess (see remarks) Interface working.
Code with Exceptions:
// Capture the preview frame
return create_task(_mediaCapture->GetPreviewFrameAsync(videoFrame))
.then([this](VideoFrame^ currentFrame)
{
// Collect the resulting frame
auto previewFrame = currentFrame->SoftwareBitmap;
auto buffer = previewFrame->LockBuffer(Windows::Graphics::Imaging::BitmapBufferAccessMode::ReadWrite);
auto reference = buffer->CreateReference();
// Get a pointer to the pixel buffer
byte* pData = nullptr;
UINT capacity = 0;
// Obtain ByteAccess
ComPtr<IUnknown> inspectable = reinterpret_cast<IUnknown*>(buffer);
// Query the IBufferByteAccess interface.
Microsoft::WRL::ComPtr<IMemoryBufferByteAccess> bufferByteAccess;
ThrowIfFailed(inspectable.As(&bufferByteAccess)); // ERROR ---> Throws HRESULT = E_NOINTERFACE
// Retrieve the buffer data.
ThrowIfFailed(bufferByteAccess->GetBuffer(_Out_ &pData, _Out_ &capacity)); // ERROR ---> Throws HRESULT = E_NOINTERFACE, because bufferByteAccess is null
I tried this too:
HRESULT hr = ((IMemoryBufferByteAccess*)reference)->GetBuffer(&pData, &capacity);
HRESULT is ok, but I can't access pData -> Access Violation reading Memory.
Thanks for your help.
You should use reference instead of buffer in reinterpret_cast.
#include "pch.h"
#include <wrl\wrappers\corewrappers.h>
#include <wrl\client.h>
MIDL_INTERFACE("5b0d3235-4dba-4d44-865e-8f1d0e4fd04d")
IMemoryBufferByteAccess : IUnknown
{
virtual HRESULT STDMETHODCALLTYPE GetBuffer(
BYTE **value,
UINT32 *capacity
);
};
auto previewFrame = currentFrame->SoftwareBitmap;
auto buffer = previewFrame->LockBuffer(BitmapBufferAccessMode::ReadWrite);
auto reference = buffer->CreateReference();
ComPtr<IMemoryBufferByteAccess> bufferByteAccess;
HRESULT result = reinterpret_cast<IInspectable*>(reference)->QueryInterface(IID_PPV_ARGS(&bufferByteAccess));
if (result == S_OK)
{
WriteLine("Get interface successfully");
BYTE* data = nullptr;
UINT32 capacity = 0;
result = bufferByteAccess->GetBuffer(&data, &capacity);
if (result == S_OK)
{
WriteLine("get data access successfully, capacity: " + capacity);
}
}
Based on answer from #jeffrey-chen and example from #kennykerr, I've assembled a tiny bit cleaner solution:
#include <wrl/client.h>
// other includes, as required by your project
MIDL_INTERFACE("5b0d3235-4dba-4d44-865e-8f1d0e4fd04d")
IMemoryBufferByteAccess : ::IUnknown
{
virtual HRESULT __stdcall GetBuffer(BYTE **value, UINT32 *capacity) = 0;
};
// your code:
auto previewFrame = currentFrame->SoftwareBitmap;
auto buffer = previewFrame->LockBuffer(BitmapBufferAccessMode::ReadWrite);
auto bufferByteAccess= buffer->CreateReference().as<IMemoryBufferByteAccess>();
WriteLine("Get interface successfully"); // otherwise - exception is thrown
BYTE* data = nullptr;
UINT32 capacity = 0;
winrt::check_hresult(bufferByteAccess->GetBuffer(&data, &capacity));
WriteLine("get data access successfully, capacity: " + capacity);
I'm currently accessing the raw unsigned char* data from each frame I obtain on a MediaFrameReader::FrameArrived event without using WRL and COM...
Here it is how:
void MainPage::OnFrameArrived(MediaFrameReader ^reader, MediaFrameArrivedEventArgs ^args)
{
MediaFrameReference ^mfr = reader->TryAcquireLatestFrame();
VideoMediaFrame ^vmf = mfr->VideoMediaFrame;
VideoFrame ^vf = vmf->GetVideoFrame();
SoftwareBitmap ^sb = vf->SoftwareBitmap;
Buffer ^buff = ref new Buffer(sb->PixelHeight * sb->PixelWidth * 2);
sb->CopyToBuffer(buff);
DataReader ^dataReader = DataReader::FromBuffer(buffer);
Platform::Array<unsigned char, 1> ^arr = ref new Platform::Array<unsigned char, 1>(buffer->Length);
dataReader->ReadBytes(arr);
// here arr->Data is a pointer to the raw pixel data
}
NOTE: The MediaCapture object needs to be configured with MediaCaptureMemoryPreference::Cpu in order to have a valid SoftwareBitmap
Hope the above helps someone

Array copy in parallel_for_each context

I’m very newbie in AMP C++. Everything works fine if I use ‘memcpy’ inside the ‘parallel_for_each’ function, but I do know it is not the best practice. I tried to use ‘copy_to’, but it raises an exception. Below follows a simplified code, focusing the issue, that I am having troubles. Thanks in advance.
typedef std::vector<DWORD> CArrDwData;
class CdataMatrix
{
public:
CdataMatrix(int nChCount) : m_ChCount(nChCount)
{
}
void SetSize(UINT uSize)
{
// MUST be multiple of m_ChCount*DWORD
ASSERT(uSize%sizeof(DWORD) == 0);
m_PackedLength = uSize/sizeof(DWORD);
m_arrChannels.resize(m_ChCount*m_PackedLength);
}
UINT GetChannelPackedLen() const
{
return m_PackedLength;
}
const LPBYTE GetChannelBuffer(UINT uChannel) const
{
CArrDwData::const_pointer cPtr = m_arrChannels.data() + m_PackedLength*uChannel;
return (const LPBYTE)cPtr;
}
public:
CArrDwData m_arrChannels;
protected:
UINT m_ChCount;
UINT m_PackedLength;
};
void CtypDiskHeader::ParalelProcess()
{
const int nJobs = 6;
const int nChannelCount = 3;
UINT uAmount = 250000;
int vch;
CArrDwData arrCompData;
// Check buffers sizes
ASSERT((~uAmount & 0x00000003) == 3); // DWORD aligned
const UINT uInDWSize = uAmount/sizeof(DWORD); // in size give in DWORDs
CdataMatrix arrChData(nJobs);
arrCompData.resize(nJobs*uInDWSize);
vector<int> a(nJobs);
for(vch = 0; vch < nJobs; vch++)
a[vch] = vch;
arrChData.SetSize(uAmount+16); // note: 16 bytes or 4 DWORDs larger than uInDWSize
accelerator_view acc_view = accelerator().default_view;
Concurrency::extent<2> eIn(nJobs, uInDWSize);
Concurrency::extent<2> eOut(nJobs, arrChData.GetChannelPackedLen());
array_view<DWORD, 2> viewOut(eOut, arrChData.m_arrChannels);
array_view<DWORD, 2> viewIn(eIn, arrCompData);
concurrency::parallel_for_each(begin(a), end(a), [&](int vch)
{
vector<DWORD>::pointer ptr = (LPDWORD)viewIn(vch).data();
LPDWORD bufCompIn = (LPDWORD)ptr;
ptr = viewOut(vch).data();
LPDWORD bufExpandedIn = (LPDWORD)ptr;
if(ConditionNotOk())
{
// Copy raw data bufCompIn to bufExpandedIn
// Works fine, but not the best way, I suppose:
memcpy(bufExpandedIn, bufCompIn, uAmount);
// Raises exception:
//viewIn(vch).copy_to(viewOut(vch));
}
else
{
// Some data processing here
}
});
}
It was my fault. In the original code, the extent of viewOut(vch) is a little bit larger than viewIn(vch) extent. Using this way, it raises an exception 'runtime_exception'. When catching it, it supplies the following message xcp.what() = "Failed to copy because extents do not match".
I fixed the code replacing the original code by: viewIn(vch).copy_to(viewOut(vch).section(viewIn(vch).extent));
It copies only the source extent, that is what I need. But only compiles without restricted AMP.
The has nothing to do with the parallel_for_each it looks like it is a known bug with array_view::copy_to. See the following post:
Curiosity about concurrency::copy and array_view projection interactions
You can fix this using an explicit view_as() instead. I believe in your case your code should look something like this.
viewIn(vch).copy_to(viewOut(vch));
// Becomes...
viewIn[vch].view_as<1>(concurrency::extent<1>(uInDWSize)).copy_to(viewOut(vch));
I can't compile your example so was unable to verify this but I was able to get an exception from similar code and fix it using view_as().
If you want to copy data within a C++ AMP kernel then you need to do it as assignment operations on a series of threads. The following code copies the first 500 elements of source into the smaller dest array.
array<int, 1> source(1000);
array<int, 1> dest(500);
parallel_for_each(source.extent, [=, &source, &dest](index<1> idx)
{
if (dest.extent.contains(idx))
dest[idx] = source[idx];
});