OpenGL to FFMpeg encode - opengl

I have a opengl buffer that I need to forward directly to ffmpeg to do the nvenc based h264 encoding.
My current way of doing this is glReadPixels to get the pixels out of the frame buffer and then passing that pointer into ffmpeg such that it can encode the frame into H264 packets for RTSP. However, this is bad because I have to copy bytes out of the GPU ram into CPU ram, to only copy them back into the GPU for encoding.

If you look at the date of posting versus the date of this answer you'll notice I spent much time working on this. (It was my full time job the past 4 weeks).
Since I had such a difficult time getting this to work I will write up a short guide to hopefully help out whomever finds this.
Outline
The basic flow I have is OGL Frame buffer object color attachement (texture) → nvenc (nvidia encoder)
Things to note
Some things to note:
1) The nvidia encoder can accept YUV or RGB type images.
2) FFMPEG 4.0 and under cannot pass RGB images to nvenc.
3) FFMPEG was updated to accept RGB as input, per my issues.
There are a couple different things to know about:
1) AVHWDeviceContext- Think of this as ffmpegs device abstraction layer.
2) AVHWFramesContext- Think of this as ffmpegs hardware frame abstraction layer.
3) cuMemcpy2D- The required method to copy a cuda mapped OGL texture into a cuda buffer created by ffmpeg.
Comprehensiveness
This guide is in addition to standard software encoding guidelines. This is NOT complete code, and should only be used in addition to the standard flow.
Code details
Setup
You will need to first get your gpu name, to do this I found some code (I cannot remember where I got it from) that made some cuda calls and got the GPU name:
int getDeviceName(std::string& gpuName)
{
//Setup the cuda context for hardware encoding with ffmpeg
NV_ENC_BUFFER_FORMAT eFormat = NV_ENC_BUFFER_FORMAT_IYUV;
int iGpu = 0;
CUresult res;
ck(cuInit(0));
int nGpu = 0;
ck(cuDeviceGetCount(&nGpu));
if (iGpu < 0 || iGpu >= nGpu)
{
std::cout << "GPU ordinal out of range. Should be within [" << 0 << ", "
<< nGpu - 1 << "]" << std::endl;
return 1;
}
CUdevice cuDevice = 0;
ck(cuDeviceGet(&cuDevice, iGpu));
char szDeviceName[80];
ck(cuDeviceGetName(szDeviceName, sizeof(szDeviceName), cuDevice));
gpuName = szDeviceName;
epLog::msg(epMSG_STATUS, "epVideoEncode:H264Encoder", "...using device \"%s\"", szDeviceName);
return 0;
}
Next you will need to setup your hwdevice and hwframe contexts:
getDeviceName(gpuName);
ret = av_hwdevice_ctx_create(&m_avBufferRefDevice, AV_HWDEVICE_TYPE_CUDA, gpuName.c_str(), NULL, NULL);
if (ret < 0)
{
return -1;
}
//Example of casts needed to get down to the cuda context
AVHWDeviceContext* hwDevContext = (AVHWDeviceContext*)(m_avBufferRefDevice->data);
AVCUDADeviceContext* cudaDevCtx = (AVCUDADeviceContext*)(hwDevContext->hwctx);
m_cuContext = &(cudaDevCtx->cuda_ctx);
//Create the hwframe_context
// This is an abstraction of a cuda buffer for us. This enables us to, with one call, setup the cuda buffer and ready it for input
m_avBufferRefFrame = av_hwframe_ctx_alloc(m_avBufferRefDevice);
//Setup some values before initialization
AVHWFramesContext* frameCtxPtr = (AVHWFramesContext*)(m_avBufferRefFrame->data);
frameCtxPtr->width = width;
frameCtxPtr->height = height;
frameCtxPtr->sw_format = AV_PIX_FMT_0BGR32; // There are only certain supported types here, we need to conform to these types
frameCtxPtr->format = AV_PIX_FMT_CUDA;
frameCtxPtr->device_ref = m_avBufferRefDevice;
frameCtxPtr->device_ctx = (AVHWDeviceContext*)m_avBufferRefDevice->data;
//Initialization - This must be done to actually allocate the cuda buffer.
// NOTE: This call will only work for our input format if the FFMPEG library is >4.0 version..
ret = av_hwframe_ctx_init(m_avBufferRefFrame);
if (ret < 0) {
return -1;
}
//Cast the OGL texture/buffer to cuda ptr
CUresult res;
CUcontext oldCtx;
m_inputTexture = texture;
res = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
res = cuCtxPushCurrent(*m_cuContext);
res = cuGraphicsGLRegisterImage(&cuInpTexRes, m_inputTexture, GL_TEXTURE_2D, CU_GRAPHICS_REGISTER_FLAGS_READ_ONLY);
res = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
//Assign some hardware accel specific data to AvCodecContext
c->hw_device_ctx = m_avBufferRefDevice;//This must be done BEFORE avcodec_open2()
c->pix_fmt = AV_PIX_FMT_CUDA; //Since this is a cuda buffer, although its really opengl with a cuda ptr
c->hw_frames_ctx = m_avBufferRefFrame;
c->codec_type = AVMEDIA_TYPE_VIDEO;
c->sw_pix_fmt = AV_PIX_FMT_0BGR32;
// Setup some cuda stuff for memcpy-ing later
m_memCpyStruct.srcXInBytes = 0;
m_memCpyStruct.srcY = 0;
m_memCpyStruct.srcMemoryType = CUmemorytype::CU_MEMORYTYPE_ARRAY;
m_memCpyStruct.dstXInBytes = 0;
m_memCpyStruct.dstY = 0;
m_memCpyStruct.dstMemoryType = CUmemorytype::CU_MEMORYTYPE_DEVICE;
Keep in mind, although there is a lot done above, the code shown is IN ADDITION to the standard software encoding code. Make sure to include all those calls/object initialization as well.
Unlike the software version, all that is needed for the input AVFrame object is to get the buffer AFTER your alloc call:
// allocate RGB video frame buffer
ret = av_hwframe_get_buffer(m_avBufferRefFrame, rgb_frame, 0); // 0 is for flags, not used at the moment
Notice it takes in the hwframe_context as an argument, this is how it knows what device, size, format, etc to allocate for on the gpu.
Call to encode each frame
Now we are setup, and are ready to encode. Before each encode we need to copy the frame from the texture to a cuda buffer. We do this by mapping a cuda array to the texture then copying that array to a cuDeviceptr (which was allocated by the av_hwframe_get_buffer call above):
//Perform cuda mem copy for input buffer
CUresult cuRes;
CUarray mappedArray;
CUcontext oldCtx;
//Get context
cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
cuRes = cuCtxPushCurrent(*m_cuContext);
//Get Texture
cuRes = cuGraphicsResourceSetMapFlags(cuInpTexRes, CU_GRAPHICS_MAP_RESOURCE_FLAGS_READ_ONLY);
cuRes = cuGraphicsMapResources(1, &cuInpTexRes, 0);
//Map texture to cuda array
cuRes = cuGraphicsSubResourceGetMappedArray(&mappedArray, cuInpTexRes, 0, 0); // Nvidia says its good practice to remap each iteration as OGL can move things around
//Release texture
cuRes = cuGraphicsUnmapResources(1, &cuInpTexRes, 0);
//Setup for memcopy
m_memCpyStruct.srcArray = mappedArray;
m_memCpyStruct.dstDevice = (CUdeviceptr)rgb_frame->data[0]; // Make sure to copy devptr as it could change, upon resize
m_memCpyStruct.dstPitch = rgb_frame->linesize[0]; // Linesize is generated by hwframe_context
m_memCpyStruct.WidthInBytes = rgb_frame->width * 4; //* 4 needed for each pixel
m_memCpyStruct.Height = rgb_frame->height; //Vanilla height for frame
//Do memcpy
cuRes = cuMemcpy2D(&m_memCpyStruct);
//release context
cuRes = cuCtxPopCurrent(&oldCtx); // THIS IS ALLOWED TO FAIL
Now we can simply call send_frame and it all works!
ret = avcodec_send_frame(c, rgb_frame);
Note: I left most of my code out, since it is not for the public. I may have some details incorrect, this is how I was able to make sense of all the data I gathered over the past month...feel free to correct anything that is incorrect. Also, fun fact, during all this my computer crashed an I lost all my initial investigation (everything I didnt check into source control), which includes all the various example code I found around the internet. So if you see something an its yours, call it out please. This can help others come to the conclusion that I came to.
Shoutout
Big shout out to BtbN at https://webchat.freenode.net/ #ffmpeg, I wouldnt have gotten any of this without their help.

First thing to check is that it may be "bad" but is it running fast enough anyway? It's always nice to be more efficient but if it works, don't break it.
If there really is a performance problem...
1 Use FFMPEG software encoding only, without hardware assistance. Then you'll only be copying from GPU to CPU once. (If the video encoder is on the GPU and you're sending packets out via RTSP, there's a second GPU to CPU after encoding.)
2 Look for an NVIDIA (I assume that's the GPU given you talk about nvenc) GL extension to texture formats and/or commands that will perform on GPU H264 encoding directly to OpenGL buffers.

Related

NVencs Output Bitstream is not readable

I have one question related to Nvidias NVenc API. I want to use the API to encode some OpenGL graphics. My problem is, that the API reports no error throughout the whole program, everything seems to be fine. But the generated output is not readable by, e.g. VLC. If I try to play the generated file, VLC would flash a black screen for about 0.5s, then ends the playback.
The Video has the length of 0, the size of the Vid seems rather small, too.
Resolution is 1280*720 and the size of 5secs recording is only 700kb. Is this realistic?
The flow of the application is as following:
Render to secondary Framebuffer
Download Framebuffer to one of two PBOs (glReadPixels())
Map the PBO of the previous frame, to get a pointer understandable by Cuda.
Call a simple CudaKernel converting OpenGLs RGBA to ARGB which should be understandable by NVenc according to this(p.18). The kernel reads the content of the PBO and writes the converted content into a CudaArray (created with cudaMalloc) which is registered as InputResource with NVenc.
The content of the converted Array gets encoded. A completion event plus the corresponding output bitstream buffer get queued.
A secondary thread listens on the queued output events, if one event is signaled, the Output Bitstream gets mapped and written to hdd.
The initializion of NVenc-Encoder:
InitParams* ip = new InitParams();
m_initParams = ip;
memset(ip, 0, sizeof(InitParams));
ip->version = NV_ENC_INITIALIZE_PARAMS_VER;
ip->encodeGUID = m_encoderGuid; //Used Codec
ip->encodeWidth = width; // Frame Width
ip->encodeHeight = height; // Frame Height
ip->maxEncodeWidth = 0; // Zero means no dynamic res changes
ip->maxEncodeHeight = 0;
ip->darWidth = width; // Aspect Ratio
ip->darHeight = height;
ip->frameRateNum = 60; // 60 fps
ip->frameRateDen = 1;
ip->reportSliceOffsets = 0; // According to programming guide
ip->enableSubFrameWrite = 0;
ip->presetGUID = m_presetGuid; // Used Preset for Encoder Config
NV_ENC_PRESET_CONFIG presetCfg; // Load the Preset Config
memset(&presetCfg, 0, sizeof(NV_ENC_PRESET_CONFIG));
presetCfg.version = NV_ENC_PRESET_CONFIG_VER;
presetCfg.presetCfg.version = NV_ENC_CONFIG_VER;
CheckApiError(m_apiFunctions.nvEncGetEncodePresetConfig(m_Encoder,
m_encoderGuid, m_presetGuid, &presetCfg));
memcpy(&m_encodingConfig, &presetCfg.presetCfg, sizeof(NV_ENC_CONFIG));
// And add information about Bitrate etc
m_encodingConfig.rcParams.averageBitRate = 500000;
m_encodingConfig.rcParams.maxBitRate = 600000;
m_encodingConfig.rcParams.rateControlMode = NV_ENC_PARAMS_RC_MODE::NV_ENC_PARAMS_RC_CBR;
ip->encodeConfig = &m_encodingConfig;
ip->enableEncodeAsync = 1; // Async Encoding
ip->enablePTD = 1; // Encoder handles picture ordering
Registration of CudaResource
m_cuContext->SetCurrent(); // Make the clients cuCtx current
NV_ENC_REGISTER_RESOURCE res;
memset(&res, 0, sizeof(NV_ENC_REGISTER_RESOURCE));
NV_ENC_REGISTERED_PTR resPtr; // handle to the cuda resource for future use
res.bufferFormat = m_inputFormat; // Format is ARGB
res.height = m_height;
res.width = m_width;
// NOTE: I've set the pitch to the width of the frame, because the resource is a non-pitched
//cudaArray. Is this correct? Pitch = 0 would produce no output.
res.pitch = pitch;
res.resourceToRegister = (void*) (uintptr_t) resourceToRegister; //CUdevptr to resource
res.resourceType =
NV_ENC_INPUT_RESOURCE_TYPE::NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
res.version = NV_ENC_REGISTER_RESOURCE_VER;
CheckApiError(m_apiFunctions.nvEncRegisterResource(m_Encoder, &res));
m_registeredInputResources.push_back(res.registeredResource);
Encoding
m_cuContext->SetCurrent(); // Make Clients context current
MapInputResource(id); //Map the CudaInputResource
NV_ENC_PIC_PARAMS temp;
memset(&temp, 0, sizeof(NV_ENC_PIC_PARAMS));
temp.version = NV_ENC_PIC_PARAMS_VER;
unsigned int currentBufferAndEvent = m_counter % m_registeredEvents.size(); //Counter is inc'ed in every Frame
temp.bufferFmt = m_currentlyMappedInputBuffer.mappedBufferFmt;
temp.inputBuffer = m_currentlyMappedInputBuffer.mappedResource; //got set by MapInputResource
temp.completionEvent = m_registeredEvents[currentBufferAndEvent];
temp.outputBitstream = m_registeredOutputBuffers[currentBufferAndEvent];
temp.inputWidth = m_width;
temp.inputHeight = m_height;
temp.inputPitch = m_width;
temp.inputTimeStamp = m_counter;
temp.pictureStruct = NV_ENC_PIC_STRUCT_FRAME; // According to samples
temp.qpDeltaMap = NULL;
temp.qpDeltaMapSize = 0;
EventWithId latestEvent(currentBufferAndEvent,
m_registeredEvents[currentBufferAndEvent]);
PushBackEncodeEvent(latestEvent); // Store the Event with its ID in a Queue
CheckApiError(m_apiFunctions.nvEncEncodePicture(m_Encoder, &temp));
m_counter++;
UnmapInputResource(id); // Unmap
Every little hint, where to look at, is very much appreciated. I'm running out of ideas what might be wrong.
Thanks a lot!
With the help of hall822 from the nvidia forums I managed to solve the issue.
The primary error was that I registered my cuda resource with a pitch equal to the size of the frame. I'm using a Framebuffer-Renderbuffer to draw my content into. The data of this is a plain, unpitched array. My first thought, giving a pitch equal to zero, failed. The encoder did nothing. The next idea was to set it to the width of the frame, a quarter of the image was encoded.
// NOTE: I've set the pitch to the width of the frame, because the resource is a non-pitched
//cudaArray. Is this correct? Pitch = 0 would produce no output.
res.pitch = pitch;
To answer this question: Yes, it is correct. But the pitch is measured in byte. So because I'm encoding RGBA-Frames, the correct pitch has to be FRAME_WIDTH * 4.
The second error was that my color channels were not right (See point 4 in my opening post). The NVidia enum says that the encoder expects the channels in ARGB format but actually ment is BGRA, so the alpha channel which is always 255 polluted the blue channel.
Edit: This may be due to the fact that NVidia is using little endian internally. I'm writing
my pixel data to a byte array, choosing an other type like int32 may allow one to pass actual ARGB data.

How to encode a video from several images generated in a C++ program without writing the separate frame images to disk?

I am writing a C++ code where a sequence of N different frames is generated after performing some operations implemented therein. After each frame is completed, I write it on the disk as IMG_%d.png, and finally I encode them to a video through ffmpeg using the x264 codec.
The summarized pseudocode of the main part of the program is the following one:
std::vector<int> B(width*height*3);
for (i=0; i<N; i++)
{
// void generateframe(std::vector<int> &, int)
generateframe(B, i); // Returns different images for different i values.
sprintf(s, "IMG_%d.png", i+1);
WriteToDisk(B, s); // void WriteToDisk(std::vector<int>, char[])
}
The problem of this implementation is that the number of desired frames, N, is usually high (N~100000) as well as the resolution of the pictures (1920x1080), resulting into an overload of the disk, producing write cycles of dozens of GB after each execution.
In order to avoid this, I have been trying to find documentation about parsing directly each image stored in the vector B to an encoder such as x264 (without having to write the intermediate image files to the disk). Albeit some interesting topics were found, none of them solved specifically what I exactly want to, as many of them concern the execution of the encoder with existing images files on the disk, whilst others provide solutions for other programming languages such as Python (here you can find a fully satisfactory solution for that platform).
The pseudocode of what I would like to obtain is something similar to this:
std::vector<int> B(width*height*3);
video_file=open_video("Generated_Video.mp4", ...[encoder options]...);
for (i=0; i<N; i++)
{
generateframe(B, i+1);
add_frame(video_file, B);
}
video_file.close();
According to what I have read on related topics, the x264 C++ API might be able to do this, but, as stated above, I did not find a satisfactory answer for my specific question. I tried learning and using directly the ffmpeg source code, but both its low ease of use and compilation issues forced me to discard this possibility as a mere non-professional programmer I am (I take it as just as a hobby and unluckily I cannot waste that many time learning something so demanding).
Another possible solution that came to my mind is to find a way to call the ffmpeg binary file in the C++ code, and somehow manage to transfer the image data of each iteration (stored in B) to the encoder, letting the addition of each frame (that is, not "closing" the video file to write) until the last frame, so that more frames can be added until reaching the N-th one, where the video file will be "closed". In other words, call ffmpeg.exe through the C++ program to write the first frame to a video, but make the encoder "wait" for more frames. Then call again ffmpeg to add the second frame and make the encoder "wait" again for more frames, and so on until reaching the last frame, where the video will be finished. However, I do not know how to proceed or if it is actually possible.
Edit 1:
As suggested in the replies, I have been documenting about named pipes and tried to use them in my code. First of all, it should be remarked that I am working with Cygwin, so my named pipes are created as they would be created under Linux. The modified pseudocode I used (including the corresponding system libraries) is the following one:
FILE *fd;
mkfifo("myfifo", 0666);
for (i=0; i<N; i++)
{
fd=fopen("myfifo", "wb");
generateframe(B, i+1);
WriteToPipe(B, fd); // void WriteToPipe(std::vector<int>, FILE *&fd)
fflush(fd);
fd=fclose("myfifo");
}
unlink("myfifo");
WriteToPipe is a slight modification of the previous WriteToFile function, where I made sure that the write buffer to send the image data is small enough to fit the pipe buffering limitations.
Then I compile and write the following command in the Cygwin terminal:
./myprogram | ffmpeg -i pipe:myfifo -c:v libx264 -preset slow -crf 20 Video.mp4
However, it remains stuck at the loop when i=0 at the "fopen" line (that is, the first fopen call). If I had not called ffmpeg it would be natural as the server (my program) would be waiting for a client program to connect to the "other side" of the pipe, but it is not the case. It looks like they cannot be connected through the pipe somehow, but I have not been able to find further documentation in order to overcome this issue. Any suggestion?
After some intense struggle, I finally managed to make it work after learning a bit how to use the FFmpeg and libx264 C APIs for my specific purpose, thanks to the useful information that some users provided in this site and some others, as well as some FFmpeg's documentation examples. For the sake of illustration, the details will be presented next.
First of all, the libx264 C library was compiled and, after that, the FFmpeg one with the configure options --enable-gpl --enable-libx264. Now let us go to the coding. The relevant part of the code that achieved the requested purpose is the following one:
Includes:
#include <stdint.h>
extern "C"{
#include <x264.h>
#include <libswscale/swscale.h>
#include <libavcodec/avcodec.h>
#include <libavutil/mathematics.h>
#include <libavformat/avformat.h>
#include <libavutil/opt.h>
}
LDFLAGS on Makefile:
-lx264 -lswscale -lavutil -lavformat -lavcodec
Inner code (for the sake of simplicity, the error checkings will be omitted and the variable declarations will be done when needed instead of the beginning for better understanding):
av_register_all(); // Loads the whole database of available codecs and formats.
struct SwsContext* convertCtx = sws_getContext(width, height, AV_PIX_FMT_RGB24, width, height, AV_PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL); // Preparing to convert my generated RGB images to YUV frames.
// Preparing the data concerning the format and codec in order to write properly the header, frame data and end of file.
char *fmtext="mp4";
char *filename;
sprintf(filename, "GeneratedVideo.%s", fmtext);
AVOutputFormat * fmt = av_guess_format(fmtext, NULL, NULL);
AVFormatContext *oc = NULL;
avformat_alloc_output_context2(&oc, NULL, NULL, filename);
AVStream * stream = avformat_new_stream(oc, 0);
AVCodec *codec=NULL;
AVCodecContext *c= NULL;
int ret;
codec = avcodec_find_encoder_by_name("libx264");
// Setting up the codec:
av_dict_set( &opt, "preset", "slow", 0 );
av_dict_set( &opt, "crf", "20", 0 );
avcodec_get_context_defaults3(stream->codec, codec);
c=avcodec_alloc_context3(codec);
c->width = width;
c->height = height;
c->pix_fmt = AV_PIX_FMT_YUV420P;
// Setting up the format, its stream(s), linking with the codec(s) and write the header:
if (oc->oformat->flags & AVFMT_GLOBALHEADER) // Some formats require a global header.
c->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
avcodec_open2( c, codec, &opt );
av_dict_free(&opt);
stream->time_base=(AVRational){1, 25};
stream->codec=c; // Once the codec is set up, we need to let the container know which codec are the streams using, in this case the only (video) stream.
av_dump_format(oc, 0, filename, 1);
avio_open(&oc->pb, filename, AVIO_FLAG_WRITE);
ret=avformat_write_header(oc, &opt);
av_dict_free(&opt);
// Preparing the containers of the frame data:
AVFrame *rgbpic, *yuvpic;
// Allocating memory for each RGB frame, which will be lately converted to YUV:
rgbpic=av_frame_alloc();
rgbpic->format=AV_PIX_FMT_RGB24;
rgbpic->width=width;
rgbpic->height=height;
ret=av_frame_get_buffer(rgbpic, 1);
// Allocating memory for each conversion output YUV frame:
yuvpic=av_frame_alloc();
yuvpic->format=AV_PIX_FMT_YUV420P;
yuvpic->width=width;
yuvpic->height=height;
ret=av_frame_get_buffer(yuvpic, 1);
// After the format, code and general frame data is set, we write the video in the frame generation loop:
// std::vector<uint8_t> B(width*height*3);
The above commented vector has the same structure than the one I exposed in my question; however, the RGB data is stored on the AVFrames in a specific way. Therefore, for the sake of exposition, let us assume we have instead a pointer to a structure of the form uint8_t[3] Matrix(int, int), whose way to access the color values of the pixels for a given coordinate (x, y) is Matrix(x, y)->Red, Matrix(x, y)->Green and Matrix(x, y)->Blue, in order to get, respectively, to the red, green and blue values of the coordinate (x, y). The first argument stands for the horizontal position, from left to right as x increases and the second one for the vertical position, from top to bottom as y increases.
Being that said, the for loop to transfer the data, encode and write each frame would be the following one:
Matrix B(width, height);
int got_output;
AVPacket pkt;
for (i=0; i<N; i++)
{
generateframe(B, i); // This one is the function that generates a different frame for each i.
// The AVFrame data will be stored as RGBRGBRGB... row-wise, from left to right and from top to bottom, hence we have to proceed as follows:
for (y=0; y<height; y++)
{
for (x=0; x<width; x++)
{
// rgbpic->linesize[0] is equal to width.
rgbpic->data[0][y*rgbpic->linesize[0]+3*x]=B(x, y)->Red;
rgbpic->data[0][y*rgbpic->linesize[0]+3*x+1]=B(x, y)->Green;
rgbpic->data[0][y*rgbpic->linesize[0]+3*x+2]=B(x, y)->Blue;
}
}
sws_scale(convertCtx, rgbpic->data, rgbpic->linesize, 0, height, yuvpic->data, yuvpic->linesize); // Not actually scaling anything, but just converting the RGB data to YUV and store it in yuvpic.
av_init_packet(&pkt);
pkt.data = NULL;
pkt.size = 0;
yuvpic->pts = i; // The PTS of the frame are just in a reference unit, unrelated to the format we are using. We set them, for instance, as the corresponding frame number.
ret=avcodec_encode_video2(c, &pkt, yuvpic, &got_output);
if (got_output)
{
fflush(stdout);
av_packet_rescale_ts(&pkt, (AVRational){1, 25}, stream->time_base); // We set the packet PTS and DTS taking in the account our FPS (second argument) and the time base that our selected format uses (third argument).
pkt.stream_index = stream->index;
printf("Write frame %6d (size=%6d)\n", i, pkt.size);
av_interleaved_write_frame(oc, &pkt); // Write the encoded frame to the mp4 file.
av_packet_unref(&pkt);
}
}
// Writing the delayed frames:
for (got_output = 1; got_output; i++) {
ret = avcodec_encode_video2(c, &pkt, NULL, &got_output);
if (got_output) {
fflush(stdout);
av_packet_rescale_ts(&pkt, (AVRational){1, 25}, stream->time_base);
pkt.stream_index = stream->index;
printf("Write frame %6d (size=%6d)\n", i, pkt.size);
av_interleaved_write_frame(oc, &pkt);
av_packet_unref(&pkt);
}
}
av_write_trailer(oc); // Writing the end of the file.
if (!(fmt->flags & AVFMT_NOFILE))
avio_closep(oc->pb); // Closing the file.
avcodec_close(stream->codec);
// Freeing all the allocated memory:
sws_freeContext(convertCtx);
av_frame_free(&rgbpic);
av_frame_free(&yuvpic);
avformat_free_context(oc);
Side notes:
For future reference, as the available information on the net concerning the time stamps (PTS/DTS) looks so confusing, I will next explain as well how I did manage to solve the issues by setting the proper values. Setting these values incorrectly caused that the output size was being much bigger than the one obtained through the ffmpeg built binary commandline tool, because the frame data was being redundantly written through smaller time intervals than the actually set by the FPS.
First of all, it should be remarked that when encoding there are two kinds of time stamps: one associated to the frame (PTS) (pre-encoding stage) and two associated to the packet (PTS and DTS) (post-encoding stage). In the first case, it looks like the frame PTS values can be assigned using a custom unit of reference (with the only restriction that they must be equally spaced if one wants constant FPS), so one can take for instance the frame number as we did in the above code. In the second one, we have to take into account the following parameters:
The time base of the output format container, in our case mp4 (=12800 Hz), whose information is held in stream->time_base.
The desired FPS of the video.
If the encoder generates B-frames or not (in the second case the PTS and DTS values for the frame must be set the same, but it is more complicated if we are in the first case, like in this example). See this answer to another related question for more references.
The key here is that luckily it is not necessary to struggle with the computation of these quantities, as libav provides a function to compute the correct time stamps associated to the packet by knowing the aforementioned data:
av_packet_rescale_ts(AVPacket *pkt, AVRational FPS, AVRational time_base)
Thanks to these considerations, I was finally able to generate a sane output container and essentially the same compression rate than the one obtained using the commandline tool, which were the two remaining issues before investigating more deeply how the format header and trailer and how the time stamps are properly set.
Thanks for your excellent work, #ksb496 !
One minor improvement:
c=avcodec_alloc_context3(codec);
should be better written as:
c = stream->codec;
to avoid a memory leak.
If you don't mind, I've uploaded the complete ready-to-deploy library onto GitHub: https://github.com/apc-llc/moviemaker-cpp.git
Thanks to ksb496 I managed to do this task, but in my case I need to change some codes to work as expected. I thought maybe it could help others so I decided to share (with two years delay :D).
I had an RGB buffer filled by directshow sample grabber that I needed to take a video from. RGB to YUV conversion from given answer didn't do the job for me. I did it like this :
int stride = m_width * 3;
int index = 0;
for (int y = 0; y < m_height; y++) {
for (int x = 0; x < stride; x++) {
int j = (size - ((y + 1)*stride)) + x;
m_rgbpic->data[0][j] = data[index];
++index;
}
}
data variable here is my RGB buffer (simple BYTE*) and size is data buffer size in bytes. It's start filling RGB AVFrame from bottom left to top right.
The other thing is that my version of FFMPEG didn't have av_packet_rescale_ts function. It's latest version but FFMPEG docs didn't say this function is deprecated anywhere, I guess this might be the case for windows only. Anyway I used av_rescale_q instead that does the same job. like this :
AVPacket pkt;
pkt.pts = av_rescale_q(pkt.pts, { 1, 25 }, m_stream->time_base);
And the last thing, using this format conversion I needed to change my swsContext to BGR24 instead of RGB24 like this :
m_convert_ctx = sws_getContext(width, height, AV_PIX_FMT_BGR24, width, height,
AV_PIX_FMT_YUV420P, SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);
avcodec_encode_video2 & avcodec_encode_audio2 seems to be deprecated. FFmpeg of Current Version (4.2) has new API: avcodec_send_frame & avcodec_receive_packet.

Problems to understand DXGI DirectX 11 Desktop Duplication to get a Buffer or Array

I want to understand DXGI Desktop Duplication. I have read a lot and this is the code I copied from parts of the DesktopDuplication sample on the Microsoft Website. My plan is to get the Buffer or Array from the DesktopImage because I want to make a new Texture for an other program. I hope somebody can explain me what I can do to get it.
void DesktopDublication::GetFrame(_Out_ FRAME_DATA* Data, _Out_ bool* Timeout)
{
IDXGIResource* DesktopResource = nullptr;
DXGI_OUTDUPL_FRAME_INFO FrameInfo;
// Get new frame
HRESULT hr = m_DeskDupl->AcquireNextFrame(500, &FrameInfo, &DesktopResource);
if (hr == DXGI_ERROR_WAIT_TIMEOUT)
{
*Timeout = true;
}
*Timeout = false;
if (FAILED(hr))
{
}
// If still holding old frame, destroy it
if (m_AcquiredDesktopImage)
{
m_AcquiredDesktopImage->Release();
m_AcquiredDesktopImage = nullptr;
}
// QI for IDXGIResource
hr = DesktopResource->QueryInterface(__uuidof(ID3D11Texture2D), reinterpret_cast<void **>(&m_AcquiredDesktopImage));
DesktopResource->Release();
DesktopResource = nullptr;
if (FAILED(hr))
{
}
// Get metadata
if (FrameInfo.TotalMetadataBufferSize)
{
// Old buffer too small
if (FrameInfo.TotalMetadataBufferSize > m_MetaDataSize)
{
if (m_MetaDataBuffer)
{
delete[] m_MetaDataBuffer;
m_MetaDataBuffer = nullptr;
}
m_MetaDataBuffer = new (std::nothrow) BYTE[FrameInfo.TotalMetadataBufferSize];
if (!m_MetaDataBuffer)
{
m_MetaDataSize = 0;
Data->MoveCount = 0;
Data->DirtyCount = 0;
}
m_MetaDataSize = FrameInfo.TotalMetadataBufferSize;
}
UINT BufSize = FrameInfo.TotalMetadataBufferSize;
// Get move rectangles
hr = m_DeskDupl->GetFrameMoveRects(BufSize, reinterpret_cast<DXGI_OUTDUPL_MOVE_RECT*>(m_MetaDataBuffer), &BufSize);
if (FAILED(hr))
{
Data->MoveCount = 0;
Data->DirtyCount = 0;
}
Data->MoveCount = BufSize / sizeof(DXGI_OUTDUPL_MOVE_RECT);
BYTE* DirtyRects = m_MetaDataBuffer + BufSize;
BufSize = FrameInfo.TotalMetadataBufferSize - BufSize;
// Get dirty rectangles
hr = m_DeskDupl->GetFrameDirtyRects(BufSize, reinterpret_cast<RECT*>(DirtyRects), &BufSize);
if (FAILED(hr))
{
Data->MoveCount = 0;
Data->DirtyCount = 0;
}
Data->DirtyCount = BufSize / sizeof(RECT);
Data->MetaData = m_MetaDataBuffer;
}
Data->Frame = m_AcquiredDesktopImage;
Data->FrameInfo = FrameInfo;
}
If I'm understanding you correctly, you want to get the current desktop image, duplicate it into a private texture, and then render that private texture onto your window. I would start by reading up on Direct3D 11 and learning how to render a scene, as you will need D3D to do anything with the texture object you get from DXGI. This, this, and this can get you started on D3D11. I would also spend some time reading through the source of the sample you copied your code from, as it completely explains how to do this. Here is the link to the full source code for that sample.
To actually get the texture data and render it out, you need to do the following:
1). Create a D3D11 Device object and a Device Context.
2). Write and compile a Vertex and Pixel shader for the graphics card, then load them into your application.
3). Create an Input Layout object and set it to the device.
4). Initialize the required Blend, Depth-Stencil, and Rasterizer states for the device.
5). Create a Texture object and a Shader Resource View object.
6). Acquire the Desktop Duplication texture using the above code.
7). Use CopyResource to copy the data into your texture.
8). Render that texture to the screen.
This will capture all data displayed on one of the desktops to your texture. It does not do processing on the dirty rects of the desktop. It does not do processing on moved regions. This is bare bones 'capture the desktop and display it elsewhere' code.
If you want to get more in depth, read the linked resources and study the sample code, as the sample basically does what you're asking for.
Since tacking this onto my last answer didn't feel quite right, I decided to create a second.
If you want to read the desktop data to a file, you need a D3D11 Device object, a texture object with the D3D11_USAGE_STAGING flag set, and a method of converting the RGBA pixel data of the desktop texture to whatever it is you want. The basic procedure is a simplified version of the one in my original answer:
1). Create a D3D11 Device object and a Device Context.
2). Create a Staging Texture with the same format as the Desktop Texture.
3). Use CopyResource to copy the Desktop Texture into your Staging Texture.
4). Use ID3D11DeviceContext::Map() to get a pointer to the data contained in the Staging Texture.
Make sure you know how Map works and make sure you can write out image files from a single binary stream. There may also be padding in the image buffer, so be aware you may also need to filter that out. Additionally, make sure you Unmap the buffer instead of calling free, as the buffer given to you almost certainly does not belong to the CRT.

How do I grab frames from a video stream on Windows 8 modern apps?

I am trying to extract images out of a mp4 video stream. After looking stuff up, it seems like the proper way of doing that is using Media Foundations in C++ and open the frame/read stuff out of it.
There's very little by way of documentation and samples, but after some digging, it seems like some people have had success in doing this by reading frames into a texture and copying the content of that texture to a memory-readable texture (I am not even sure if I am using the correct terms here). Trying what I found though gives me errors and I am probably doing a bunch of stuff wrong.
Here's a short piece of code from where I try to do that (project itself attached at the bottom).
ComPtr<ID3D11Texture2D> spTextureDst;
MEDIA::ThrowIfFailed(
m_spDX11SwapChain->GetBuffer(0, IID_PPV_ARGS(&spTextureDst))
);
auto rcNormalized = MFVideoNormalizedRect();
rcNormalized.left = 0;
rcNormalized.right = 1;
rcNormalized.top = 0;
rcNormalized.bottom = 1;
MEDIA::ThrowIfFailed(
m_spMediaEngine->TransferVideoFrame(m_spRenderTexture.Get(), &rcNormalized, &m_rcTarget, &m_bkgColor)
);
//copy the render target texture to the readable texture.
m_spDX11DeviceContext->CopySubresourceRegion(m_spCopyTexture.Get(),0,0,0,0,m_spRenderTexture.Get(),0,NULL);
m_spDX11DeviceContext->Flush();
//Map the readable texture;
D3D11_MAPPED_SUBRESOURCE mapped = {0};
m_spDX11DeviceContext->Map(m_spCopyTexture.Get(),0,D3D11_MAP_READ,0,&mapped);
void* buffer = ::CoTaskMemAlloc(600 * 400 * 3);
memcpy(buffer, mapped.pData,600 * 400 * 3);
//unmap so we can copy during next update.
m_spDX11DeviceContext->Unmap(m_spCopyTexture.Get(),0);
// and the present it to the screen
MEDIA::ThrowIfFailed(
m_spDX11SwapChain->Present(1, 0)
);
}
The error I get is:
First-chance exception at 0x76814B32 in App1.exe: Microsoft C++ exception: Platform::InvalidArgumentException ^ at memory location 0x07AFF60C. HRESULT:0x80070057
I am not really sure how to pursue it further it since, like I said, there's very little docs about it.
Here's the modified sample I am working off of. This question is specific for WinRT (Windows 8 apps).
UPDATE success!! see edit at bottom
Some partial success, but maybe enough to answer your question. Please read on.
On my system, debugging the exception showed that the OnTimer() function failed when attempting to call TransferVideoFrame(). The error it gave was InvalidArgumentException.
So, a bit of Googling led to my first discovery - there is apparently a bug in NVIDIA drivers - which means the video playback seems to fail with 11 and 10 feature levels.
So my first change was in function CreateDX11Device() as follows:
static const D3D_FEATURE_LEVEL levels[] = {
/*
D3D_FEATURE_LEVEL_11_1,
D3D_FEATURE_LEVEL_11_0,
D3D_FEATURE_LEVEL_10_1,
D3D_FEATURE_LEVEL_10_0,
*/
D3D_FEATURE_LEVEL_9_3,
D3D_FEATURE_LEVEL_9_2,
D3D_FEATURE_LEVEL_9_1
};
Now TransferVideoFrame() still fails, but gives E_FAIL (as an HRESULT) instead of an invalid argument.
More Googling led to my second discovery -
Which was an example showing use of TransferVideoFrame() without using CreateTexture2D() to pre-create the texture. I see you already had some code in OnTimer() similar to this but which was not used, so I guess you'd found the same link.
Anyway, I now used this code to get the video frame:
ComPtr <ID3D11Texture2D> spTextureDst;
m_spDX11SwapChain->GetBuffer (0, IID_PPV_ARGS (&spTextureDst));
m_spMediaEngine->TransferVideoFrame (spTextureDst.Get (), nullptr, &m_rcTarget, &m_bkgColor);
After doing this, I see that TransferVideoFrame() succeeds (good!) but calling Map() on your copied texture - m_spCopyTexture - fails because that texture wasn't created with CPU read access.
So, I just used your read/write m_spRenderTexture as the target of the copy instead because that has the correct flags and, due to the previous change, I was no longer using it.
//copy the render target texture to the readable texture.
m_spDX11DeviceContext->CopySubresourceRegion(m_spRenderTexture.Get(),0,0,0,0,spTextureDst.Get(),0,NULL);
m_spDX11DeviceContext->Flush();
//Map the readable texture;
D3D11_MAPPED_SUBRESOURCE mapped = {0};
HRESULT hr = m_spDX11DeviceContext->Map(m_spRenderTexture.Get(),0,D3D11_MAP_READ,0,&mapped);
void* buffer = ::CoTaskMemAlloc(176 * 144 * 3);
memcpy(buffer, mapped.pData,176 * 144 * 3);
//unmap so we can copy during next update.
m_spDX11DeviceContext->Unmap(m_spRenderTexture.Get(),0);
Now, on my system, the OnTimer() function does not fail. Video frames are rendered to the texture and the pixel data is copied out successfully to the memory buffer.
Before looking to see if there are further problems, maybe this is a good time to see if you can make the same progress as I have so far. If you comment on this answer with more info, I will edit the answer to add any more help if possible.
EDIT
Changes made to texture description in FramePlayer::CreateBackBuffers()
//make first texture cpu readable
D3D11_TEXTURE2D_DESC texDesc = {0};
texDesc.Width = 176;
texDesc.Height = 144;
texDesc.MipLevels = 1;
texDesc.ArraySize = 1;
texDesc.Format = DXGI_FORMAT_B8G8R8A8_UNORM;
texDesc.SampleDesc.Count = 1;
texDesc.SampleDesc.Quality = 0;
texDesc.Usage = D3D11_USAGE_STAGING;
texDesc.BindFlags = 0;
texDesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ | D3D11_CPU_ACCESS_WRITE;
texDesc.MiscFlags = 0;
MEDIA::ThrowIfFailed(m_spDX11Device->CreateTexture2D(&texDesc,NULL,&m_spRenderTexture));
Note also that there's a memory leak that needs to be cleared up sometime (I'm sure you're aware) - the memory allocated in the following line is never freed:
void* buffer = ::CoTaskMemAlloc(176 * 144 * 3); // sizes changed for my test
SUCCESS
I have now succeeded in saving an individual frame, but now without the use of the copy texture.
First, I downloaded the latest version of the DirectXTex Library, which provides DX11 texture helper functions, for example to extract an image from a texture and to save to file. The instructions for adding the DirectXTex library to your solution as an existing project need to be followed carefully, taking note of the changes needed for Windows 8 Store Apps.
Once, the above library is included, referenced and built, add the following #include's to FramePlayer.cpp
#include "..\DirectXTex\DirectXTex.h" // nb - use the relative path you copied to
#include <wincodec.h>
Finally, the central section of code in FramePlayer::OnTimer() needs to be similar to the following. You will see I just save to the same filename each time so this will need amending to add e.g. a frame number to the name
// new frame available at the media engine so get it
ComPtr<ID3D11Texture2D> spTextureDst;
MEDIA::ThrowIfFailed(m_spDX11SwapChain->GetBuffer(0, IID_PPV_ARGS(&spTextureDst)));
auto rcNormalized = MFVideoNormalizedRect();
rcNormalized.left = 0;
rcNormalized.right = 1;
rcNormalized.top = 0;
rcNormalized.bottom = 1;
MEDIA::ThrowIfFailed(m_spMediaEngine->TransferVideoFrame(spTextureDst.Get(), &rcNormalized, &m_rcTarget, &m_bkgColor));
// capture an image from the DX11 texture
DirectX::ScratchImage pImage;
HRESULT hr = DirectX::CaptureTexture(m_spDX11Device.Get(), m_spDX11DeviceContext.Get(), spTextureDst.Get(), pImage);
if (SUCCEEDED(hr))
{
// get the image object from the wrapper
const DirectX::Image *pRealImage = pImage.GetImage(0, 0, 0);
// set some place to save the image frame
StorageFolder ^dataFolder = ApplicationData::Current->LocalFolder;
Platform::String ^szPath = dataFolder->Path + "\\frame.png";
// save the image to file
hr = DirectX::SaveToWICFile(*pRealImage, DirectX::WIC_FLAGS_NONE, GUID_ContainerFormatPng, szPath->Data());
}
// and the present it to the screen
MEDIA::ThrowIfFailed(m_spDX11SwapChain->Present(1, 0));
I don't have time right now to take this any further but I'm very pleased with what I have achieved so far :-))
Can you take a fresh look and update your results in comments?
Look at the Video Thumbnail Sample and the Source Reader documentation.
You can find sample code under SDK Root\Samples\multimedia\mediafoundation\VideoThumbnail
I think OpenCV may help you.
OpenCV offers api to capture frames from camera or video files.
You can download it here http://opencv.org/downloads.html.
The following is a demo I writed with "OpenCV 2.3.1".
#include "opencv.hpp"
using namespace cv;
int main()
{
VideoCapture cap("demo.avi"); // open a video to capture
if (!cap.isOpened()) // check if succeeded
return -1;
Mat frame;
namedWindow("Demo", CV_WINDOW_NORMAL);
// Loop to capture frame and show on the window
while (1) {
cap >> frame;
if (frame.empty())
break;
imshow("Demo", frame);
if (waitKey(33) >= 0) // pause 33ms every frame
break;
}
return 0;
}

NVIDIA CUDA Video Encoder (NVCUVENC) input from device texture array

I am modifying CUDA Video Encoder (NVCUVENC) encoding sample found in SDK samples pack so that the data comes not from external yuv files (as is done in the sample ) but from cudaArray which is filled from texture.
So the key API method that encodes the frame is:
int NVENCAPI NVEncodeFrame(NVEncoder hNVEncoder, NVVE_EncodeFrameParams *pFrmIn, unsigned long flag, void *pData);
If I get it right the param :
CUdeviceptr dptr_VideoFrame
is supposed to pass the data to encode.But I really haven't understood how to connect it with some texture data on GPU.The sample source code is very vague about it as it works with CPU yuv files input.
For example in main.cpp , lines 555 -560 there is following block:
// If dptrVideoFrame is NULL, then we assume that frames come from system memory, otherwise it comes from GPU memory
// VideoEncoder.cpp, EncodeFrame() will automatically copy it to GPU Device memory, if GPU device input is specified
if (pCudaEncoder->EncodeFrame(efparams, dptrVideoFrame, cuCtxLock) == false)
{
printf("\nEncodeFrame() failed to encode frame\n");
}
So ,from the comment, it seems like dptrVideoFrame should be filled with yuv data coming from device to encode the frame.But there is no place where it is explained how to do so.
UPDATE:
I would like to share some findings.First , I managed to encode data from Frame Buffer texture.The problem now is that the output video is a mess.
That is the desired result:
Here is what I do :
On OpenGL side I have 2 custom FBOs-first gets the scene rendered normally into it .Then the texture from the first FBO is used to render screen quad into second FBO doing RGB -> YUV conversion in the fragment shader.
The texture attached to second FBO is mapped then to CUDA resource.
Then I encode the current texture like this:
void CUDAEncoder::Encode(){
NVVE_EncodeFrameParams efparams;
efparams.Height = sEncoderParams.iOutputSize[1];
efparams.Width = sEncoderParams.iOutputSize[0];
efparams.Pitch = (sEncoderParams.nDeviceMemPitch ? sEncoderParams.nDeviceMemPitch : sEncoderParams.iOutputSize[0]);
efparams.PictureStruc = (NVVE_PicStruct)sEncoderParams.iPictureType;
efparams.SurfFmt = (NVVE_SurfaceFormat)sEncoderParams.iSurfaceFormat;
efparams.progressiveFrame = (sEncoderParams.iSurfaceFormat == 3) ? 1 : 0;
efparams.repeatFirstField = 0;
efparams.topfieldfirst = (sEncoderParams.iSurfaceFormat == 1) ? 1 : 0;
if(_curFrame > _framesTotal){
efparams.bLast=1;
}else{
efparams.bLast=0;
}
//----------- get cuda array from the texture resource -------------//
checkCudaErrorsDrv(cuGraphicsMapResources(1,&_cutexResource,NULL));
checkCudaErrorsDrv(cuGraphicsSubResourceGetMappedArray(&_cutexArray,_cutexResource,0,0));
/////////// copy data into dptrvideo frame //////////
// LUMA based on CUDA SDK sample//////////////
CUDA_MEMCPY2D pcopy;
memset((void *)&pcopy, 0, sizeof(pcopy));
pcopy.srcXInBytes = 0;
pcopy.srcY = 0;
pcopy.srcHost= NULL;
pcopy.srcDevice= 0;
pcopy.srcPitch =efparams.Width;
pcopy.srcArray= _cutexArray;///SOME DEVICE ARRAY!!!!!!!!!!!!! <--------- to figure out how to fill this.
/// destination //////
pcopy.dstXInBytes = 0;
pcopy.dstY = 0;
pcopy.dstHost = 0;
pcopy.dstArray = 0;
pcopy.dstDevice=dptrVideoFrame;
pcopy.dstPitch = sEncoderParams.nDeviceMemPitch;
pcopy.WidthInBytes = sEncoderParams.iInputSize[0];
pcopy.Height = sEncoderParams.iInputSize[1];
pcopy.srcMemoryType=CU_MEMORYTYPE_ARRAY;
pcopy.dstMemoryType=CU_MEMORYTYPE_DEVICE;
// CHROMA based on CUDA SDK sample/////
CUDA_MEMCPY2D pcChroma;
memset((void *)&pcChroma, 0, sizeof(pcChroma));
pcChroma.srcXInBytes = 0;
pcChroma.srcY = 0;// if I uncomment this line I get error from cuda for incorrect value.It does work in CUDA SDK original sample SAMPLE//sEncoderParams.iInputSize[1] << 1; // U/V chroma offset
pcChroma.srcHost = NULL;
pcChroma.srcDevice = 0;
pcChroma.srcArray = _cutexArray;
pcChroma.srcPitch = efparams.Width >> 1; // chroma is subsampled by 2 (but it has U/V are next to each other)
pcChroma.dstXInBytes = 0;
pcChroma.dstY = sEncoderParams.iInputSize[1] << 1; // chroma offset (srcY*srcPitch now points to the chroma planes)
pcChroma.dstHost = 0;
pcChroma.dstDevice = dptrVideoFrame;
pcChroma.dstArray = 0;
pcChroma.dstPitch = sEncoderParams.nDeviceMemPitch >> 1;
pcChroma.WidthInBytes = sEncoderParams.iInputSize[0] >> 1;
pcChroma.Height = sEncoderParams.iInputSize[1]; // U/V are sent together
pcChroma.srcMemoryType = CU_MEMORYTYPE_ARRAY;
pcChroma.dstMemoryType = CU_MEMORYTYPE_DEVICE;
checkCudaErrorsDrv(cuvidCtxLock(cuCtxLock, 0));
checkCudaErrorsDrv( cuMemcpy2D(&pcopy));
checkCudaErrorsDrv( cuMemcpy2D(&pcChroma));
checkCudaErrorsDrv(cuvidCtxUnlock(cuCtxLock, 0));
//=============================================
// If dptrVideoFrame is NULL, then we assume that frames come from system memory, otherwise it comes from GPU memory
// VideoEncoder.cpp, EncodeFrame() will automatically copy it to GPU Device memory, if GPU device input is specified
if (_encoder->EncodeFrame(efparams, dptrVideoFrame, cuCtxLock) == false)
{
printf("\nEncodeFrame() failed to encode frame\n");
}
checkCudaErrorsDrv(cuGraphicsUnmapResources(1, &_cutexResource, NULL));
// computeFPS();
if(_curFrame > _framesTotal){
_encoder->Stop();
exit(0);
}
_curFrame++;
}
I set Encoder params from the .cfg files included with CUDA SDK Encoder sample.So here I use 704x480-h264.cfg setup .I tried all of them and getting always similarly ugly result.
I suspect the problem is somewhere in CUDA_MEMCPY2D for luma and chroma objects params setup .May be wrong pitch , width ,height dimensions.I set the viewport the same size as the video (704,480) and compared params to those used in CUDA SDK sample but got no clue where the problem is.
Anyone ?
First: I messed around with Cuda Video Encoder, and had lots of troubles to. But it Looks to me as if you convert it to Yuv values, but as a one on one Pixel conversion (like AYUV 4:4:4). Afaik you need the correct kind of YUV with padding and compression (color values for more than one Pixel like 4:2:0). A good overview of YUV-alignments can be seen here:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd206750(v=vs.85).aspx
As far as I remember you have to use NV12 alignment for Cuda Encoder.
nvEncoder application is used for codec conversion,for processing over GPU its used cuda and communicating with hardware it use API of nvEncoder.
in this application logic is read yuv data in input buffer and store that content in memory and then start encoding the frames.
and parallel write the encoding frame in to output file.
Handling of input buffer is available in nvRead function and it is available in nvFileIO.h
any other help required leave a message here...