ffmpeg how to efficiently decode the video frame? - c++

Here is the code I used to decode a rtsp stream in a worker thread:
while(1)
{
// Read a frame
if(av_read_frame(pFormatCtx, &packet)<0)
break; // Frame read failed (e.g. end of stream)
if(packet.stream_index==videoStream)
{
// Is this a packet from the video stream -> decode video frame
int frameFinished;
avcodec_decode_video2(pCodecCtx,pFrame,&frameFinished,&packet);
// Did we get a video frame?
if (frameFinished)
{
if (LastFrameOk == false)
{
LastFrameOk = true;
}
// Convert the image format (init the context the first time)
int w = pCodecCtx->width;
int h = pCodecCtx->height;
img_convert_ctx = ffmpeg::sws_getCachedContext(img_convert_ctx, w, h, pCodecCtx->pix_fmt, w, h, ffmpeg::PIX_FMT_RGB24, SWS_BICUBIC, NULL, NULL, NULL);
if (img_convert_ctx == NULL)
{
printf("Cannot initialize the conversion context!\n");
return false;
}
ffmpeg::sws_scale(img_convert_ctx, pFrame->data, pFrame->linesize, 0, pCodecCtx->height, pFrameRGB->data, pFrameRGB->linesize);
// Convert the frame to QImage
LastFrame = QImage(w, h, QImage::Format_RGB888);
for (int y = 0; y < h; y++)
memcpy(LastFrame.scanLine(y), pFrameRGB->data[0] + y*pFrameRGB->linesize[0], w * 3);
LastFrameOk = true;
} // frameFinished
} // stream_index==videoStream
av_free_packet(&packet); // Free the packet that was allocated by av_read_frame
}
I followed the ffmpeg's tutorial and used a while loop to read the packet and decode the video.
But is there a more efficient way to do this, like a event-triggered function when there is packet received?

I haven't seen any event driven approach for reading frames, but what is the purpose of reading RTSP stream? But I can give some recommendations for improving performance. First of all, you may add a very short sleep in your loop (e.g. Sleep(1);). In your program, if your purpose is to:
Display images to the user: Don't use conversion to RGB, after decoding, the resulting frame is in YUV420P format which can be directly displayed to the user using GPU without any CPU usage. Almost all graphics cards support YUV420P (or YV12) format. Conversion to RGB is a highly CPU-consuming operation, especially for large images.
Record (save) to disk: I you want to record the stream to play it later, there is no need to decode the frames. You may use OpenRTSP to record directly to the disk without any CPU usage.
Process realtime images: You may find alternative algorithms to process on YUV420P format instead of RGB. The Y plane in YUV420P is actually a grayscale version of the colored RGB images.

Related

Windows Media Foundation MFT buffering and video quality issues (Loss of colors, not so smooth curves, especially text)

I'm trying to encode RGBA buffers captured from an image (RGBA) source (Desktop/Camera) into raw H264 using Windows Media Foundation, transfer them and decode the raw H264 frames received at the other end in real time. I'm trying to achieve at least 30 fps. The encoder works pretty good but not the decoder.
I understand Microsoft WMF MFTs buffer up to 30 frames before emitting the encoded/decoded data.
The image source would emit frames only when there is a change occurs
and not a continuous stream of RGBA buffers, so my intention is to
obtain a buffer of encoded/decoded data for each and every input
buffer to the respective MFT so that I can stream the data in real
time and also render it.
Both the encoder and decoder are able to emit at least 10 to 15 fps when I make the image source to send continuous changes (by stimulating the changes). The encoder is able to utilize hardware acceleration support. I'm able to achieve up to 30 fps in the encoder end, and I'm yet to implement hardware assisted decoding using DirectX surfaces. Here the problem is not the frame rate but buffering of data by MFTs.
So, I tried to drain the decoder MFT by sending the MFT_MESSAGE_COMMAND_DRAIN command, and repeatedly calling ProcessOutput until the decoder returns MF_E_TRANSFORM_NEED_MORE_INPUT. What happens now is the decoder now emits only one frame per 30 input h264 buffers, I tested it with even a continuous stream of data, and the behavior is same. Looks like the decoder drops all the intermediate frames in a GOP.
It's okay for me if it buffers only first few frames, but my decoder implementation outputs only when it's buffer is full all the time even after the SPS and PPS parsing phase.
I come across Google's chromium source code (https://github.com/adobe/chromium/blob/master/content/common/gpu/media/dxva_video_decode_accelerator.cc), they follow the same approach.
mpDecoder->ProcessMessage(MFT_MESSAGE_COMMAND_DRAIN, NULL);
My implementation is based on
https://github.com/GameTechDev/ChatHeads/blob/master/VideoStreaming/EncodeTransform.cpp
and
https://github.com/GameTechDev/ChatHeads/blob/master/VideoStreaming/DecodeTransform.cpp
My questions are, Am I missing something? Is Windows Media Foundation suitable for real-time streaming?. Whether draining the encoder and decoder would work for real-time use cases?.
There are only two options for me, make this WMF work of real-time use case or go with something like Intel's QuickSync. I chose WMF for my POC because Windows Media Foundation implicitly supports Hardware/GPU/Software fallbacks in case any of the MFT is unavailable and it internally chooses best available MFT without much coding.
I'm facing video quality issues, though the bitrate property is set to 3Mbps. But it is least in priority compared to buffering problems. I have been beating my head around the keyboard for weeks, This is so hard to fix. Any help would be appreciated.
Code:
Encoder setup:
IMFAttributes* attributes = 0;
HRESULT hr = MFCreateAttributes(&attributes, 0);
if (attributes)
{
//attributes->SetUINT32(MF_SINK_WRITER_DISABLE_THROTTLING, TRUE);
attributes->SetGUID(MF_TRANSCODE_CONTAINERTYPE, MFTranscodeContainerType_MPEG4);
}//end if (attributes)
hr = MFCreateMediaType(&pMediaTypeOut);
// Set the output media type.
if (SUCCEEDED(hr))
{
hr = MFCreateMediaType(&pMediaTypeOut);
}
if (SUCCEEDED(hr))
{
hr = pMediaTypeOut->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
}
if (SUCCEEDED(hr))
{
hr = pMediaTypeOut->SetGUID(MF_MT_SUBTYPE, cVideoEncodingFormat); // MFVideoFormat_H264
}
if (SUCCEEDED(hr))
{
hr = pMediaTypeOut->SetUINT32(MF_MT_AVG_BITRATE, VIDEO_BIT_RATE); //18000000
}
if (SUCCEEDED(hr))
{
hr = MFSetAttributeRatio(pMediaTypeOut, MF_MT_FRAME_RATE, VIDEO_FPS, 1); // 30
}
if (SUCCEEDED(hr))
{
hr = MFSetAttributeSize(pMediaTypeOut, MF_MT_FRAME_SIZE, mStreamWidth, mStreamHeight);
}
if (SUCCEEDED(hr))
{
hr = pMediaTypeOut->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive);
}
if (SUCCEEDED(hr))
{
hr = pMediaTypeOut->SetUINT32(MF_MT_MPEG2_PROFILE, eAVEncH264VProfile_High);
}
if (SUCCEEDED(hr))
{
hr = MFSetAttributeRatio(pMediaTypeOut, MF_MT_PIXEL_ASPECT_RATIO, 1, 1);
}
if (SUCCEEDED(hr))
{
hr = pMediaTypeOut->SetUINT32(MF_MT_MAX_KEYFRAME_SPACING, 16);
}
if (SUCCEEDED(hr))
{
hr = pMediaTypeOut->SetUINT32(CODECAPI_AVEncCommonRateControlMode, eAVEncCommonRateControlMode_UnconstrainedVBR);//eAVEncCommonRateControlMode_Quality, eAVEncCommonRateControlMode_UnconstrainedCBR);
}
if (SUCCEEDED(hr))
{
hr = pMediaTypeOut->SetUINT32(CODECAPI_AVEncCommonQuality, 100);
}
if (SUCCEEDED(hr))
{
hr = pMediaTypeOut->SetUINT32(MF_MT_FIXED_SIZE_SAMPLES, FALSE);
}
if (SUCCEEDED(hr))
{
BOOL allSamplesIndependent = TRUE;
hr = pMediaTypeOut->SetUINT32(MF_MT_ALL_SAMPLES_INDEPENDENT, allSamplesIndependent);
}
if (SUCCEEDED(hr))
{
hr = pMediaTypeOut->SetUINT32(MF_MT_COMPRESSED, TRUE);
}
if (SUCCEEDED(hr))
{
hr = mpEncoder->SetOutputType(0, pMediaTypeOut, 0);
}
// Process the incoming sample. Ignore the timestamp & duration parameters, we just render the data in real-time.
HRESULT ProcessSample(IMFSample **ppSample, LONGLONG& time, LONGLONG& duration, TransformOutput& oDtn)
{
IMFMediaBuffer *buffer = nullptr;
DWORD bufferSize;
HRESULT hr = S_FALSE;
if (ppSample)
{
hr = (*ppSample)->ConvertToContiguousBuffer(&buffer);
if (SUCCEEDED(hr))
{
buffer->GetCurrentLength(&bufferSize);
hr = ProcessInput(ppSample);
if (SUCCEEDED(hr))
{
//hr = mpDecoder->ProcessMessage(MFT_MESSAGE_COMMAND_DRAIN, NULL);
//if (SUCCEEDED(hr))
{
while (hr != MF_E_TRANSFORM_NEED_MORE_INPUT)
{
hr = ProcessOutput(time, duration, oDtn);
}
}
}
else
{
if (hr == MF_E_NOTACCEPTING)
{
while (hr != MF_E_TRANSFORM_NEED_MORE_INPUT)
{
hr = ProcessOutput(time, duration, oDtn);
}
}
}
}
}
return (hr == MF_E_TRANSFORM_NEED_MORE_INPUT ? (oDtn.numBytes > 0 ? oDtn.returnCode : hr) : hr);
}
// Finds and returns the h264 MFT (given in subtype parameter) if available...otherwise fails.
HRESULT FindDecoder(const GUID& subtype)
{
HRESULT hr = S_OK;
UINT32 count = 0;
IMFActivate **ppActivate = NULL;
MFT_REGISTER_TYPE_INFO info = { 0 };
UINT32 unFlags = MFT_ENUM_FLAG_HARDWARE | MFT_ENUM_FLAG_ASYNCMFT;
info.guidMajorType = MFMediaType_Video;
info.guidSubtype = subtype;
hr = MFTEnumEx(
MFT_CATEGORY_VIDEO_DECODER,
unFlags,
&info,
NULL,
&ppActivate,
&count
);
if (SUCCEEDED(hr) && count == 0)
{
hr = MF_E_TOPO_CODEC_NOT_FOUND;
}
if (SUCCEEDED(hr))
{
hr = ppActivate[0]->ActivateObject(IID_PPV_ARGS(&mpDecoder));
}
CoTaskMemFree(ppActivate);
return hr;
}
// reconstructs the sample from encoded data
HRESULT ProcessData(char *ph264Buffer, DWORD bufferLength, LONGLONG& time, LONGLONG& duration, TransformOutput &dtn)
{
dtn.numBytes = 0;
dtn.pData = NULL;
dtn.returnCode = S_FALSE;
IMFSample *pSample = NULL;
IMFMediaBuffer *pMBuffer = NULL;
// Create a new memory buffer.
HRESULT hr = MFCreateMemoryBuffer(bufferLength, &pMBuffer);
// Lock the buffer and copy the video frame to the buffer.
BYTE *pData = NULL;
if (SUCCEEDED(hr))
hr = pMBuffer->Lock(&pData, NULL, NULL);
if (SUCCEEDED(hr))
memcpy(pData, ph264Buffer, bufferLength);
pMBuffer->SetCurrentLength(bufferLength);
pMBuffer->Unlock();
// Create a media sample and add the buffer to the sample.
if (SUCCEEDED(hr))
hr = MFCreateSample(&pSample);
if (SUCCEEDED(hr))
hr = pSample->AddBuffer(pMBuffer);
LONGLONG sampleTime = time - mStartTime;
// Set the time stamp and the duration.
if (SUCCEEDED(hr))
hr = pSample->SetSampleTime(sampleTime);
if (SUCCEEDED(hr))
hr = pSample->SetSampleDuration(duration);
hr = ProcessSample(&pSample, sampleTime, duration, dtn);
::Release(&pSample);
::Release(&pMBuffer);
return hr;
}
// Process the output sample for the decoder
HRESULT ProcessOutput(LONGLONG& time, LONGLONG& duration, TransformOutput& oDtn/*output*/)
{
IMFMediaBuffer *pBuffer = NULL;
DWORD mftOutFlags;
MFT_OUTPUT_DATA_BUFFER outputDataBuffer;
IMFSample *pMftOutSample = NULL;
MFT_OUTPUT_STREAM_INFO streamInfo;
memset(&outputDataBuffer, 0, sizeof outputDataBuffer);
HRESULT hr = mpDecoder->GetOutputStatus(&mftOutFlags);
if (SUCCEEDED(hr))
{
hr = mpDecoder->GetOutputStreamInfo(0, &streamInfo);
}
if (SUCCEEDED(hr))
{
hr = MFCreateSample(&pMftOutSample);
}
if (SUCCEEDED(hr))
{
hr = MFCreateMemoryBuffer(streamInfo.cbSize, &pBuffer);
}
if (SUCCEEDED(hr))
{
hr = pMftOutSample->AddBuffer(pBuffer);
}
if (SUCCEEDED(hr))
{
DWORD dwStatus = 0;
outputDataBuffer.dwStreamID = 0;
outputDataBuffer.dwStatus = 0;
outputDataBuffer.pEvents = NULL;
outputDataBuffer.pSample = pMftOutSample;
hr = mpDecoder->ProcessOutput(0, 1, &outputDataBuffer, &dwStatus);
}
if (SUCCEEDED(hr))
{
hr = GetDecodedBuffer(outputDataBuffer.pSample, outputDataBuffer, time, duration, oDtn);
}
if (pBuffer)
{
::Release(&pBuffer);
}
if (pMftOutSample)
{
::Release(&pMftOutSample);
}
return hr;
}
// Write the decoded sample out
HRESULT GetDecodedBuffer(IMFSample *pMftOutSample, MFT_OUTPUT_DATA_BUFFER& outputDataBuffer, LONGLONG& time, LONGLONG& duration, TransformOutput& oDtn/*output*/)
{
// ToDo: These two lines are not right. Need to work out where to get timestamp and duration from the H264 decoder MFT.
HRESULT hr = outputDataBuffer.pSample->SetSampleTime(time);
if (SUCCEEDED(hr))
{
hr = outputDataBuffer.pSample->SetSampleDuration(duration);
}
if (SUCCEEDED(hr))
{
hr = pMftOutSample->ConvertToContiguousBuffer(&pDecodedBuffer);
}
if (SUCCEEDED(hr))
{
DWORD bufLength;
hr = pDecodedBuffer->GetCurrentLength(&bufLength);
}
if (SUCCEEDED(hr))
{
byte *pEncodedYUVBuffer;
DWORD buffCurrLen = 0;
DWORD buffMaxLen = 0;
pDecodedBuffer->GetCurrentLength(&buffCurrLen);
pDecodedBuffer->Lock(&pEncodedYUVBuffer, &buffMaxLen, &buffCurrLen);
ColorConversion::YUY2toRGBBuffer(pEncodedYUVBuffer,
buffCurrLen,
mpRGBABuffer,
mStreamWidth,
mStreamHeight,
mbEncodeBackgroundPixels,
mChannelThreshold);
pDecodedBuffer->Unlock();
::Release(&pDecodedBuffer);
oDtn.pData = mpRGBABuffer;
oDtn.numBytes = mStreamWidth * mStreamHeight * 4;
oDtn.returnCode = hr; // will be S_OK..
}
return hr;
}
Update:
The decoder's output is satisfactory now after enabling CODECAPI_AVLowLatency Mode, But with 2 seconds delay in the stream compared to the sender, I'm able to achieve 15 to 20fps that's a lot better compared to previous. The quality detoriates when there are more number of changes pushed from the source to the encoder. I'm yet to implement hardware accelerated decoding.
Update2:
I figured out that the timestamp and duration settings are the ones that affect the quality of the video if set improperly. The thing is, my image source does not emit frames at a constant rate, but it looks like the encoder and decoders expect constant frame rate. When I set the duration as constant and increment the sample time in constant steps the video quality seems to be better but not the greatest though. I don't think what I'm doing is the correct approach. Is there any way to specify the encoder and decoder about the variable frame rate?
Update3:
I'm able to get acceptable performance from both encoders and decoders after setting CODECAPI_AVEncMPVDefaultBPictureCount (0), and CODECAPI_AVEncCommonLowLatency properties. Yet to explore hardware accelerated decoding. I hope I would be able to achieve the best performance if hardware decoding is implemented.
The quality of the video is still poor, edges & curves are not sharp. Text looks blurred, and it's not acceptable. The quality is okay for videos and images but not for texts and shapes.
Update4
It seems some of the color information is getting lost in the YUV subsampling phase. I tried converting the RGBA buffer to YUV2 and then back, The color loss is visible but not bad though. The loss due to YUV conversion is not as bad as the quality of the image that is being rendered after RGBA-> YUV2 -> H264 -> YUV2 -> RGBA conversion. It's evident that not just YUV2 conversion the sole reason for the loss of quality but also the H264 encoder that further causes aliasing. I would still have obtained a better video quality if the H264 encode doesn't introduce aliasing effects. I'm going to explore WMV CODECs. The only thing that still bothers me is this, the code works pretty well and is able to capture the screen and save the stream in mp4 format in a file. The only difference here is that I'm using Media foundation transform with MFVideoFormat_YUY2 input format compared to the sink writer approach with MFVideoFormat_RGB32 as input type in the mentioned code. I still have some hope that it is possible to achieve better quality through Media Foundation itself. The thing is MFTEnum/ProcessInput fails if I specify MFVideoFormat_ARGB32 as input format in MFT_REGISTER_TYPE_INFO (MFTEnum)/SetInputType respectively.
Original:
Decoded image (After RGBA -> YUV2 -> H264 -> YUV2 -> RGBA conversion):
Click to open in the new tab to view the full image so that you can
see the aliasing effect.
Most consumer H.264 encoders sub-sample the color information to 4:2:0. (RGB to YUV)
This means before the encode process even starts your RGB bitmap losses 75% of the color information.
H.264 was more designed for natural content rather than screen capture.
But there are codecs that are specifically designed to achieve good compression for screen content. For example: https://learn.microsoft.com/en-us/windows/desktop/medfound/usingthewindowsmediavideo9screencodec
Even if you increase the bitrate of your H.264 encode - you are working only with 25% of the original color information to start with.
So your format changes look like this:
You start with 1920x1080 red, green and blue pixels. You transform to YUV. Now you have 1920x1080 luma, Cb and Cr. where Cb and Cr are color difference components. This is just a different way of representing colors. Now you scale the Cb and Cr plane to 1/4 of their original size. So your resulting Cb and Cr channels are around 960x540 and your luma plane is still 1920x1080. By scaling your color information from 1920x1080 to 960x540 - you are down to 25% of the original size.
Then the full size luma plane and 25% color difference channels are passed into the encoder. This level of reducing the color information is called subsampling to 4:2:0. The subsampled input is required by the encoder and is done automatically by the media framework. There is not much you can do to escape it - outside from choosing a different format.
R = red
G = green
B = blue
Y = luminescence
U = blue difference (Cb)
V = red difference (Cr)
YUV is used to separate out a luma signal (Y) that can be stored with high resolution or transmitted at high bandwidth,
and two chroma components (U and V) that can be bandwidth-reduced, subsampled,
compressed, or otherwise treated separately for improved system efficiency.
(Wikipedia)
Original format
RGB (4:4:4) 3 bytes per pixel
R R R R R R R R R R R R R R R R
G G G G G G G G G G G G G G G G
B B B B B B B B B B B B B B B B
Encoder input format - before H.264 compression
YUV (4:2:0) 1.5 bytes per pixel (6 bytes per 4 pixel)
Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y
UV UV UV UV
I'm trying to understand your problem.
My program ScreenCaptureEncode uses default Microsoft encoder settings :
Profile : baseline
Level : 40
CODECAPI_AVEncCommonQuality : 70
Bitrate : 2000000
From my results, i think quality is good/acceptable.
You can change profile/level/bitrate with MF_MT_MPEG2_PROFILE/MF_MT_MPEG2_LEVEL/MF_MT_AVG_BITRATE.
For CODECAPI_AVEncCommonQuality, it seems like you are trying to use a locally registered encoder,
because you're on Win7, to set that value to 100, I guess.
but I do not think that will change things significantly.
So.
here is 3 screenshots with keyboard print screen :
the screen
the encoded screen, playing by a video player in fullscreen mode
the encoded screen, playing by a video player in a non fullscreen mode
The two last pictures are from the same video encoded file.
The video player introduces aliasing when not playing in fullscreen mode.
The same encoded file playing in fullscreen mode is not so bad, comparing to the original screen,
and with default encoder settings.
You should try this. I think we have to look at this more closely.
I think aliasing comes from your video player, and because not playing in fullscreen mode.
PS : I use the video player MPC-HC.
PS2: my program needs to be improved :
(not sure) use IDirect3D9Ex to improve buffered mechanism. On Windows7, for rendering, IDirect3D9Ex is better (no swap buffer).
Perhaps it's the same for capture screen (todo list).
I should use two threads, one for capture screen, and one for encoding.
EDIT
Did you read this :
CODECAPI_AVLowLatencyMode
Low-latency mode is useful for real-time communications or live capture, when latency should be minimized. However, low-latency mode might also reduce the decoding or encoding quality.
About why my program using MFVideoFormat_RGB32 and yours using MFVideoFormat_YUY2. By default, SinkWriter has converters enable. The SinkWriter converts MFVideoFormat_RGB32 to a compatible h264 encoder format. For Microsoft encoder, read this : H.264 Video Encoder
Input format :
MFVideoFormat_I420
MFVideoFormat_IYUV
MFVideoFormat_NV12
MFVideoFormat_YUY2
MFVideoFormat_YV12
So there is no MFVideoFormat_RGB32. The SinkWriter does the conversion using the Color Converter DSP, I think.
so definitely, the problem does not come from converting rgb to yuv, before encoding.
PS (last)
like Markus Schumann said ;
H.264 was more designed for natural content rather than screen capture.
He should have mentioned that the problem is particularly related to text capture.
You just have found encoder limitation. I just think that no encoder is optmized for text encoding, with an acceptable streching, like I mention with video player rendering.
You see aliasing on final video capture, because it is fixed information inside the movie. Playing this movie in fullscreen (same as capture) is OK.
On Windows, text is calculate according to the screen resolution. So display is always good.
this is my last conclusion.
After so much research and effort, the problems are fixed. Color quality problem was due to the software-based color conversion which leads to aliasing (RGB to YUV in the encoder and back at the decoder). Using a hardware-accelerated color convertor solved the aliasing and image quality problems.
Setting optimal values to CODECAPI_AVEncMPVGOPSize, CODECAPI_AVEncMPVDefaultBPictureCount and CODECAPI_AVEncCommonLowLatency properties solved the buffering problems.

Extracting KLV data from mp2 stream using C++ and ffmpeg

I have an mp2 stream that has klv metadata. I stored the klv in a file using ffmpeg command line:
ffmpeg -i input.mpg -map data-re -codec copy -f data output.klv
I now want to do this in c++. So, I have
FFMPEG setup …..
Then the main loop
// Read frames
while(av_read_frame(pFormatCtx, &packet) >= 0)
{
// Is this a packet from the video stream?
if(packet.stream_index == videoStream)
{
// Decode video frame
avcodec_decode_video2(pCodecCtx, pFrame, &frameFinished, &packet);
// Did we get a video frame?
if(frameFinished)
{
// Convert the image from its native format to RGB
sws_scale(sws_ctx, (uint8_t const * const *)pFrame->data,
pFrame->linesize, 0, pCodecCtx->height,
pFrameRGB->data, pFrameRGB->linesize);
QImage myImage(pFrameRGB->data[0], pCodecCtx->width, pCodecCtx->height, QImage::Format_RGB888);
QPixmap img(QPixmap::fromImage(myImage.scaled(ui->label->width(),ui->label->height(),Qt::KeepAspectRatio)));
ui->label->setPixmap(img);
QCoreApplication::processEvents();
}
}
else // klv stream
{
// Decode klv data
qDebug() << packet.buf->size;
for(int i=0; i<packet.buf->size; i++)
{
qDebug() << packet.buf->data[i];
}
}
The resulting klv output is different - I must be doing something wrong processing the packet. The frames are good and I'm viewing it in a qt label - so my ffmpeg setup is working on images but not the klv data.
My bad - this code is working - I was comparing the int output to the ffmpeg output being viewed in notepad - when I used notepad++ - I can make sense of the ffmpeg output and it does correlate :)

ffmpeg sws_scale YUV420p to RGBA not giving correct scaling result (c++)

I am trying to scale a decoded YUV420p frame(1018x700) via sws_scale to RGBA, I am saving data to a raw video file and then playing the raw video using ffplay to see the result.
Here is my code:
sws_ctx = sws_getContext(video_dec_ctx->width, video_dec_ctx->height,AV_PIX_FMT_YUV420P, video_dec_ctx->width, video_dec_ctx->height, AV_PIX_FMT_BGR32, SWS_LANCZOS | SWS_ACCURATE_RND, 0, 0, 0);
ret = avcodec_decode_video2(video_dec_ctx, yuvframe, got_frame, &pkt);
if (ret < 0) {
std::cout<<"Error in decoding"<<std::endl;
return ret;
}else{
//the source and destination heights and widths are the same
int sourceX = video_dec_ctx->width;
int sourceY = video_dec_ctx->height;
int destX = video_dec_ctx->width;
int destY = video_dec_ctx->height;
//declare destination frame
AVFrame avFrameRGB;
avFrameRGB.linesize[0] = destX * 4;
avFrameRGB.data[0] = (uint8_t*)malloc(avFrameRGB.linesize[0] * destY);
//scale the frame to avFrameRGB
sws_scale(sws_ctx, yuvframe->data, yuvframe->linesize, 0, yuvframe->height, avFrameRGB.data, avFrameRGB.linesize);
//write to file
fwrite(avFrameRGB.data[0], 1, video_dst_bufsize, video_dst_file);
}
Here is the result without scaling (i.e. in YUV420p Format)
Here is the after scaling while playing using ffplay (i.e. in RGBA format)
I run the ffplay using the following command ('video' is the raw video file)
ffplay -f rawvideo -pix_fmt bgr32 -video_size 1018x700 video
What should I fix to make the correct scaling happen to RGB32?
I found the solution, the problem here was that I was not using the correct buffer size to write to the file.
fwrite(avFrameRGB.data[0], 1, video_dst_bufsize, video_dst_file);
The variable video_dst_file was being taken from the return value of
video_dst_bufsize = av_image_alloc(yuvframe.data, yuvframe.linesize, destX, destY, AV_PIX_FMT_YUV420P, 1);
The solution is to get the return value from and use this in the fwrite statement:
video_dst_bufsize_RGB = av_image_alloc(avFrameRGB.data, avFrameRGB.linesize, destX, destY, AV_PIX_FMT_BGR32, 1);
fwrite(avFrameRGB.data[0], 1, video_dst_bufsize_RGB, video_dst_file);

FFMPEG decoding artifacts between keyframes

Marked question as outdated as using the deprecated avcodec_decode_video2
I'm currently experiencing artifacts when decoding video using ffmpegs api. On what I would assume to be intermediate frames, artifacts build slowly only from active movement in the frame. These artifacts build for 50-100 frames until I assume a keyframe resets them. Frames are then decoded correctly and the artifacts proceed to build again.
One thing that is bothering me is I have a few video samples that are 30fps(h264) that work correctly, but all of my 60fps videos(h264) experience the problem.
I don't currently have enough reputation to post an image, so hopefully this link will work.
http://i.imgur.com/PPXXkJc.jpg
int numBytes;
int frameFinished;
AVFrame* decodedRawFrame;
AVFrame* rgbFrame;
//Enum class for decoding results, used to break decode loop when a frame is gathered
DecodeResult retResult = DecodeResult::Fail;
decodedRawFrame = av_frame_alloc();
rgbFrame = av_frame_alloc();
if (!decodedRawFrame) {
fprintf(stderr, "Could not allocate video frame\n");
return DecodeResult::Fail;
}
numBytes = avpicture_get_size(PIX_FMT_RGBA, mCodecCtx->width,mCodecCtx->height);
uint8_t* buffer = (uint8_t *)av_malloc(numBytes*sizeof(uint8_t));
avpicture_fill((AVPicture *) rgbFrame, buffer, PIX_FMT_RGBA, mCodecCtx->width, mCodecCtx->height);
AVPacket packet;
while(av_read_frame(mFormatCtx, &packet) >= 0 && retResult != DecodeResult::Success)
{
// Is this a packet from the video stream?
if (packet.stream_index == mVideoStreamIndex)
{
// Decode video frame
int decodeValue = avcodec_decode_video2(mCodecCtx, decodedRawFrame, &frameFinished, &packet);
// Did we get a video frame?
if (frameFinished)// && rgbFrame->pict_type != AV_PICTURE_TYPE_NONE )
{
// Convert the image from its native format to RGB
int SwsFlags = SWS_BILINEAR;
// Accurate round clears up a problem where the start
// of videos have green bars on them
SwsFlags |= SWS_ACCURATE_RND;
struct SwsContext *ctx = sws_getCachedContext(NULL, mCodecCtx->width, mCodecCtx->height, mCodecCtx->pix_fmt, mCodecCtx->width, mCodecCtx->height,
PIX_FMT_RGBA, SwsFlags, NULL, NULL, NULL);
sws_scale(ctx, decodedRawFrame->data, decodedRawFrame->linesize, 0, mCodecCtx->height, rgbFrame->data, rgbFrame->linesize);
//if(count%5 == 0 && count < 105)
// DebugSavePPMImage(rgbFrame, mCodecCtx->width, mCodecCtx->height, count);
++count;
// Viewable frame is a struct to hold buffer and frame together in a queue
ViewableFrame frame;
frame.buffer = buffer;
frame.frame = rgbFrame;
mFrameQueue.push(frame);
retResult = DecodeResult::Success;
sws_freeContext(ctx);
}
}
// Free the packet that was allocated by av_read_frame
av_free_packet(&packet);
}
// Check for end of file leftover frames
if(retResult != DecodeResult::Success)
{
int result = av_read_frame(mFormatCtx, &packet);
if(result < 0)
isEoF = true;
av_free_packet(&packet);
}
// Free the YUV frame
av_frame_free(&decodedRawFrame);
I'm attempting to build a queue of the decoded frames that I then use and free as needed. Is my seperation of the frames causing the intermediate frames to be decoded incorrectly? I also break the decoding loop once I've successfully gathered a frame(Decode::Success, most examples I've seen tend to loop through the whole video.
All codec contect, video stream information, and format contexts are setup up exactly as shown in the main function of https://github.com/chelyaev/ffmpeg-tutorial/blob/master/tutorial01.c
Any suggestions would be greatly appreciated.
For reference if someone finds themselves in a similar position. Apparently with some of the older versions of FFMPEG there's an issue when using sws_scale to convert an image and not changing the actual dimensions of the final frame. If instead you create a flag for the SwsContext using:
int SwsFlags = SWS_BILINEAR; //Whatever you want
SwsFlags |= SWS_ACCURATE_RND; // Under the hood forces ffmpeg to use the same logic as if scaled
SWS_ACCURATE_RND has a performance penalty but for regular video it's probably not that noticeable. This will remove the splash of green, or green bars along the edges of textures if present.
I wanted to thank Multimedia Mike, and George Y, they were also right in that the way I was decoding the frame wasn't preserving the packets correctly and that was what caused the video artifacts building from previous frames.

Converting QImage to YUV420P pixel format

Has anybody solved this problem earlier? I need simple and fast method to convert QImage::bits() buffer from RGB32 to YUV420P pixel format. Can you help me?
libswscale, part of the ffmpeg project has optimized routines to perform colorspace conversions, scaling, and filtering. If you really want speed, I would suggest using it unless you cannot add the extra dependency. I haven't actually tested this code, but here is the general idea:
QImage img = ... //your image in RGB32
//allocate output buffer. use av_malloc to align memory. YUV420P
//needs 1.5 times the number of pixels (Cb and Cr only use 0.25
//bytes per pixel on average)
char* out_buffer = (char*)av_malloc((int)ceil(img.height() * img.width() * 1.5));
//allocate ffmpeg frame structures
AVFrame* inpic = avcodec_alloc_frame();
AVFrame* outpic = avcodec_alloc_frame();
//avpicture_fill sets all of the data pointers in the AVFrame structures
//to the right places in the data buffers. It does not copy the data so
//the QImage and out_buffer still need to live after calling these.
avpicture_fill((AVPicture*)inpic,
img.bits(),
AV_PIX_FMT_ARGB,
img.width(),
img.height());
avpicture_fill((AVPicture*)outpic,
out_buffer,
AV_PIX_FMT_YUV420P,
img.width(),
img.height());
//create the conversion context. you only need to do this once if
//you are going to do the same conversion multiple times.
SwsContext* ctx = sws_getContext(img.width(),
img.height(),
AV_PIX_FMT_ARGB,
img.width(),
img.height(),
AV_PIX_FMT_YUV420P,
SWS_BICUBIC,
NULL, NULL, NULL);
//perform the conversion
sws_scale(ctx,
inpic->data,
inpic->linesize,
0,
img.height(),
outpic->data,
outpic->linesize);
//free memory
av_free(inpic);
av_free(outpic);
//...
//free output buffer when done with it
av_free(out_buffer);
Like I said, I haven't tested this code so it may require some tweaks to get it working.