Windows Media Foundation Degrades In Windows 8 - c++

I have a workflow as follows:
Get raw YUV frame.
Pass it in to Windows Media Foundation to encode in to an H.264 frame.
Convert the output to an FFmpeg AVPacket.
Inject the packet with av_interleaved_write_frame to an output file mixed with other things.
On Windows 7, this worked great. On Windows 8, av_interleaved_write_frame broke. The reason for this is that Windows 8 introduced B-Frames to the output, which av_interleaved_write_frame just didn't like, no matter how I set the pts/dts.
I modified the encoder to use 0 B-Frames, which then gave me the output I wanted. But...
After about 10-15 seconds of encoded frames, the video degrades from nearly perfect to extremely blocky and a very low frame rate. I've tried changing most of the settings available to Windows 8 to modify the encoder, but nothing seems to help.
The only thing that did make an affect was changing the bitrate of the encoder. The more I increase the encoder bitrate, the longer the video goes before it starts to degrade.
Any ideas on what changed between Windows 7 and Windows 8 that may have caused this to happen?
Encoder Setup Minus Success Checks (They All Succeed)
IMFMediaType * mInputType = NULL;
MFCreateMediaType(&mOutputType);
mOutputType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
mOutputType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264);
mOutputType->SetUINT32(MF_MT_AVG_BITRATE, 10000000);
MFSetAttributeSize(mOutputType, MF_MT_FRAME_SIZE, frameWidth, frameHeight);
MFSetAttributeSize(mOutputType, MF_MT_FRAME_RATE, 30, 1);
MFSetAttributeSize(mOutputType, MF_MT_FRAME_RATE_RANGE_MAX, 30, 1);
MFSetAttributeSize(mOutputType, MF_MT_FRAME_RATE_RANGE_MIN, 15, 1);
mOutputType->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive);
mOutputType->SetUINT32(MF_MT_ALL_SAMPLES_INDEPENDENT, 1);
mOutputType->SetUINT32(MF_MT_FIXED_SIZE_SAMPLES, 1);
mOutputType->SetUINT32(MF_MT_SAMPLE_SIZE, frameWidth * frameHeight * 2);
mOutputType->SetUINT32(MF_MT_MPEG2_PROFILE, eAVEncH264VProfile_Main);
mOutputType->SetUINT32(CODECAPI_AVEncCommonRateControlMode, eAVEncCommonRateControlMode_Quality);
mOutputType->SetUINT32(CODECAPI_AVEncCommonQuality, 80);
Encoding of the frame is basically:
MFCreateMemoryBuffer to store the incoming YUV data.
MFCreateSample.
Attach the buffer to the sample.
Set the sample time and duration.
ProcessInput
ProcessOutput with an the proper output size
On success, build an AVPacket with the sample's info

Related

Media Foundation: H264 Encoder Dropping Frames

I am trying to encode frame data from a monitor onto an MP4 file using MFVideoFormat_H264 and a sink writer on Media Foundation using MFCreateSinkWriterFromURL. I configured my input IMFMediaType to contain
inputMediaType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
inputMediaType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB32);
inputMediaType->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive);
MFSetAttributeRatio(inputMediaType, MF_MT_FRAME_RATE, 60, 1);
MFSetAttributeRatio(inputMediaType, MF_MT_FRAME_RATE_RANGE_MAX, 120, 1);
FSetAttributeRatio(inputMediaType, MF_MT_FRAME_RATE_RANGE_MIN, 1, 1);
MFSetAttributeRatio(inputMediaType, MF_MT_PIXEL_ASPECT_RATIO, 1, 1);
MFSetAttributeSize(inputMediaType, MF_MT_FRAME_SIZE, 1920, 1080);
all on a 1080p monitor at 60Hz refresh rate. My outputMediaType is similar other than
outMediaType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264);
outputMediaType->SetUINT32(MF_MT_AVG_BITRATE, 10000000);
The sink writer itself is also configured such that MF_SINK_WRITER_DISABLE_THROTTLING=TRUE and
MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS=TRUE to get the best possible performance out of it using hardware acceleration when available. Everything works and videos get created successfully. However, each video seems to have stutter across the entire duration. I've attempted lowering the bitrate and raising the average FPS to try and compensate but its more putting a bandaid on it. My assumption is that there are dropped frames causing this stutter as a result of a bucket overflowing?
Is anyone aware of a fix for this issue of frame drops/stuttering in the final video file while retaining the h264 encoding format?
EDIT: I've also tinkered with the different attributes on the input and output types setting
hr = MFSetAttributeRatio(pMediaTypeIN/OUT, MF_MT_FRAME_RATE_RANGE_MIN, 60, 1); but to no avail.

Setting bitrate of video in FFmpeg

I use FFmpeg to record videos from a RTSP stream (the codec is H.264). It works. But I face a problem with the bitrate value. First, I set bitrate like below, but it doesn't work:
AVCodecContext *m_c;
m_c->bit_rate = bitrate_value;
Following this question I can set bitrate manually with this command:
av_opt_set(m_c->priv_data, "crf", "39", AV_OPT_SEARCH_CHILDREN);
But I have to test several times to choose value '39', which creates acceptable video quality. It's hard to do it again if I use another camera setting (image width, height, etc). Is there a way to set bitrate more easily, and adaptively?

FFmpeg: How to estimate number of samples in audio stream?

I'm currently writing a small application that's making use of the FFmpeg library in order to decode audio files (especially avformat and swresample) in C++.
Now I need the total number of samples in an audio stream. I know that the exact number can only be found out by actually decoding all the frames, I just need an estimation. What is the preferred method here? How can I find out the duration of a file?
There's some good info in this question about how to get info out of ffmpeg: FFMPEG Can't Display The Duration Of a Video.
To work out the number of samples in an audio stream, you need three basic bits of info:
The duration (in seconds)
The sample rate (in samples per second)
The number of channels in the stream (e.g. 2 for stereo)
Once you have that info, the total number of samples in your stream is simply [duration] * [rate] * [channels].
Note that this is not equivalent to bytes, as the samples are likely to be at least 16 bit, and possibly 24.
I believe what you need is the formula that is AUDIORATE / FRAMERATE. For instance, if ar=48000, and frame rate of video is let's say 50fps then 48000/50 = 960 samples per frame you need.
Buffer calculation comes later as samples_per_frame * nChannels * (audiobit/8).
AudioBit is usually 16bit (24 or 32bits also possible). So for 8 channels audio at 16bit 48Khz, you'll need 960 * 8 * 2 = 15360 bytes per audio frame.
Offical way to do this last calculation is to use :
av_samples_get_buffer_size(NULL, nChannels, SamplesPerFrame, audio_st->codec->sample_fmt, 0)
function.
av_samples_get_buffer_size(NULL, 8, 960, audio_st->codec->sample_fmt, 0)
will return also 15360 (For experts: yes I'm assuming format is pcm_s16le).
So this answers first part of your question. Hope that helps.

Writing 4:2:0 YUV-Rawdata into an AVI-File via DirectShow in C++

I'm trying to write some 4:2:0 rawdata received from a capture card into an AVI-File. For every pixel the char buffer contains 2 Bytes (16 Bit). The order of the data is the same as FOURCC UYVY: YUV 4:2:2 (Y sample at every pixel, U and V sampled at every second pixel horizontally on each line). A macropixel contains 2 pixels in 1 u_int32.
First I tried the OpenCV Videowriter. But this is simply too slow for this huge amount of video data (I'm capturing 2 video streams, each is 1080p25 format), so I switched to the "Video for Windows"-Library by Windows. But even this one does't proceed the file writing in real time. My last chance is Directshow. I want to use the AVI Mux and the File Writer Filters to store my raw data as an AVI-File, but I'm not shure how to "give" the AVI Mux my raw data (char array) which contains just video data in UYVY-order and no audio. Maybe you can give me an advice. This is what I've got until now:
CoInitialize(NULL);
IGraphBuilder*pGraph= NULL;
CoCreateInstance(CLSID_FilterGraph, NULL,CLSCTX_INPROC_SERVER,IID_IGraphBuilder, (void **)&pGraph);
IMediaControl*pMediaControl= NULL;
pGraph->QueryInterface(IID_IMediaControl,(void **)&pMediaControl);
ICaptureGraphBuilder2 *pCapture= NULL;
CoCreateInstance(CLSID_CaptureGraphBuilder2, NULL,CLSCTX_INPROC,IID_ICaptureGraphBuilder2, (void **)&pCapture);
IBaseFilter *pMux;
pCapture->SetOutputFileName(&MEDIASUBTYPE_Avi,L"Test.avi",&pMux,NULL);
IBaseFilter *pCap;
pCapture->RenderStream(&PIN_CATEGORY_CAPTURE, &MEDIATYPE_Video,pCap,NULL,pMux);
Thx a lot and Regards,
Valentin
(As you mentioned 10 fps in previous question, which I assume to be effective frame rate) Are you writing dual 1920x1080 12 bits per pixel 10 fps into a file? This is 60 megabytes per second, you might be just hitting your HDD writing capacity limit.
Choosing different API is not going to help if your HDD is not powerful enough. You need to either compress data, or lower resolution or FPS. Or use faster drives.

DirectShow video playing too fast when audio pin rendering data

I'm working on a custom Windows DirectShow source filter based on CSource and CSourceStream for each pin. There are two pins - video output and audio output. Both pins work fine when individually rendered in graphedit and similar tools such as Graph Studio with correct time stamps, frame rates and sound. I'm rendering the video to the Video Mixing Renderer (VMR7 or VMR9).
However when I render both pins the video plays back too fast while the audio still sounds correct. The video plays back approximately 50% too fast but I think this is limited by the speed of decoding.
The timestamps on the samples are the same in both cases. If I render the audio stream to a null renderer (the one in qedit.dll) then the video stream plays back at the correct frame rate. The filter is a 32 bit filter running on a Win7 x64 system.
When I added support for IMediaSeeking seeking I found that the seeking bar for the audio stream behaved quite bizarrely. However the problem happens without IMediaSeeking support.
Any suggestions for what could be causing this or suggestions for further investigation?
The output types from the audio and video pin are pasted below:
Mediatyp: Video Subtype: RGB24 Format: Type VideoInfo Video Size: 1024 x 576 Pixel, 24 Bit Image Size: 1769472 Bytes Compression: RGB Source: width 0, height 0 Target: width 0, height 0 Bitrate: 0 bits/sec. Errorrate: 0 bits/sec. Avg. display time: 41708 µsec.
Mediatyp: Video Subtype: RGB32 Format: Type VideoInfo Video Size: 1024 x 576 Pixel, 32 Bit Image Size: 2359296 Bytes Compression: RGB Source: width 0, height 0 Target: width 0, height 0 Bitrate: 0 bits/sec. Errorrate: 0 bits/sec. Avg. display time: 41708 µsec.
Majortyp: Audio
Subtype: PCM audio
Sample Size: 3
Type WaveFormatEx
Wave Format: Unknown
Channels: 1
Samples/sec.: 48000
Avg. bytes/sec.:144000
Block align: 3
Bits/sample: 24
I realised the problem straight after posting the question. A case of debugging by framing the question correctly.
The audio stream had completely bogus time stamps. The audio and video streams played back fine individually but did not synch at all with each other when played together.