Windows MFT (Media Foundation Transform) decoder not returning proper sample time or duration - c++

To decode a H264 stream with the Windows Media foundation Transform, the work flow is currently something like this:
IMFSample sample;
sample->SetTime(time_in_ns);
sample->SetDuration(duration_in_ns);
sample->AddBuffer(buffer);
// Feed IMFSample to decoder
mDecoder->ProcessInput(0, sample, 0);
// Get output from decoder.
/* create outputsample that will receive content */ { ... }
MFT_OUTPUT_DATA_BUFFER output = {0};
output.pSample = outputsample;
DWORD status = 0;
HRESULT hr = mDecoder->ProcessOutput(0, 1, &output, &status);
DWORD status = 0;
hr = mDecoder->ProcessOutput(0, 1, &output, &status);
if (output.pEvents) {
// We must release this, as per the IMFTransform::ProcessOutput()
// MSDN documentation.
output.pEvents->Release();
output.pEvents = nullptr;
}
if (hr == MF_E_TRANSFORM_STREAM_CHANGE) {
// Type change, probably geometric aperture change.
// Reconfigure decoder output type, so that GetOutputMediaType()
} else if (hr == MF_E_TRANSFORM_NEED_MORE_INPUT) {
// Not enough input to produce output.
} else if (!output.pSample) {
return S_OK;
} else }
// Process output
}
}
When we have fed all data to the MFT decoder, we must drain it:
mDecoder->ProcessMessage(MFT_MESSAGE_COMMAND_DRAIN, 0);
Now, one thing with the WMF H264 decoder, is that it will typically not output anything before having been called with over 30 compressed h264 frames regardless of the size of the h264 sliding window. Latency is very high...
I'm encountering an issue that is very troublesome.
With a video made only of keyframes, and which has only 15 frames, each being 2s long, the first frame having a presentation time of non-zero (this stream is from live content, so first frame is typically in epos time)
So without draining the decoder, nothing will come out of the decoder as it hasn't received enough frames.
However, once the decoder is drained, the decoded frame will come out. HOWEVER, the MFT decoder has set all durations to 33.6ms only and the presentation time of the first sample coming out is always 0.
The original duration and presentation time have been lost.
If you provide over 30 frames to the h264 decoder, then both duration and pts are valid...
I haven't yet found a way to get the WMF decoder to output samples with the proper value.
It appears that if you have to drain the decoder before it has output any samples by itself, then it's totally broken...
Has anyone experienced such problems? How did you get around it?
Thank you in advance
Edit: a sample of the video is available on http://people.mozilla.org/~jyavenard/mediatest/fragmented/1301869.mp4
Playing this video with Firefox will causes it to play extremely quickly due to the problems described above.

I'm not sure that your work flow is correct. I think you should do something like this:
do
{
...
hr = mDecoder->ProcessInput(0, sample, 0);
if(FAILED(hr))
break;
...
hr = mDecoder->ProcessOutput(0, 1, &output, &status);
if(FAILED(hr) && hr != MF_E_TRANSFORM_NEED_MORE_INPUT)
break;
}
while(hr == MF_E_TRANSFORM_NEED_MORE_INPUT);
if(SUCCEEDED(hr))
{
// You have a valid decoded frame here
}
The idea is to keep calling ProcessInput/ProcessOuptut while ProcessOutput returns MF_E_TRANSFORM_NEED_MORE_INPUT. MF_E_TRANSFORM_NEED_MORE_INPUT means that decoder needs more input. I think that with this loop you won't need to drain the decoder.

Related

IMFTransfomer::ProcessInput() and MF_E_TRANSFORM_NEED_MORE_INPUT

I have code that decodes AAC-encoded audio using IMFTransform. It works well for various test inputs. But I observed that in some cases IMFTransform::ProcessOutput() returns MF_E_TRANSFORM_NEED_MORE_INPUT when according to my reading of MS documentation it should return a valid data sample.
Basically the code has the following structure:
IMFTransform* transformer;
MFT_OUTPUT_DATA_BUFFER output_data_buffer;
...
bool try_to_get_output = false;
for (;;) {
if (try_to_get_output) {
// Try to get the outpu sample.
try_to_get_output = false;
output_data_buffer.dwStatus = 0;
...
hr = transformer->ProcessOutput(...&output_data_buffer);
if (success) {
// process sample
if (output_data_buffer.dwStatus & MFT_OUTPUT_DATA_BUFFER_INCOMPLETE) {
// We have more data
try_to_get_output = true;
}
} else if (hr == MF_E_TRANSFORM_NEED_MORE_INPUT) {
Log("Unnecessary ProcessOutput()");
} else {
// Process other errors
}
continue;
}
// Send more encoded AAC data to MFT.
hr->ProcessInput();
}
What happens is that ProcessOutput() sets MFT_OUTPUT_DATA_BUFFER_INCOMPLETE in MFT_OUTPUT_DATA_BUFFER.dwStatus but then the following ProcessOutput() always returns MF_E_TRANSFORM_NEED_MORE_INPUT contradicting the documentation.
Again, so far it seems harmless and things works. But then what exactly does AAC decoder want to tell the caller via setting MFT_OUTPUT_DATA_BUFFER_INCOMPLETE?
This might be a small glitch in the decoder implementation. Quite possible that if you happen to drain the MFT it would spit out some data, so the incompletion flag migth indicate, a bit confusingly, some data even though not immediately accessible.
However overall the idea is to do ProcessOutput sucking the output data for as long as possible until yщu get MF_E_TRANSFORM_NEED_MORE_INPUT, and then proceed with feeding new input (or draining). That is, I would say MF_E_TRANSFORM_NEED_MORE_INPUT is much more important compared to MFT_OUTPUT_DATA_BUFFER_INCOMPLETE. After all this is what Microsoft's own code over MFTs does.
Also keep in mind that AAC decoder is an "old", "first generation" MFT and so over years its update could be such that it diverted a bit from the current docs.

WASAPI captured packets do not align

I'm trying to visualize a soundwave captured by WASAPI loopback but find that the packets I record do not form a smooth wave when put together.
My understanding of how the WASAPI capture client works is that when I call pCaptureClient->GetBuffer(&pData, &numFramesAvailable, &flags, NULL, NULL) the buffer pData is filled from the front with numFramesAvailable datapoints. Each datapoint is a float and they alternate by channel. Thus to get all available datapoints I should cast pData to a float pointer, and take the first channels * numFramesAvailable values. Once I release the buffer and call GetBuffer again it provides the next packet. I would assume that these packets would follow on from each other but it doesn't seem to be the case.
My guess is that either I'm making an incorrect assumption about the format of the audio data in pData or the capture client is either missing or overlapping frames. But have no idea how to check these.
To make the code below as brief as possible I've removed things like error status checking and cleanup.
Initialization of capture client:
const CLSID CLSID_MMDeviceEnumerator = __uuidof(MMDeviceEnumerator);
const IID IID_IMMDeviceEnumerator = __uuidof(IMMDeviceEnumerator);
const IID IID_IAudioClient = __uuidof(IAudioClient);
const IID IID_IAudioCaptureClient = __uuidof(IAudioCaptureClient);
pAudioClient = NULL;
IMMDeviceEnumerator * pDeviceEnumerator = NULL;
IMMDevice * pDeviceEndpoint = NULL;
IAudioClient *pAudioClient = NULL;
IAudioCaptureClient *pCaptureClient = NULL;
int channels;
// Initialize audio device endpoint
CoInitialize(nullptr);
CoCreateInstance(CLSID_MMDeviceEnumerator, NULL, CLSCTX_ALL, IID_IMMDeviceEnumerator, (void**)&pDeviceEnumerator );
pDeviceEnumerator ->GetDefaultAudioEndpoint(eRender, eConsole, &pDeviceEndpoint );
// init audio client
WAVEFORMATEX *pwfx = NULL;
REFERENCE_TIME hnsRequestedDuration = 10000000;
REFERENCE_TIME hnsActualDuration;
audio_device_endpoint->Activate(IID_IAudioClient, CLSCTX_ALL, NULL, (void**)&pAudioClient);
pAudioClient->GetMixFormat(&pwfx);
pAudioClient->Initialize(AUDCLNT_SHAREMODE_SHARED, AUDCLNT_STREAMFLAGS_LOOPBACK, hnsRequestedDuration, 0, pwfx, NULL);
channels = pwfx->nChannels;
pAudioClient->GetService(IID_IAudioCaptureClient, (void**)&pCaptureClient);
pAudioClient->Start(); // Start recording.
Capture of packets (note that std::mutex packet_buffer_mutex and vector<vector<float>> packet_bufferare already be defined and used by another thread to safely display the data):
UINT32 packetLength = 0;
BYTE *pData = NULL;
UINT32 numFramesAvailable;
DWORD flags;
int max_packets = 8;
std::unique_lock<std::mutex>write_guard(packet_buffer_mutex, std::defer_lock);
while (true) {
pCaptureClient->GetNextPacketSize(&packetLength);
while (packetLength != 0)
{
// Get the available data in the shared buffer.
pData = NULL;
pCaptureClient->GetBuffer(&pData, &numFramesAvailable, &flags, NULL, NULL);
if (flags & AUDCLNT_BUFFERFLAGS_SILENT)
{
pData = NULL; // Tell CopyData to write silence.
}
write_guard.lock();
if (packet_buffer.size() == max_packets) {
packet_buffer.pop_back();
}
if (pData) {
float * pfData = (float*)pData;
packet_buffer.emplace(packet_buffer.begin(), pfData, pfData + channels * numFramesAvailable);
} else {
packet_buffer.emplace(packet_buffer.begin());
}
write_guard.unlock();
hpCaptureClient->ReleaseBuffer(numFramesAvailable);
pCaptureClient->GetNextPacketSize(&packetLength);
}
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
I store the packets in a vector<vector<float>> (where each vector<float> is a packet) removing the last one and inserting the newest at the start so I can iterate over them in order.
Below is the result of a captured sinewave, plotting alternating values so it only represents a single channel. It is clear where the packets are being stitched together.
Something is playing a sine wave to Windows; you're recording the sine wave back in the audio loopback; and the sine wave you're getting back isn't really a sine wave.
You're almost certainly running into glitches. The most likely causes of glitching are:
Whatever is playing the sine wave to Windows isn't getting data to Windows in time, so the buffer is running dry.
Whatever is reading the loopback data out of Windows isn't reading the data in time, so the buffer is filling up.
Something is going wrong in between playing the sine wave to Windows and reading it back.
It is possible that more than one of these are happening.
The IAudioCaptureClient::GetBuffer call will tell you if you read the data too late. In particular it will set *pdwFlags so that the AUDCLNT_BUFFERFLAGS_DATA_DISCONTINUITY bit is set.
Looking at your code, I see you're doing the following things between the GetBuffer and the WriteBuffer:
Waiting on a lock
Sometimes doing something called "pop_back"
Doing something called "emplace"
I quote from the above-linked documentation:
Clients should avoid excessive delays between the GetBuffer call that acquires a packet and the ReleaseBuffer call that releases the packet. The implementation of the audio engine assumes that the GetBuffer call and the corresponding ReleaseBuffer call occur within the same buffer-processing period. Clients that delay releasing a packet for more than one period risk losing sample data.
In particular you should NEVER DO ANY OF THE FOLLOWING between GetBuffer and ReleaseBuffer because eventually they will cause a glitch:
Wait on a lock
Wait on any other operation
Read from or write to a file
Allocate memory
Instead, pre-allocate a bunch of memory before calling IAudioClient::Start. As each buffer arrives, write to this memory. On the side, have a regularly scheduled work item that takes written memory and writes it to disk or whatever you're doing with it.

How to read YUV8 data from avi file?

I have avi file that contains uncompressed gray video data. I need to extract frames from it. The size of file is 22 Gb.
How do i do that?
I have already tried ffmpeg, but it gives me "could not find codec parameters for video stream" message - because there is no codec at work, just frames.
Since Opencv just uses ffmpeg to read video, that rules out opencv as well.
The only path that seems to be left is to try and dig into the raw data, but i do not know how.
Edit: this is the code i use to read from the file with opencv. The failure occurs inside the second if. Running ffmpeg binary on the file also fails with the message above (could not find codec aprameters etc)
/* register all formats and codecs */
av_register_all();
/* open input file, and allocate format context */
if (avformat_open_input(&fmt_ctx, src_filename, NULL, NULL) < 0) {
fprintf(stderr, "Could not open source file %s\n", src_filename);
ret = 1;
goto end;
}
fmt_ctx->seek2any = true;
/* retrieve stream information */
int res = avformat_find_stream_info(fmt_ctx, NULL);
if (res < 0) {
fprintf(stderr, "Could not find stream information\n");
ret = 1;
goto end;
}
Edit:
Here is sample code i have tried to make the extraction: pastebin. The result i get is an unchanging buffer after every call to AVIStreamRead.
If you do not need cross platform functionality Video for Windows (VFW) API is a good alternative (http://msdn.microsoft.com/en-us/library/windows/desktop/dd756808(v=vs.85).aspx), i will not put an entire code block, since there's quite much to do, but you should be able to figure it out from the reference link. Basically, you do a AVIFileOpen, then get the video stream via AVIFileGetStream with streamtypeVIDEO, or alternatively do it at once with AVIStreamOpenFromFile and then read samples from the stream with AVIStreamRead. If you get to a point where you fail I can try to help, but it should be pretty straightforward.
Also, not sure why ffmpeg is failing, I have been doing raw AVI reading with ffmpeg without any codecs involved, can you post what call to ffpeg actually fails?
EDIT:
For the issue that you are seeing when the read data size is 0. The AVI file has N slots for frames in each second where N is the fps of the video. In real life the samples won't come exactly at that speed (e.g. IP surveillance cameras) so the actual data sample indexes can be non continuous like 1,5,11,... and VFW would insert empty samples between them (that is from where you read a sample with a zero size). What you have to do is call AVIStreamRead with NULL as buffer and 0 as size until the bRead is not 0 or you run past last sample. When you get an actual size, then you can again call AVIStreamRead on that sample index with the buffer pointer and size. I usually do compressed video so i don't use the suggested size, but at least according to your code snipplet I would do something like this:
...
bRead = 0;
do
{
aviOpRes = AVIStreamRead(ppavi,smpS,1,NULL,0,&bRead,&smpN);
} while (bRead == 0 && ++smpS < si.dwLength + si.dwStart);
if(smpS >= si.dwLength + si.dwStart)
break;
PUCHAR tempBuffer = new UCHAR[bRead];
aviOpRes = AVIStreamRead(ppavi,smpS,1,tempBuffer,bRead,&bRead,&smpN);
/* do whatever you need */
delete tempBuffer;
...
EDIT 2:
Since this may come in handy to someone or yourself to make a choice between VFW and FFMPEG I also updated your FFMPEG example so that it parsed the same file (sorry for the code quality since it lacks error checking but i guess you can see the logical flow):
/* register all formats and codecs */
av_register_all();
AVFormatContext* fmt_ctx = NULL;
/* open input file, and allocate format context */
const char *src_filename = "E:\\Output.avi";
if (avformat_open_input(&fmt_ctx, src_filename, NULL, NULL) < 0) {
fprintf(stderr, "Could not open source file %s\n", src_filename);
abort();
}
/* retrieve stream information */
int res = avformat_find_stream_info(fmt_ctx, NULL);
if (res < 0) {
fprintf(stderr, "Could not find stream information\n");
abort();
}
int video_stream_index = 0; /* video stream is usualy 0 but still better to lookup in case it's not present */
for(; video_stream_index < fmt_ctx->nb_streams; ++video_stream_index)
{
if(fmt_ctx->streams[video_stream_index]->codec->codec_type == AVMEDIA_TYPE_VIDEO)
break;
}
if(video_stream_index == fmt_ctx->nb_streams)
abort();
AVPacket *packet = new AVPacket;
while(av_read_frame(fmt_ctx, packet) == 0)
{
if (packet->stream_index == video_stream_index)
printf("Sample nr %d\n", packet->pts);
av_free_packet(packet);
}
Basically you open the context and read packets from it. You will get both audio and video packets so you should check if the packet belongs to the stream of interest. FFMPEG will save you the trouble with empty frames and give only those samples that have data in them.

AVI created with AVIStreamWrite has incorrect length and playback speed

I'm trying to write to an AVI file using AVIStreamWrite but the resulting avi file is a bit messed up. The images in the avi contain the proper image and colors but the duration and speed of the video is off. I recorded a video that should have been around 7 seconds and looking at the file properties in Windows explorer it showed it had a duration of about 2 seconds. When I played it in Media Player it was too short and seemed to be playing very rapidly (motion in the video was like fast forward). I also can't seem to seek within the video using Media Player.
Here is what I'm doing...
//initialization
HRESULT AVIWriter::Init()
{
HRESULT hr = S_OK;
_hAVIFile = NULL;
_videoStream = NULL;
_frameCount = 0;
AVIFileInit();
::DeleteFileW(_filename);
hr = AVIFileOpen(&_hAVIFile,_filename,OF_WRITE|OF_CREATE, NULL);
if (hr != AVIERR_OK)
{
::cout << "AVI ERROR";
return 0;
}
/**************************************/
// Create a raw video stream in the file
::ZeroMemory(&_streamInfo, sizeof(_streamInfo));
_streamInfo.fccType = streamtypeVIDEO; // stream type
_streamInfo.fccHandler = 0; // No compressor
_streamInfo.dwScale = 1;
_streamInfo.dwRate = _aviFps; //this is 30
_streamInfo.dwSuggestedBufferSize = 0;
_streamInfo.dwSampleSize = 0;
SetRect( &_streamInfo.rcFrame, 0, 0,_bmi.biWidth , _bmi.biHeight );
hr = AVIFileCreateStream( _hAVIFile, // file pointer
&_videoStream,// returned stream pointer
&_streamInfo); // stream header
hr = AVIStreamSetFormat(_videoStream, 0,
&_bmi,
sizeof(_bmi));
return hr;
}
//call this when I receive a frame from my camera
HRESULT AVIWriter::AddFrameToAVI(BYTE* buffer)
{
HRESULT hr;
long size = _bmi.biHeight * _bmi.biWidth * 3;
hr = AVIStreamWrite(_videoStream, // stream pointer
_frameCount++, // time of this frame
1, // number to write
buffer, // pointer to data
size,// size of this frame
AVIIF_KEYFRAME, // flags....
NULL,
NULL);
return hr;
}
//call this when I am done
void AVIWriter::CloseAVI()
{
AVIStreamClose(_videoStream);
AVIFileClose(_hAVIFile);
AVIFileExit();
}
Now as a test I used DirectShow's GraphEdit to create a graph consisting of a VideoCapture Filter for this same camera and an AVI mux and created an avi file. The resulting AVI file was fine. The frame rate was 30 fps, the same that I am using. I queried both avi files (my 'bad' one and the 'good' one created with GraphEdit) using a call to AVIStreamInfo and the stream info was pretty much the same for both files. I would have expected either the samples per second or number of frames to be way off for my 'bad' avi but it wasn't. Am I doing something wrong that would cause my AVI to have the incorrect length and seem to play back at an increased speed?? I'm new to using VFW so any help is appreciated. Thanks
Frame time in the will will eventually be _frameCount / _aviFps, so it is either you are dropping your frames and they don't reach AVIStreamWrite, or alternatively if you prefer to skip a few frames in the file, you need to increment _frameCount respectively, to jump over frames to skip.

Filling CMediaType and IMediaSample from AVPacket for h264 video

I have searched and have found almost nothing, so I would really appreciate some help with my question.
I am writting a DirectShow source filter which uses libav to read and send downstream h264 packets from youtube's FLV file. But I can't find appropriate libav structure's fields to implement correctly filter's GetMediType() and FillBuffer(). Some libav fields is null. In consequence h264 decoder crashes in attempt to process received data.
Where am I wrong? In working with libav or with DirectShow interfaces? Maybe h264 requires additional processing when working with libav or I fill reference time incorrectly? Does someone have any links useful for writing DirectShow h264 source filter with libav?
Part of GetMediaType():
VIDEOINFOHEADER *pvi = (VIDEOINFOHEADER*) toMediaType->AllocFormatBuffer(sizeof(VIDEOINFOHEADER));
pvi->AvgTimePerFrame = UNITS_PER_SECOND / m_pFormatContext->streams[m_streamNo]->codec->sample_rate; //sample_rate is 0
pvi->dwBitRate = m_pFormatContext->bit_rate;
pvi->rcSource = videoRect;
pvi->rcTarget = videoRect;
//Bitmap
pvi->bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
pvi->bmiHeader.biWidth = videoRect.right;
pvi->bmiHeader.biHeight = videoRect.bottom;
pvi->bmiHeader.biPlanes = 1;
pvi->bmiHeader.biBitCount = m_pFormatContext->streams[m_streamNo]->codec->bits_per_raw_sample;//or should here be bits_per_coded_sample
pvi->bmiHeader.biCompression = FOURCC_H264;
pvi->bmiHeader.biSizeImage = GetBitmapSize(&pvi->bmiHeader);
Part of FillBuffer():
//Get buffer pointer
BYTE* pBuffer = NULL;
if (pSamp->GetPointer(&pBuffer) < 0)
return S_FALSE;
//Get next packet
AVPacket* pPacket = m_mediaFile.getNextPacket();
if (pPacket->data == NULL)
return S_FALSE;
//Check packet and buffer size
if (pSamp->GetSize() < pPacket->size)
return S_FALSE;
//Copy from packet to sample buffer
memcpy(pBuffer, pPacket->data, pPacket->size);
//Set media sample time
REFERENCE_TIME start = m_mediaFile.timeStampToReferenceTime(pPacket->pts);
REFERENCE_TIME duration = m_mediaFile.timeStampToReferenceTime(pPacket->duration);
REFERENCE_TIME end = start + duration;
pSamp->SetTime(&start, &end);
pSamp->SetMediaTime(&start, &end);
P.S. I've debugged my filter with hax264 decoder and it crashes on call to libav deprecated function img_convert().
Here is the MSDN link you need to build a correct H.264 media type: H.264 Video Types
You have to fill the right fields with the right values.
The AM_MEDIA_TYPE should contain the right MEDIASUBTYPE for h264.
And these are plain wrong :
pvi->bmiHeader.biWidth = videoRect.right;
pvi->bmiHeader.biHeight = videoRect.bottom;
You should use a width/height which is independent of the rcSource/rcTarget, due to the them being indicators, and maybe completely zero if you take them from some other filter.
pvi->bmiHeader.biBitCount = m_pFormatContext->streams[m_streamNo]->codec->bits_per_raw_sample;//or should here be bits_per_coded_sample
This only makes sense if biWidth*biHeight*biBitCount/8 are the true size of the sample. I do not think so ...
pvi->bmiHeader.biCompression = FOURCC_H264;
This must also be passed in the AM_MEDIA_TYPE in the subtype parameter.
pvi->bmiHeader.biSizeImage = GetBitmapSize(&pvi->bmiHeader);
This fails, because the fourcc is unknown to the function and the bitcount is plain wrong for this sample, due to not being a full frame.
You have to take a look at how the data stream is handled by the downstream h264 filter. This seems to be flawed.