AVI created with AVIStreamWrite has incorrect length and playback speed - c++

I'm trying to write to an AVI file using AVIStreamWrite but the resulting avi file is a bit messed up. The images in the avi contain the proper image and colors but the duration and speed of the video is off. I recorded a video that should have been around 7 seconds and looking at the file properties in Windows explorer it showed it had a duration of about 2 seconds. When I played it in Media Player it was too short and seemed to be playing very rapidly (motion in the video was like fast forward). I also can't seem to seek within the video using Media Player.
Here is what I'm doing...
//initialization
HRESULT AVIWriter::Init()
{
HRESULT hr = S_OK;
_hAVIFile = NULL;
_videoStream = NULL;
_frameCount = 0;
AVIFileInit();
::DeleteFileW(_filename);
hr = AVIFileOpen(&_hAVIFile,_filename,OF_WRITE|OF_CREATE, NULL);
if (hr != AVIERR_OK)
{
::cout << "AVI ERROR";
return 0;
}
/**************************************/
// Create a raw video stream in the file
::ZeroMemory(&_streamInfo, sizeof(_streamInfo));
_streamInfo.fccType = streamtypeVIDEO; // stream type
_streamInfo.fccHandler = 0; // No compressor
_streamInfo.dwScale = 1;
_streamInfo.dwRate = _aviFps; //this is 30
_streamInfo.dwSuggestedBufferSize = 0;
_streamInfo.dwSampleSize = 0;
SetRect( &_streamInfo.rcFrame, 0, 0,_bmi.biWidth , _bmi.biHeight );
hr = AVIFileCreateStream( _hAVIFile, // file pointer
&_videoStream,// returned stream pointer
&_streamInfo); // stream header
hr = AVIStreamSetFormat(_videoStream, 0,
&_bmi,
sizeof(_bmi));
return hr;
}
//call this when I receive a frame from my camera
HRESULT AVIWriter::AddFrameToAVI(BYTE* buffer)
{
HRESULT hr;
long size = _bmi.biHeight * _bmi.biWidth * 3;
hr = AVIStreamWrite(_videoStream, // stream pointer
_frameCount++, // time of this frame
1, // number to write
buffer, // pointer to data
size,// size of this frame
AVIIF_KEYFRAME, // flags....
NULL,
NULL);
return hr;
}
//call this when I am done
void AVIWriter::CloseAVI()
{
AVIStreamClose(_videoStream);
AVIFileClose(_hAVIFile);
AVIFileExit();
}
Now as a test I used DirectShow's GraphEdit to create a graph consisting of a VideoCapture Filter for this same camera and an AVI mux and created an avi file. The resulting AVI file was fine. The frame rate was 30 fps, the same that I am using. I queried both avi files (my 'bad' one and the 'good' one created with GraphEdit) using a call to AVIStreamInfo and the stream info was pretty much the same for both files. I would have expected either the samples per second or number of frames to be way off for my 'bad' avi but it wasn't. Am I doing something wrong that would cause my AVI to have the incorrect length and seem to play back at an increased speed?? I'm new to using VFW so any help is appreciated. Thanks

Frame time in the will will eventually be _frameCount / _aviFps, so it is either you are dropping your frames and they don't reach AVIStreamWrite, or alternatively if you prefer to skip a few frames in the file, you need to increment _frameCount respectively, to jump over frames to skip.

Related

WASAPI captured packets do not align

I'm trying to visualize a soundwave captured by WASAPI loopback but find that the packets I record do not form a smooth wave when put together.
My understanding of how the WASAPI capture client works is that when I call pCaptureClient->GetBuffer(&pData, &numFramesAvailable, &flags, NULL, NULL) the buffer pData is filled from the front with numFramesAvailable datapoints. Each datapoint is a float and they alternate by channel. Thus to get all available datapoints I should cast pData to a float pointer, and take the first channels * numFramesAvailable values. Once I release the buffer and call GetBuffer again it provides the next packet. I would assume that these packets would follow on from each other but it doesn't seem to be the case.
My guess is that either I'm making an incorrect assumption about the format of the audio data in pData or the capture client is either missing or overlapping frames. But have no idea how to check these.
To make the code below as brief as possible I've removed things like error status checking and cleanup.
Initialization of capture client:
const CLSID CLSID_MMDeviceEnumerator = __uuidof(MMDeviceEnumerator);
const IID IID_IMMDeviceEnumerator = __uuidof(IMMDeviceEnumerator);
const IID IID_IAudioClient = __uuidof(IAudioClient);
const IID IID_IAudioCaptureClient = __uuidof(IAudioCaptureClient);
pAudioClient = NULL;
IMMDeviceEnumerator * pDeviceEnumerator = NULL;
IMMDevice * pDeviceEndpoint = NULL;
IAudioClient *pAudioClient = NULL;
IAudioCaptureClient *pCaptureClient = NULL;
int channels;
// Initialize audio device endpoint
CoInitialize(nullptr);
CoCreateInstance(CLSID_MMDeviceEnumerator, NULL, CLSCTX_ALL, IID_IMMDeviceEnumerator, (void**)&pDeviceEnumerator );
pDeviceEnumerator ->GetDefaultAudioEndpoint(eRender, eConsole, &pDeviceEndpoint );
// init audio client
WAVEFORMATEX *pwfx = NULL;
REFERENCE_TIME hnsRequestedDuration = 10000000;
REFERENCE_TIME hnsActualDuration;
audio_device_endpoint->Activate(IID_IAudioClient, CLSCTX_ALL, NULL, (void**)&pAudioClient);
pAudioClient->GetMixFormat(&pwfx);
pAudioClient->Initialize(AUDCLNT_SHAREMODE_SHARED, AUDCLNT_STREAMFLAGS_LOOPBACK, hnsRequestedDuration, 0, pwfx, NULL);
channels = pwfx->nChannels;
pAudioClient->GetService(IID_IAudioCaptureClient, (void**)&pCaptureClient);
pAudioClient->Start(); // Start recording.
Capture of packets (note that std::mutex packet_buffer_mutex and vector<vector<float>> packet_bufferare already be defined and used by another thread to safely display the data):
UINT32 packetLength = 0;
BYTE *pData = NULL;
UINT32 numFramesAvailable;
DWORD flags;
int max_packets = 8;
std::unique_lock<std::mutex>write_guard(packet_buffer_mutex, std::defer_lock);
while (true) {
pCaptureClient->GetNextPacketSize(&packetLength);
while (packetLength != 0)
{
// Get the available data in the shared buffer.
pData = NULL;
pCaptureClient->GetBuffer(&pData, &numFramesAvailable, &flags, NULL, NULL);
if (flags & AUDCLNT_BUFFERFLAGS_SILENT)
{
pData = NULL; // Tell CopyData to write silence.
}
write_guard.lock();
if (packet_buffer.size() == max_packets) {
packet_buffer.pop_back();
}
if (pData) {
float * pfData = (float*)pData;
packet_buffer.emplace(packet_buffer.begin(), pfData, pfData + channels * numFramesAvailable);
} else {
packet_buffer.emplace(packet_buffer.begin());
}
write_guard.unlock();
hpCaptureClient->ReleaseBuffer(numFramesAvailable);
pCaptureClient->GetNextPacketSize(&packetLength);
}
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
I store the packets in a vector<vector<float>> (where each vector<float> is a packet) removing the last one and inserting the newest at the start so I can iterate over them in order.
Below is the result of a captured sinewave, plotting alternating values so it only represents a single channel. It is clear where the packets are being stitched together.
Something is playing a sine wave to Windows; you're recording the sine wave back in the audio loopback; and the sine wave you're getting back isn't really a sine wave.
You're almost certainly running into glitches. The most likely causes of glitching are:
Whatever is playing the sine wave to Windows isn't getting data to Windows in time, so the buffer is running dry.
Whatever is reading the loopback data out of Windows isn't reading the data in time, so the buffer is filling up.
Something is going wrong in between playing the sine wave to Windows and reading it back.
It is possible that more than one of these are happening.
The IAudioCaptureClient::GetBuffer call will tell you if you read the data too late. In particular it will set *pdwFlags so that the AUDCLNT_BUFFERFLAGS_DATA_DISCONTINUITY bit is set.
Looking at your code, I see you're doing the following things between the GetBuffer and the WriteBuffer:
Waiting on a lock
Sometimes doing something called "pop_back"
Doing something called "emplace"
I quote from the above-linked documentation:
Clients should avoid excessive delays between the GetBuffer call that acquires a packet and the ReleaseBuffer call that releases the packet. The implementation of the audio engine assumes that the GetBuffer call and the corresponding ReleaseBuffer call occur within the same buffer-processing period. Clients that delay releasing a packet for more than one period risk losing sample data.
In particular you should NEVER DO ANY OF THE FOLLOWING between GetBuffer and ReleaseBuffer because eventually they will cause a glitch:
Wait on a lock
Wait on any other operation
Read from or write to a file
Allocate memory
Instead, pre-allocate a bunch of memory before calling IAudioClient::Start. As each buffer arrives, write to this memory. On the side, have a regularly scheduled work item that takes written memory and writes it to disk or whatever you're doing with it.

DirectShow CSourceStream::FillBuffer unpredictable number of calls after Pause and Seek to the first frame

I have a Directshow File Source Filter which has audio and frame output pins. It is written in C++ based on this tutorial on MSDN. My filter opens the video by using Medialooks MFormats SDK and provides raw data to output pins. Two pins are directly connecting to renderer filters when they are rendered.
The problem occurs when I run the graph, pause the video and seek to the frame number 0. After a call to ChangeStart method in output frame pin, sometimes FillBuffer is called three times and frame 1 is shown on the screen instead of 0. When it is called two times, it shows the correct frame which is the frame 0.
Output pins are inherited from CSourceStream and CSourceSeeking classes. Here is my FillBuffer and ChangeStart methods of the output frame pin;
FillBuffer Method
HRESULT frame_pin::FillBuffer(IMediaSample *sample)
{
CheckPointer(sample, E_POINTER);
BYTE *frame_buffer;
sample->GetPointer(&frame_buffer);
// Check if the downstream filter is changing the format.
CMediaType *mt;
HRESULT hr = sample->GetMediaType(reinterpret_cast<AM_MEDIA_TYPE**>(&mt));
if (hr == S_OK)
{
auto new_width = reinterpret_cast<VIDEOINFOHEADER2*>(mt->pbFormat)->bmiHeader.biWidth;
auto old_witdh = reinterpret_cast<VIDEOINFOHEADER2*>(m_mt.pbFormat)->bmiHeader.biWidth;
if(new_width != old_witdh)
format_changed_ = true;
SetMediaType(mt);
DeleteMediaType(mt);
}
ASSERT(m_mt.formattype == FORMAT_VideoInfo2);
VIDEOINFOHEADER2 *vih = reinterpret_cast<VIDEOINFOHEADER2*>(m_mt.pbFormat);
CComPtr<IMFFrame> mf_frame;
{
CAutoLock lock(&shared_state_);
if (source_time_ >= m_rtStop)
return S_FALSE;
// mf_reader_ is a member external SDK instance which gets the frame data with this function call
hr = mf_reader_->SourceFrameConvertedGetByNumber(&av_props_, frame_number_, -1, &mf_frame, CComBSTR(L""));
if (FAILED(hr))
return hr;
REFERENCE_TIME start, stop = 0;
start = stream_time_;
stop = static_cast<REFERENCE_TIME>(tc_.get_stop_time() / m_dRateSeeking);
sample->SetTime(&start, &stop);
stream_time_ = stop;
source_time_ += (stop - start);
frame_number_++;
}
if (format_changed_)
{
CComPtr<IMFFrame> mf_frame_resized;
mf_frame->MFResize(eMFCC_YUY2, std::abs(vih->bmiHeader.biWidth), std::abs(vih->bmiHeader.biHeight), 0, &mf_frame_resized, CComBSTR(L""), CComBSTR(L""));
mf_frame = mf_frame_resized;
}
MF_FRAME_INFO mf_frame_info;
mf_frame->MFAllGet(&mf_frame_info);
memcpy(frame_buffer, reinterpret_cast<BYTE*>(mf_frame_info.lpVideo), mf_frame_info.cbVideo);
sample->SetActualDataLength(static_cast<long>(mf_frame_info.cbVideo));
sample->SetSyncPoint(TRUE);
sample->SetPreroll(FALSE);
if (discontinuity_)
{
sample->SetDiscontinuity(TRUE);
discontinuity_ = FALSE;
}
return S_OK;
}
ChangeStart Method
HRESULT frame_pin::ChangeStart()
{
{
CAutoLock lock(CSourceSeeking::m_pLock);
tc_.reset();
stream_time_ = 0;
source_time_ = m_rtStart;
frame_number_ = static_cast<int>(m_rtStart / frame_lenght_);
}
update_from_seek();
return S_OK;
}
From the Microsoft DirectShow documentation:
The CSourceSeeking class is an abstract class for implementing
seeking in source filters with one output pin.
CSourceSeeking is not recommended for a filter with more than one
output pin. The main issue is that only one pin should respond to
seeking requests. Typically this requires communication among the pins
and the filter.
And you have two output pins in your source filter.
The CSourceSeeking class can be extended to manage more than one output pin with custom coding. When seek commands come in they'll come through both input pins so you'll need to decide which pin is controlling seeking and ignore seek commands arriving at the other input pin.

Media Foundation Audio/Video capturing to MPEG4FileSink produces incorrect duration

I am working on media streaming application using Media Foundation framework. I've used some samples from internet and from Anton Polinger book. Unfortunately after saving streams into mp4 file metadata of file is corrupted. It has incorrect duration (according to time of work of my PC, 30 hours for instance), wrong bitrate. After long struggling I've fixed it for single stream (video or audio) but when i try to record both audio and video this problem returns again. Something is wrong with my topology but i can't understand what and probably there are some experts here?
I get audio and video source, wrap it into IMFCollection, create aggregate source by MFCreateAggregateSource.
I create source nodes for each source in aggregate source:
Com::IMFTopologyNodePtr
TopologyBuilder::CreateSourceNode(Com::IMFStreamDescriptorPtr
streamDescriptor)
{
HRESULT hr = S_OK;
Com::IMFTopologyNodePtr pNode;
// Create the topology node, indicating that it must be a source node.
hr = MFCreateTopologyNode(MF_TOPOLOGY_SOURCESTREAM_NODE, &pNode);
THROW_ON_FAIL(hr, "Unable to create topology node for source");
// Associate the node with the source by passing in a pointer to the media source,
// and indicating that it is the source
hr = pNode->SetUnknown(MF_TOPONODE_SOURCE, _sourceDefinition->GetMediaSource());
THROW_ON_FAIL(hr, "Unable to set source as object for topology node");
// Set the node presentation descriptor attribute of the node by passing
// in a pointer to the presentation descriptor
hr = pNode->SetUnknown(MF_TOPONODE_PRESENTATION_DESCRIPTOR, _sourceDefinition->GetPresentationDescriptor());
THROW_ON_FAIL(hr, "Unable to set MF_TOPONODE_PRESENTATION_DESCRIPTOR to node");
// Set the node stream descriptor attribute by passing in a pointer to the stream
// descriptor
hr = pNode->SetUnknown(MF_TOPONODE_STREAM_DESCRIPTOR, streamDescriptor);
THROW_ON_FAIL(hr, "Unable to set MF_TOPONODE_STREAM_DESCRIPTOR to node");
return pNode;
}
After that i connect each source to transform(H264 encoder and AAC encoder) and to MPEG4FileSink:
void TopologyBuilder::CreateFileSinkOutputNode(PCWSTR filePath)
{
HRESULT hr = S_OK;
DWORD sink_count;
Com::IMFByteStreamPtr byte_stream;
Com::IMFTransformPtr transform;
LPCWSTR lpcwstrFilePath = filePath;
hr = MFCreateFile(
MF_ACCESSMODE_WRITE, MF_OPENMODE_FAIL_IF_NOT_EXIST, MF_FILEFLAGS_NONE,
lpcwstrFilePath, &byte_stream);
THROW_ON_FAIL(hr, L"Unable to create and open file");
// Video stream
Com::IMFMediaTypePtr in_mf_video_media_type = _sourceDefinition->GetCurrentVideoMediaType();
Com::IMFMediaTypePtr out_mf_media_type = CreateMediaType(MFMediaType_Video, MFVideoFormat_H264);
hr = CopyType(in_mf_video_media_type, out_mf_media_type);
THROW_ON_FAIL(hr, L"Unable to copy type parameters");
if (GetSubtype(in_mf_video_media_type) != MEDIASUBTYPE_H264)
{
transform.Attach(CreateAndInitCoderMft(MFT_CATEGORY_VIDEO_ENCODER, out_mf_media_type));
THROW_ON_NULL(transform);
}
if (transform)
{
Com::IMFMediaTypePtr transformMediaType;
hr = transform->GetOutputCurrentType(0, &transformMediaType);
THROW_ON_FAIL(hr, L"Unable to get current output type");
UINT32 pcbBlobSize = 0;
hr = transformMediaType->GetBlobSize(MF_MT_MPEG_SEQUENCE_HEADER, &pcbBlobSize);
THROW_ON_FAIL(hr, L"Unable to get blob size of MF_MT_MPEG_SEQUENCE_HEADER");
std::vector<UINT8> blob(pcbBlobSize);
hr = transformMediaType->GetBlob(MF_MT_MPEG_SEQUENCE_HEADER, &blob.front(), blob.size(), NULL);
THROW_ON_FAIL(hr, L"Unable to get blob MF_MT_MPEG_SEQUENCE_HEADER");
hr = out_mf_media_type->SetBlob(MF_MT_MPEG_SEQUENCE_HEADER, &blob.front(), blob.size());
THROW_ON_FAIL(hr, L"Unable to set blob of MF_MT_MPEG_SEQUENCE_HEADER");
}
// Audio stream
Com::IMFMediaTypePtr out_mf_audio_media_type;
Com::IMFTransformPtr transformAudio;
Com::IMFMediaTypePtr mediaTypeTmp = _sourceDefinition->GetCurrentAudioMediaType();
Com::IMFMediaTypePtr in_mf_audio_media_type;
if (mediaTypeTmp != NULL)
{
std::unique_ptr<MediaTypesFactory> factory(new MediaTypesFactory());
if (!IsMediaTypeSupportedByAacEncoder(mediaTypeTmp))
{
UINT32 channels;
hr = mediaTypeTmp->GetUINT32(MF_MT_AUDIO_NUM_CHANNELS, &channels);
THROW_ON_FAIL(hr, L"Unable to get MF_MT_AUDIO_NUM_CHANNELS fron source media type");
in_mf_audio_media_type = factory->CreatePCM(factory->DEFAULT_SAMPLE_RATE, channels);
}
else
{
in_mf_audio_media_type.Attach(mediaTypeTmp.Detach());
}
out_mf_audio_media_type = factory->CreateAAC(in_mf_audio_media_type, factory->HIGH_ENCODED_BITRATE);
GUID subType = GetSubtype(in_mf_audio_media_type);
if (GetSubtype(in_mf_audio_media_type) != MFAudioFormat_AAC)
{
// add encoder to Aac
transformAudio.Attach(CreateAndInitCoderMft(MFT_CATEGORY_AUDIO_ENCODER, out_mf_audio_media_type));
}
}
Com::IMFMediaSinkPtr pFileSink;
hr = MFCreateMPEG4MediaSink(byte_stream, out_mf_media_type, out_mf_audio_media_type, &pFileSink);
THROW_ON_FAIL(hr, L"Unable to create mpeg4 media sink");
Com::IMFTopologyNodePtr pOutputNodeVideo;
hr = MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pOutputNodeVideo);
THROW_ON_FAIL(hr, L"Unable to create output node");
hr = pFileSink->GetStreamSinkCount(&sink_count);
THROW_ON_FAIL(hr, L"Unable to get stream sink count from mediasink");
if (sink_count == 0)
{
THROW_ON_FAIL(E_UNEXPECTED, L"Sink count should be greater than 0");
}
Com::IMFStreamSinkPtr stream_sink_video;
hr = pFileSink->GetStreamSinkByIndex(0, &stream_sink_video);
THROW_ON_FAIL(hr, L"Unable to get stream sink by index");
hr = pOutputNodeVideo->SetObject(stream_sink_video);
THROW_ON_FAIL(hr, L"Unable to set stream sink as output node object");
hr = _pTopology->AddNode(pOutputNodeVideo);
THROW_ON_FAIL(hr, L"Unable to add file sink output node");
pOutputNodeVideo = AddEncoderIfNeed(_pTopology, transform, in_mf_video_media_type, pOutputNodeVideo);
_outVideoNodes.push_back(pOutputNodeVideo);
Com::IMFTopologyNodePtr pOutputNodeAudio;
if (in_mf_audio_media_type != NULL)
{
hr = MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pOutputNodeAudio);
THROW_ON_FAIL(hr, L"Unable to create output node");
Com::IMFStreamSinkPtr stream_sink_audio;
hr = pFileSink->GetStreamSinkByIndex(1, &stream_sink_audio);
THROW_ON_FAIL(hr, L"Unable to get stream sink by index");
hr = pOutputNodeAudio->SetObject(stream_sink_audio);
THROW_ON_FAIL(hr, L"Unable to set stream sink as output node object");
hr = _pTopology->AddNode(pOutputNodeAudio);
THROW_ON_FAIL(hr, L"Unable to add file sink output node");
if (transformAudio)
{
Com::IMFTopologyNodePtr outputTransformNodeAudio;
AddTransformNode(_pTopology, transformAudio, pOutputNodeAudio, &outputTransformNodeAudio);
_outAudioNode = outputTransformNodeAudio;
}
else
{
_outAudioNode = pOutputNodeAudio;
}
}
}
When output type is applied on to audio transform, it has 15 attributes instead of 8, including MF_MT_AVG_BITRATE which should be applied to video as i understand. In my case it is 192000 and it is different of MF_MT_AVG_BITRATE on video stream.
My AAC media type is creating by this method:
HRESULT MediaTypesFactory::CopyAudioTypeBasicAttributes(IMFMediaType * in_media_type, IMFMediaType * out_mf_media_type) {
HRESULT hr = S_OK;
static const GUID AUDIO_MAJORTYPE = MFMediaType_Audio;
static const GUID AUDIO_SUBTYPE = MFAudioFormat_PCM;
out_mf_media_type->SetUINT32(MF_MT_AUDIO_BITS_PER_SAMPLE, AUDIO_BITS_PER_SAMPLE);
WAVEFORMATEX *in_wfx;
UINT32 wfx_size;
MFCreateWaveFormatExFromMFMediaType(in_media_type, &in_wfx, &wfx_size);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, in_wfx->nSamplesPerSec);
DEBUG_ON_FAIL(hr);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_NUM_CHANNELS, in_wfx->nChannels);
DEBUG_ON_FAIL(hr);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_AVG_BYTES_PER_SECOND, in_wfx->nAvgBytesPerSec);
DEBUG_ON_FAIL(hr);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_BLOCK_ALIGNMENT, in_wfx->nBlockAlign);
DEBUG_ON_FAIL(hr);
return hr;
}
It would be awesome if somebody can help me or explain where i am wrong.
Thanks.
In my project CaptureManager I faced with similar problem - while I have wrote code for recording live video from many web cams into the one file. After long time research of Media Foundation I found two important facts:
1. live sources - web cams and microphones do not start from 0 - according of specification samples from them should start from 0 time stamp - Live Sources - "The first sample should have a time stamp of zero." - but live sources set current system time.
2. I see from you code that you use Media Session - it is an object with IMFMediaSession interface. I think you create it from MFCreateMediaSession function. This function creates default version of session which is optimized for playing of media from file, which samples starts from 0 by default.
In my view,the main problem is that default Media Session does not check time stamp of media samples from source, because from media file they start from zero or from StartPosition. However, live sources do not start from 0 - they should, or must, but do not.
So, my advise - write class with IMFTransform which will be "Proxy" transform between source and encoder - this "Proxy" transform must fix time stamp of media samples from live source: 1. while it receive first media sample from live source, it save actual time stamp of the first media sample like reference time, and set time stamp of the first media sample to zero, all time stamps the next media samples from this live source must be subtracted by this reference time and set to time stamps of media samples.
Also, check code for calling of IMFFinalizableMediaSink.
Regards.
MP4 metadata might under some conditions be initialized incorrectly (e.g. like this), however in the scenario you described the problem is like to be the payload data and not the way you set up the pipeline in first place.
The decoders and converters are typically passing time stamps of samples through copying them from input to output, so they are not indicating a failure if something is wrong - you still have output that makes sense written into file. The sink might be having issues processing your data if you have sample time issues, very long recordings, overflow bug esp. in case of rates expressed with large numerators/denominators. Important is what sample times the sources produce.
You might want to try to record shorter recordings, also video only and audio only recording that might possibly help in identification of the stream which supplies the data leading to the problem.
Additionally, you might want to inspect the resulting MP4 file atoms/boxes to identify whether the header boxes have incorrect data or data itself is stamped incorrectly, on which track and how exactly (esp. starts okay and then does a weird gaps in the middle).

Media Foundation set video interlacing and decode

I have an MOV file and I want to decode it and have all frames as separate images.
So I try to configure an uncompressed media type in the following way:
// configure the source reader
IMFSourceReader* m_pReader;
MFCreateSourceReaderFromURL(filePath, NULL, &m_pReader);
// get the compressed media type
IMFMediaType* pFileVideoMediaType;
m_pReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, &pFileVideoMediaType);
// create new media type for uncompressed type
IMFMediaType* pTypeUncomp;
MFCreateMediaType(&pTypeUncomp);
// copy all settings from compressed to uncompressed type
pFileVideoMediaType->CopyAllItems(pTypeUncomp);
// set the uncompressed video attributes
pTypeUncomp->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB8);
pTypeUncomp->SetUINT32(MF_MT_ALL_SAMPLES_INDEPENDENT, TRUE);
pTypeUncomp->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive);
// set the new uncompressed type to source reader
m_pReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, 0, pTypeUncomp);
// get the full uncompressed media type
m_pReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, &pTypeUncomp);
I noticed that even I explicitly set the MF_MT_INTERLACE_MODE to MFVideoInterlace_Progressive the final configuration is still configured with the old mode MFVideoInterlace_MixedInterlaceOrProgressive.
Afterwards, I loop through all samples and look at their size:
IMFSample* videoSample = nullptr;
IMFMediaBuffer* mbuffer = nullptr;
LONGLONG llTimeStamp;
DWORD streamIndex, flags;
m_pReader->ReadSample(
MF_SOURCE_READER_FIRST_VIDEO_STREAM,
0, // Flags.
&streamIndex, // Receives the actual stream index.
&flags, // Receives status flags.
&llTimeStamp, // Receives the time stamp.
&videoSample) // Receives the sample or NULL.
videoSample->ConvertToContiguousBuffer(&mbuffer);
BYTE* videoData = nullptr;
DWORD sampleBufferLength = 0;
mbuffer->Lock(&videoData, nullptr, &sampleBufferLength);
cout << sampleBufferLength << endl;
And I get quite different sizes for the samples: from 31bytes to 18000bytes.
Even changing the format to MFVideoFormat_RGB32 does not change affect the sample sizes.
This question seems to have the same issue but the solution is not fixing it.
Any help on why I can't change the interlacing and how to properly decode video frames and get image data out of samples?
Many thanks in advance.
In order to make SourceReader convert the samples to RGB you need to create it like this:
IMFAttributes* pAttr = NULL;
MFCreateAttributes(&pAttr, 1);
pAttr->SetUINT32(MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS, TRUE);
pAttr->SetUINT32(MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING, TRUE);
IMFSourceReader* m_pReader;
throwIfFailed(MFCreateSourceReaderFromURL(filePath, pAttr, &m_pReader), Can't create source reader from url");
pAttr->Release();
Later, you shouldn't break from the cycle when MF_SOURCE_READERF_CURRENTMEDIATYPECHANGED occurs. Now you'll have all samples with the same size.
Otherwise you can use MFVideoFormat_NV12 subtype and then you won't need to specify MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING attribute when creating the reader.

Windows MFT (Media Foundation Transform) decoder not returning proper sample time or duration

To decode a H264 stream with the Windows Media foundation Transform, the work flow is currently something like this:
IMFSample sample;
sample->SetTime(time_in_ns);
sample->SetDuration(duration_in_ns);
sample->AddBuffer(buffer);
// Feed IMFSample to decoder
mDecoder->ProcessInput(0, sample, 0);
// Get output from decoder.
/* create outputsample that will receive content */ { ... }
MFT_OUTPUT_DATA_BUFFER output = {0};
output.pSample = outputsample;
DWORD status = 0;
HRESULT hr = mDecoder->ProcessOutput(0, 1, &output, &status);
DWORD status = 0;
hr = mDecoder->ProcessOutput(0, 1, &output, &status);
if (output.pEvents) {
// We must release this, as per the IMFTransform::ProcessOutput()
// MSDN documentation.
output.pEvents->Release();
output.pEvents = nullptr;
}
if (hr == MF_E_TRANSFORM_STREAM_CHANGE) {
// Type change, probably geometric aperture change.
// Reconfigure decoder output type, so that GetOutputMediaType()
} else if (hr == MF_E_TRANSFORM_NEED_MORE_INPUT) {
// Not enough input to produce output.
} else if (!output.pSample) {
return S_OK;
} else }
// Process output
}
}
When we have fed all data to the MFT decoder, we must drain it:
mDecoder->ProcessMessage(MFT_MESSAGE_COMMAND_DRAIN, 0);
Now, one thing with the WMF H264 decoder, is that it will typically not output anything before having been called with over 30 compressed h264 frames regardless of the size of the h264 sliding window. Latency is very high...
I'm encountering an issue that is very troublesome.
With a video made only of keyframes, and which has only 15 frames, each being 2s long, the first frame having a presentation time of non-zero (this stream is from live content, so first frame is typically in epos time)
So without draining the decoder, nothing will come out of the decoder as it hasn't received enough frames.
However, once the decoder is drained, the decoded frame will come out. HOWEVER, the MFT decoder has set all durations to 33.6ms only and the presentation time of the first sample coming out is always 0.
The original duration and presentation time have been lost.
If you provide over 30 frames to the h264 decoder, then both duration and pts are valid...
I haven't yet found a way to get the WMF decoder to output samples with the proper value.
It appears that if you have to drain the decoder before it has output any samples by itself, then it's totally broken...
Has anyone experienced such problems? How did you get around it?
Thank you in advance
Edit: a sample of the video is available on http://people.mozilla.org/~jyavenard/mediatest/fragmented/1301869.mp4
Playing this video with Firefox will causes it to play extremely quickly due to the problems described above.
I'm not sure that your work flow is correct. I think you should do something like this:
do
{
...
hr = mDecoder->ProcessInput(0, sample, 0);
if(FAILED(hr))
break;
...
hr = mDecoder->ProcessOutput(0, 1, &output, &status);
if(FAILED(hr) && hr != MF_E_TRANSFORM_NEED_MORE_INPUT)
break;
}
while(hr == MF_E_TRANSFORM_NEED_MORE_INPUT);
if(SUCCEEDED(hr))
{
// You have a valid decoded frame here
}
The idea is to keep calling ProcessInput/ProcessOuptut while ProcessOutput returns MF_E_TRANSFORM_NEED_MORE_INPUT. MF_E_TRANSFORM_NEED_MORE_INPUT means that decoder needs more input. I think that with this loop you won't need to drain the decoder.