Media Foundation: Getting a MediaSink from a SinkWriter - c++

I'm trying to add an MP4 file sink to a Topology. When my MediaSource is already MP4, I use MFCreateMPEG4MediaSink and MF_MPEG4SINK_SPSPPS_PASSTHROUGH. When my MediaSource isn't MP4 (so raw YUV from a webcam), I want to use MFCreateSinkWriterFromURL so that I don't have to figure out MP4 headers and other complex stuff.
According to the MSDN Docs I should be able to use GetServiceForStream to get at the MediaSink, since the input type is different from the output type. However it always returns MF_E_UNSUPPORTED_SERVICE.
How can I get the underlying MediaSink out of a MediaSinkWriter?
Alternatively, how can I easily create a MP4 media sink for an arbitrary topology?
HRESULT CreateVideoFileSink(
IMFStreamDescriptor *pSourceSD, // Pointer to the stream descriptor.
LPCWSTR pFilename, // Name of file to save to.
IMFStreamSink **ppStream) // Receives a pointer to the stream sink.
CComPtr<IMFAttributes> pAttr;
CComPtr<IMFMediaTypeHandler> pHandler;
CComPtr<IMFMediaType> pType;
CComPtr<IMFMediaSink> pSink;
CComPtr<IMFStreamSink> pStream;
CComPtr<IMFSinkWriter> pSinkWriter;
CComPtr<IMFByteStream> pByteStream;
*ppStream = nullptr;
// Get the media type handler for the stream.
// Get the major media type.
GUID guidMajorType;
IFR(MFCreateAttributes(&pAttr, 1));
// Create an output file
if (MFMediaType_Video == guidMajorType)
GUID guidSubType;
IFR(pType->GetGUID(MF_MT_SUBTYPE, &guidSubType));
if (MFVideoFormat_H264 == guidSubType)
// ... use MFCreateMPEG4MediaSink
IFR(MFCreateSinkWriterFromURL(pFilename, nullptr, pAttr, &pSinkWriter));
DWORD streamIdx;
IFR(pSinkWriter->AddStream(pType, &streamIdx));
IFR(pSink->GetStreamSinkByIndex(streamIdx, &pStream));
// Don't use this stream
// Return IMFStreamSink pointer to caller.
*ppStream = pStream.Detach();
return S_OK;

Figured it out right after writing the question - of course. The SinkWriter doesn't have a MediaSink until you call BeginWriting.
IFR(MFCreateSinkWriterFromURL(pFilename, nullptr, pAttr, &pSinkWriter));
DWORD streamIdx;
IFR(pSinkWriter->AddStream(pType, &streamIdx));
IFR(pSinkWriter->BeginWriting()); // <<----
IFR(pSink->GetStreamSinkByIndex(streamIdx, &pStream));
(Make sure you don't let the SinkWriter get Released while you're using the StreamSink)


Create IMFByteStream from byte array

I am trying to adapt a method that originally took a URL from Microsoft's MediaFoundation audio playback sample to instead take a source from a const char* array. Problem is, CreateObjectFromByteStream requires an IMFByteStream, not a const char*. How can I get what I need?
// Create a media source from a byte stream
HRESULT CreateMediaSource(const byte *data, IMFMediaSource **ppSource)
IMFSourceResolver* pSourceResolver = NULL;
IUnknown* pSource = NULL;
// Create the source resolver.
HRESULT hr = MFCreateSourceResolver(&pSourceResolver);
if (FAILED(hr))
goto done;
// Use the source resolver to create the media source.
// Note: For simplicity this sample uses the synchronous method to create
// the media source. However, creating a media source can take a noticeable
// amount of time, especially for a network source. For a more responsive
// UI, use the asynchronous BeginCreateObjectFromURL method.
hr = pSourceResolver->CreateObjectFromByteStream(data,
NULL, // URL of the source.
NULL, // Optional property store.
&ObjectType, // Receives the created object type.
&pSource // Receives a pointer to the media source.
if (FAILED(hr))
goto done;
// Get the IMFMediaSource interface from the media source.
hr = pSource->QueryInterface(IID_PPV_ARGS(ppSource));
return hr;
I found easiest way to do this to just create tempfile and write *data there. Ugly hack, but worked good enough for me and if needed it can easily replaced by custom inmemory IMFByteStream implementation.
So code would be something like this:
Byte data[] = {'a','b','c','d','e','f'};
hr = MFStartup(MF_VERSION);
IMFByteStream *stream = NULL;
hr = MFCreateTempFile(
ULONG wroteBytes = 0;
stream->Write(data, sizeof(data), &wroteBytes);
// make sure that wroteBytes is equal with data length
You can use MFCreateMFByteStreamOnStream() to create an IMFByteStream from an IStream and you can create and IStream from a byte array using SHCreateMemStream(). The documentation at the time of writing is at and
Here's a quick example:
// Generate a byte array
int sample_size = 0x100;
BYTE* sample_bytes = (BYTE*)malloc(sample_size)
// Create the IStream from the byte array
IStream* pstm = SHCreateMemStream(sample_bytes, sample_size);
// Create the IMFByteStream from the IStream
IMFByteStream* pibs = NULL;
MFCreateMFByteStreamOnStream(pstm, &pibs);
// Clean up time
if (pibs)
if (pstm)
if (sample_bytes)
Having an IStream interface but not a byte array interface seems to be a frequent occurrence in the Microsoft API. Thankfully creating an IStream is easy.

Media Foundation Audio/Video capturing to MPEG4FileSink produces incorrect duration

I am working on media streaming application using Media Foundation framework. I've used some samples from internet and from Anton Polinger book. Unfortunately after saving streams into mp4 file metadata of file is corrupted. It has incorrect duration (according to time of work of my PC, 30 hours for instance), wrong bitrate. After long struggling I've fixed it for single stream (video or audio) but when i try to record both audio and video this problem returns again. Something is wrong with my topology but i can't understand what and probably there are some experts here?
I get audio and video source, wrap it into IMFCollection, create aggregate source by MFCreateAggregateSource.
I create source nodes for each source in aggregate source:
Com::IMFTopologyNodePtr pNode;
// Create the topology node, indicating that it must be a source node.
hr = MFCreateTopologyNode(MF_TOPOLOGY_SOURCESTREAM_NODE, &pNode);
THROW_ON_FAIL(hr, "Unable to create topology node for source");
// Associate the node with the source by passing in a pointer to the media source,
// and indicating that it is the source
hr = pNode->SetUnknown(MF_TOPONODE_SOURCE, _sourceDefinition->GetMediaSource());
THROW_ON_FAIL(hr, "Unable to set source as object for topology node");
// Set the node presentation descriptor attribute of the node by passing
// in a pointer to the presentation descriptor
hr = pNode->SetUnknown(MF_TOPONODE_PRESENTATION_DESCRIPTOR, _sourceDefinition->GetPresentationDescriptor());
// Set the node stream descriptor attribute by passing in a pointer to the stream
// descriptor
hr = pNode->SetUnknown(MF_TOPONODE_STREAM_DESCRIPTOR, streamDescriptor);
return pNode;
After that i connect each source to transform(H264 encoder and AAC encoder) and to MPEG4FileSink:
void TopologyBuilder::CreateFileSinkOutputNode(PCWSTR filePath)
DWORD sink_count;
Com::IMFByteStreamPtr byte_stream;
Com::IMFTransformPtr transform;
LPCWSTR lpcwstrFilePath = filePath;
hr = MFCreateFile(
lpcwstrFilePath, &byte_stream);
THROW_ON_FAIL(hr, L"Unable to create and open file");
// Video stream
Com::IMFMediaTypePtr in_mf_video_media_type = _sourceDefinition->GetCurrentVideoMediaType();
Com::IMFMediaTypePtr out_mf_media_type = CreateMediaType(MFMediaType_Video, MFVideoFormat_H264);
hr = CopyType(in_mf_video_media_type, out_mf_media_type);
THROW_ON_FAIL(hr, L"Unable to copy type parameters");
if (GetSubtype(in_mf_video_media_type) != MEDIASUBTYPE_H264)
transform.Attach(CreateAndInitCoderMft(MFT_CATEGORY_VIDEO_ENCODER, out_mf_media_type));
if (transform)
Com::IMFMediaTypePtr transformMediaType;
hr = transform->GetOutputCurrentType(0, &transformMediaType);
THROW_ON_FAIL(hr, L"Unable to get current output type");
UINT32 pcbBlobSize = 0;
hr = transformMediaType->GetBlobSize(MF_MT_MPEG_SEQUENCE_HEADER, &pcbBlobSize);
THROW_ON_FAIL(hr, L"Unable to get blob size of MF_MT_MPEG_SEQUENCE_HEADER");
std::vector<UINT8> blob(pcbBlobSize);
hr = transformMediaType->GetBlob(MF_MT_MPEG_SEQUENCE_HEADER, &blob.front(), blob.size(), NULL);
hr = out_mf_media_type->SetBlob(MF_MT_MPEG_SEQUENCE_HEADER, &blob.front(), blob.size());
THROW_ON_FAIL(hr, L"Unable to set blob of MF_MT_MPEG_SEQUENCE_HEADER");
// Audio stream
Com::IMFMediaTypePtr out_mf_audio_media_type;
Com::IMFTransformPtr transformAudio;
Com::IMFMediaTypePtr mediaTypeTmp = _sourceDefinition->GetCurrentAudioMediaType();
Com::IMFMediaTypePtr in_mf_audio_media_type;
if (mediaTypeTmp != NULL)
std::unique_ptr<MediaTypesFactory> factory(new MediaTypesFactory());
if (!IsMediaTypeSupportedByAacEncoder(mediaTypeTmp))
UINT32 channels;
hr = mediaTypeTmp->GetUINT32(MF_MT_AUDIO_NUM_CHANNELS, &channels);
THROW_ON_FAIL(hr, L"Unable to get MF_MT_AUDIO_NUM_CHANNELS fron source media type");
in_mf_audio_media_type = factory->CreatePCM(factory->DEFAULT_SAMPLE_RATE, channels);
out_mf_audio_media_type = factory->CreateAAC(in_mf_audio_media_type, factory->HIGH_ENCODED_BITRATE);
GUID subType = GetSubtype(in_mf_audio_media_type);
if (GetSubtype(in_mf_audio_media_type) != MFAudioFormat_AAC)
// add encoder to Aac
transformAudio.Attach(CreateAndInitCoderMft(MFT_CATEGORY_AUDIO_ENCODER, out_mf_audio_media_type));
Com::IMFMediaSinkPtr pFileSink;
hr = MFCreateMPEG4MediaSink(byte_stream, out_mf_media_type, out_mf_audio_media_type, &pFileSink);
THROW_ON_FAIL(hr, L"Unable to create mpeg4 media sink");
Com::IMFTopologyNodePtr pOutputNodeVideo;
hr = MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pOutputNodeVideo);
THROW_ON_FAIL(hr, L"Unable to create output node");
hr = pFileSink->GetStreamSinkCount(&sink_count);
THROW_ON_FAIL(hr, L"Unable to get stream sink count from mediasink");
if (sink_count == 0)
THROW_ON_FAIL(E_UNEXPECTED, L"Sink count should be greater than 0");
Com::IMFStreamSinkPtr stream_sink_video;
hr = pFileSink->GetStreamSinkByIndex(0, &stream_sink_video);
THROW_ON_FAIL(hr, L"Unable to get stream sink by index");
hr = pOutputNodeVideo->SetObject(stream_sink_video);
THROW_ON_FAIL(hr, L"Unable to set stream sink as output node object");
hr = _pTopology->AddNode(pOutputNodeVideo);
THROW_ON_FAIL(hr, L"Unable to add file sink output node");
pOutputNodeVideo = AddEncoderIfNeed(_pTopology, transform, in_mf_video_media_type, pOutputNodeVideo);
Com::IMFTopologyNodePtr pOutputNodeAudio;
if (in_mf_audio_media_type != NULL)
hr = MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pOutputNodeAudio);
THROW_ON_FAIL(hr, L"Unable to create output node");
Com::IMFStreamSinkPtr stream_sink_audio;
hr = pFileSink->GetStreamSinkByIndex(1, &stream_sink_audio);
THROW_ON_FAIL(hr, L"Unable to get stream sink by index");
hr = pOutputNodeAudio->SetObject(stream_sink_audio);
THROW_ON_FAIL(hr, L"Unable to set stream sink as output node object");
hr = _pTopology->AddNode(pOutputNodeAudio);
THROW_ON_FAIL(hr, L"Unable to add file sink output node");
if (transformAudio)
Com::IMFTopologyNodePtr outputTransformNodeAudio;
AddTransformNode(_pTopology, transformAudio, pOutputNodeAudio, &outputTransformNodeAudio);
_outAudioNode = outputTransformNodeAudio;
_outAudioNode = pOutputNodeAudio;
When output type is applied on to audio transform, it has 15 attributes instead of 8, including MF_MT_AVG_BITRATE which should be applied to video as i understand. In my case it is 192000 and it is different of MF_MT_AVG_BITRATE on video stream.
My AAC media type is creating by this method:
HRESULT MediaTypesFactory::CopyAudioTypeBasicAttributes(IMFMediaType * in_media_type, IMFMediaType * out_mf_media_type) {
static const GUID AUDIO_MAJORTYPE = MFMediaType_Audio;
static const GUID AUDIO_SUBTYPE = MFAudioFormat_PCM;
UINT32 wfx_size;
MFCreateWaveFormatExFromMFMediaType(in_media_type, &in_wfx, &wfx_size);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, in_wfx->nSamplesPerSec);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_NUM_CHANNELS, in_wfx->nChannels);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_AVG_BYTES_PER_SECOND, in_wfx->nAvgBytesPerSec);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_BLOCK_ALIGNMENT, in_wfx->nBlockAlign);
return hr;
It would be awesome if somebody can help me or explain where i am wrong.
In my project CaptureManager I faced with similar problem - while I have wrote code for recording live video from many web cams into the one file. After long time research of Media Foundation I found two important facts:
1. live sources - web cams and microphones do not start from 0 - according of specification samples from them should start from 0 time stamp - Live Sources - "The first sample should have a time stamp of zero." - but live sources set current system time.
2. I see from you code that you use Media Session - it is an object with IMFMediaSession interface. I think you create it from MFCreateMediaSession function. This function creates default version of session which is optimized for playing of media from file, which samples starts from 0 by default.
In my view,the main problem is that default Media Session does not check time stamp of media samples from source, because from media file they start from zero or from StartPosition. However, live sources do not start from 0 - they should, or must, but do not.
So, my advise - write class with IMFTransform which will be "Proxy" transform between source and encoder - this "Proxy" transform must fix time stamp of media samples from live source: 1. while it receive first media sample from live source, it save actual time stamp of the first media sample like reference time, and set time stamp of the first media sample to zero, all time stamps the next media samples from this live source must be subtracted by this reference time and set to time stamps of media samples.
Also, check code for calling of IMFFinalizableMediaSink.
MP4 metadata might under some conditions be initialized incorrectly (e.g. like this), however in the scenario you described the problem is like to be the payload data and not the way you set up the pipeline in first place.
The decoders and converters are typically passing time stamps of samples through copying them from input to output, so they are not indicating a failure if something is wrong - you still have output that makes sense written into file. The sink might be having issues processing your data if you have sample time issues, very long recordings, overflow bug esp. in case of rates expressed with large numerators/denominators. Important is what sample times the sources produce.
You might want to try to record shorter recordings, also video only and audio only recording that might possibly help in identification of the stream which supplies the data leading to the problem.
Additionally, you might want to inspect the resulting MP4 file atoms/boxes to identify whether the header boxes have incorrect data or data itself is stamped incorrectly, on which track and how exactly (esp. starts okay and then does a weird gaps in the middle).

Media Foundation set video interlacing and decode

I have an MOV file and I want to decode it and have all frames as separate images.
So I try to configure an uncompressed media type in the following way:
// configure the source reader
IMFSourceReader* m_pReader;
MFCreateSourceReaderFromURL(filePath, NULL, &m_pReader);
// get the compressed media type
IMFMediaType* pFileVideoMediaType;
m_pReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, &pFileVideoMediaType);
// create new media type for uncompressed type
IMFMediaType* pTypeUncomp;
// copy all settings from compressed to uncompressed type
// set the uncompressed video attributes
pTypeUncomp->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB8);
pTypeUncomp->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive);
// set the new uncompressed type to source reader
m_pReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, 0, pTypeUncomp);
// get the full uncompressed media type
m_pReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, &pTypeUncomp);
I noticed that even I explicitly set the MF_MT_INTERLACE_MODE to MFVideoInterlace_Progressive the final configuration is still configured with the old mode MFVideoInterlace_MixedInterlaceOrProgressive.
Afterwards, I loop through all samples and look at their size:
IMFSample* videoSample = nullptr;
IMFMediaBuffer* mbuffer = nullptr;
LONGLONG llTimeStamp;
DWORD streamIndex, flags;
0, // Flags.
&streamIndex, // Receives the actual stream index.
&flags, // Receives status flags.
&llTimeStamp, // Receives the time stamp.
&videoSample) // Receives the sample or NULL.
BYTE* videoData = nullptr;
DWORD sampleBufferLength = 0;
mbuffer->Lock(&videoData, nullptr, &sampleBufferLength);
cout << sampleBufferLength << endl;
And I get quite different sizes for the samples: from 31bytes to 18000bytes.
Even changing the format to MFVideoFormat_RGB32 does not change affect the sample sizes.
This question seems to have the same issue but the solution is not fixing it.
Any help on why I can't change the interlacing and how to properly decode video frames and get image data out of samples?
Many thanks in advance.
In order to make SourceReader convert the samples to RGB you need to create it like this:
IMFAttributes* pAttr = NULL;
MFCreateAttributes(&pAttr, 1);
IMFSourceReader* m_pReader;
throwIfFailed(MFCreateSourceReaderFromURL(filePath, pAttr, &m_pReader), Can't create source reader from url");
Later, you shouldn't break from the cycle when MF_SOURCE_READERF_CURRENTMEDIATYPECHANGED occurs. Now you'll have all samples with the same size.
Otherwise you can use MFVideoFormat_NV12 subtype and then you won't need to specify MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING attribute when creating the reader.

Add Enhanced Video Renderer Stream

I have a Direct Show program that utilizes the EVR. I would like to add another video stream that basically inserts a picture-in-picture box over the main video stream but can't quite figure out how to do it:
// When this is called, the graph is already running with the EVR
// displaying a web cam in stream 0
HRESULT CVideoControl::AddVideoStream(wchar_t* file)
CComPtr<IMFMediaSink> sink;
CComPtr<IMFStreamSink> stream;
//hr = pEVR->QueryInterface(__uuidof(IMFMediaSink), (void **) &sink); <- FAILS
hr = MFCreateVideoRenderer(__uuidof(IMFMediaSink), (void **) &sink);
hr = sink->AddStreamSink(1234, NULL, &stream);
CComPtr<IMFGetService> service;
hr = pEVR->QueryInterface(&service);
CComPtr<IMFVideoMixerControl> mixer;
hr = service->GetService(MR_VIDEO_MIXER_SERVICE, IID_PPV_ARGS(&mixer));
MFVideoNormalizedRect rect = { .25, .25, .5, .5 };
hr = mixer->SetStreamOutputRect(1234, &rect);
hr = m_pGraph->RenderFile(file, NULL);
return hr;
Everything returns S_OK except the SetStreamOutputRect, which returns "The stream number provided was invalid."
I'm also dubious about the MFCreateVideoRenderer call, as this is a direct show program, not media foundation.
I'm pretty sure I am way oversimplifying this, but can't find much documentation on this. Any suggestions?
In a directshow program you need to create EVR with CoCreateInstance and then use it's IEVRFilterConfig interface as explained in the link above:
The EVR filter starts with one input pin, which corresponds to the reference stream. To add pins for substreams, query the filter for the IEVRFilterConfig interface and call IEVRFilterConfig::SetNumberOfStreams. Call this method before connecting any input pins. Pin 0 is always the reference stream. Connect this pin before any other pins, because the format of the reference stream might limit which substream formats are available.

IWICBitmapDecoder::Initialize() failing

I have a byte stream pBitmap, And i need to create a decoder from it. so I tried as follows
IWICStream *piStream = NULL;
IWICBitmapDecoder *piDecoder = NULL;
//piFactory is my IWICImagingFactory
hr = piFactory->CreateStream(&piStream);
//lRawSize is bufferSize
//pBitmap is my byte buffer
hr = piStream ->InitializeFromMemory(pBitmap, lRawSize);
hr = piFactory->CreateDecoder(GUID_ContainerFormatJpeg,NULL,&piDecoder);
//HERE i got the error.
hr = piDecoder->Initialize(piStream, WICDecodeMetadataCacheOnDemand);
hr returns component not found.
What could be the problem here.
I was not sure whether the bitmap source im intend to decode is jpg or not. so i can understand that pass container format as "GUID_ContainerFormatJpeg" is not right.
so i tried IWICImagingFactory::CreateDecoderFromStream
hr = piFactory->CreateDecoderFromStream(
But the result was same.
and i initiate the stream from a file. which isworked fine.
hr = piStream ->InitializeFromFilename(L"C..\\test.jpg",GENERIC_READ);
So the problem should be in the initiating the stream.
I created a encoder and do some stuf and save them in to a file using writepixel(without creating a decoder)
hr = piBitmapFrame->WritePixels(
and it saves a fine image. so icould say that pBitmap surely contains image data.
What could be the problem here.
The cause of an error is in using pointers to different objects. piStreamTemp was initialized from bitmap array, but piDecoder Initialized using piStream which is empty and was not properly initialized.
In addition, here is a recommendation to avoid using method InitializeFromMemory and workaround for this has described.