Trying to use a MFT in Media Foundation Encoding - c++

The idea is to use a Media Foundation Transform, such as the Video Stabilization MFT while transcoding a video with Media Foundation.
When not using an MFT, the code works fine.
Create IMFSourceReader for the source file - OK
Create IMFSinkWriter for the target file - OK
Add a stream to the writer describing the Video - OK
Add the audio stream - OK
Set input types for video and audio - OK
Loop to read samples and send them to the sink writer, OK.
When using the MFT, these are the facts. To create the MFT (error checking removed):
CComPtr<IMFTransform> trs;
trs.CoCreateInstance(CLSID_CMSVideoDSPMFT);
std::vector<DWORD> iids;
std::vector<DWORD> oods;
DWORD is = 0, os = 0;
hr = trs->GetStreamCount(&is, &os);
iids.resize(is);
oods.resize(os);
hr = trs->GetStreamIDs(is, iids.data(), os, oods.data());
CComPtr<IMFMediaType> ptype;
CComPtr<IMFMediaType> ptype2;
MFCreateMediaType(&ptype);
MFCreateMediaType(&ptype2);
SourceVideoType->CopyAllItems(ptype);
SourceVideoType->CopyAllItems(ptype2);
ptype->SetUINT32(MF_VIDEODSP_MODE, MFVideoDSPMode_Stabilization);
// LogMediaType(ptype);
ptype2->SetUINT32(MF_VIDEODSP_MODE, MFVideoDSPMode_Stabilization);
// LogMediaType(ptype2);
hr = trs->SetInputType(iids[0], ptype, 0);
auto hr2 = trs->SetOutputType(oods[0], ptype2, 0);
if (SUCCEEDED(hr) && SUCCEEDED(hr2))
{
VideoStabilizationMFT = trs;
}
This code works - the MFT is successfully configured. However, in my sample processing loop:
// pSample = sample got from the reader
CComPtr<IMFSample> pSample2;
LONGLONG dur = 0, tim = 0;
pSample->GetSampleDuration(&dur);
pSample->GetSampleTime(&tim);
trs->ProcessInput(0, pSample, 0);
MFT_OUTPUT_STREAM_INFO si = {};
trs->GetOutputStreamInfo(0, &si);
// Create pSample2
MFCreateSample(&pSample2);
CComPtr<IMFMediaBuffer> bb;
MFCreateMemoryBuffer(si.cbSize, &bb);
pSample2->AddBuffer(bb);
DWORD st = 0;
hr = trs->ProcessOutput(0, 1, &db, &st);
This last call fails initially with MF_E_TRANSFORM_NEED_MORE_INPUT, I can understand that the MFT needs more than one sample to achieve stabilizization, so I skip this sample for the writer.
When the call succeeds, I get a sample with no time or duration. Even If I set the time and duration manually, the sink writer fails with E_INVALIDARG.
What do I miss?

With this source code I provide, the sink writer returns S_OK :
VideoStabilizationMFT
if Microsoft is reading this, what are these guids from CLSID_CMSVideoDSPMFT ?
Guid : 44A4AB4B-1D0C-4181-9293-E2F37680672E : VT_UI4 = 4
Guid : 8252D735-8CB3-4A2E-A296-894E7B738059 : VT_R8 = 0.869565
Guid : 9B2DEAFE-37EC-468C-90FF-024E22BD6BC6 : VT_UI4 = 0
Guid : B0052692-FC62-4F21-A1DD-B9DFE1CEB9BF : VT_R8 = 0.050000
Guid : C8DA7888-14AA-43AE-BDF2-BF9CC48E12BE : VT_UI4 = 4
Guid : EF77D08F-7C9C-40F3-9127-96F760903367 : VT_UI4 = 0
Guid : F67575DF-EA5C-46DB-80C4-CEB7EF3A1701 : VT_UI4 = 1
Microsoft, are you serious ?
according to this documentation : https://learn.microsoft.com/en-us/windows/win32/medfound/video-stabilization-mft
On Win10 :
Video Stabilization MFT is MF_SA_D3D11_AWARE, the documentation does not speak about this
Video Stabilization MFT can fallback to software mode, the documentation does not speak about this (see MF_SA_D3D11_AWARE)
Video Stabilization MFT has dynamic format change, the MFT_SUPPORT_DYNAMIC_FORMAT_CHANGE is not present on IMFTransform::GetAttributes
Video Stabilization MFT implements IMFGetService/IMFRealTimeClientEx/IMFShutdown, it is not in the documentation
Video Stabilization MFT only handles MFVideoFormat_NV12, the documentation speak about MEDIASUBTYPE_YUY2
The documentation tells to include Camerauicontrol.h, seriously...
Having said that, this MTF is really good doing stabilization...

This is strange, you set attribute on IMFMediaType, not on Video Stabilization MFT :
ptype->SetUINT32(MF_VIDEODSP_MODE, MFVideoDSPMode_Stabilization);
ptype2->SetUINT32(MF_VIDEODSP_MODE, MFVideoDSPMode_Stabilization);
Should be :
Call IMFTransform::GetAttributes on the video stabilization MFT to get an IMFAttributes pointer.
Call IMFAttributes::SetUINT32 to set the attribute.
MF_VIDEODSP_MODE attribute

Related

Media Foundation Audio/Video capturing to MPEG4FileSink produces incorrect duration

I am working on media streaming application using Media Foundation framework. I've used some samples from internet and from Anton Polinger book. Unfortunately after saving streams into mp4 file metadata of file is corrupted. It has incorrect duration (according to time of work of my PC, 30 hours for instance), wrong bitrate. After long struggling I've fixed it for single stream (video or audio) but when i try to record both audio and video this problem returns again. Something is wrong with my topology but i can't understand what and probably there are some experts here?
I get audio and video source, wrap it into IMFCollection, create aggregate source by MFCreateAggregateSource.
I create source nodes for each source in aggregate source:
Com::IMFTopologyNodePtr
TopologyBuilder::CreateSourceNode(Com::IMFStreamDescriptorPtr
streamDescriptor)
{
HRESULT hr = S_OK;
Com::IMFTopologyNodePtr pNode;
// Create the topology node, indicating that it must be a source node.
hr = MFCreateTopologyNode(MF_TOPOLOGY_SOURCESTREAM_NODE, &pNode);
THROW_ON_FAIL(hr, "Unable to create topology node for source");
// Associate the node with the source by passing in a pointer to the media source,
// and indicating that it is the source
hr = pNode->SetUnknown(MF_TOPONODE_SOURCE, _sourceDefinition->GetMediaSource());
THROW_ON_FAIL(hr, "Unable to set source as object for topology node");
// Set the node presentation descriptor attribute of the node by passing
// in a pointer to the presentation descriptor
hr = pNode->SetUnknown(MF_TOPONODE_PRESENTATION_DESCRIPTOR, _sourceDefinition->GetPresentationDescriptor());
THROW_ON_FAIL(hr, "Unable to set MF_TOPONODE_PRESENTATION_DESCRIPTOR to node");
// Set the node stream descriptor attribute by passing in a pointer to the stream
// descriptor
hr = pNode->SetUnknown(MF_TOPONODE_STREAM_DESCRIPTOR, streamDescriptor);
THROW_ON_FAIL(hr, "Unable to set MF_TOPONODE_STREAM_DESCRIPTOR to node");
return pNode;
}
After that i connect each source to transform(H264 encoder and AAC encoder) and to MPEG4FileSink:
void TopologyBuilder::CreateFileSinkOutputNode(PCWSTR filePath)
{
HRESULT hr = S_OK;
DWORD sink_count;
Com::IMFByteStreamPtr byte_stream;
Com::IMFTransformPtr transform;
LPCWSTR lpcwstrFilePath = filePath;
hr = MFCreateFile(
MF_ACCESSMODE_WRITE, MF_OPENMODE_FAIL_IF_NOT_EXIST, MF_FILEFLAGS_NONE,
lpcwstrFilePath, &byte_stream);
THROW_ON_FAIL(hr, L"Unable to create and open file");
// Video stream
Com::IMFMediaTypePtr in_mf_video_media_type = _sourceDefinition->GetCurrentVideoMediaType();
Com::IMFMediaTypePtr out_mf_media_type = CreateMediaType(MFMediaType_Video, MFVideoFormat_H264);
hr = CopyType(in_mf_video_media_type, out_mf_media_type);
THROW_ON_FAIL(hr, L"Unable to copy type parameters");
if (GetSubtype(in_mf_video_media_type) != MEDIASUBTYPE_H264)
{
transform.Attach(CreateAndInitCoderMft(MFT_CATEGORY_VIDEO_ENCODER, out_mf_media_type));
THROW_ON_NULL(transform);
}
if (transform)
{
Com::IMFMediaTypePtr transformMediaType;
hr = transform->GetOutputCurrentType(0, &transformMediaType);
THROW_ON_FAIL(hr, L"Unable to get current output type");
UINT32 pcbBlobSize = 0;
hr = transformMediaType->GetBlobSize(MF_MT_MPEG_SEQUENCE_HEADER, &pcbBlobSize);
THROW_ON_FAIL(hr, L"Unable to get blob size of MF_MT_MPEG_SEQUENCE_HEADER");
std::vector<UINT8> blob(pcbBlobSize);
hr = transformMediaType->GetBlob(MF_MT_MPEG_SEQUENCE_HEADER, &blob.front(), blob.size(), NULL);
THROW_ON_FAIL(hr, L"Unable to get blob MF_MT_MPEG_SEQUENCE_HEADER");
hr = out_mf_media_type->SetBlob(MF_MT_MPEG_SEQUENCE_HEADER, &blob.front(), blob.size());
THROW_ON_FAIL(hr, L"Unable to set blob of MF_MT_MPEG_SEQUENCE_HEADER");
}
// Audio stream
Com::IMFMediaTypePtr out_mf_audio_media_type;
Com::IMFTransformPtr transformAudio;
Com::IMFMediaTypePtr mediaTypeTmp = _sourceDefinition->GetCurrentAudioMediaType();
Com::IMFMediaTypePtr in_mf_audio_media_type;
if (mediaTypeTmp != NULL)
{
std::unique_ptr<MediaTypesFactory> factory(new MediaTypesFactory());
if (!IsMediaTypeSupportedByAacEncoder(mediaTypeTmp))
{
UINT32 channels;
hr = mediaTypeTmp->GetUINT32(MF_MT_AUDIO_NUM_CHANNELS, &channels);
THROW_ON_FAIL(hr, L"Unable to get MF_MT_AUDIO_NUM_CHANNELS fron source media type");
in_mf_audio_media_type = factory->CreatePCM(factory->DEFAULT_SAMPLE_RATE, channels);
}
else
{
in_mf_audio_media_type.Attach(mediaTypeTmp.Detach());
}
out_mf_audio_media_type = factory->CreateAAC(in_mf_audio_media_type, factory->HIGH_ENCODED_BITRATE);
GUID subType = GetSubtype(in_mf_audio_media_type);
if (GetSubtype(in_mf_audio_media_type) != MFAudioFormat_AAC)
{
// add encoder to Aac
transformAudio.Attach(CreateAndInitCoderMft(MFT_CATEGORY_AUDIO_ENCODER, out_mf_audio_media_type));
}
}
Com::IMFMediaSinkPtr pFileSink;
hr = MFCreateMPEG4MediaSink(byte_stream, out_mf_media_type, out_mf_audio_media_type, &pFileSink);
THROW_ON_FAIL(hr, L"Unable to create mpeg4 media sink");
Com::IMFTopologyNodePtr pOutputNodeVideo;
hr = MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pOutputNodeVideo);
THROW_ON_FAIL(hr, L"Unable to create output node");
hr = pFileSink->GetStreamSinkCount(&sink_count);
THROW_ON_FAIL(hr, L"Unable to get stream sink count from mediasink");
if (sink_count == 0)
{
THROW_ON_FAIL(E_UNEXPECTED, L"Sink count should be greater than 0");
}
Com::IMFStreamSinkPtr stream_sink_video;
hr = pFileSink->GetStreamSinkByIndex(0, &stream_sink_video);
THROW_ON_FAIL(hr, L"Unable to get stream sink by index");
hr = pOutputNodeVideo->SetObject(stream_sink_video);
THROW_ON_FAIL(hr, L"Unable to set stream sink as output node object");
hr = _pTopology->AddNode(pOutputNodeVideo);
THROW_ON_FAIL(hr, L"Unable to add file sink output node");
pOutputNodeVideo = AddEncoderIfNeed(_pTopology, transform, in_mf_video_media_type, pOutputNodeVideo);
_outVideoNodes.push_back(pOutputNodeVideo);
Com::IMFTopologyNodePtr pOutputNodeAudio;
if (in_mf_audio_media_type != NULL)
{
hr = MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pOutputNodeAudio);
THROW_ON_FAIL(hr, L"Unable to create output node");
Com::IMFStreamSinkPtr stream_sink_audio;
hr = pFileSink->GetStreamSinkByIndex(1, &stream_sink_audio);
THROW_ON_FAIL(hr, L"Unable to get stream sink by index");
hr = pOutputNodeAudio->SetObject(stream_sink_audio);
THROW_ON_FAIL(hr, L"Unable to set stream sink as output node object");
hr = _pTopology->AddNode(pOutputNodeAudio);
THROW_ON_FAIL(hr, L"Unable to add file sink output node");
if (transformAudio)
{
Com::IMFTopologyNodePtr outputTransformNodeAudio;
AddTransformNode(_pTopology, transformAudio, pOutputNodeAudio, &outputTransformNodeAudio);
_outAudioNode = outputTransformNodeAudio;
}
else
{
_outAudioNode = pOutputNodeAudio;
}
}
}
When output type is applied on to audio transform, it has 15 attributes instead of 8, including MF_MT_AVG_BITRATE which should be applied to video as i understand. In my case it is 192000 and it is different of MF_MT_AVG_BITRATE on video stream.
My AAC media type is creating by this method:
HRESULT MediaTypesFactory::CopyAudioTypeBasicAttributes(IMFMediaType * in_media_type, IMFMediaType * out_mf_media_type) {
HRESULT hr = S_OK;
static const GUID AUDIO_MAJORTYPE = MFMediaType_Audio;
static const GUID AUDIO_SUBTYPE = MFAudioFormat_PCM;
out_mf_media_type->SetUINT32(MF_MT_AUDIO_BITS_PER_SAMPLE, AUDIO_BITS_PER_SAMPLE);
WAVEFORMATEX *in_wfx;
UINT32 wfx_size;
MFCreateWaveFormatExFromMFMediaType(in_media_type, &in_wfx, &wfx_size);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, in_wfx->nSamplesPerSec);
DEBUG_ON_FAIL(hr);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_NUM_CHANNELS, in_wfx->nChannels);
DEBUG_ON_FAIL(hr);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_AVG_BYTES_PER_SECOND, in_wfx->nAvgBytesPerSec);
DEBUG_ON_FAIL(hr);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_BLOCK_ALIGNMENT, in_wfx->nBlockAlign);
DEBUG_ON_FAIL(hr);
return hr;
}
It would be awesome if somebody can help me or explain where i am wrong.
Thanks.
In my project CaptureManager I faced with similar problem - while I have wrote code for recording live video from many web cams into the one file. After long time research of Media Foundation I found two important facts:
1. live sources - web cams and microphones do not start from 0 - according of specification samples from them should start from 0 time stamp - Live Sources - "The first sample should have a time stamp of zero." - but live sources set current system time.
2. I see from you code that you use Media Session - it is an object with IMFMediaSession interface. I think you create it from MFCreateMediaSession function. This function creates default version of session which is optimized for playing of media from file, which samples starts from 0 by default.
In my view,the main problem is that default Media Session does not check time stamp of media samples from source, because from media file they start from zero or from StartPosition. However, live sources do not start from 0 - they should, or must, but do not.
So, my advise - write class with IMFTransform which will be "Proxy" transform between source and encoder - this "Proxy" transform must fix time stamp of media samples from live source: 1. while it receive first media sample from live source, it save actual time stamp of the first media sample like reference time, and set time stamp of the first media sample to zero, all time stamps the next media samples from this live source must be subtracted by this reference time and set to time stamps of media samples.
Also, check code for calling of IMFFinalizableMediaSink.
Regards.
MP4 metadata might under some conditions be initialized incorrectly (e.g. like this), however in the scenario you described the problem is like to be the payload data and not the way you set up the pipeline in first place.
The decoders and converters are typically passing time stamps of samples through copying them from input to output, so they are not indicating a failure if something is wrong - you still have output that makes sense written into file. The sink might be having issues processing your data if you have sample time issues, very long recordings, overflow bug esp. in case of rates expressed with large numerators/denominators. Important is what sample times the sources produce.
You might want to try to record shorter recordings, also video only and audio only recording that might possibly help in identification of the stream which supplies the data leading to the problem.
Additionally, you might want to inspect the resulting MP4 file atoms/boxes to identify whether the header boxes have incorrect data or data itself is stamped incorrectly, on which track and how exactly (esp. starts okay and then does a weird gaps in the middle).

Windows MFT (Media Foundation Transform) decoder not returning proper sample time or duration

To decode a H264 stream with the Windows Media foundation Transform, the work flow is currently something like this:
IMFSample sample;
sample->SetTime(time_in_ns);
sample->SetDuration(duration_in_ns);
sample->AddBuffer(buffer);
// Feed IMFSample to decoder
mDecoder->ProcessInput(0, sample, 0);
// Get output from decoder.
/* create outputsample that will receive content */ { ... }
MFT_OUTPUT_DATA_BUFFER output = {0};
output.pSample = outputsample;
DWORD status = 0;
HRESULT hr = mDecoder->ProcessOutput(0, 1, &output, &status);
DWORD status = 0;
hr = mDecoder->ProcessOutput(0, 1, &output, &status);
if (output.pEvents) {
// We must release this, as per the IMFTransform::ProcessOutput()
// MSDN documentation.
output.pEvents->Release();
output.pEvents = nullptr;
}
if (hr == MF_E_TRANSFORM_STREAM_CHANGE) {
// Type change, probably geometric aperture change.
// Reconfigure decoder output type, so that GetOutputMediaType()
} else if (hr == MF_E_TRANSFORM_NEED_MORE_INPUT) {
// Not enough input to produce output.
} else if (!output.pSample) {
return S_OK;
} else }
// Process output
}
}
When we have fed all data to the MFT decoder, we must drain it:
mDecoder->ProcessMessage(MFT_MESSAGE_COMMAND_DRAIN, 0);
Now, one thing with the WMF H264 decoder, is that it will typically not output anything before having been called with over 30 compressed h264 frames regardless of the size of the h264 sliding window. Latency is very high...
I'm encountering an issue that is very troublesome.
With a video made only of keyframes, and which has only 15 frames, each being 2s long, the first frame having a presentation time of non-zero (this stream is from live content, so first frame is typically in epos time)
So without draining the decoder, nothing will come out of the decoder as it hasn't received enough frames.
However, once the decoder is drained, the decoded frame will come out. HOWEVER, the MFT decoder has set all durations to 33.6ms only and the presentation time of the first sample coming out is always 0.
The original duration and presentation time have been lost.
If you provide over 30 frames to the h264 decoder, then both duration and pts are valid...
I haven't yet found a way to get the WMF decoder to output samples with the proper value.
It appears that if you have to drain the decoder before it has output any samples by itself, then it's totally broken...
Has anyone experienced such problems? How did you get around it?
Thank you in advance
Edit: a sample of the video is available on http://people.mozilla.org/~jyavenard/mediatest/fragmented/1301869.mp4
Playing this video with Firefox will causes it to play extremely quickly due to the problems described above.
I'm not sure that your work flow is correct. I think you should do something like this:
do
{
...
hr = mDecoder->ProcessInput(0, sample, 0);
if(FAILED(hr))
break;
...
hr = mDecoder->ProcessOutput(0, 1, &output, &status);
if(FAILED(hr) && hr != MF_E_TRANSFORM_NEED_MORE_INPUT)
break;
}
while(hr == MF_E_TRANSFORM_NEED_MORE_INPUT);
if(SUCCEEDED(hr))
{
// You have a valid decoded frame here
}
The idea is to keep calling ProcessInput/ProcessOuptut while ProcessOutput returns MF_E_TRANSFORM_NEED_MORE_INPUT. MF_E_TRANSFORM_NEED_MORE_INPUT means that decoder needs more input. I think that with this loop you won't need to drain the decoder.

Media Foundation onReadSample wrong size of returned sample

I am working on translating a capture library from DirectShow to MediaFoundation. The capture library seemed to work quite well but I face a problem with an integrated webcam on a tablet running Windows 8 32 bit.
When enumerating the capture format (as explained in Media Foundation documentation), I got the following supported format for the camera:
0 : MFVideoFormat_NV12, resolution : 448x252, framerate : 30000x1001
1 : MFVideoFormat_YUY2, resolution : 448x252, framerate : 30000x1001
2 : MFVideoFormat_NV12, resolution : 640x360, framerate : 30000x1001
3 : MFVideoFormat_YUY2, resolution : 640x360, framerate : 30000x1001
4 : MFVideoFormat_NV12, resolution : 640x480, framerate : 30000x1001
5 : MFVideoFormat_YUY2, resolution : 640x480, framerate : 30000x1001
I then set the capture format, in this case the one at index 5, using the following function, as described in the example:
hr = pHandler->SetCurrentMediaType(pType);
This function executed without error. The camera should thus be configured to capture in YUY2 with a resolution of 640*480.
In the onReadSample callback, I should receive a sample with a buffer of size :
640 * 480 * sizeof(unsigned char) * 2 = 614400 //YUY2 is encoded on 2 bytes
However, I got a sample with a buffer of size 169344. Here below is a part of the callback function.
HRESULT SourceReader::OnReadSample(
HRESULT hrStatus,
DWORD dwStreamIndex,
DWORD dwStreamFlags,
LONGLONG llTimeStamp,
IMFSample *pSample // Can be NULL
)
{
EnterCriticalSection(&m_critsec);
if (pSample)
{
DWORD expectedBufferSize = 640*480*1*2; // = 614400 (hard code for the example)
IMFMediaBuffer* buffer = NULL;
hr = pSample->ConvertToContiguousBuffer(&buffer);
if (FAILED(hr))
{
//...
goto done;
}
DWORD byteLength = 0;
BYTE* pixels = NULL;
hr = buffer->Lock(&pixels, NULL, &byteLength);
//byteLength is 169344 instead of 614400
if (byteLength > 0 && byteLength == expectedBufferSize)
{
//do someting with the image, but never comes here because byteLength is wrong
}
//...
Any advice why I get a sample of size 169344 ?
Thanks in advance
Thanks Mgetz for your answer.
I checked the value of MF_MT_INTERLACE_MODE of the media type and it appears that the video stream contains progressive frames. The value of MF_MT_INTERLACE_MODE returns MFVideoInterlace_Progressive.
hr = pHandler->SetCurrentMediaType(m_pType);
if(FAILED(hr)){
//
}
else
{
//get info about interlacing
UINT32 interlaceFormat = MFVideoInterlace_Unknown;
m_pType->GetUINT32(MF_MT_INTERLACE_MODE, &interlaceFormat);
//...
So the video stream is not interlaced. I checked again in the onReadSample the value of MFSampleExtension_Interlaced to see if the sample is interlaced or not and it appears that the sample is interlaced.
if (pSample && m_bCapture)
{
//check if interlaced
UINT32 isSampleInterlaced = 0;
pSample->GetUINT32(MFSampleExtension_Interlaced, &isSampleInterlaced);
if(isSampleInterlaced)
{
//enters here
}
How it is possible that the stream is progressive and that the sample is interlaced? I double checked the value of MF_MT_INTERLACE_MODE in the onReadSample callback as well and it still gives me the value MFT_INPUT_STREAM_WHOLE_SAMPLES.
Concerning your first suggestion, I didn't way to force the flag MFT_INPUT_STREAM_WHOLE_SAMPLES on the input stream.
Thanks in advance
I still face the issue and I am now investigating on the different streams available.
According to the documentation, each media source provides a presentation descriptor from which we can get the streams available. To get the presentation descriptor, we have to call:
HRESULT hr = pSource->CreatePresentationDescriptor(&pPD);
I then request the streams available using the IMFPresentationDescriptor::GetStreamDescriptorCount function:
DWORD nbrStream;
pPD->GetStreamDescriptorCount(&nbrStream);
When requesting this information on the frontal webcam on an ACER tablet running windows 8, I got that three streams are available. I looped over these streams, requested their MediaTypeHandler and checked the MajorType. The three streams have for major type : MFMediaType_Video, so all the streams are video streams. When listing the media type available on the different streams, I got that all the streams support capture at 640x480. (some of the streams have more available media types).
I tested to select each of the different streams and the appropriate format type (the framework did not return any error), but I still do not receive the correct sample in the callback function...
Any advice to progress on the issue?
Finally found the issue: I had to set the media type on the source reader directly, using SourceReader->SetCurrentMediaType(..). That did the trick!
Thanks for your help!
Without knowing what the input media type descriptor is we can largely only speculate, but the most likely answer is you are saying you can handle the stream even though MFT_INPUT_STREAM_WHOLE_SAMPLES is not set on the input stream.
The next most likely cause is interlacing in which case each frame would be complete but not full resolution which you are assuming. Regardless you should verify the ENTIRE media type descriptor before accepting it.
Finally found the issue: I had to set the media type on the source reader directly, using SourceReader->SetCurrentMediaType(..). That did the trick!
Thanks for your help!

Windows Media Player DSP Plugin Format Negotiation

I am writing an audio DSP plugin for Windows Media Player with the plugin acting as a DMO. I am trying to get WMP to send me the audio data in mono 22.050 khz audio. However, no matter what I do the player re-samples all audio to stereo 44.1k data. Even if the file I'm playing is a 22.050khz wave file I still get 44.1 audio in my plugin.
I specify the data my plugin can handle via the GetInputType/GetOutputType functions, but no matter what happens by the time SetInputType/SetOutputType is called the format is back to 44.1k. Does anyone have an idea of what is happening? I tried writing ValidateMediaType to only accept the sample rate I want, but then I just get no data at all. My GetInputType function is below
STDMETHODIMP CWMPIPSpeaker::GetInputType (
DWORD dwInputStreamIndex,
DWORD dwTypeIndex,
DMO_MEDIA_TYPE *pmt)
{
HRESULT hr = S_OK;
if ( 0 != dwInputStreamIndex )
{
return DMO_E_INVALIDSTREAMINDEX ;
}
// only support one preferred type
if ( 0 != dwTypeIndex )
{
return DMO_E_NO_MORE_ITEMS;
}
if ( NULL == pmt )
{
return E_POINTER;
}
hr = MoInitMediaType(pmt, sizeof( WAVEFORMATEX ) );
WAVEFORMATEX* format = ((WAVEFORMATEX*)pmt->pbFormat);
format->nChannels = 1;
format->nSamplesPerSec = 22050;
format->wFormatTag = WAVE_FORMAT_PCM;
format->wBitsPerSample = 16;
format->cbSize = 0;
format->nBlockAlign = (format->nChannels * format->wBitsPerSample) / 8;
format->nAvgBytesPerSec = format->nBlockAlign * format->nSamplesPerSec;
pmt->formattype = FORMAT_WaveFormatEx;
pmt->lSampleSize = format->nBlockAlign;
pmt->bFixedSizeSamples = true;
pmt->majortype = MEDIATYPE_Audio;
pmt->subtype = MEDIASUBTYPE_PCM;
return hr;
}
Well unfortunately it appears the problem isn't me. I'm archiving this here for future reference because of all the trouble this issue caused me. I found a detailed report on the problem on an msdn blog and it appears that in Vista and later you cannot negotiate media types for DMO plugins by design. I can't say I agree with this decision, but I means that I must do the conversion myself if I want to have down-sampled data.
Hopefully this helps anyone else who runs into this "feature".

Filling CMediaType and IMediaSample from AVPacket for h264 video

I have searched and have found almost nothing, so I would really appreciate some help with my question.
I am writting a DirectShow source filter which uses libav to read and send downstream h264 packets from youtube's FLV file. But I can't find appropriate libav structure's fields to implement correctly filter's GetMediType() and FillBuffer(). Some libav fields is null. In consequence h264 decoder crashes in attempt to process received data.
Where am I wrong? In working with libav or with DirectShow interfaces? Maybe h264 requires additional processing when working with libav or I fill reference time incorrectly? Does someone have any links useful for writing DirectShow h264 source filter with libav?
Part of GetMediaType():
VIDEOINFOHEADER *pvi = (VIDEOINFOHEADER*) toMediaType->AllocFormatBuffer(sizeof(VIDEOINFOHEADER));
pvi->AvgTimePerFrame = UNITS_PER_SECOND / m_pFormatContext->streams[m_streamNo]->codec->sample_rate; //sample_rate is 0
pvi->dwBitRate = m_pFormatContext->bit_rate;
pvi->rcSource = videoRect;
pvi->rcTarget = videoRect;
//Bitmap
pvi->bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
pvi->bmiHeader.biWidth = videoRect.right;
pvi->bmiHeader.biHeight = videoRect.bottom;
pvi->bmiHeader.biPlanes = 1;
pvi->bmiHeader.biBitCount = m_pFormatContext->streams[m_streamNo]->codec->bits_per_raw_sample;//or should here be bits_per_coded_sample
pvi->bmiHeader.biCompression = FOURCC_H264;
pvi->bmiHeader.biSizeImage = GetBitmapSize(&pvi->bmiHeader);
Part of FillBuffer():
//Get buffer pointer
BYTE* pBuffer = NULL;
if (pSamp->GetPointer(&pBuffer) < 0)
return S_FALSE;
//Get next packet
AVPacket* pPacket = m_mediaFile.getNextPacket();
if (pPacket->data == NULL)
return S_FALSE;
//Check packet and buffer size
if (pSamp->GetSize() < pPacket->size)
return S_FALSE;
//Copy from packet to sample buffer
memcpy(pBuffer, pPacket->data, pPacket->size);
//Set media sample time
REFERENCE_TIME start = m_mediaFile.timeStampToReferenceTime(pPacket->pts);
REFERENCE_TIME duration = m_mediaFile.timeStampToReferenceTime(pPacket->duration);
REFERENCE_TIME end = start + duration;
pSamp->SetTime(&start, &end);
pSamp->SetMediaTime(&start, &end);
P.S. I've debugged my filter with hax264 decoder and it crashes on call to libav deprecated function img_convert().
Here is the MSDN link you need to build a correct H.264 media type: H.264 Video Types
You have to fill the right fields with the right values.
The AM_MEDIA_TYPE should contain the right MEDIASUBTYPE for h264.
And these are plain wrong :
pvi->bmiHeader.biWidth = videoRect.right;
pvi->bmiHeader.biHeight = videoRect.bottom;
You should use a width/height which is independent of the rcSource/rcTarget, due to the them being indicators, and maybe completely zero if you take them from some other filter.
pvi->bmiHeader.biBitCount = m_pFormatContext->streams[m_streamNo]->codec->bits_per_raw_sample;//or should here be bits_per_coded_sample
This only makes sense if biWidth*biHeight*biBitCount/8 are the true size of the sample. I do not think so ...
pvi->bmiHeader.biCompression = FOURCC_H264;
This must also be passed in the AM_MEDIA_TYPE in the subtype parameter.
pvi->bmiHeader.biSizeImage = GetBitmapSize(&pvi->bmiHeader);
This fails, because the fourcc is unknown to the function and the bitcount is plain wrong for this sample, due to not being a full frame.
You have to take a look at how the data stream is handled by the downstream h264 filter. This seems to be flawed.