I have a Directshow File Source Filter which has audio and frame output pins. It is written in C++ based on this tutorial on MSDN. My filter opens the video by using Medialooks MFormats SDK and provides raw data to output pins. Two pins are directly connecting to renderer filters when they are rendered.
The problem occurs when I run the graph, pause the video and seek to the frame number 0. After a call to ChangeStart method in output frame pin, sometimes FillBuffer is called three times and frame 1 is shown on the screen instead of 0. When it is called two times, it shows the correct frame which is the frame 0.
Output pins are inherited from CSourceStream and CSourceSeeking classes. Here is my FillBuffer and ChangeStart methods of the output frame pin;
FillBuffer Method
HRESULT frame_pin::FillBuffer(IMediaSample *sample)
{
CheckPointer(sample, E_POINTER);
BYTE *frame_buffer;
sample->GetPointer(&frame_buffer);
// Check if the downstream filter is changing the format.
CMediaType *mt;
HRESULT hr = sample->GetMediaType(reinterpret_cast<AM_MEDIA_TYPE**>(&mt));
if (hr == S_OK)
{
auto new_width = reinterpret_cast<VIDEOINFOHEADER2*>(mt->pbFormat)->bmiHeader.biWidth;
auto old_witdh = reinterpret_cast<VIDEOINFOHEADER2*>(m_mt.pbFormat)->bmiHeader.biWidth;
if(new_width != old_witdh)
format_changed_ = true;
SetMediaType(mt);
DeleteMediaType(mt);
}
ASSERT(m_mt.formattype == FORMAT_VideoInfo2);
VIDEOINFOHEADER2 *vih = reinterpret_cast<VIDEOINFOHEADER2*>(m_mt.pbFormat);
CComPtr<IMFFrame> mf_frame;
{
CAutoLock lock(&shared_state_);
if (source_time_ >= m_rtStop)
return S_FALSE;
// mf_reader_ is a member external SDK instance which gets the frame data with this function call
hr = mf_reader_->SourceFrameConvertedGetByNumber(&av_props_, frame_number_, -1, &mf_frame, CComBSTR(L""));
if (FAILED(hr))
return hr;
REFERENCE_TIME start, stop = 0;
start = stream_time_;
stop = static_cast<REFERENCE_TIME>(tc_.get_stop_time() / m_dRateSeeking);
sample->SetTime(&start, &stop);
stream_time_ = stop;
source_time_ += (stop - start);
frame_number_++;
}
if (format_changed_)
{
CComPtr<IMFFrame> mf_frame_resized;
mf_frame->MFResize(eMFCC_YUY2, std::abs(vih->bmiHeader.biWidth), std::abs(vih->bmiHeader.biHeight), 0, &mf_frame_resized, CComBSTR(L""), CComBSTR(L""));
mf_frame = mf_frame_resized;
}
MF_FRAME_INFO mf_frame_info;
mf_frame->MFAllGet(&mf_frame_info);
memcpy(frame_buffer, reinterpret_cast<BYTE*>(mf_frame_info.lpVideo), mf_frame_info.cbVideo);
sample->SetActualDataLength(static_cast<long>(mf_frame_info.cbVideo));
sample->SetSyncPoint(TRUE);
sample->SetPreroll(FALSE);
if (discontinuity_)
{
sample->SetDiscontinuity(TRUE);
discontinuity_ = FALSE;
}
return S_OK;
}
ChangeStart Method
HRESULT frame_pin::ChangeStart()
{
{
CAutoLock lock(CSourceSeeking::m_pLock);
tc_.reset();
stream_time_ = 0;
source_time_ = m_rtStart;
frame_number_ = static_cast<int>(m_rtStart / frame_lenght_);
}
update_from_seek();
return S_OK;
}
From the Microsoft DirectShow documentation:
The CSourceSeeking class is an abstract class for implementing
seeking in source filters with one output pin.
CSourceSeeking is not recommended for a filter with more than one
output pin. The main issue is that only one pin should respond to
seeking requests. Typically this requires communication among the pins
and the filter.
And you have two output pins in your source filter.
The CSourceSeeking class can be extended to manage more than one output pin with custom coding. When seek commands come in they'll come through both input pins so you'll need to decide which pin is controlling seeking and ignore seek commands arriving at the other input pin.
I am working on media streaming application using Media Foundation framework. I've used some samples from internet and from Anton Polinger book. Unfortunately after saving streams into mp4 file metadata of file is corrupted. It has incorrect duration (according to time of work of my PC, 30 hours for instance), wrong bitrate. After long struggling I've fixed it for single stream (video or audio) but when i try to record both audio and video this problem returns again. Something is wrong with my topology but i can't understand what and probably there are some experts here?
I get audio and video source, wrap it into IMFCollection, create aggregate source by MFCreateAggregateSource.
I create source nodes for each source in aggregate source:
Com::IMFTopologyNodePtr
TopologyBuilder::CreateSourceNode(Com::IMFStreamDescriptorPtr
streamDescriptor)
{
HRESULT hr = S_OK;
Com::IMFTopologyNodePtr pNode;
// Create the topology node, indicating that it must be a source node.
hr = MFCreateTopologyNode(MF_TOPOLOGY_SOURCESTREAM_NODE, &pNode);
THROW_ON_FAIL(hr, "Unable to create topology node for source");
// Associate the node with the source by passing in a pointer to the media source,
// and indicating that it is the source
hr = pNode->SetUnknown(MF_TOPONODE_SOURCE, _sourceDefinition->GetMediaSource());
THROW_ON_FAIL(hr, "Unable to set source as object for topology node");
// Set the node presentation descriptor attribute of the node by passing
// in a pointer to the presentation descriptor
hr = pNode->SetUnknown(MF_TOPONODE_PRESENTATION_DESCRIPTOR, _sourceDefinition->GetPresentationDescriptor());
THROW_ON_FAIL(hr, "Unable to set MF_TOPONODE_PRESENTATION_DESCRIPTOR to node");
// Set the node stream descriptor attribute by passing in a pointer to the stream
// descriptor
hr = pNode->SetUnknown(MF_TOPONODE_STREAM_DESCRIPTOR, streamDescriptor);
THROW_ON_FAIL(hr, "Unable to set MF_TOPONODE_STREAM_DESCRIPTOR to node");
return pNode;
}
After that i connect each source to transform(H264 encoder and AAC encoder) and to MPEG4FileSink:
void TopologyBuilder::CreateFileSinkOutputNode(PCWSTR filePath)
{
HRESULT hr = S_OK;
DWORD sink_count;
Com::IMFByteStreamPtr byte_stream;
Com::IMFTransformPtr transform;
LPCWSTR lpcwstrFilePath = filePath;
hr = MFCreateFile(
MF_ACCESSMODE_WRITE, MF_OPENMODE_FAIL_IF_NOT_EXIST, MF_FILEFLAGS_NONE,
lpcwstrFilePath, &byte_stream);
THROW_ON_FAIL(hr, L"Unable to create and open file");
// Video stream
Com::IMFMediaTypePtr in_mf_video_media_type = _sourceDefinition->GetCurrentVideoMediaType();
Com::IMFMediaTypePtr out_mf_media_type = CreateMediaType(MFMediaType_Video, MFVideoFormat_H264);
hr = CopyType(in_mf_video_media_type, out_mf_media_type);
THROW_ON_FAIL(hr, L"Unable to copy type parameters");
if (GetSubtype(in_mf_video_media_type) != MEDIASUBTYPE_H264)
{
transform.Attach(CreateAndInitCoderMft(MFT_CATEGORY_VIDEO_ENCODER, out_mf_media_type));
THROW_ON_NULL(transform);
}
if (transform)
{
Com::IMFMediaTypePtr transformMediaType;
hr = transform->GetOutputCurrentType(0, &transformMediaType);
THROW_ON_FAIL(hr, L"Unable to get current output type");
UINT32 pcbBlobSize = 0;
hr = transformMediaType->GetBlobSize(MF_MT_MPEG_SEQUENCE_HEADER, &pcbBlobSize);
THROW_ON_FAIL(hr, L"Unable to get blob size of MF_MT_MPEG_SEQUENCE_HEADER");
std::vector<UINT8> blob(pcbBlobSize);
hr = transformMediaType->GetBlob(MF_MT_MPEG_SEQUENCE_HEADER, &blob.front(), blob.size(), NULL);
THROW_ON_FAIL(hr, L"Unable to get blob MF_MT_MPEG_SEQUENCE_HEADER");
hr = out_mf_media_type->SetBlob(MF_MT_MPEG_SEQUENCE_HEADER, &blob.front(), blob.size());
THROW_ON_FAIL(hr, L"Unable to set blob of MF_MT_MPEG_SEQUENCE_HEADER");
}
// Audio stream
Com::IMFMediaTypePtr out_mf_audio_media_type;
Com::IMFTransformPtr transformAudio;
Com::IMFMediaTypePtr mediaTypeTmp = _sourceDefinition->GetCurrentAudioMediaType();
Com::IMFMediaTypePtr in_mf_audio_media_type;
if (mediaTypeTmp != NULL)
{
std::unique_ptr<MediaTypesFactory> factory(new MediaTypesFactory());
if (!IsMediaTypeSupportedByAacEncoder(mediaTypeTmp))
{
UINT32 channels;
hr = mediaTypeTmp->GetUINT32(MF_MT_AUDIO_NUM_CHANNELS, &channels);
THROW_ON_FAIL(hr, L"Unable to get MF_MT_AUDIO_NUM_CHANNELS fron source media type");
in_mf_audio_media_type = factory->CreatePCM(factory->DEFAULT_SAMPLE_RATE, channels);
}
else
{
in_mf_audio_media_type.Attach(mediaTypeTmp.Detach());
}
out_mf_audio_media_type = factory->CreateAAC(in_mf_audio_media_type, factory->HIGH_ENCODED_BITRATE);
GUID subType = GetSubtype(in_mf_audio_media_type);
if (GetSubtype(in_mf_audio_media_type) != MFAudioFormat_AAC)
{
// add encoder to Aac
transformAudio.Attach(CreateAndInitCoderMft(MFT_CATEGORY_AUDIO_ENCODER, out_mf_audio_media_type));
}
}
Com::IMFMediaSinkPtr pFileSink;
hr = MFCreateMPEG4MediaSink(byte_stream, out_mf_media_type, out_mf_audio_media_type, &pFileSink);
THROW_ON_FAIL(hr, L"Unable to create mpeg4 media sink");
Com::IMFTopologyNodePtr pOutputNodeVideo;
hr = MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pOutputNodeVideo);
THROW_ON_FAIL(hr, L"Unable to create output node");
hr = pFileSink->GetStreamSinkCount(&sink_count);
THROW_ON_FAIL(hr, L"Unable to get stream sink count from mediasink");
if (sink_count == 0)
{
THROW_ON_FAIL(E_UNEXPECTED, L"Sink count should be greater than 0");
}
Com::IMFStreamSinkPtr stream_sink_video;
hr = pFileSink->GetStreamSinkByIndex(0, &stream_sink_video);
THROW_ON_FAIL(hr, L"Unable to get stream sink by index");
hr = pOutputNodeVideo->SetObject(stream_sink_video);
THROW_ON_FAIL(hr, L"Unable to set stream sink as output node object");
hr = _pTopology->AddNode(pOutputNodeVideo);
THROW_ON_FAIL(hr, L"Unable to add file sink output node");
pOutputNodeVideo = AddEncoderIfNeed(_pTopology, transform, in_mf_video_media_type, pOutputNodeVideo);
_outVideoNodes.push_back(pOutputNodeVideo);
Com::IMFTopologyNodePtr pOutputNodeAudio;
if (in_mf_audio_media_type != NULL)
{
hr = MFCreateTopologyNode(MF_TOPOLOGY_OUTPUT_NODE, &pOutputNodeAudio);
THROW_ON_FAIL(hr, L"Unable to create output node");
Com::IMFStreamSinkPtr stream_sink_audio;
hr = pFileSink->GetStreamSinkByIndex(1, &stream_sink_audio);
THROW_ON_FAIL(hr, L"Unable to get stream sink by index");
hr = pOutputNodeAudio->SetObject(stream_sink_audio);
THROW_ON_FAIL(hr, L"Unable to set stream sink as output node object");
hr = _pTopology->AddNode(pOutputNodeAudio);
THROW_ON_FAIL(hr, L"Unable to add file sink output node");
if (transformAudio)
{
Com::IMFTopologyNodePtr outputTransformNodeAudio;
AddTransformNode(_pTopology, transformAudio, pOutputNodeAudio, &outputTransformNodeAudio);
_outAudioNode = outputTransformNodeAudio;
}
else
{
_outAudioNode = pOutputNodeAudio;
}
}
}
When output type is applied on to audio transform, it has 15 attributes instead of 8, including MF_MT_AVG_BITRATE which should be applied to video as i understand. In my case it is 192000 and it is different of MF_MT_AVG_BITRATE on video stream.
My AAC media type is creating by this method:
HRESULT MediaTypesFactory::CopyAudioTypeBasicAttributes(IMFMediaType * in_media_type, IMFMediaType * out_mf_media_type) {
HRESULT hr = S_OK;
static const GUID AUDIO_MAJORTYPE = MFMediaType_Audio;
static const GUID AUDIO_SUBTYPE = MFAudioFormat_PCM;
out_mf_media_type->SetUINT32(MF_MT_AUDIO_BITS_PER_SAMPLE, AUDIO_BITS_PER_SAMPLE);
WAVEFORMATEX *in_wfx;
UINT32 wfx_size;
MFCreateWaveFormatExFromMFMediaType(in_media_type, &in_wfx, &wfx_size);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, in_wfx->nSamplesPerSec);
DEBUG_ON_FAIL(hr);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_NUM_CHANNELS, in_wfx->nChannels);
DEBUG_ON_FAIL(hr);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_AVG_BYTES_PER_SECOND, in_wfx->nAvgBytesPerSec);
DEBUG_ON_FAIL(hr);
hr = out_mf_media_type->SetUINT32(MF_MT_AUDIO_BLOCK_ALIGNMENT, in_wfx->nBlockAlign);
DEBUG_ON_FAIL(hr);
return hr;
}
It would be awesome if somebody can help me or explain where i am wrong.
Thanks.
In my project CaptureManager I faced with similar problem - while I have wrote code for recording live video from many web cams into the one file. After long time research of Media Foundation I found two important facts:
1. live sources - web cams and microphones do not start from 0 - according of specification samples from them should start from 0 time stamp - Live Sources - "The first sample should have a time stamp of zero." - but live sources set current system time.
2. I see from you code that you use Media Session - it is an object with IMFMediaSession interface. I think you create it from MFCreateMediaSession function. This function creates default version of session which is optimized for playing of media from file, which samples starts from 0 by default.
In my view,the main problem is that default Media Session does not check time stamp of media samples from source, because from media file they start from zero or from StartPosition. However, live sources do not start from 0 - they should, or must, but do not.
So, my advise - write class with IMFTransform which will be "Proxy" transform between source and encoder - this "Proxy" transform must fix time stamp of media samples from live source: 1. while it receive first media sample from live source, it save actual time stamp of the first media sample like reference time, and set time stamp of the first media sample to zero, all time stamps the next media samples from this live source must be subtracted by this reference time and set to time stamps of media samples.
Also, check code for calling of IMFFinalizableMediaSink.
Regards.
MP4 metadata might under some conditions be initialized incorrectly (e.g. like this), however in the scenario you described the problem is like to be the payload data and not the way you set up the pipeline in first place.
The decoders and converters are typically passing time stamps of samples through copying them from input to output, so they are not indicating a failure if something is wrong - you still have output that makes sense written into file. The sink might be having issues processing your data if you have sample time issues, very long recordings, overflow bug esp. in case of rates expressed with large numerators/denominators. Important is what sample times the sources produce.
You might want to try to record shorter recordings, also video only and audio only recording that might possibly help in identification of the stream which supplies the data leading to the problem.
Additionally, you might want to inspect the resulting MP4 file atoms/boxes to identify whether the header boxes have incorrect data or data itself is stamped incorrectly, on which track and how exactly (esp. starts okay and then does a weird gaps in the middle).
I have built a mp3 decoder directshow filter for win CE , and I want to measure the performance of the decoder. I found two macros from msdn site, https://msdn.microsoft.com/en-IN/library/ms932254.aspx which is declared in the measure.h header file in base classes.
It is explained in the measure.h file that these macros will expand to nothing unless macro PERF is defined. but once I enable the macro, I get link error
"LNK2019: unresolved external symbol Msr_Start() referneced in
function function "public: virtual long_cdecl
CMP3Decoder::Recieve(Struct IMediaSample
*)"(?Recieve#CMP3Decoder##UAAJPAUIMediaSample###Z)
I tried to dump the symbols in strmbase.lib, but I couldn't find any symbol name Msr_Start in it. also I searched the whole base classes folder source code.
where can I find the definition for these functions?
Or is there any other ways to measure the performance of the filter?
CMP3Decoder::recieve() function is as follows
HRESULT CMP3Decoder::Receive(IMediaSample *pSample)
{
HRESULT hr;
ASSERT(pSample);
if(pSample == NULL || m_MP3DecHandle == NULL)
{
return E_FAIL;
}
ASSERT (m_pOutput != NULL) ;
// Start timing the transform (if PERF is defined)
MSR_START(m_idTransform);
// have the derived class transform the data
hr = MP3StartDecode(pSample);//, pOutSample);
// Stop the clock and log it (if PERF is defined)
MSR_STOP(m_idTransform);
if (FAILED(hr)) {
//DbgLog((LOG_TRACE,1,TEXT("Error from transform")));
} else {
// the Transform() function can return S_FALSE to indicate that the
// sample should not be delivered; we only deliver the sample if it's
// really S_OK (same as NOERROR, of course.)
if (hr == NOERROR) {
//hr = m_pOutput->Deliver(pOutSample);
m_bSampleSkipped = FALSE; // last thing no longer dropped
} else {
// S_FALSE returned from Transform is a PRIVATE agreement
// We should return NOERROR from Receive() in this cause because returning S_FALSE
// from Receive() means that this is the end of the stream and no more data should
// be sent.
if (S_FALSE == hr) {
// Release the sample before calling notify to avoid
// deadlocks if the sample holds a lock on the system
// such as DirectDraw buffers do
//pOutSample->Release();
m_bSampleSkipped = TRUE;
if (!m_bQualityChanged) {
NotifyEvent(EC_QUALITY_CHANGE,0,0);
m_bQualityChanged = TRUE;
}
return NOERROR;
}
}
}
// release the output buffer. If the connected pin still needs it,
// it will have addrefed it itself.
//pOutSample->Release();
return hr;
}
According to MSDN, you need to link to Strmiids.lib.
Or is there any other ways to measure the performance of the filter?
To measure performance of a filter, I typically insert a custom trans-in-place filter before and after the filter to be measured. The trans-in-place filter outputs sample times and current time in high resolution to a log file. You can calculate filter processing time by subtracting the before current times from the after and averaging those, etc. Also, the file-io should only be done after stopping the graph as you don't want to interfere with the measurement itself.
Update:
Dumping the symbols in Strmiids.lib seems to confirm that Msr_xxx functions are not defined in Strmiids.lib. Looks like the MSDN article is incorrect.
I have a Direct Show program that utilizes the EVR. I would like to add another video stream that basically inserts a picture-in-picture box over the main video stream but can't quite figure out how to do it:
// When this is called, the graph is already running with the EVR
// displaying a web cam in stream 0
HRESULT CVideoControl::AddVideoStream(wchar_t* file)
{
HRESULT hr;
CComPtr<IMFMediaSink> sink;
CComPtr<IMFStreamSink> stream;
//hr = pEVR->QueryInterface(__uuidof(IMFMediaSink), (void **) &sink); <- FAILS
hr = MFCreateVideoRenderer(__uuidof(IMFMediaSink), (void **) &sink);
hr = sink->AddStreamSink(1234, NULL, &stream);
CComPtr<IMFGetService> service;
hr = pEVR->QueryInterface(&service);
CComPtr<IMFVideoMixerControl> mixer;
hr = service->GetService(MR_VIDEO_MIXER_SERVICE, IID_PPV_ARGS(&mixer));
MFVideoNormalizedRect rect = { .25, .25, .5, .5 };
hr = mixer->SetStreamOutputRect(1234, &rect);
hr = m_pGraph->RenderFile(file, NULL);
return hr;
}
Everything returns S_OK except the SetStreamOutputRect, which returns "The stream number provided was invalid."
I'm also dubious about the MFCreateVideoRenderer call, as this is a direct show program, not media foundation.
I'm pretty sure I am way oversimplifying this, but can't find much documentation on this. Any suggestions?
https://msdn.microsoft.com/en-us/library/windows/desktop/aa965247(v=vs.85).aspx
In a directshow program you need to create EVR with CoCreateInstance and then use it's IEVRFilterConfig interface as explained in the link above:
The EVR filter starts with one input pin, which corresponds to the reference stream. To add pins for substreams, query the filter for the IEVRFilterConfig interface and call IEVRFilterConfig::SetNumberOfStreams. Call this method before connecting any input pins. Pin 0 is always the reference stream. Connect this pin before any other pins, because the format of the reference stream might limit which substream formats are available.
I currently have this code, iterating over the audio session controls of the default device (not shown):
int sessionCount;
hr = audioSessionEnumerator->GetCount(&sessionCount);
if (FAILED(hr)) {
throw HRESULTException("audioSessionEnumerator->GetCount", hr);
}
IAudioSessionControl *audioSessionControl;
for (int i = 0; i < sessionCount; ++i) {
hr = audioSessionEnumerator->GetSession(i, &audioSessionControl);
if (FAILED(hr)) {
throw HRESULTException("audioSessionEnumerator->GetSession", hr);
}
LPWSTR displayName;
hr = audioSessionControl->GetDisplayName(&displayName);
if (FAILED(hr)) {
throw HRESULTException("audioSessionControl->GetDisplayName", hr);
}
std::wcout << displayName << std::endl;
CoTaskMemFree(displayName);
audioSessionControl->Release();
}
audioSessionEnumerator->Release();
My mixer currently looks like this:
The expected output is:
Steam Client Bootstrapper
melodysheep - The Face of Creation
System Sounds
However, the output seen is:
(blank line)
(blank line)
#%SystemRoot%\System32\AudioSrv.Dll,-202
Which is the same as the output when GetDisplayName is replaced with GetIconPath.
What problem occurs in the code above to cause this issue? If more code must be shown, please inform me.
If you read the remarks for both GetDisplayName and GetIconName in MSDN you'll see that the functions can return NULL if no-one has set them. The GetIconName page also then remarks that the sndvol application (which you've got a screenshot of) will actually look up the icon of the main window if it's NULL, and therefore by induction will lookup the main window title for the display name if it doesn't exist.
You probably want to query for the IAudioSessionControl2 interface which has a GetProcessId method which might return you the client process id. At which point you can use things like this and this to try and extract the values from the main window to be consistent.