Video Recording Hangs on IMFSinkWriter->Finalize(); - c++

I have an issue when finalizing a video recording into an .mp4 using Media Foundation where the call to IMFSinkWriter->Finalize(); hangs forever. It doesn't always happen, and can happen on almost any machine (seen on Windows server, 7, 8, 10). Flush() is called on the audio and video streams before hand and no new samples are added between Flush and Finalize. Any ideas on what could cause Finalize to hang forever?
Things I've tried:
Logging all HRESULTs to check for any issues (was already checking them before proceeding to the next line of code)
Everything comes back as S_OK, not seeing any issues
Added the IMFSinkWriterCallback on the stream to get callbacks when
the stream process markers (adding markers every 10 samples) and finishes Finalize()
Haven't been able to reproduce since adding this but this would give the best information about what's going on when I get it working.
Searched code samples online to see how others are setting up the
Sink Writer and how Finalize() is used
Didn't find many samples and it looks like my code is similar to the ones that were found
Looked at encoders available and used by each system including version of the encoder dll
Encoders varied between AMD H.264 Hardware MFT Encoder and H264 Encoder MFT on machines that could reproduce the issue. Versions didn't seem to matter and some of the machines were up to date with video drivers.
Here are some code samples without any HRESULT checking (that doubled the amount of code so I took it out)
Building the sink sample:
CComPtr<IMFAttributes> pAttr;
::MFCreateAttributes( &pAttr, 4 );
pAttr->SetGUID( MF_TRANSCODE_CONTAINERTYPE, GetFileContainerType() );
pAttr->SetUINT32( MF_LOW_LATENCY, FALSE ); // Allows better multithreading
pAttr->SetUINT32( MF_SINK_WRITER_DISABLE_THROTTLING, TRUE ); // Does not block
pAttr->SetUINT32( MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS, TRUE );
m_pCallback.Attach( new MFSinkWriterCallback() );
pAttr->SetUnknown( MF_SINK_WRITER_ASYNC_CALLBACK, m_pCallback );
::MFCreateSinkWriterFromURL( m_strFilename.c_str(), NULL, pAttr, &m_pSink );
if ( m_pVideoInputType && m_pVideoOutputType )
{
m_pSink->AddStream( m_pVideoOutputType, &m_dwVideoStreamId );
// Attributes for encoding?
CComPtr<IMFAttributes> pAttrVideo;
// Not sure if these are needed
//::MFCreateAttributes( &pAttrVideo, 5 );
m_pSink->SetInputMediaType( m_dwVideoStreamId, m_pVideoInputType, pAttrVideo );
}
if ( m_pAudioInputType && m_pAudioOutputType )
{
m_pSink->AddStream( m_pAudioOutputType, &m_dwAudioStreamId );
// Attributes for encoding?
CComPtr<IMFAttributes> pAttrAudio;
// Not sure if these are needed
//::MFCreateAttributes( &pAttrAudio, 2 );
//pAttrAudio->SetGUID( MF_MT_SUBTYPE, MFAudioFormat_AAC );
//pAttrAudio->SetUINT32( MF_MT_AUDIO_BITS_PER_SAMPLE, 16 );
m_pSink->SetInputMediaType( m_dwAudioStreamId, m_pAudioInputType, pAttrAudio );
}
m_pSink->BeginWriting();
Stopping the recording sample:
if ( m_dwVideoStreamId != (DWORD)-1 )
{
m_sink->Flush( m_dwVideoStreamId );
}
if ( m_dwAudioStreamId != (DWORD)-1 )
{
m_sink->Flush( m_dwAudioStreamId );
}
m_sink->Finalize();

There is a lot of situation where a Media Foundation application can hang :
Calling MFShutDown/CoUninitialize when using Media Foundation objects.
Using a GUI, and doing bad use of windows message pump in multithreaded application.
Bad use of MTA/STA components.
Bad use of critical section/wait for event function.
Forget to call EndXXX() function when BeginXXX() function are used.
Bad use of Callback function.
Forget to call AddRef when necessary, and releasing the object used by another thread.
A bug in Media Foundation (there are some on Windows Seven).
and so on...
When i say a minimal source code, i mean, isolate the source code that do the encoding process, and provide it to Github if it's too large. It's better if we can compile and try the source code, because deadlock are difficult to find.

Related

How could just loading a dll lead to 100 CPU load in my main application?

I have a perfectly working program which connects to a video camera (an IDS uEye camera) and continuously grabs frames from it and displays them.
However, when loading a specific dll before connecting to the camera, the program runs with 100% CPU load. If I load the dll after connecting to the camera, the program runs fine.
int main()
{
INT nRet = IS_NO_SUCCESS;
// init camera (open next available camera)
m_hCam = (HIDS)0;
// (A) Uncomment this for 100% CPU load:
// HMODULE handle = LoadLibrary(L"myInnocentDll.dll");
// This is the call to the 3rdparty camera vendor's library:
nRet = is_InitCamera(&m_hCam, 0);
// (B) Uncomment this instead of (A) and the CPU load won't change
// HMODULE handle = LoadLibrary(L"myInnocentDll.dll");
if (nRet == IS_SUCCESS)
{
/*
* Please note: I have removed all lines which are not necessary for the exploit.
* Therefore this is NOT a full example of how to properly initialize an IDS camera!
*/
is_GetSensorInfo(m_hCam, &m_sInfo);
GetMaxImageSize(m_hCam, &m_s32ImageWidth, &m_s32ImageHeight);
m_nColorMode = IS_CM_BGR8_PACKED;// IS_CM_BGRA8_PACKED;
m_nBitsPerPixel = 24; // 32;
nRet |= is_SetColorMode(m_hCam, m_nColorMode);
// allocate image memory.
if (is_AllocImageMem(m_hCam, m_s32ImageWidth, m_s32ImageHeight, m_nBitsPerPixel, &m_pcImageMemory, &m_lMemoryId) != IS_SUCCESS)
{
return 1;
}
else
{
is_SetImageMem(m_hCam, m_pcImageMemory, m_lMemoryId);
}
}
else
{
return 1;
}
std::thread([&]() {
while (true) {
is_FreezeVideo(m_hCam, IS_WAIT);
/*
* Usually, the image memory would now be grabbed via is_GetImageMem().
* but as it is not needed for the exploit, I removed it as well
*/
}
}).detach();
cv::waitKey(0);
}
Independently of the actually used camera driver, in what way could loading a dll change the performance of it, occupying 100% of all available CPU cores? When using the Visual Studio Diagnostic Tools, the excess CPU time is attributed to "[External Call] SwitchToThread" and not to the myInnocentDll.
Loading just the dll without the camera initialization does not result in 100% CPU load.
I was first thinking of some static initializers in the myInnocentDll.dll configuring some threading behavior, but I did not find anything pointing in this direction. For which aspects should I look for in the code of myInnocentDll.dll?
After a lot of digging I found the answer and it is both frustratingly simple and frustrating by itself:
It is Microsoft's poor support of OpenMP. When I disabled OpenMP in my project, the camera driver runs just fine.
The reason seems to be that the Microsoft compiler uses OpenMP with busy waiting and there is also the possibility to manually configure OMP_WAIT_POLICY, but as I was not depending on OpenMP anyways, disabling was the easiest solution for me.
https://developercommunity.visualstudio.com/content/problem/589564/how-to-control-omp-wait-policy-for-openmp.html
https://support.microsoft.com/en-us/help/2689322/redistributable-package-fix-high-cpu-usage-when-you-run-a-visual-c-201
I still don't understand why the CPU only went up high when using the camera and not when running the rest of my solution, even though the camera library is pre-built and my disabling/enabling of OpenMP compilation cannot have any effect on it. And I also don't understand why they bothered to make a hotfix for VS2010 but have no real fix as of VS2019, which I am using. But the problem is averted.
You can disable CPU idle state in the IDS camera manager and then the minimum CPU load in the windows energy plans is set to 100%
I think this is worth mentioning here, even you solved your problem already.

Media foundation multiple videos playback results in memory-leak & crash after undefined timeframe

So we are using a stack consisting of c++ media foundation code in order to playback video files. An important requirement is the ability to play these videos in constantly repeating sequences, so every single video slot will periodically change the video it is playing. In our current example we are creating 16 HWNDs to render video into and 16 corresponding player objects. The main application loops over all of them one after another and does the following:
Shutdown the last player
Release the object
CoCreateinstance for a new player
Initialize the player with the (old) HWND
Start Playback
The media player is called "MediaPlayer2", this needs to be built and registered as COM (regsvr32). The main application is to be found in the TAPlayer2 Project. It searches for the CLSID of the player in the registry and instantiates it. As current test file we use a test.mp4 that has to reside on the disk like C:\test.mp4
Now everything goes fine initially. The program loops through the players and the video keeps restarting and playing. The memory footprint is normal and all goes smooth. After a timeframe of anything between 20 minutes and 4 days, all of the sudden things will get weird. At this point it seems as if calls to "InitializeRenderer" by the EVR slow down and eventually don't go through anymore at all. With this, also thread count and memory footprint will start to increase drastically and after a certain amount of time depending on existing RAM all the memory will be exhausted and our application crashes, usually somewhere in the GPU driver or near the EVR DLL.
I am happy to try out any other code examples that propose to solve my issue: displaying multiple video windows at the same time, and looping through them like in a playlist. Needs to be running on Windows 10!
I have been going at this for quite a while now and am pretty hard stuck. I uploaded the above mentioned code example and added the link to this post. This should work out of the box afaik. I can also provide code excerpts in here in the thread if that is preferred.
Any help is appreciated, thanks
Thomas
Link to demo project (VS2015): https://filebin.net/l8gl79jrz6fd02vt
edit: the following code from the end of winmain.cpp is used to restart the players:
do
{
for (int i = 0; i < PLAYER_COUNT; i++)
{
hr = g_pPlayer[i]->Shutdown();
SafeRelease(&g_pPlayer[i]);
hr = CoCreateInstance(CLSID_AvasysPlayer, // CLSID of the coclass
NULL, // no aggregation
CLSCTX_INPROC_SERVER, // the server is in-proc
__uuidof(IAvasysPlayer), // IID of the interface we want
(void**)&g_pPlayer[i]); // address of our interface pointer
hr = g_pPlayer[i]->InitPlayer(hwndPlayers[i]);
hr = g_pPlayer[i]->OpenUrl(L"C:\\test.mp4");
}
} while (true);
Some MediaFoundation interface like
IMFMediaSource
IMFMediaSession
IMFMediaSink
need to be Shutdown before Release them.
At this point it seems as if calls to "InitializeRenderer" by the EVR slow down and eventually don't go through anymore at all.
... usually somewhere in the GPU driver or near the EVR DLL.
a good track to make a precise search in your code.
In your file PlayerTopoBuilder.cpp, at CPlayerTopoBuilder::AddBranchToPartialTopology :
if (bVideo)
{
if (false) {
BREAK_ON_FAIL(hr = CreateMediaSinkActivate(pSD, hVideoWnd, &pSinkActivate));
BREAK_ON_FAIL(hr = AddOutputNode(pTopology, pSinkActivate, 0, &pOutputNode));
}
else {
//// try directly create renderer
BREAK_ON_FAIL(hr = MFCreateVideoRenderer(__uuidof(IMFMediaSink), (void**)&pMediaSink));
CComQIPtr<IMFVideoRenderer> pRenderer = pMediaSink;
BREAK_ON_FAIL(hr = pRenderer->InitializeRenderer(nullptr, nullptr));
CComQIPtr<IMFGetService> getService(pRenderer);
BREAK_ON_FAIL(hr = getService->GetService(MR_VIDEO_RENDER_SERVICE, __uuidof(IMFVideoDisplayControl), (void**)&pVideoDisplayControl));
BREAK_ON_FAIL(hr = pVideoDisplayControl->SetVideoWindow(hVideoWnd));
BREAK_ON_FAIL(hr = pMediaSink->GetStreamSinkByIndex(0, &pStreamSink));
BREAK_ON_FAIL(hr = AddOutputNode(pTopology, 0, &pOutputNode, pStreamSink));
}
}
You create a IMFMediaSink with MFCreateVideoRenderer and pMediaSink. pMediaSink is release because of the use of CComPtr, but never Shutdown.
You must keep a reference on the Media Sink and Shutdown/Release it when the Player Shutdown.
Or you can use a different approach with MFCreateVideoRendererActivate.
IMFMediaSink::Shutdown
If the application creates the media sink, it is responsible for calling Shutdown to avoid memory or resource leaks.
In most applications, however, the application creates an activation object for the media sink, and the Media Session uses that object to create the media sink.
In that case, the Media Session — not the application — shuts down the media sink. (For more information, see Activation Objects.)
I also suggest you to use this kind of code at the end of CPlayer::CloseSession (after release all others objects) :
if(m_pSession != NULL){
hr = m_pSession->Shutdown();
ULONG ulMFObjects = m_pSession->Release();
m_pSession = NULL;
assert(ulMFObjects == 0);
}
For the use of MFCreateVideoRendererActivate, you can look at my MFNodePlayer project :
MFNodePlayer
EDIT
I rewrote your program, but i tried to keep your logic and original source code, like CComPtr/Mutex...
MFMultiVideo
Tell me if this program has memory leaks.
It will depend on your answer, but then we can talk about best practices with MediaFoundation.
Another thought :
Your program uses 1 to 16 IMFMediaSession. On a good computer configuration, you could use only one IMFMediasession, i think (Never try to aggregate 16 MFSource).
Visit :
CustomVideoMixer
to understand the other way to do it.
I think your approach to use 16 IMFMediasession is not the best approach on a modern computer. VuVirt talk about this.
EDIT2
I've updated MFMultiVideo using Work Queues.
I think the problem can be that you call MFStartup/MFShutdown for each players.
Just call MFStartup/MFShutdown once in winmain.cpp for example, like my program does.

OpenAL unqueueing error code, incomplete documentation

I am trying to implement streaming audio and I've run into a problem where OpenAL is giving me an error codes seems impossible given the information in the documentation.
int buffersProcessed = 0;
alGetSourcei(m_Source, AL_BUFFERS_PROCESSED, &buffersProcessed);
PrintALError();
int toAddBufferIndex;
// Remove the first buffer from the queue and move it to
//the end after buffering new data.
if (buffersProcessed > 0)
{
ALuint unqueued;
alSourceUnqueueBuffers(m_Source, 1, &unqueued);
/////////////////////////////////
PrintALError(); // Prints AL_INVALID_OPERATION //
/////////////////////////////////
toAddBufferIndex = firstBufferIndex;
}
According to the documentation [PDF], AL_INVALID_OPERATION means: "There is no current context." This seems like it can't be true because OpenAL has been, and continues to play other audio just fine!
Just to be sure, I called ALCcontext* temp = alcGetCurrentContext( ); here and it returned a valid context.
Is there some other error condition that's possible here that's not mentioned in the docs?
More details: The sound source is playing when this code is being called, but the impression I got from reading the spec is you can safely unqueue processed buffers while the source is playing. PrintALError is just a wrapper for alGetError that prints if there is any error.
I am on a Mac (OS 10.8.3), in case it matters.
So far what I've gathered is that it seems this OpenAL implementation incorrectly throws an error if you unqueue a buffer while the source is playing. The spec says that you should be able to unqueue a buffer that has been marked as processing while the source is playing:
Removal of a given queue entry is not possible unless either the source is stopped (in which case then entire queue is considered processed), or if the queue entry has already been processed (AL_PLAYING or AL_PAUSED source).
On that basis I'm gonna say this is probably a bug in my OpenAL implementation. I'm gonna leave the question open in case someone can give a more concrete answer though.
To handle condition for multiple buffers use a loop.
Following works on iOS and linux :
// UN queue used buffers
ALint buffers_processed = 0;
alGetSourcei(streaming_source, AL_BUFFERS_PROCESSED, & buffers_processed); // get source parameter num used buffs
while (buffers_processed > 0) { // we have a consumed buffer so we need to replenish
ALuint unqueued_buffer;
alSourceUnqueueBuffers(streaming_source, 1, & unqueued_buffer);
available_AL_buffer_array_curr_index--;
available_AL_buffer_array[available_AL_buffer_array_curr_index] = unqueued_buffer;
buffers_processed--;
}

Best DirectShow way to capture image from web cam preview ? SampleGrabber is deprecated

I have developed DirectShow C++ app which successfully previews web cam view into provided window. Now I want to capture image from this live web cam preview. I have used graph manager, ICaptureGraphBuilder2, IMoniker etc. for that.
I have searched and found following options:
WIA & Sample Grabber.
Many recommends using SampleGrabber but as per MS's msdn document SampleGrabber is deprecated and one should not use. And I don't want to use WIA API.
So which is the best DirectShow way to capture image from live web cam preview?
Here is a quote from DxSnap sample from DirectShow.NET library:
Use DirectShow to take snapshots from the Still pin of a capture
device. Note the MS encourages you to use WIA for this, but if you
want to do in with DirectShow and C#, here's how.
Note that this sample will only work with devices that output
uncompressed video as RBG24. This will include most webcams, but
probably zero tv tuners.
This is C# code, but you should get the idea as the interfaces are all the same. And there are other samples on how to use Sample Grabber Filter in C++.
Sample Grabber is deprecated, the headers are removed from a couple of latest SDKs, however the runtime components are all there and are going to be there for a long time, or otherwise a multitude of application would be broken (e.g. Video Chat in browser hosted GMail is using Sample Grabber). So basically Sample Grabber is still an easy way to capture snapshots from a web camera, or if you alternatively prefer to follow the latest MS APIs - you would want to look into Media Foundation (09 Jul 2016 update: new Windows Server installations might need one to add "Media Foundation" and/or "Desktop Experience" features to make Media Foundation API available along with DirectShow, and DirectShow Editing Services, Sample Grabber is a part of which. Default installation does not offer qedit.dll out of the box).
Also in C++ you certainly don't have to use Sample Grabber Filter. You can develop a custom filter using DirectShow BaseClasses to be a custom transformation filter or a custom renderer, which which accept incoming video feed and export the frames from the DirectShow pipeline. Another option is to use Sample Grabber sample source code from one of the older SDKs (which is not exact source for OS Sample Grabber, but it is doing the same thing). The point however that Sample Grabber shipped with Windows is still a good option.
Listed on Microsoft's website is an example of how to capture a frame using IVMRWindowlessControl9::GetCurrentImage ... Here's one way of doing it:
IBaseFilter* vmr9ptr; // I'm assuming that you got this pointer already
IVMRWindowlessControl9* controlPtr = NULL;
vmr9ptr->QueryInterface(IID_IVMRWindowlessControl9, (void**)controlPtr);
assert ( controlPtr != NULL );
// Get the current frame
BYTE* lpDib = NULL;
hr = controlPtr->GetCurrentImage(&lpDib);
// If everything is okay, we can create a BMP
if (SUCCEEDED(hr))
{
BITMAPINFOHEADER* pBMIH = (BITMAPINFOHEADER*) lpDib;
DWORD bufSize = pBMIH->biSizeImage;
// Let's create a bmp
BITMAPFILEHEADER bmpHdr;
BITMAPINFOHEADER bmpInfo;
size_t hdrSize = sizeof(bmpHdr);
size_t infSize = sizeof(bmpInfo);
memset(&bmpHdr, 0, hdrSize);
bmpHdr.bfType = ('M' << 8) | 'B';
bmpHdr.bfOffBits = static_cast<DWORD>(hdrSize + infSize);
bmpHdr.bfSize = bmpHdr.bfOffBits + bufSize;
// Builder the bit map info.
memset(&bmpInfo, 0, infSize);
bmpInfo.biSize = static_cast<DWORD>(infSize);
bmpInfo.biWidth = pBMIH->biWidth;
bmpInfo.biHeight = pBMIH->biHeight;
bmpInfo.biPlanes = pBMIH->biPlanes;
bmpInfo.biBitCount = pBMIH->biBitCount;
// boost::shared_arrays are awesome!
boost::shared_array<BYTE> buf(new BYTE[bmpHdr.bfSize]);//(lpDib);
memcpy(buf.get(), &bmpHdr, hdrSize); // copy the header
memcpy(buf.get() + hdrSize, &bmpInfo, infSize); // now copy the info block
memcpy(buf.get() + bmpHdr.bfOffBits, lpDib, bufSize);
// Do something with your image data ... seriously...
CoTaskMemFree(lpDib);
} // All done!
jeez... so much dis-information. if you're previewing in a directshow graph, then it depends on what you're previewing off of. Capture filters have 1, 2, or 3 pins. If it has 1 pin, it's most likely a "capture" pin (no preview pin). For this, if you want to capture and preview at the same time, you should put in a "Smart Tee" filter, and connect the VMR off of the preview pin, and hook up "something that grabs frames" off the capture pin. since you don't want to fool around with DirectShow's crummy pin start/stop stuff (instead, just simply controlling the entire graph's start/stop state). You don't need to use a SampleGrabber, it's a dead-simple filter and you could write it in a few hours (I should know, I'm the one that wrote it). it's simply a CTransInPlace filter that you can set a forced media type for it to accept, and you can set a callback interface on it to call you back when it receives a sample. It's actually simpler to write a NullRenderer which calls you back when it receives a sample, you could write this quite easily.
If the capture filter has 2 pins, it's most likely a capture pin and a still pin. in this case you still need a Smart Tee connected to the source's capture pin, and need to preview off the smart tee's preview pin, and capture samples off the smart tee's capture pin.
(If you don't know what a SmartTee is, it's a filter that plays allocator tricks and only sends a sample down the preview pin if the capture pin isn't super bogged down. It's job is to provide a path for the VMR to render from which won't botch up the allocators between the capture filter and the filters downstream of the capture filter)
If the capture filter has both capture and preview pins, I think you can figure out what you need to do then.
Anyhow, summary: The SampleGrabber is simply a CTransInPlaceFilter. You could write it as a Null Renderer, too, just make sure to fill out some junk in CheckInputType, and to call your callback back in DoRenderSample.

Finding out when the samplegrabber is ready in DirectShow

I am continuing work on my DirectShow application and am just putting the finishing touches on it. What the program is doing is going through a video file in 1 second intervals and capturing from the samplegrabber the current buffer and processing it before moving on. However, I was noticing that I was getting duplicate images in my tests to which I found out that DirectShow has not incremented through the video in that 1 second interval fast enough. My question is if there is a way to check when DirectShow is ready for me to call the samplegrabber to get the current frame and to process it. At the moment I call sleep for 1 second but there has to be a better method. Thank you in advance for the help.
EDIT
I just tried running a check to see if the video's position is equal to the next position that I would like to grab and process. That decreased the number of duplicate frames but I still see them showing up in chunks.
I always let the DS framework handle the processing rate:
in the main application thread, configure the sample grabber callback and then when the callback is triggered, you get the media sample as well as sample time: at this point you can process the sample if the appropriate interval i.e. 1 second has elapsed.
What do you mean you call sleep for a second and from where (which thread) do you call it?
If you're doing this from inside the callback you are effectively blocking the DirectShow pipeline? Perhaps if you could explain your setup in more detail I could be more helpful.
/// Callback that is triggered per media sample
/// Note this all happens in the DirectShow streaming thread!
STDMETHODIMP SampleCB( double dSampleTime, IMediaSample * pSample )
{
// check timestamp and if one second has elapsed process sample accordingly
// else do nothing
...
// Get byte pointer
BYTE* pbData(NULL);
HRESULT hr = pSample->GetPointer(&pbData);
if (FAILED(hr))
{
return hr;
}
...
}
P.S if you want to process the samples as fast as possible, you can set the sample timestamp to NULL in your callback.
// set time to NULL to allow for fast rendering since the
// the video renderer controls the rendering rate according
// to the timestamps
pSample->SetTime(NULL, NULL);
Try setting your graph timer to NULL. It will allow to:
process the file as fast as possible
will relieve you of the issues you have.
Of course, it won't work if you are rendering the file to screen at the same time.