I'm building a pipeline in Gstreamer, using a UriDecodeBin. This pipeline is meant to be build and released repeatedly, but I have a hard time to figure out how to properly delete the pipeline.
For now, I'm trying to figure out what to unref and what not to unref to get all resources of a pipeline to be released, with just a UriDecodeBin in the pipeline, but I can't get the full memory to be released somehow.
My sequence of actions is as follows :
Fist, I'm building the pipeline and UriDecodeBin :
_pipeline = (GstPipeline*)gst_pipeline_new("pipeline");
_bus = gst_pipeline_get_bus(_pipeline); //will be manually observed
_decoder = gst_element_factory_make("uridecodebin", "decoder");
gst_bin_add(GST_BIN(_pipeline), _decoder);
std::string uri = std::string("file://some_video_file");
g_object_set(_decoder, "uri", uri.c_str(), nullptr);
g_signal_connect(_decoder, "no-more-pads", G_CALLBACK(GsDecoder::no_more_pads_cb), (gpointer)this);
Then, I wait for the no-more-pads to trigger, and add a blocking probe on each pad :
GstIterator *it = gst_element_iterate_src_pads(_decoder);
GValue padV = G_VALUE_INIT;
while(gst_iterator_next(it, &padV) == GST_ITERATOR_OK){
GstPad* pad = (GstPad*)g_value_get_object(&padV);
GstCaps* caps = gst_pad_get_current_caps(pad);
_decoderAudioSrcs.push_back(pad); //adding pad to a collection
gulong id = gst_pad_add_probe(pad, GST_PAD_PROBE_TYPE_BLOCK_DOWNSTREAM, GsDecoder::pad_blocked_cb, (gpointer)this, NULL); //blocking probe
_decoderAudioProbeIds.push_back(id); //remembering id of the probe, for later
} else if (caps_is_video(caps)){
//same with video but file is guaranteed to have a single video track.
_decoderVideoSrc = pad;
_decoderVideoProbeId = gst_pad_add_probe(pad, GST_PAD_PROBE_TYPE_BLOCK_DOWNSTREAM, GsDecoder::pad_blocked_cb, (gpointer)this, NULL);
} else {
std::cout << "UNKNOWN PAD FOUND" << std::endl;
gst_caps_unref(caps); //releasing caps
At this point I wait that all probes block the pads, and then I unblock the pads :
for (unsigned i = 0; i < _decoderAudioSrcs.size(); i++){
gst_pad_remove_probe(_decoderAudioSrcs[i], _decoderAudioProbeIds[i]);
gst_pad_remove_probe(_decoderVideoSrc, _decoderVideoProbeId);
play(); //setting the pipeline to play
Finally, after a bit of time, I order the pipeline to delete itself, and there I am a bit confused about what shall be deleted or not (both in this pipeline and in general).
How shall the pipeline be deleted ? From what I understood, I have to set pipeline state to Null, and then unref everything (pads, decoder, bus, pipeline) using g_object_unref, however it still leads to pipeline having few refs pointing to it (in my case, 3), and not releasing its resources, would it be memory nor threads.
Is there a way to tell Gstreamer to fully delete a pipeline and everthing related to it, without having to unref everything by hand ?
Especially, I would be interested in a pointer to some documentation, since I couldn't found any on the subject (even if Gstreamer doc mentions unrefing time to time).
GStreamer objects are reference counted. Whenever the reference drops to 0 the object gets deleted. The GStreamer documentation usually tells you for a function about its transfer behavior. E.g.
Tells you the transfer is full and you are supposed to release it when you make no more use it (you do so in your sample code). If you don't do it, it will never be deleted.
So the typical use case is if-i-don't-need-it-anymore-i-release-it. That does not mean that it gets deleted. For example, if you put an element into a pipeline, the pipeline will add a reference to it. It will take care of the reference and release it once the pipeline's reference count drops to 0. But since you added it to the pipeline and your code itself does not need it anymore you would usually just release it.
So you need to the careful about the references. If you miss one tiny small object which may be part of a big pipeline it may result in many objects to not be released as they depend on each other.
I have a function that copies data from one buffer to another, I need to synchronize its execution.
I have such a bad option:
void MainWindow::copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size)
VkCommandBuffer commandBuffer;
vkAllocateCommandBuffers(logicalDevice, &allocInfo, &commandBuffer);
//Start recording
vkBeginCommandBuffer(commandBuffer, &beginInfo);
vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, ©Region);
//Run command buffer
vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);
//Waiting for completion
vkFreeCommandBuffers(logicalDevice, commandPool, 1, &commandBuffer);
This option is bad because if I want to execute the copyBuffer() function several times, then all the buffers will be copied strictly one at a time.
I want to use a fence for each function call so that multiple calls can run in parallel.
So far, I have only such a solution:
void MainWindow::copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size)
VkCommandBuffer commandBuffer;
vkAllocateCommandBuffers(logicalDevice, &allocInfo, &commandBuffer);
//Create fence
VkFenceCreateInfo fenceInfo{};
VkFence executionCompleteFence = VK_NULL_HANDLE;
if (vkCreateFence(logicalDevice, &fenceInfo, VK_NULL_HANDLE, &executionCompleteFence) != VK_SUCCESS) {
throw MakeErrorInfo("Failed to create fence");
//Start recording
vkBeginCommandBuffer(commandBuffer, &beginInfo);
vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, ©Region);
//Run command buffer
vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);
vkWaitForFences(logicalDevice, 1, &executionCompleteFence, VK_TRUE, UINT64_MAX);
vkResetFences(logicalDevice, 1, &executionCompleteFence);
vkFreeCommandBuffers(logicalDevice, commandPool, 1, &commandBuffer);
vkDestroyFence(logicalDevice, executionCompleteFence, VK_NULL_HANDLE);
Which of these options is better?
Is the second option written correctly?
Both functions are bad in the same way. They both block the CPU from doing anything until the transfer is done. And they both could be used to potentially submit multiple CBs to the same queue in the same frame, but with different submit commands.
Neither is desirable if performance is something you care about.
Ultimately, what you need to do is have your copyBuffer function not actually perform the copy. You should have a function which builds a command buffer to do a copy. That CB is then stored in a place to be submitted later with other copying CBs. Or better yet, you can have just one copying CB that each command adds to (the first one called in a frame will create the CB).
At some point in the future, before you've submitted the work that will use this data, you need to submit the transfer operations. And the way this works depends on if you're submitting the transfer operations on the same queue as the work that will consume them or not.
If they're on the same queue, then all you need to do is have an event in a command buffer at the end of your batch that synchronizes the transfer operations with their receivers. If you want to be more clever, each transfer operation can have its own event, which the receiving operations will wait on.
And in same-queue transfers, you also want to make sure that you submit the transfers in the same vkQueueSubmit call as the rest of your work. Or to put it another way, you should never make more than one call to vkQueueSubmit for a particular queue in a particular frame.
If you're dealing with separate queues, then things change. A bit. If timeline semaphores aren't an option, you'll need to submit your transfer work before you submit the receiving operations. This is because the transfer batch will need to signal a semaphore that the receiving operation will wait on. And a binary semaphore cannot be waited on until the operation that signals it has been submitted to a queue.
But otherwise, everything else stays the same. Of course, you don't need events since you're synchronizing by semaphore.
The two functions are semantically identical and do exactly the same blocking behavior.
The second is slightly better. vkQueueWaitIdle is kind of a debug and out-of-hotspot feature. It might incur a hidden second submit to signal the implicit fence.
You don't need to reset fence that you subsequently destroy anyway. And you are creating it presignaled, which is a bug. Also you forgot to pass it to the vkQueueSubmit.
So we are using a stack consisting of c++ media foundation code in order to playback video files. An important requirement is the ability to play these videos in constantly repeating sequences, so every single video slot will periodically change the video it is playing. In our current example we are creating 16 HWNDs to render video into and 16 corresponding player objects. The main application loops over all of them one after another and does the following:
Shutdown the last player
Release the object
CoCreateinstance for a new player
Initialize the player with the (old) HWND
Start Playback
The media player is called "MediaPlayer2", this needs to be built and registered as COM (regsvr32). The main application is to be found in the TAPlayer2 Project. It searches for the CLSID of the player in the registry and instantiates it. As current test file we use a test.mp4 that has to reside on the disk like C:\test.mp4
Now everything goes fine initially. The program loops through the players and the video keeps restarting and playing. The memory footprint is normal and all goes smooth. After a timeframe of anything between 20 minutes and 4 days, all of the sudden things will get weird. At this point it seems as if calls to "InitializeRenderer" by the EVR slow down and eventually don't go through anymore at all. With this, also thread count and memory footprint will start to increase drastically and after a certain amount of time depending on existing RAM all the memory will be exhausted and our application crashes, usually somewhere in the GPU driver or near the EVR DLL.
I am happy to try out any other code examples that propose to solve my issue: displaying multiple video windows at the same time, and looping through them like in a playlist. Needs to be running on Windows 10!
I have been going at this for quite a while now and am pretty hard stuck. I uploaded the above mentioned code example and added the link to this post. This should work out of the box afaik. I can also provide code excerpts in here in the thread if that is preferred.
Any help is appreciated, thanks
Link to demo project (VS2015): https://filebin.net/l8gl79jrz6fd02vt
edit: the following code from the end of winmain.cpp is used to restart the players:
for (int i = 0; i < PLAYER_COUNT; i++)
hr = g_pPlayer[i]->Shutdown();
hr = CoCreateInstance(CLSID_AvasysPlayer, // CLSID of the coclass
NULL, // no aggregation
CLSCTX_INPROC_SERVER, // the server is in-proc
__uuidof(IAvasysPlayer), // IID of the interface we want
(void**)&g_pPlayer[i]); // address of our interface pointer
hr = g_pPlayer[i]->InitPlayer(hwndPlayers[i]);
hr = g_pPlayer[i]->OpenUrl(L"C:\\test.mp4");
} while (true);
Some MediaFoundation interface like
need to be Shutdown before Release them.
At this point it seems as if calls to "InitializeRenderer" by the EVR slow down and eventually don't go through anymore at all.
... usually somewhere in the GPU driver or near the EVR DLL.
a good track to make a precise search in your code.
In your file PlayerTopoBuilder.cpp, at CPlayerTopoBuilder::AddBranchToPartialTopology :
if (bVideo)
if (false) {
BREAK_ON_FAIL(hr = CreateMediaSinkActivate(pSD, hVideoWnd, &pSinkActivate));
BREAK_ON_FAIL(hr = AddOutputNode(pTopology, pSinkActivate, 0, &pOutputNode));
else {
//// try directly create renderer
BREAK_ON_FAIL(hr = MFCreateVideoRenderer(__uuidof(IMFMediaSink), (void**)&pMediaSink));
CComQIPtr<IMFVideoRenderer> pRenderer = pMediaSink;
BREAK_ON_FAIL(hr = pRenderer->InitializeRenderer(nullptr, nullptr));
CComQIPtr<IMFGetService> getService(pRenderer);
BREAK_ON_FAIL(hr = getService->GetService(MR_VIDEO_RENDER_SERVICE, __uuidof(IMFVideoDisplayControl), (void**)&pVideoDisplayControl));
BREAK_ON_FAIL(hr = pVideoDisplayControl->SetVideoWindow(hVideoWnd));
BREAK_ON_FAIL(hr = pMediaSink->GetStreamSinkByIndex(0, &pStreamSink));
BREAK_ON_FAIL(hr = AddOutputNode(pTopology, 0, &pOutputNode, pStreamSink));
You create a IMFMediaSink with MFCreateVideoRenderer and pMediaSink. pMediaSink is release because of the use of CComPtr, but never Shutdown.
You must keep a reference on the Media Sink and Shutdown/Release it when the Player Shutdown.
Or you can use a different approach with MFCreateVideoRendererActivate.
If the application creates the media sink, it is responsible for calling Shutdown to avoid memory or resource leaks.
In most applications, however, the application creates an activation object for the media sink, and the Media Session uses that object to create the media sink.
In that case, the Media Session — not the application — shuts down the media sink. (For more information, see Activation Objects.)
I also suggest you to use this kind of code at the end of CPlayer::CloseSession (after release all others objects) :
if(m_pSession != NULL){
hr = m_pSession->Shutdown();
ULONG ulMFObjects = m_pSession->Release();
m_pSession = NULL;
assert(ulMFObjects == 0);
For the use of MFCreateVideoRendererActivate, you can look at my MFNodePlayer project :
I rewrote your program, but i tried to keep your logic and original source code, like CComPtr/Mutex...
Tell me if this program has memory leaks.
It will depend on your answer, but then we can talk about best practices with MediaFoundation.
Another thought :
Your program uses 1 to 16 IMFMediaSession. On a good computer configuration, you could use only one IMFMediasession, i think (Never try to aggregate 16 MFSource).
Visit :
to understand the other way to do it.
I think your approach to use 16 IMFMediasession is not the best approach on a modern computer. VuVirt talk about this.
I've updated MFMultiVideo using Work Queues.
I think the problem can be that you call MFStartup/MFShutdown for each players.
Just call MFStartup/MFShutdown once in winmain.cpp for example, like my program does.
I would like to know how to check whether a sink pad of an element in gstreamer is getting data or not.
When ever if it is not getting the data i would like to reset or restart the pipeline.
And can anybody tell me how to reset or restart the pipeline? and what happens when restart the pipeline? and how to know about incoming data for a pad?
You may want to break your post into two separate questions. As far as re-starting the pipeline, you can set the state to NULL then back to PLAYING, but I recommend restarting the entire process since so many elements fail to cleanup properly.
To detect if buffers are coming in, you could add an identity element at the desired spot of your pipeline and register a callback on it like so. Then in your main thread verify the update time is within the desired range. Perhaps using g_timeout_add().
void ir_data_received(GstElement* identity, GstBuffer* buf, gpointer user_data) {
my_object *some_useful_pointer = (my_object*)user_data;
//data is coming in, reset the necessary flag
void setup(GstElement * pipeline, my_object *some_useful_pointer) {
GstElement* identity = gst_bin_get_by_name(GST_BIN(pipeline), "identity");
if(identity == NULL) {
//error handling
g_signal_connect(G_OBJECT(identity), "handoff", (GCallback)data_received, some_useful_pointer);
Timeout check every second:
gboolean check_status(gpointer user_data) {
//check if data is coming in and exit system if it's not
g_timeout_add(1000, check_status, some_useful_pointer);
I have a problem with Microsoft's WaveOut API:
edit1: Added Link to sample project:
edit2: removed link, its not representative of the issue
After playing some audio, when I want to terminate a given playback stream, I call the function:
However, even after waveOutClose() is called, sometimes the library will still access memory previously passed to it by waveOutWrite(), causing an invalid memory access.
I then tried to ensure all the buffers are marked as done before freeing the buffer:
if(hWaveOut_ == nullptr)
waveOutReset(hWaveOut_); // infinite-loops, never returns
for(auto it = buffers_.begin(); it != buffers_.end(); ++it)
waveOutUnprepareHeader(hWaveOut_, &it->wavehdr_, sizeof(WAVEHDR));
while( buffers_.empty() == false ) // infinite loops
//Unhandled exception at 0x75629E80 (msvcrt.dll) in app.exe:
// 0xC0000005: Access violation reading location 0xFEEEFEEE.
void PcmPlayback::removeCompletedBuffers()
for(auto it = buffers_.begin(); it != buffers_.end();)
if( it->wavehdr_.dwFlags & WHDR_DONE )
waveOutUnprepareHeader(hWaveOut_, &it->wavehdr_, sizeof(WAVEHDR));
it = buffers_.erase(it);
However, this situation never happens - the buffer never becomes empty. There will be 4-5 blocks remaining with wavehdr_.dwFlags == 18 (I believe this means the blocks are still marked as in playback)
How can I resolve this issue?
# Martin Schlott ("Can you provide the loop where you write the buffer to waveOutWrite?")
Its not quite a loop, instead I have a function that is called whenever I receive an audio packet over the network:
void PcmPlayback::addData(const std::vector<short> &rhs)
// add new data
Buffer & buffer = buffers_.back();
buffer.data_ = rhs;
ZeroMemory(&buffers_.back().wavehdr_, sizeof(WAVEHDR));
buffer.wavehdr_.dwBufferLength = buffer.data_.size() * sizeof(short);
buffer.wavehdr_.lpData = (char *)(buffer.data_.data());
waveOutPrepareHeader(hWaveOut_, &buffer.wavehdr_, sizeof(WAVEHDR)); // prepare block for playback
waveOutWrite(hWaveOut_, &buffer.wavehdr_, sizeof(WAVEHDR));
The described behavior can happen if you do not call
to every buffer you used before you use
The flagfield _dwFlags seems to indicate that the buffers are still enqueued (WHDR_INQUEUE | WHDR_PREPARED) try:
before unprepare buffers.
After analyses your code, I found two problems/bugs which are not related to waveOut (funny, you use C++11 but the oldest media interface). You use a vector as buffer. During some calling operations, the vector is copied! One bug I found is:
typedef std::function<void(std::vector<short>)> CALLBACK_FN;
instead of:
typedef std::function<void(std::vector<short>&)> CALLBACK_FN;
which forces a copy of the vector.
Try to avoid using vectors if you expect to use it mostly as rawbuffer. Better use std::unique_pointer as buffer pointer.
Your callback in the recorder is not monitored by a mutex, nor does it check if a destructor was already called. The destructing happens during the callback (mostly) which leads to an exception.
For your test program, go back and use raw pointer and static callbacks before blaming waveOut. Your code is not bad, but the first bug already shows, that a small bug will lead to unpredictical errors. As you also organize your buffers in a std::array, I would search for bugs there. I guess, you make a unintentional copy of your whole buffer array, unpreparing the wrong buffers.
I did not have the time to dig deeper, but I guess those are the problems.
I managed to find my problem in the end, it was caused by multiple bugs and a deadlock. I will document what happened here so people can learn from this in the future
I was clued in to what was happening when I fixed the bugs in the sample:
call waveInStop() before waveInClose() in ~Recorder.cpp
wait for all buffers to have the WHDR_DONE flag before calling waveOutClose() in ~PcmPlayback.
After doing this, the sample worked fine and did not display the behavior of the WHDR_DONE flag never being marked.
In my main program, that behavior was caused by a deadlock that occurs in the following situation:
I have a vector of objects representing each peer I am streaming audio with
Each Object owns a Playback class
This vector is protected by a mutex
Recorder callback:
send audio packet to each peer.
Remove Peer:
wait for WHDR_DONE flags to be marked
A deadlock occurs when I remove a peer, locking the mutex and the recorder callback tries to acquire a lock too.
Note that this will happen often because the playback buffer is usually (~4 * 20ms) while the recorder has a cadence of 20ms.
In ~PcmPlayback, the buffers will never be marked as WHDR_DONE and any calls to the WaveOut API will never return because the WaveOut API is waiting for the Recorder callback to complete, which is in turn waiting on mutex.lock(), causing a deadlock.
I am trying to implement streaming audio and I've run into a problem where OpenAL is giving me an error codes seems impossible given the information in the documentation.
int buffersProcessed = 0;
alGetSourcei(m_Source, AL_BUFFERS_PROCESSED, &buffersProcessed);
int toAddBufferIndex;
// Remove the first buffer from the queue and move it to
//the end after buffering new data.
if (buffersProcessed > 0)
ALuint unqueued;
alSourceUnqueueBuffers(m_Source, 1, &unqueued);
PrintALError(); // Prints AL_INVALID_OPERATION //
toAddBufferIndex = firstBufferIndex;
According to the documentation [PDF], AL_INVALID_OPERATION means: "There is no current context." This seems like it can't be true because OpenAL has been, and continues to play other audio just fine!
Just to be sure, I called ALCcontext* temp = alcGetCurrentContext( ); here and it returned a valid context.
Is there some other error condition that's possible here that's not mentioned in the docs?
More details: The sound source is playing when this code is being called, but the impression I got from reading the spec is you can safely unqueue processed buffers while the source is playing. PrintALError is just a wrapper for alGetError that prints if there is any error.
I am on a Mac (OS 10.8.3), in case it matters.
So far what I've gathered is that it seems this OpenAL implementation incorrectly throws an error if you unqueue a buffer while the source is playing. The spec says that you should be able to unqueue a buffer that has been marked as processing while the source is playing:
Removal of a given queue entry is not possible unless either the source is stopped (in which case then entire queue is considered processed), or if the queue entry has already been processed (AL_PLAYING or AL_PAUSED source).
On that basis I'm gonna say this is probably a bug in my OpenAL implementation. I'm gonna leave the question open in case someone can give a more concrete answer though.
To handle condition for multiple buffers use a loop.
Following works on iOS and linux :
// UN queue used buffers
ALint buffers_processed = 0;
alGetSourcei(streaming_source, AL_BUFFERS_PROCESSED, & buffers_processed); // get source parameter num used buffs
while (buffers_processed > 0) { // we have a consumed buffer so we need to replenish
ALuint unqueued_buffer;
alSourceUnqueueBuffers(streaming_source, 1, & unqueued_buffer);
available_AL_buffer_array[available_AL_buffer_array_curr_index] = unqueued_buffer;