Windows Audio Endpoint API. Getting the names of my Audio Devices - c++

My main goal at the moment is to get detailed information about all of the local machine's Audio Endpoint Devices. That is the objects representing the audio peripherals. I want to be able to choose which device to record from based on some logic (or eventually allow the user to manually do so).
Here's what I've got so far. I'm pretty new to c++ so dealing with all of these abstract classes is getting a bit tricky so feel free to comment on code quality as well.
//Create vector of IMMDevices
UINT endpointCount = NULL;
(*pCollection).GetCount(&endpointCount);
std::vector<IMMDevice**> IMMDevicePP; //IMMDevice seems to contain all endpoint devices, so why have a collection here?
for (UINT i = 0; i < (endpointCount); i++)
{
IMMDevice* pp = NULL;
(*pCollection).Item(i, &pp);
IMMDevicePP.assign(1, &pp);
}
My more technical goal at present is to get objects that implement this interface: http://msdn.microsoft.com/en-us/library/windows/desktop/dd371414(v=vs.85).aspx
This is a type that is supposed to represent a single Audio Endpoint device whereas the IMMDevice seems to contain a collection of devices. However IMMEndpoint only contains a method called GetDataFlow so I'm unsure if that will help me. Again the goal is to easily select which endpoint device to record and stream audio from.
Any suggestions? Am I using the wrong API? This API definitely has good commands for the actual streaming and sampling of the audio but I'm a bit lost as to how to make sure I'm using the desired device.

WASAPI will allow you to do what you need so you're using the right API. You're mistaken about IMMDevice representing a collection of audio devices though, that is IMMDeviceCollection. IMMDevice represents a single audio device. By "device", WASAPI does't mean audio card as you might expect, rather it means a single input/output on such card. For example an audio card with analog in/out + digital out will show up as 3 IMMDevices each with it's own IMMEndpoint. I'm not sure what detailed info you're after but it seems to me IMMDevice will provide you with everything you need. Basically, you'll want to do something like this:
Create an IMMDeviceEnumerator
Call EnumAudioEndpoints specifying render, capture or both, to enumerate into an IMMDeviceCollection
Obtain individual IMMDevice instances from IMMDeviceCollection
Device name and description can be queried from IMMDevice using OpenPropertyStore (http://msdn.microsoft.com/en-us/library/windows/desktop/dd370812%28v=vs.85%29.aspx). Additional supported device details can be found here: http://msdn.microsoft.com/en-us/library/windows/desktop/dd370794%28v=vs.85%29.aspx.
IMMDevice instances obtained from IMMDeviceCollection will also be instances of IMMEndpoint, use QueryInterface to switch between the two. However, as you noted, this will only tell you if you've got your hands on a render or capture device. Much easier to only ask for what you want directly on EnumAudioEndpoints.
About code quality: use x->f() instead if (*x).f(), although it's technically the same thing the -> operator is the common way to call a function through an object pointer
Don't use vector::assign, apparently that replaces the contents of the entire vector on each call so you'll end up with a collection of size 1 regardless of the number of available devices. Use push_back instead.

After enumerating your IMMDevices as Sjoerd stated it is a must to retrieve the IPropertyStore
information for the device. From there you have to extract the PROPVARIANT object as such:
PROPERTYKEY key;
HRESULT keyResult = (*IMMDeviceProperties[i]).GetAt(p, &key);
then
PROPVARIANT propVari;
HRESULT propVariResult = (*IMMDeviceProperties[i]).GetValue(key, &propVari);
according to these documents:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb761471(v=vs.85).aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/aa380072(v=vs.85).aspx
And finally to navigate the large PROPVARIANT structure in order to get the friendly name of the audio endpoint device simply access the pwszVal member of the PROPVARIANT structure as illustrated here:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd316594(v=vs.85).aspx
All about finding the right documentation!

Related

Media foundation multiple videos playback results in memory-leak & crash after undefined timeframe

So we are using a stack consisting of c++ media foundation code in order to playback video files. An important requirement is the ability to play these videos in constantly repeating sequences, so every single video slot will periodically change the video it is playing. In our current example we are creating 16 HWNDs to render video into and 16 corresponding player objects. The main application loops over all of them one after another and does the following:
Shutdown the last player
Release the object
CoCreateinstance for a new player
Initialize the player with the (old) HWND
Start Playback
The media player is called "MediaPlayer2", this needs to be built and registered as COM (regsvr32). The main application is to be found in the TAPlayer2 Project. It searches for the CLSID of the player in the registry and instantiates it. As current test file we use a test.mp4 that has to reside on the disk like C:\test.mp4
Now everything goes fine initially. The program loops through the players and the video keeps restarting and playing. The memory footprint is normal and all goes smooth. After a timeframe of anything between 20 minutes and 4 days, all of the sudden things will get weird. At this point it seems as if calls to "InitializeRenderer" by the EVR slow down and eventually don't go through anymore at all. With this, also thread count and memory footprint will start to increase drastically and after a certain amount of time depending on existing RAM all the memory will be exhausted and our application crashes, usually somewhere in the GPU driver or near the EVR DLL.
I am happy to try out any other code examples that propose to solve my issue: displaying multiple video windows at the same time, and looping through them like in a playlist. Needs to be running on Windows 10!
I have been going at this for quite a while now and am pretty hard stuck. I uploaded the above mentioned code example and added the link to this post. This should work out of the box afaik. I can also provide code excerpts in here in the thread if that is preferred.
Any help is appreciated, thanks
Thomas
Link to demo project (VS2015): https://filebin.net/l8gl79jrz6fd02vt
edit: the following code from the end of winmain.cpp is used to restart the players:
do
{
for (int i = 0; i < PLAYER_COUNT; i++)
{
hr = g_pPlayer[i]->Shutdown();
SafeRelease(&g_pPlayer[i]);
hr = CoCreateInstance(CLSID_AvasysPlayer, // CLSID of the coclass
NULL, // no aggregation
CLSCTX_INPROC_SERVER, // the server is in-proc
__uuidof(IAvasysPlayer), // IID of the interface we want
(void**)&g_pPlayer[i]); // address of our interface pointer
hr = g_pPlayer[i]->InitPlayer(hwndPlayers[i]);
hr = g_pPlayer[i]->OpenUrl(L"C:\\test.mp4");
}
} while (true);
Some MediaFoundation interface like
IMFMediaSource
IMFMediaSession
IMFMediaSink
need to be Shutdown before Release them.
At this point it seems as if calls to "InitializeRenderer" by the EVR slow down and eventually don't go through anymore at all.
... usually somewhere in the GPU driver or near the EVR DLL.
a good track to make a precise search in your code.
In your file PlayerTopoBuilder.cpp, at CPlayerTopoBuilder::AddBranchToPartialTopology :
if (bVideo)
{
if (false) {
BREAK_ON_FAIL(hr = CreateMediaSinkActivate(pSD, hVideoWnd, &pSinkActivate));
BREAK_ON_FAIL(hr = AddOutputNode(pTopology, pSinkActivate, 0, &pOutputNode));
}
else {
//// try directly create renderer
BREAK_ON_FAIL(hr = MFCreateVideoRenderer(__uuidof(IMFMediaSink), (void**)&pMediaSink));
CComQIPtr<IMFVideoRenderer> pRenderer = pMediaSink;
BREAK_ON_FAIL(hr = pRenderer->InitializeRenderer(nullptr, nullptr));
CComQIPtr<IMFGetService> getService(pRenderer);
BREAK_ON_FAIL(hr = getService->GetService(MR_VIDEO_RENDER_SERVICE, __uuidof(IMFVideoDisplayControl), (void**)&pVideoDisplayControl));
BREAK_ON_FAIL(hr = pVideoDisplayControl->SetVideoWindow(hVideoWnd));
BREAK_ON_FAIL(hr = pMediaSink->GetStreamSinkByIndex(0, &pStreamSink));
BREAK_ON_FAIL(hr = AddOutputNode(pTopology, 0, &pOutputNode, pStreamSink));
}
}
You create a IMFMediaSink with MFCreateVideoRenderer and pMediaSink. pMediaSink is release because of the use of CComPtr, but never Shutdown.
You must keep a reference on the Media Sink and Shutdown/Release it when the Player Shutdown.
Or you can use a different approach with MFCreateVideoRendererActivate.
IMFMediaSink::Shutdown
If the application creates the media sink, it is responsible for calling Shutdown to avoid memory or resource leaks.
In most applications, however, the application creates an activation object for the media sink, and the Media Session uses that object to create the media sink.
In that case, the Media Session — not the application — shuts down the media sink. (For more information, see Activation Objects.)
I also suggest you to use this kind of code at the end of CPlayer::CloseSession (after release all others objects) :
if(m_pSession != NULL){
hr = m_pSession->Shutdown();
ULONG ulMFObjects = m_pSession->Release();
m_pSession = NULL;
assert(ulMFObjects == 0);
}
For the use of MFCreateVideoRendererActivate, you can look at my MFNodePlayer project :
MFNodePlayer
EDIT
I rewrote your program, but i tried to keep your logic and original source code, like CComPtr/Mutex...
MFMultiVideo
Tell me if this program has memory leaks.
It will depend on your answer, but then we can talk about best practices with MediaFoundation.
Another thought :
Your program uses 1 to 16 IMFMediaSession. On a good computer configuration, you could use only one IMFMediasession, i think (Never try to aggregate 16 MFSource).
Visit :
CustomVideoMixer
to understand the other way to do it.
I think your approach to use 16 IMFMediasession is not the best approach on a modern computer. VuVirt talk about this.
EDIT2
I've updated MFMultiVideo using Work Queues.
I think the problem can be that you call MFStartup/MFShutdown for each players.
Just call MFStartup/MFShutdown once in winmain.cpp for example, like my program does.

How to track screens through time? [duplicate]

I have a setup with two regular displays and three projectors connected to a windows pc. In my win32 program I need to uniquely identify each monitor and store information for each such that I can retrieve the stored information even after computer restart.
The EnumDisplayDevices seems to return different device orders after restarting the computer. There is also GetPhysicalMonitorsFromHMONITOR which at least gives me the display's name. However, I need something like a serial number for my projectors, since they are the same model. How can I get such a unique identifier?
EDIT: This is the solution I came up with after reading the answer from user Anders (thanks!):
DISPLAY_DEVICEA dispDevice;
ZeroMemory(&dispDevice, sizeof(dispDevice));
dispDevice.cb = sizeof(dispDevice);
DWORD screenID;
while (EnumDisplayDevicesA(NULL, screenID, &dispDevice, 0))
{
// important: make copy of DeviceName
char name[sizeof(dispDevice.DeviceName)];
strcpy(name, dispDevice.DeviceName);
if (EnumDisplayDevicesA(name, 0, &dispDevice, EDD_GET_DEVICE_INTERFACE_NAME))
{
// at this point dispDevice.DeviceID contains a unique identifier for the monitor
}
++screenID;
}
EnumDisplayDevices with the EDD_GET_DEVICE_INTERFACE_NAME flag should give you a usable string. And if not, you can use this string with the SetupAPI to get the hardware id or driver key or whatever is unique enough for your purpose.
Set this flag to EDD_GET_DEVICE_INTERFACE_NAME (0x00000001) to retrieve the device interface name for GUID_DEVINTERFACE_MONITOR, which is registered by the operating system on a per monitor basis. The value is placed in the DeviceID member of the DISPLAY_DEVICE structure returned in lpDisplayDevice. The resulting device interface name can be used with SetupAPI functions and serves as a link between GDI monitor devices and SetupAPI monitor devices.

Get unique identifier for Windows monitors

I have a setup with two regular displays and three projectors connected to a windows pc. In my win32 program I need to uniquely identify each monitor and store information for each such that I can retrieve the stored information even after computer restart.
The EnumDisplayDevices seems to return different device orders after restarting the computer. There is also GetPhysicalMonitorsFromHMONITOR which at least gives me the display's name. However, I need something like a serial number for my projectors, since they are the same model. How can I get such a unique identifier?
EDIT: This is the solution I came up with after reading the answer from user Anders (thanks!):
DISPLAY_DEVICEA dispDevice;
ZeroMemory(&dispDevice, sizeof(dispDevice));
dispDevice.cb = sizeof(dispDevice);
DWORD screenID;
while (EnumDisplayDevicesA(NULL, screenID, &dispDevice, 0))
{
// important: make copy of DeviceName
char name[sizeof(dispDevice.DeviceName)];
strcpy(name, dispDevice.DeviceName);
if (EnumDisplayDevicesA(name, 0, &dispDevice, EDD_GET_DEVICE_INTERFACE_NAME))
{
// at this point dispDevice.DeviceID contains a unique identifier for the monitor
}
++screenID;
}
EnumDisplayDevices with the EDD_GET_DEVICE_INTERFACE_NAME flag should give you a usable string. And if not, you can use this string with the SetupAPI to get the hardware id or driver key or whatever is unique enough for your purpose.
Set this flag to EDD_GET_DEVICE_INTERFACE_NAME (0x00000001) to retrieve the device interface name for GUID_DEVINTERFACE_MONITOR, which is registered by the operating system on a per monitor basis. The value is placed in the DeviceID member of the DISPLAY_DEVICE structure returned in lpDisplayDevice. The resulting device interface name can be used with SetupAPI functions and serves as a link between GDI monitor devices and SetupAPI monitor devices.

Get encoder name from SinkWriter or ICodecAPI or IMFTransform

I'm using the SinkWriter in order to encode video using media foundation.
After I initialize the SinkWriter, I would like to get the underlying encoder it uses, and print out its name, so I can see what encoder it uses. (In my case, the encoder is most probably the H.264 Video Encoder included in MF).
I can get references to the encoder's ICodecAPI and IMFTransform interface (using pSinkWriter->GetServiceForStream), but I don't know how to get the encoder's friendly name using those interfaces.
Does anyone know how to get the encoder's friendly name from the sinkwriter? Or from its ICodecAPI or IMFTransform interface?
This is by far an effective solution and i am not 100% sure it works, but what could be done is:
1) At start-up enumerate all the codecs that could be used (as i understand in this case H264 encoders) and subscribe to setting change event
MFT_REGISTER_TYPE_INFO TransformationOutput = { MFMediaType_Video, MFVideoFormat_H264 };
DWORD nFlags = MFT_ENUM_FLAG_ALL;
UINT32 nCount = 0;
CLSID* pClsids;
MFTEnum( MFT_CATEGORY_VIDEO_ENCODER, nFlags, NULL, &TransformationOutput, NULL, &pClsids, &nCount);
// Ok here we assume nCount is 1 and we got the MS encoder
ICodecAPI *pMsEncoder;
hr = CoCreateInstance(pClsids[0], NULL, CLSCTX_INPROC_SERVER, __uuidof(ICodecAPI), (void**)&pMsEncoder);
// nCodecIds is supposed to be an array of identifiers to distinguish the sender
hr = pMsEncoder->RegisterForEvent(CODECAPI_AVEncVideoOutputFrameRate, (LONG_PTR)&nCodecIds[0]);
2) Not 100% sure if the frame rate setting is also set when the input media type for the stream is set, but anyhow you can try to set the same property on the ICodecAPI you retrieved from the SinkWriter. Then after getting the event you should be able to identify the codec by comparing lParam1 to the value passed. But still this is very poor since it relies on the fact that all the encoders support the event notification and requires unneeded parameter changing if my hypothesis about the event being generated on stream construction is wrong.
Having IMFTransform you don't have a friendly name of the encoder.
One of the options you have is to check transform output type and compare to well known GUIDs to identify the encoder, in particular you are going to have a subtype of MFVideoFormat_H264 with H264 Encoder MFT.
Another option is to reach CLSID of the encoder (IMFTransform does not get you it, but you might have it otherwise such as via IMFActivate or querying MFT_TRANSFORM_CLSID_Attribute attribute, or via IPersist* interfaces). Then you could look registry up for a friendly name or enumerate transforms and look your one in that list by comparing CLSID.

Best DirectShow way to capture image from web cam preview ? SampleGrabber is deprecated

I have developed DirectShow C++ app which successfully previews web cam view into provided window. Now I want to capture image from this live web cam preview. I have used graph manager, ICaptureGraphBuilder2, IMoniker etc. for that.
I have searched and found following options:
WIA & Sample Grabber.
Many recommends using SampleGrabber but as per MS's msdn document SampleGrabber is deprecated and one should not use. And I don't want to use WIA API.
So which is the best DirectShow way to capture image from live web cam preview?
Here is a quote from DxSnap sample from DirectShow.NET library:
Use DirectShow to take snapshots from the Still pin of a capture
device. Note the MS encourages you to use WIA for this, but if you
want to do in with DirectShow and C#, here's how.
Note that this sample will only work with devices that output
uncompressed video as RBG24. This will include most webcams, but
probably zero tv tuners.
This is C# code, but you should get the idea as the interfaces are all the same. And there are other samples on how to use Sample Grabber Filter in C++.
Sample Grabber is deprecated, the headers are removed from a couple of latest SDKs, however the runtime components are all there and are going to be there for a long time, or otherwise a multitude of application would be broken (e.g. Video Chat in browser hosted GMail is using Sample Grabber). So basically Sample Grabber is still an easy way to capture snapshots from a web camera, or if you alternatively prefer to follow the latest MS APIs - you would want to look into Media Foundation (09 Jul 2016 update: new Windows Server installations might need one to add "Media Foundation" and/or "Desktop Experience" features to make Media Foundation API available along with DirectShow, and DirectShow Editing Services, Sample Grabber is a part of which. Default installation does not offer qedit.dll out of the box).
Also in C++ you certainly don't have to use Sample Grabber Filter. You can develop a custom filter using DirectShow BaseClasses to be a custom transformation filter or a custom renderer, which which accept incoming video feed and export the frames from the DirectShow pipeline. Another option is to use Sample Grabber sample source code from one of the older SDKs (which is not exact source for OS Sample Grabber, but it is doing the same thing). The point however that Sample Grabber shipped with Windows is still a good option.
Listed on Microsoft's website is an example of how to capture a frame using IVMRWindowlessControl9::GetCurrentImage ... Here's one way of doing it:
IBaseFilter* vmr9ptr; // I'm assuming that you got this pointer already
IVMRWindowlessControl9* controlPtr = NULL;
vmr9ptr->QueryInterface(IID_IVMRWindowlessControl9, (void**)controlPtr);
assert ( controlPtr != NULL );
// Get the current frame
BYTE* lpDib = NULL;
hr = controlPtr->GetCurrentImage(&lpDib);
// If everything is okay, we can create a BMP
if (SUCCEEDED(hr))
{
BITMAPINFOHEADER* pBMIH = (BITMAPINFOHEADER*) lpDib;
DWORD bufSize = pBMIH->biSizeImage;
// Let's create a bmp
BITMAPFILEHEADER bmpHdr;
BITMAPINFOHEADER bmpInfo;
size_t hdrSize = sizeof(bmpHdr);
size_t infSize = sizeof(bmpInfo);
memset(&bmpHdr, 0, hdrSize);
bmpHdr.bfType = ('M' << 8) | 'B';
bmpHdr.bfOffBits = static_cast<DWORD>(hdrSize + infSize);
bmpHdr.bfSize = bmpHdr.bfOffBits + bufSize;
// Builder the bit map info.
memset(&bmpInfo, 0, infSize);
bmpInfo.biSize = static_cast<DWORD>(infSize);
bmpInfo.biWidth = pBMIH->biWidth;
bmpInfo.biHeight = pBMIH->biHeight;
bmpInfo.biPlanes = pBMIH->biPlanes;
bmpInfo.biBitCount = pBMIH->biBitCount;
// boost::shared_arrays are awesome!
boost::shared_array<BYTE> buf(new BYTE[bmpHdr.bfSize]);//(lpDib);
memcpy(buf.get(), &bmpHdr, hdrSize); // copy the header
memcpy(buf.get() + hdrSize, &bmpInfo, infSize); // now copy the info block
memcpy(buf.get() + bmpHdr.bfOffBits, lpDib, bufSize);
// Do something with your image data ... seriously...
CoTaskMemFree(lpDib);
} // All done!
jeez... so much dis-information. if you're previewing in a directshow graph, then it depends on what you're previewing off of. Capture filters have 1, 2, or 3 pins. If it has 1 pin, it's most likely a "capture" pin (no preview pin). For this, if you want to capture and preview at the same time, you should put in a "Smart Tee" filter, and connect the VMR off of the preview pin, and hook up "something that grabs frames" off the capture pin. since you don't want to fool around with DirectShow's crummy pin start/stop stuff (instead, just simply controlling the entire graph's start/stop state). You don't need to use a SampleGrabber, it's a dead-simple filter and you could write it in a few hours (I should know, I'm the one that wrote it). it's simply a CTransInPlace filter that you can set a forced media type for it to accept, and you can set a callback interface on it to call you back when it receives a sample. It's actually simpler to write a NullRenderer which calls you back when it receives a sample, you could write this quite easily.
If the capture filter has 2 pins, it's most likely a capture pin and a still pin. in this case you still need a Smart Tee connected to the source's capture pin, and need to preview off the smart tee's preview pin, and capture samples off the smart tee's capture pin.
(If you don't know what a SmartTee is, it's a filter that plays allocator tricks and only sends a sample down the preview pin if the capture pin isn't super bogged down. It's job is to provide a path for the VMR to render from which won't botch up the allocators between the capture filter and the filters downstream of the capture filter)
If the capture filter has both capture and preview pins, I think you can figure out what you need to do then.
Anyhow, summary: The SampleGrabber is simply a CTransInPlaceFilter. You could write it as a Null Renderer, too, just make sure to fill out some junk in CheckInputType, and to call your callback back in DoRenderSample.