Sound from mic vs sound from speaker - c++

I want to capture audio from both the mic and the speaker - separately. How can I distinguish between them? I can capture one or the other using the Wave API, e.g., WaveInOpen().
When I enumerate the devices using waveInGetNumDevs() and waveInGetDevCaps()/waveoutGetDevCaps(), there seems to be no information related to a particular end-point device (e.g., mic or speaker). I only see the following, which are adapter devices:
HD Read Audio Input
HD Read Audio Output
Webcam ...

I've actually no knowledge of the windows API so my answer isn't probably the best and there maybe even better ways.
HRESULT hr = CoInitialize(NULL);
IMMDeviceEnumerator *pEnum = NULL;
hr = CoCreateInstance(__uuidof(MMDeviceEnumerator), NULL, CLSCTX_ALL, __uuidof(IMMDeviceEnumerator), (void**)&pEnum);
if(SUCCEEDED(hr))
{
IMMDeviceCollection *pDevices;
// Enumerate the output devices.
hr = pEnum->EnumAudioEndpoints(eAll, DEVICE_STATE_ACTIVE, &pDevices);
// You can choose between eAll, eCapture or eRender
}
With that you'd be able to distinguish between input (capture) and output (render).
(That's what you wanted right?)
The code is taken from this article. You may look at it for the correct API calls and libraries, it even might give you some more information.
Hope that's helpfull.

Related

UWP, Media Foundation, choosing specific encoder

I would like to choose a specific encoder in Media Foundation under UWP using c++/cx. Currently I use a SinkWriter and let the system choose a default encoder.
This code returns "class not registered" error under UWP, but it works in a win32 console app:
CoInitializeEx(NULL, COINIT_APARTMENTTHREADED | COINIT_DISABLE_OLE1DDE);
MFStartup(MF_VERSION);
IMFTransform* mtf;
CLSID id;
CLSIDFromString(L"{966F107C-8EA2-425D-B822-E4A71BEF01D7}", &id); // "NVIDIA HEVC Encoder MFT"
//CLSIDFromString(L"{F2F84074-8BCA-40BD-9159-E880F673DD3B}", &id); // "H265 Encoder MFT"
//CLSIDFromString(L"{BC10864D-2B34-408F-912A-102B1B867B6C}", &id); // "IntelĀ« Hardware H265 Encoder MFT"
//HRESULT hr = CoCreateInstance(id, nullptr, CLSCTX_INPROC_SERVER, IID_IMFTransform, (void **)&mtf);
HRESULT hr = CoCreateInstance(id, nullptr, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&mtf));
I also noticed that MFTEnumEx() is not definded in the header files under UWP, so I can't enumerate the encoders.
I noticed there is C# documentation allowing something like this:
auto codecQuery = ref new Windows::Media::Core::CodecQuery();
But it seems it not available when using c++/cx.
I would also like to ask the SinkWriter what encoder it actually chose, but this code does not work because ICodecAPI is undefined:
IMFTransform* pEncoder = NULL;
mWriter->GetServiceForStream(MF_SOURCE_READER_FIRST_VIDEO_STREAM, GUID_NULL, IID_IMFTransform, (void**)&pEncoder);
if (pEncoder)
{
ICodecAPI* pCodecApi = NULL;
hr = pEncoder->QueryInterface<ICodecAPI>(&pCodecApi);
}
Please help me choose encoder or find out which encoder was chosen?
Media Foundation does not offer flexibility to specify encoder using Sink Writer API. You can only instruct to use or not use hardware encoder, using MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS attribute:
Enables the source reader or sink writer to use hardware-based Media Foundation transforms (MFTs).
Once Sink Writer is set up, you can use IMFSinkWriterEx::GetTransformForStream to enumerate the transforms the API prepared for the processing and pick the encoder from the enumeration. This is going to give you an idea what encoder is actually used.
Media Foundation Sink Writer API reserves the right to decode which encoder to use. Typically if would prefer certified compatible encoder, especially if you enable Direct3D scenario.
Finally, I am not sure which of these is available for C++/CX, but your code snippets suggest that the mentioned API is available.
To use encoder of your choice, you are supposed to use Media Foundation Media Session API, as opposed to Sink Writer.
Thank you Roman. I tried GetTranformForStream. With nvidia driver I get the attributes for the IMFTransform:
{206B4FC8-FCF9-4C51-AFE3-9764369E33A0}=1,
{2FB866AC-B078-4942-AB6C-003D05CDA674}=NVIDIA HEVC Encoder MFT,
FRIENDLY_NAME_Attribute=NVIDIA HEVC Encoder MFT,
{3AECB0CC-035B-4BCC-8185-2B8D551EF3AF}=VEN_10DE,
MAJOR_TYPE=Video,
{53476A11-3F13-49FB-AC42-EE2733C96741}=1,
{86A355AE-3A77-4EC4-9F31-01149A4E92DE}=1,
{88A7CB15-7B07-4A34-9128-E64C6703C4D3}=8,
{E3F2E203-D445-4B8C-9211-AE390D3BA017}=2303214,
{E5666D6B-3422-4EB6-A421-DA7DB1F8E207}=1,
{F34B9093-05E0-4B16-993D-3E2A2CDE6AD3}=860522,
SUBTYPE=Base,
{F81A699A-649A-497D-8C73-29F8FED6AD7A}=1,
When disabling nvidia driver I only get:
{86A355AE-3A77-4EC4-9F31-01149A4E92DE}=1
I wonder if the last transform is a list of several transforms? How to get them? Can I traverse the topology from sinkwriter?
My pc has the following codecs I could use:
{966F107C-8EA2-425D-B822-E4A71BEF01D7} // "NVIDIA HEVC Encoder MFT"
{F2F84074-8BCA-40BD-9159-E880F673DD3B} // "H265 Encoder MFT"
{BC10864D-2B34-408F-912A-102B1B867B6C} // "IntelĀ« Hardware H265 Encoder MFT"
In the nvidia case, I get a meaningful string, but not when it is not nvidia apparently (Intel or software).
Now I will also try look into Media Session API as you suggested.

Encode Audio using Sink Writer

I found this article that explaing how to encode a video using Media Foundation.
I am trying to encode an audio using the principle used in the above link.
I am stuck on setting a correct input media type for the sink writer.
Here is that part:
if (SUCCEEDED(hr))
hr = MFCreateMediaType(&pMediaTypeIn);
if (SUCCEEDED(hr))
hr = pMediaTypeIn->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio);
if (SUCCEEDED(hr))
hr = pMediaTypeIn->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_Float);
if (SUCCEEDED(hr))
hr = pMediaTypeIn->SetUINT32(MF_MT_AUDIO_NUM_CHANNELS, cChannels);
if (SUCCEEDED(hr))
hr = pMediaTypeIn->SetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, sampleRate);
if (SUCCEEDED(hr))
hr = pSinkWriter->SetInputMediaType(streamIndex, pMediaTypeIn, NULL);
My code fails on the last line while setting the input media type.
Any help, how to handle this?
Thanks.
You also need to supply MF_MT_AUDIO_BITS_PER_SAMPLE in the media type. It must be set to 16. Check the Input types requirements here: https://msdn.microsoft.com/en-us/library/windows/desktop/dd742785(v=vs.85).aspx. The sample rate must be either 44100 or 48000. The subtype must be MFAudioFormat_PCM.
The most important thing is that you need to call AddStream on the Sink Writer, prior to calling SetMediaType, with an output audio type enumerated by a call to MFTranscodeGetAudioOutputAvailableTypes function with MFAudioFormat_AAC subtype if you need to encode audio as AAC.
MFCreateSinkWriterFromURL may not be able to guess that you need to encode to an mp4 container from the m4a extension. You might need to supply MFTranscodeContainerType_MPEG4 containe type using the MF_TRANSCODE_CONTAINERTYPE attribute when calling MFCreateSinkWriterFromURL. Another option would be to use MFCreateMPEG4MediaSink and MFCreateSinkWriterFromMediaSink instead.

Windows Audio Endpoint API. Getting the names of my Audio Devices

My main goal at the moment is to get detailed information about all of the local machine's Audio Endpoint Devices. That is the objects representing the audio peripherals. I want to be able to choose which device to record from based on some logic (or eventually allow the user to manually do so).
Here's what I've got so far. I'm pretty new to c++ so dealing with all of these abstract classes is getting a bit tricky so feel free to comment on code quality as well.
//Create vector of IMMDevices
UINT endpointCount = NULL;
(*pCollection).GetCount(&endpointCount);
std::vector<IMMDevice**> IMMDevicePP; //IMMDevice seems to contain all endpoint devices, so why have a collection here?
for (UINT i = 0; i < (endpointCount); i++)
{
IMMDevice* pp = NULL;
(*pCollection).Item(i, &pp);
IMMDevicePP.assign(1, &pp);
}
My more technical goal at present is to get objects that implement this interface: http://msdn.microsoft.com/en-us/library/windows/desktop/dd371414(v=vs.85).aspx
This is a type that is supposed to represent a single Audio Endpoint device whereas the IMMDevice seems to contain a collection of devices. However IMMEndpoint only contains a method called GetDataFlow so I'm unsure if that will help me. Again the goal is to easily select which endpoint device to record and stream audio from.
Any suggestions? Am I using the wrong API? This API definitely has good commands for the actual streaming and sampling of the audio but I'm a bit lost as to how to make sure I'm using the desired device.
WASAPI will allow you to do what you need so you're using the right API. You're mistaken about IMMDevice representing a collection of audio devices though, that is IMMDeviceCollection. IMMDevice represents a single audio device. By "device", WASAPI does't mean audio card as you might expect, rather it means a single input/output on such card. For example an audio card with analog in/out + digital out will show up as 3 IMMDevices each with it's own IMMEndpoint. I'm not sure what detailed info you're after but it seems to me IMMDevice will provide you with everything you need. Basically, you'll want to do something like this:
Create an IMMDeviceEnumerator
Call EnumAudioEndpoints specifying render, capture or both, to enumerate into an IMMDeviceCollection
Obtain individual IMMDevice instances from IMMDeviceCollection
Device name and description can be queried from IMMDevice using OpenPropertyStore (http://msdn.microsoft.com/en-us/library/windows/desktop/dd370812%28v=vs.85%29.aspx). Additional supported device details can be found here: http://msdn.microsoft.com/en-us/library/windows/desktop/dd370794%28v=vs.85%29.aspx.
IMMDevice instances obtained from IMMDeviceCollection will also be instances of IMMEndpoint, use QueryInterface to switch between the two. However, as you noted, this will only tell you if you've got your hands on a render or capture device. Much easier to only ask for what you want directly on EnumAudioEndpoints.
About code quality: use x->f() instead if (*x).f(), although it's technically the same thing the -> operator is the common way to call a function through an object pointer
Don't use vector::assign, apparently that replaces the contents of the entire vector on each call so you'll end up with a collection of size 1 regardless of the number of available devices. Use push_back instead.
After enumerating your IMMDevices as Sjoerd stated it is a must to retrieve the IPropertyStore
information for the device. From there you have to extract the PROPVARIANT object as such:
PROPERTYKEY key;
HRESULT keyResult = (*IMMDeviceProperties[i]).GetAt(p, &key);
then
PROPVARIANT propVari;
HRESULT propVariResult = (*IMMDeviceProperties[i]).GetValue(key, &propVari);
according to these documents:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb761471(v=vs.85).aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/aa380072(v=vs.85).aspx
And finally to navigate the large PROPVARIANT structure in order to get the friendly name of the audio endpoint device simply access the pwszVal member of the PROPVARIANT structure as illustrated here:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd316594(v=vs.85).aspx
All about finding the right documentation!

Get encoder name from SinkWriter or ICodecAPI or IMFTransform

I'm using the SinkWriter in order to encode video using media foundation.
After I initialize the SinkWriter, I would like to get the underlying encoder it uses, and print out its name, so I can see what encoder it uses. (In my case, the encoder is most probably the H.264 Video Encoder included in MF).
I can get references to the encoder's ICodecAPI and IMFTransform interface (using pSinkWriter->GetServiceForStream), but I don't know how to get the encoder's friendly name using those interfaces.
Does anyone know how to get the encoder's friendly name from the sinkwriter? Or from its ICodecAPI or IMFTransform interface?
This is by far an effective solution and i am not 100% sure it works, but what could be done is:
1) At start-up enumerate all the codecs that could be used (as i understand in this case H264 encoders) and subscribe to setting change event
MFT_REGISTER_TYPE_INFO TransformationOutput = { MFMediaType_Video, MFVideoFormat_H264 };
DWORD nFlags = MFT_ENUM_FLAG_ALL;
UINT32 nCount = 0;
CLSID* pClsids;
MFTEnum( MFT_CATEGORY_VIDEO_ENCODER, nFlags, NULL, &TransformationOutput, NULL, &pClsids, &nCount);
// Ok here we assume nCount is 1 and we got the MS encoder
ICodecAPI *pMsEncoder;
hr = CoCreateInstance(pClsids[0], NULL, CLSCTX_INPROC_SERVER, __uuidof(ICodecAPI), (void**)&pMsEncoder);
// nCodecIds is supposed to be an array of identifiers to distinguish the sender
hr = pMsEncoder->RegisterForEvent(CODECAPI_AVEncVideoOutputFrameRate, (LONG_PTR)&nCodecIds[0]);
2) Not 100% sure if the frame rate setting is also set when the input media type for the stream is set, but anyhow you can try to set the same property on the ICodecAPI you retrieved from the SinkWriter. Then after getting the event you should be able to identify the codec by comparing lParam1 to the value passed. But still this is very poor since it relies on the fact that all the encoders support the event notification and requires unneeded parameter changing if my hypothesis about the event being generated on stream construction is wrong.
Having IMFTransform you don't have a friendly name of the encoder.
One of the options you have is to check transform output type and compare to well known GUIDs to identify the encoder, in particular you are going to have a subtype of MFVideoFormat_H264 with H264 Encoder MFT.
Another option is to reach CLSID of the encoder (IMFTransform does not get you it, but you might have it otherwise such as via IMFActivate or querying MFT_TRANSFORM_CLSID_Attribute attribute, or via IPersist* interfaces). Then you could look registry up for a friendly name or enumerate transforms and look your one in that list by comparing CLSID.

Windows Phone 8 audio balance change

Please advice whether it's possible to programmatically change global
audio balance (left/right volume).
I've tried the SetBalance method but it doesn't change
audio balance in currently playing audio stream (using standard media player app)
The codebelow is what I pasted to some sample WP8 app. (No, it didn't work)
Microsoft::WRL::ComPtr<IMFMediaEngineClassFactory> spFactory;
Microsoft::WRL::ComPtr<IMFAttributes> spAttributes;
CoCreateInstance(CLSID_MFMediaEngineClassFactory, nullptr, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&spFactory));
Microsoft::WRL::ComPtr<IMFMediaEngine> m_spMediaEnine;
Microsoft::WRL::ComPtr<IMFMediaEngineEx> m_EnineEx;
//MFCreateAttributes(&spAttributes, 1);
const DWORD flags = MF_MEDIA_ENGINE_WAITFORSTABLE_STATE;
spFactory->CreateInstance(flags, spAttributes.Get(), &m_spMediaEnine);
m_spMediaEnine.Get()->QueryInterface(__uuidof(IMFMediaEngine), (void **) &m_EnineEx);
HRESULT res = m_EnineEx->SetBalance(-1.0);
Thanks in advance!
UPD:
Is this method appliable solely for "My" sound stream i.e. the one I've created f.e. in my own music player? And is this method has nothing to do with sound streams not owned by my app?