How to get width and height of directshow webcam video stream - c++

I found a bit of code that gets me access to the raw pixel data from my webcam. However I need to know the image width, height, pixel format and preferably the data stride(pitch, memory padding or whatever you want to call it) if its ever gonna be something other than the width * bytes per pixel
#include <windows.h>
#include <dshow.h>
#pragma comment(lib,"Strmiids.lib")
#define DsHook(a,b,c) if (!c##_) { INT_PTR* p=b+*(INT_PTR**)a; VirtualProtect(&c##_,4,PAGE_EXECUTE_READWRITE,&no);\
*(INT_PTR*)&c##_=*p; VirtualProtect(p, 4,PAGE_EXECUTE_READWRITE,&no); *p=(INT_PTR)c; }
// Here you get image video data in buf / len. Process it before calling Receive_ because renderer dealocates it.
HRESULT ( __stdcall * Receive_ ) ( void* inst, IMediaSample *smp ) ;
HRESULT __stdcall Receive ( void* inst, IMediaSample *smp ) {
BYTE* buf; smp->GetPointer(&buf); DWORD len = smp->GetActualDataLength();
//AM_MEDIA_TYPE* info;
//smp->GetMediaType(&info);
HRESULT ret = Receive_ ( inst, smp );
return ret;
}
int WINAPI WinMain(HINSTANCE inst,HINSTANCE prev,LPSTR cmd,int show){
HRESULT hr = CoInitialize(0); MSG msg={0}; DWORD no;
IGraphBuilder* graph= 0; hr = CoCreateInstance( CLSID_FilterGraph, 0, CLSCTX_INPROC,IID_IGraphBuilder, (void **)&graph );
IMediaControl* ctrl = 0; hr = graph->QueryInterface( IID_IMediaControl, (void **)&ctrl );
ICreateDevEnum* devs = 0; hr = CoCreateInstance (CLSID_SystemDeviceEnum, 0, CLSCTX_INPROC, IID_ICreateDevEnum, (void **) &devs);
IEnumMoniker* cams = 0; hr = devs?devs->CreateClassEnumerator (CLSID_VideoInputDeviceCategory, &cams, 0):0;
IMoniker* mon = 0; hr = cams->Next (1,&mon,0); // get first found capture device (webcam?)
IBaseFilter* cam = 0; hr = mon->BindToObject(0,0,IID_IBaseFilter, (void**)&cam);
hr = graph->AddFilter(cam, L"Capture Source"); // add web cam to graph as source
IEnumPins* pins = 0; hr = cam?cam->EnumPins(&pins):0; // we need output pin to autogenerate rest of the graph
IPin* pin = 0; hr = pins?pins->Next(1,&pin, 0):0; // via graph->Render
hr = graph->Render(pin); // graph builder now builds whole filter chain including MJPG decompression on some webcams
IEnumFilters* fil = 0; hr = graph->EnumFilters(&fil); // from all newly added filters
IBaseFilter* rnd = 0; hr = fil->Next(1,&rnd,0); // we find last one (renderer)
hr = rnd->EnumPins(&pins); // because data we are intersted in are pumped to renderers input pin
hr = pins->Next(1,&pin, 0); // via Receive member of IMemInputPin interface
IMemInputPin* mem = 0; hr = pin->QueryInterface(IID_IMemInputPin,(void**)&mem);
DsHook(mem,6,Receive); // so we redirect it to our own proc to grab image data
hr = ctrl->Run();
while ( GetMessage( &msg, 0, 0, 0 ) ) {
TranslateMessage( &msg );
DispatchMessage( &msg );
}
};
Bonus points if you can tell me how get this thing not to render a window but still get me access to the image data.

That's really ugly. Please don't do that. Insert a pass-through filter like the sample grabber instead (as I replied to your other post on the same topic). Connecting the sample grabber to the null renderer gets you the bits in a clean, safe way without rendering the image.
To get the stride, you need to get the media type, either through ISampleGrabber or IPin::ConnectionMediaType. The format block will be either a VIDEOINFOHEADER or a VIDEOINFOHEADER2 (check the format GUID). The bitmapinfo header biWidth and biHeight defines the bitmap dimensions (and hence stride). If the RECT is not empty, then that defines the relevant image area within the bitmap.
I'm going to have to wash my hands now after touching this post.

I am sorry for you. When the interface was created there were probably not the best programmer to it.
// Here you get image video data in buf / len. Process it before calling Receive_ because renderer dealocates it.
BITMAPINFOHEADER bmpInfo; // current bitmap header info
int stride;
HRESULT ( __stdcall * Receive_ ) ( void* inst, IMediaSample *smp ) ;
HRESULT __stdcall Receive ( void* inst, IMediaSample *smp )
{
BYTE* buf; smp->GetPointer(&buf); DWORD len = smp->GetActualDataLength();
HRESULT ret = Receive_ ( inst, smp );
AM_MEDIA_TYPE* info;
HRESULT hr = smp->GetMediaType(&info);
if ( hr != S_OK )
{ //TODO: error }
else
{
if ( type->formattype == FORMAT_VideoInfo )
{
const VIDEOINFOHEADER * vi = reinterpret_cast<VIDEOINFOHEADER*>( type->pbFormat );
const BITMAPINFOHEADER & bmiHeader = vi->bmiHeader;
//! now the bmiHeader.biWidth contains the data stride
stride = bmiHeader.biWidth;
bmpInfo = bmiHeader;
int width = ( vi->rcTarget.right - vi->rcTarget.left );
//! replace the data stride be the actual width
if ( width != 0 )
bmpInfo.biWidth = width;
}
else
{ // unsupported format }
}
DeleteMediaType( info );
return ret;
}

Here's how to add the Null Renderer that suppresses the rendering window. Add directly after creating the IGraphBuilder*
//create null renderer and add null renderer to graph
IBaseFilter *m_pNULLRenderer; hr = CoCreateInstance(CLSID_NullRenderer, NULL, CLSCTX_INPROC_SERVER, IID_IBaseFilter, (void **)&m_pNULLRenderer);
hr = graph->AddFilter(m_pNULLRenderer, L"Null Renderer");
That dshook hack is the only elegant directshow code of which I am aware.
In my experience, the DirectShow API is a convoluted nightmare, requiring hundreds of lines of code to do even the simplest operation, and adapting a whole programming paradigm in order to access your web camera. So if this code does the job for you, as it did for me, use it and enjoy fewer lines of code to maintain.

Related

How to increase mp3 decoding quality (Media Foundation)?

I have file .wav that I need to convert in .mp3 in order to do it I am using MediaFoundation. This is approach that I use:
#include "TV_AudioEncoderMF.h"
#include <windows.h>
#include <windowsx.h>
#include <atlstr.h>
#include <comdef.h>
#include <exception>
#include <mfapi.h>
#include <mfplay.h>
#include <mfreadwrite.h>
#include <mmdeviceapi.h>
#include <Audioclient.h>
#include <mferror.h>
#include <Wmcodecdsp.h>
#pragma comment(lib, "mf.lib")
#pragma comment(lib, "mfplat.lib")
#pragma comment(lib, "mfplay.lib")
#pragma comment(lib, "mfreadwrite.lib")
#pragma comment(lib, "mfuuid.lib")
#pragma comment(lib, "wmcodecdspuuid")
TV_AudioEncoderMF::TV_AudioEncoderMF()
{
}
TV_AudioEncoderMF::~TV_AudioEncoderMF()
{
}
template <class T> void SafeRelease(T **ppT)
{
if (*ppT)
{
(*ppT)->Release();
*ppT = nullptr;
}
}
HRESULT TV_AudioEncoderMF::GetOutputMediaTypes(
GUID cAudioFormat,
UINT32 cSampleRate,
UINT32 cBitPerSample,
UINT32 cChannels,
IMFMediaType **ppType
)
{
// Enumerate all codecs except for codecs with field-of-use restrictions.
// Sort the results.
DWORD dwFlags =
(MFT_ENUM_FLAG_ALL & (~MFT_ENUM_FLAG_FIELDOFUSE)) |
MFT_ENUM_FLAG_SORTANDFILTER;
IMFCollection *pAvailableTypes = NULL; // List of audio media types.
IMFMediaType *pAudioType = NULL; // Corresponding codec.
HRESULT hr = MFTranscodeGetAudioOutputAvailableTypes(
cAudioFormat,
dwFlags,
NULL,
&pAvailableTypes
);
// Get the element count.
DWORD dwMTCount;
hr = pAvailableTypes->GetElementCount(&dwMTCount);
// Iterate through the results and check for the corresponding codec.
for (DWORD i = 0; i < dwMTCount; i++)
{
hr = pAvailableTypes->GetElement(i, (IUnknown**)&pAudioType);
GUID majorType;
hr = pAudioType->GetMajorType(&majorType);
GUID subType;
hr = pAudioType->GetGUID(MF_MT_SUBTYPE, &subType);
if (majorType != MFMediaType_Audio || subType != MFAudioFormat_FLAC)
{
continue;
}
UINT32 sampleRate = NULL;
hr = pAudioType->GetUINT32(
MF_MT_AUDIO_SAMPLES_PER_SECOND,
&sampleRate
);
UINT32 bitRate = NULL;
hr = pAudioType->GetUINT32(
MF_MT_AUDIO_BITS_PER_SAMPLE,
&bitRate
);
UINT32 channels = NULL;
hr = pAudioType->GetUINT32(
MF_MT_AUDIO_NUM_CHANNELS,
&channels
);
if (sampleRate == cSampleRate
&& bitRate == cBitPerSample
&& channels == cChannels)
{
// Found the codec.
// Jump out!
break;
}
}
// Add the media type to the caller
*ppType = pAudioType;
(*ppType)->AddRef();
SafeRelease(&pAudioType);
return hr;
}
void TV_AudioEncoderMF::decode()
{
HRESULT hr = S_OK;
// Initialize com interface
CoInitializeEx(0, COINIT_MULTITHREADED);
// Start media foundation
MFStartup(MF_VERSION);
IMFMediaType *pInputType = NULL;
IMFSourceReader *pSourceReader = NULL;
IMFMediaType *pOuputMediaType = NULL;
IMFSinkWriter *pSinkWriter = NULL;
// Create source reader
hr = MFCreateSourceReaderFromURL(
L"D:\\buffer\\del\\out\\test.wav",
NULL,
&pSourceReader
);
// Create sink writer
hr = MFCreateSinkWriterFromURL(
L"D:\\buffer\\del\\out\\test_out.mp3",
NULL,
NULL,
&pSinkWriter
);
// Get media type from source reader
hr = pSourceReader->GetCurrentMediaType(
MF_SOURCE_READER_FIRST_AUDIO_STREAM,
&pInputType
);
// Get sample rate, bit rate and channels
UINT32 sampleRate = NULL;
hr = pInputType->GetUINT32(
MF_MT_AUDIO_SAMPLES_PER_SECOND,
&sampleRate
);
UINT32 bitRate = NULL;
hr = pInputType->GetUINT32(
MF_MT_AUDIO_BITS_PER_SAMPLE,
&bitRate
);
UINT32 channels = NULL;
hr = pInputType->GetUINT32(
MF_MT_AUDIO_NUM_CHANNELS,
&channels
);
// Try to find a media type that is fitting.
hr = GetOutputMediaTypes(
MFAudioFormat_MP3,
sampleRate,
bitRate,
channels,
&pOuputMediaType);
DWORD dwWriterStreamIndex = -1;
// Add the stream
hr = pSinkWriter->AddStream(
pOuputMediaType,
&dwWriterStreamIndex
);
// Set input media type
hr = pSinkWriter->SetInputMediaType(
dwWriterStreamIndex,
pInputType,
NULL
);
// Tell the sink writer to accept data
hr = pSinkWriter->BeginWriting();
// Forever alone loop
while (true)
{
DWORD nStreamIndex, nStreamFlags;
LONGLONG nTime;
IMFSample *pSample;
// Read through the samples until...
hr = pSourceReader->ReadSample(
MF_SOURCE_READER_FIRST_AUDIO_STREAM,
0,
&nStreamIndex,
&nStreamFlags,
&nTime,
&pSample);
if (pSample)
{
hr = pSinkWriter->WriteSample(
dwWriterStreamIndex,
pSample
);
}
// ... we are at the end of the stream...
if (nStreamFlags & MF_SOURCE_READERF_ENDOFSTREAM)
{
// ... and jump out.
break;
}
}
// Call finalize to finish writing.
hr = pSinkWriter->Finalize();
// Done :D
}
Problem is - it is a big difference in audio quality, when I playback (by win standard players) .wav file it sounds good, but when I playback compressed to .mp3 file sound it sounds like person recorded his voice on recorder with a very bad quality.
What is a possible problem here? I don't see any possible way to set out quality, like setOutQualityInPersent(100)
EDIT
void co_AudioEncoderMF::decode()
{
HRESULT hr = S_OK;
// Initialize com interface
CoInitializeEx(0, COINIT_MULTITHREADED);
// Start media foundation
MFStartup(MF_VERSION);
IMFMediaType *pInputType = NULL;
IMFSourceReader *pSourceReader = NULL;
IMFMediaType *pOuputMediaType = NULL;
IMFSinkWriter *pSinkWriter = NULL;
// Create source reader
hr = MFCreateSourceReaderFromURL(
L"D:\\buffer\\del\\out\\test.wav",
NULL,
&pSourceReader
);
// Create sink writer
hr = MFCreateSinkWriterFromURL(
L"D:\\buffer\\del\\out\\test_out.mp3",
NULL,
NULL,
&pSinkWriter
);
// Get media type from source reader
hr = pSourceReader->GetCurrentMediaType(
MF_SOURCE_READER_FIRST_AUDIO_STREAM,
&pInputType
);
// Get sample rate, bit rate and channels
UINT32 sampleRate = NULL;
hr = pInputType->GetUINT32(
MF_MT_AUDIO_SAMPLES_PER_SECOND,
&sampleRate
);
UINT32 bitRate = NULL;
hr = pInputType->GetUINT32(
MF_MT_AUDIO_BITS_PER_SAMPLE,
&bitRate
);
UINT32 channels = NULL;
hr = pInputType->GetUINT32(
MF_MT_AUDIO_NUM_CHANNELS,
&channels
);
// Try to find a media type that is fitting.
hr = GetOutputMediaTypes(
MFAudioFormat_MP3,
sampleRate,
bitRate,
channels,
&pOuputMediaType);
bitRate = bitRate + 2; <------- This line
pOuputMediaType->SetUINT32(MF_MT_AUDIO_BITS_PER_SAMPLE, bitRate); <------- This line
DWORD dwWriterStreamIndex = -1;
// Add the stream
hr = pSinkWriter->AddStream(
pOuputMediaType,
&dwWriterStreamIndex
);
// Set input media type
hr = pSinkWriter->SetInputMediaType(
dwWriterStreamIndex,
pInputType,
NULL
);
// Tell the sink writer to accept data
hr = pSinkWriter->BeginWriting();
// Forever alone loop
while (true)
{
DWORD nStreamIndex, nStreamFlags;
LONGLONG nTime;
IMFSample *pSample;
// Read through the samples until...
hr = pSourceReader->ReadSample(
MF_SOURCE_READER_FIRST_AUDIO_STREAM,
0,
&nStreamIndex,
&nStreamFlags,
&nTime,
&pSample);
if (pSample)
{
hr = pSinkWriter->WriteSample(
dwWriterStreamIndex,
pSample
);
}
// ... we are at the end of the stream...
if (nStreamFlags & MF_SOURCE_READERF_ENDOFSTREAM)
{
// ... and jump out.
break;
}
}
// Call finalize to finish writing.
hr = pSinkWriter->Finalize();
// Done :D
}
EDIT2
There are 2 files - https://drive.google.com/drive/folders/1yzB2u0TvMSnwsTpYnDDPFBDkTB75ZFwM?usp=sharing
Result and orig
This part is just broken:
// Try to find a media type that is fitting.
hr = GetOutputMediaTypes(
MFAudioFormat_MP3,
sampleRate,
bitRate,
channels,
&pOuputMediaType);
bitRate = bitRate + 2; <------- This line
pOuputMediaType->SetUINT32(MF_MT_AUDIO_BITS_PER_SAMPLE, bitRate); <------- This line
To get you back on track, replace the fragment above with:
MFCreateMediaType(&pOuputMediaType);
pOuputMediaType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Audio);
pOuputMediaType->SetGUID(MF_MT_SUBTYPE, MFAudioFormat_MP3);
pOuputMediaType->SetUINT32(MF_MT_AUDIO_AVG_BYTES_PER_SECOND, 128000 / 8);
pOuputMediaType->SetUINT32(MF_MT_AUDIO_NUM_CHANNELS, channels);
pOuputMediaType->SetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, sampleRate);
and you'll start getting proper MP3.
Note that the attributes above are taken directly from documentation: MP3 Audio Encoder. In your application you will need to make sure that target values remain valid and match the documented options. You might need to resample audio, for example.

How to use IAudioClient3 (WASAPI) with Real-Time Work Queue API

I'm working on a lowest-possible latency MIDI synthetizer software. I'm aware of ASIO and other alternatives, but as they have apparently made significant improvements to the WASAPI stack (in shared mode, at least), I'm curious to try it out. I first wrote a simple event-driven version of program, but as that's not the recommended way to do low-latency audio on Windows 10 (according to the docs), I'm trying to migrate to the Real-Time Work Queue API.
The documentation on Low Latency Audio states that it is recommended to use the Real-Time Work Queue API or MFCreateMFByteStreamOnStreamEx with WASAPI in order for the OS to manage work items in a way that will avoid interference from non-audio subsystems. This seems like a good idea, but the latter option seems to require some managed code (demonstrated in this WindowsAudioSession example), which I know nothing about and would preferably avoid (also the header Robytestream.h which has defs for the IRandomAccessStream isn't found on my system either).
The RTWQ example included in the docs is incomplete (doesn't compile as such), and I have made the necessary additions to make it compilable:
class my_rtqueue : IRtwqAsyncCallback {
public:
IRtwqAsyncResult* pAsyncResult;
RTWQWORKITEM_KEY workItemKey;
DWORD WorkQueueId;
STDMETHODIMP GetParameters(DWORD* pdwFlags, DWORD* pdwQueue)
{
HRESULT hr = S_OK;
*pdwFlags = 0;
*pdwQueue = WorkQueueId;
return hr;
}
//-------------------------------------------------------
STDMETHODIMP Invoke(IRtwqAsyncResult* pAsyncResult)
{
HRESULT hr = S_OK;
IUnknown* pState = NULL;
WCHAR className[20];
DWORD bufferLength = 20;
DWORD taskID = 0;
LONG priority = 0;
BYTE* pData;
hr = render_info.renderclient->GetBuffer(render_info.buffer_framecount, &pData);
ERROR_EXIT(hr);
update_buffer((unsigned short*)pData, render_info.framesize_bytes / (2*sizeof(unsigned short))); // 2 channels, sizeof(unsigned short) == 2
hr = render_info.renderclient->ReleaseBuffer(render_info.buffer_framecount, 0);
ERROR_EXIT(hr);
return S_OK;
}
STDMETHODIMP QueryInterface(const IID &riid, void **ppvObject) {
return 0;
}
ULONG AddRef() {
return 0;
}
ULONG Release() {
return 0;
}
HRESULT queue(HANDLE event) {
HRESULT hr;
hr = RtwqPutWaitingWorkItem(event, 1, this->pAsyncResult, &this->workItemKey);
return hr;
}
my_rtqueue() : workItemKey(0) {
HRESULT hr = S_OK;
IRtwqAsyncCallback* callback = NULL;
DWORD taskId = 0;
WorkQueueId = RTWQ_MULTITHREADED_WORKQUEUE;
//WorkQueueId = RTWQ_STANDARD_WORKQUEUE;
hr = RtwqLockSharedWorkQueue(L"Pro Audio", 0, &taskId, &WorkQueueId);
ERROR_THROW(hr);
hr = RtwqCreateAsyncResult(NULL, reinterpret_cast<IRtwqAsyncCallback*>(this), NULL, &pAsyncResult);
ERROR_THROW(hr);
}
int stop() {
HRESULT hr;
if (pAsyncResult)
pAsyncResult->Release();
if (0xFFFFFFFF != this->WorkQueueId) {
hr = RtwqUnlockWorkQueue(this->WorkQueueId);
if (FAILED(hr)) {
printf("Failed with RtwqUnlockWorkQueue 0x%x\n", hr);
return 0;
}
}
return 1;
}
};
And so, the actual WASAPI code (HRESULT error checking is omitted for clarity):
void thread_main(LPVOID param) {
HRESULT hr;
REFERENCE_TIME hnsRequestedDuration = 0;
IMMDeviceEnumerator* pEnumerator = NULL;
IMMDevice* pDevice = NULL;
IAudioClient3* pAudioClient = NULL;
IAudioRenderClient* pRenderClient = NULL;
WAVEFORMATEX* pwfx = NULL;
HANDLE hEvent = NULL;
HANDLE hTask = NULL;
UINT32 bufferFrameCount;
BYTE* pData;
DWORD flags = 0;
hr = RtwqStartup();
// also, hr is checked for errors every step of the way
hr = CoInitialize(NULL);
hr = CoCreateInstance(
CLSID_MMDeviceEnumerator, NULL,
CLSCTX_ALL, IID_IMMDeviceEnumerator,
(void**)&pEnumerator);
hr = pEnumerator->GetDefaultAudioEndpoint(
eRender, eConsole, &pDevice);
hr = pDevice->Activate(
IID_IAudioClient, CLSCTX_ALL,
NULL, (void**)&pAudioClient);
WAVEFORMATEX wave_format = {};
wave_format.wFormatTag = WAVE_FORMAT_PCM;
wave_format.nChannels = 2;
wave_format.nSamplesPerSec = 48000;
wave_format.nAvgBytesPerSec = 48000 * 2 * 16 / 8;
wave_format.nBlockAlign = 2 * 16 / 8;
wave_format.wBitsPerSample = 16;
UINT32 DP, FP, MINP, MAXP;
hr = pAudioClient->GetSharedModeEnginePeriod(&wave_format, &DP, &FP, &MINP, &MAXP);
printf("DefaultPeriod: %u, Fundamental period: %u, min_period: %u, max_period: %u\n", DP, FP, MINP, MAXP);
hr = pAudioClient->InitializeSharedAudioStream(AUDCLNT_STREAMFLAGS_EVENTCALLBACK, MINP, &wave_format, 0);
my_rtqueue* workqueue = NULL;
try {
workqueue = new my_rtqueue();
}
catch (...) {
hr = E_ABORT;
ERROR_EXIT(hr);
}
hr = pAudioClient->GetBufferSize(&bufferFrameCount);
PWAVEFORMATEX wf = &wave_format;
UINT32 current_period;
pAudioClient->GetCurrentSharedModeEnginePeriod(&wf, &current_period);
INT32 FrameSize_bytes = bufferFrameCount * wave_format.nChannels * wave_format.wBitsPerSample / 8;
printf("bufferFrameCount: %u, FrameSize_bytes: %d, current_period: %u\n", bufferFrameCount, FrameSize_bytes, current_period);
hr = pAudioClient->GetService(
IID_IAudioRenderClient,
(void**)&pRenderClient);
render_info.framesize_bytes = FrameSize_bytes;
render_info.buffer_framecount = bufferFrameCount;
render_info.renderclient = pRenderClient;
hEvent = CreateEvent(nullptr, false, false, nullptr);
if (hEvent == INVALID_HANDLE_VALUE) { ERROR_EXIT(0); }
hr = pAudioClient->SetEventHandle(hEvent);
const size_t num_samples = FrameSize_bytes / sizeof(unsigned short);
DWORD taskIndex = 0;
hTask = AvSetMmThreadCharacteristics(TEXT("Pro Audio"), &taskIndex);
if (hTask == NULL) {
hr = E_FAIL;
}
hr = pAudioClient->Start(); // Start playing.
running = 1;
while (running) {
workqueue->queue(hEvent);
}
workqueue->stop();
hr = RtwqShutdown();
delete workqueue;
running = 0;
return 1;
}
This seems to kind of work (ie. audio is being output), but on every other invocation of my_rtqueue::Invoke(), IAudioRenderClient::GetBuffer() returns a HRESULT of 0x88890006 (-> AUDCLNT_E_BUFFER_TOO_LARGE), and the actual audio output is certainly not what I intend it to be.
What issues are there with my code? Is this the right way to use RTWQ with WASAPI?
Turns out there were a number of issues with my code, none of which had really anything to do with Rtwq. The biggest issue was me assuming that the shared mode audio stream was using 16-bit integer samples, when in reality my audio was setup for 32-bit float format (WAVE_FORMAT_IEEE_FLOAT). The currently active shared mode format, period etc. should be fetched like this:
WAVEFORMATEX *wavefmt = NULL;
UINT32 current_period = 0;
hr = pAudioClient->GetCurrentSharedModeEnginePeriod((WAVEFORMATEX**)&wavefmt, &current_period);
wavefmt now contains the output format info of the current shared mode. If the wFormatTag field is equal to WAVE_FORMAT_EXTENSIBLE, one needs to cast WAVEFORMATEX to WAVEFORMATEXTENSIBLE to see what the actual format is. After this, one needs to fetch the minimum period supported by the current setup, like so:
UINT32 DP, FP, MINP, MAXP;
hr = pAudioClient->GetSharedModeEnginePeriod(wavefmt, &DP, &FP, &MINP, &MAXP);
and then initialize the audio stream with the new InitializeSharedAudioStream function:
hr = pAudioClient->InitializeSharedAudioStream(AUDCLNT_STREAMFLAGS_EVENTCALLBACK, MINP, wavefmt, NULL);
... get the buffer's actual size:
hr = pAudioClient->GetBufferSize(&render_info.buffer_framecount);
and use GetCurrentPadding in the Get/ReleaseBuffer logic:
UINT32 pad = 0;
hr = render_info.audioclient->GetCurrentPadding(&pad);
int actual_size = (render_info.buffer_framecount - pad);
hr = render_info.renderclient->GetBuffer(actual_size, &pData);
if (SUCCEEDED(hr)) {
update_buffer((float*)pData, actual_size);
hr = render_info.renderclient->ReleaseBuffer(actual_size, 0);
ERROR_EXIT(hr);
}
The documentation for IAudioClient::Initialize states the following about shared mode streams (I assume it also applies to the new IAudioClient3):
Each time the thread awakens, it should call IAudioClient::GetCurrentPadding to determine how much data to write to a rendering buffer or read from a capture buffer. In contrast to the two buffers that the Initialize method allocates for an exclusive-mode stream that uses event-driven buffering, a shared-mode stream requires a single buffer.
Using GetCurrentPadding solves the problem with AUDCLNT_E_BUFFER_TOO_LARGE, and feeding the buffer with 32-bit float samples instead of 16-bit integers makes the output sound fine on my system (although the effect was quite funky!).
If someone comes up with better/more correct ways to use the Rtwq API, I would love to hear them.

Console app vs Win32 app - DirectSound capture-device enumeration gives different results

I am looking for a reliable method to map a DirectShow capture device GUID to its corresponding waveID value.
I found the following project by Chris_P:
The solution works great, and it relies on an a rather obscure IKsPropertySet interface to retrieve the mapping.
Unfortunately, if I attempt the same technique from a C++/CLI library, the code fails with E_NOTIMPL (this behavior has been described on this question, - see the answer by Vladimir Hmelyoff)
So, I figured that I could write a simple console-based auxiliary app to retrieve the mappings and print them. My library could then launch this auxiliary app and parse the redirected output to obtain the mappings.
The console program runs fine, however, the GUIDs that are being passed to the enumeration callback are completely different to the ones passed by Chris_P's solution.
In fact they all share the same basic structure:
lpGuid = 0x02ad0808 {BDF35A00-B9AC-11D0-A619-00AA00A7C000}
The only variation occurs in the last digits of the GUID, where coincidentally, they match the mapped waveId value.
Another weird thing is that the capture device description is truncated to 31 characters, as if the enumeration was being performed using WaveIn APIs!
It would almost seem that some DirectSound facade is now wrapping the WaveIn API.
Any pointers on what could be happening?, Can I disable this behavior and enumerate the same GUIDS that the WIN32 app is enumerating?
Here is the code for the console application:
#include "stdafx.h"
#include <mmreg.h>
#include <initguid.h>
#include <Dsound.h>
#include <dsconf.h>
static BOOL CALLBACK DSEnumCallback(
LPGUID lpGuid,
LPCTSTR lpcstrDescription,
LPCTSTR lpcstrModule,
LPVOID lpContext
);
static BOOL GetInfoFromDSoundGUID(GUID i_sGUID, DWORD &dwWaveID);
static HRESULT DirectSoundPrivateCreate(OUT LPKSPROPERTYSET * ppKsPropertySet);
typedef WINUSERAPI HRESULT(WINAPI *LPFNDLLGETCLASSOBJECT) (const CLSID &, const IID &, void **);
BOOL GetInfoFromDSoundGUID(GUID i_sGUID, DWORD &dwWaveID) {
LPKSPROPERTYSET pKsPropertySet = NULL;
HRESULT hr;
BOOL retval = FALSE;
PDSPROPERTY_DIRECTSOUNDDEVICE_DESCRIPTION_DATA psDirectSoundDeviceDescription = NULL;
DSPROPERTY_DIRECTSOUNDDEVICE_DESCRIPTION_DATA sDirectSoundDeviceDescription;
memset(&sDirectSoundDeviceDescription, 0, sizeof(sDirectSoundDeviceDescription));
hr = DirectSoundPrivateCreate(&pKsPropertySet);
if( SUCCEEDED(hr) ) {
ULONG ulBytesReturned = 0;
sDirectSoundDeviceDescription.DeviceId = i_sGUID;
// On the first call the final size is unknown so pass the size of the struct in order to receive
// "Type" and "DataFlow" values, ulBytesReturned will be populated with bytes required for struct+strings.
hr = pKsPropertySet->Get(DSPROPSETID_DirectSoundDevice,
DSPROPERTY_DIRECTSOUNDDEVICE_DESCRIPTION,
NULL,
0,
&sDirectSoundDeviceDescription,
sizeof(sDirectSoundDeviceDescription),
&ulBytesReturned
);
if( ulBytesReturned ) {
// On the first call it notifies us of the required amount of memory in order to receive the strings.
// Allocate the required memory, the strings will be pointed to the memory space directly after the struct.
psDirectSoundDeviceDescription = (PDSPROPERTY_DIRECTSOUNDDEVICE_DESCRIPTION_DATA)new BYTE[ulBytesReturned];
*psDirectSoundDeviceDescription = sDirectSoundDeviceDescription;
hr = pKsPropertySet->Get(DSPROPSETID_DirectSoundDevice,
DSPROPERTY_DIRECTSOUNDDEVICE_DESCRIPTION,
NULL,
0,
psDirectSoundDeviceDescription,
ulBytesReturned,
&ulBytesReturned
);
dwWaveID = psDirectSoundDeviceDescription->WaveDeviceId;
delete[] psDirectSoundDeviceDescription;
retval = TRUE;
}
pKsPropertySet->Release();
}
return retval;
}
HRESULT DirectSoundPrivateCreate(OUT LPKSPROPERTYSET * ppKsPropertySet) {
HMODULE hLibDsound = NULL;
LPFNDLLGETCLASSOBJECT pfnDllGetClassObject = NULL;
LPCLASSFACTORY pClassFactory = NULL;
LPKSPROPERTYSET pKsPropertySet = NULL;
HRESULT hr = DS_OK;
// Load dsound.dll
hLibDsound = LoadLibrary(TEXT("dsound.dll"));
if( !hLibDsound ) {
hr = DSERR_GENERIC;
}
// Find DllGetClassObject
if( SUCCEEDED(hr) ) {
pfnDllGetClassObject =
(LPFNDLLGETCLASSOBJECT)GetProcAddress(hLibDsound, "DllGetClassObject");
if( !pfnDllGetClassObject ) {
hr = DSERR_GENERIC;
}
}
// Create a class factory object
if( SUCCEEDED(hr) ) {
hr = pfnDllGetClassObject(CLSID_DirectSoundPrivate, IID_IClassFactory, (LPVOID *)&pClassFactory);
}
// Create the DirectSoundPrivate object and query for an IKsPropertySet
// interface
if( SUCCEEDED(hr) ) {
hr = pClassFactory->CreateInstance(NULL, IID_IKsPropertySet, (LPVOID *)&pKsPropertySet);
}
// Release the class factory
if( pClassFactory ) {
pClassFactory->Release();
}
// Handle final success or failure
if( SUCCEEDED(hr) ) {
*ppKsPropertySet = pKsPropertySet;
} else if( pKsPropertySet ) {
pKsPropertySet->Release();
}
FreeLibrary(hLibDsound);
return hr;
}
BOOL CALLBACK DSEnumCallback(
LPGUID lpGuid,
LPCTSTR lpcstrDescription,
LPCTSTR lpcstrModule,
LPVOID lpContext
) {
LPWSTR psz = NULL;
StringFromCLSID(*lpGuid, &psz);
DWORD WaveID = 0xFFFFFFFF;
if( lpGuid ) {
GUID i_guid = *lpGuid;
GetInfoFromDSoundGUID(i_guid, WaveID);
}
if( WaveID != 0xFFFFFFFF )
wprintf(_T("%d %s\r\n"), WaveID, psz);
if( psz ) {
CoTaskMemFree(psz);
}
return TRUE;
}
int main()
{
DirectSoundCaptureEnumerate(DSEnumCallback, NULL);
Sleep(10000);
return 0;
}
It turns out I was not initializing COM.
I added the following snippet at the beginning of my main() procedure and the program retrieved the expected GUIDs:
HRESULT hr = NULL;
hr = CoInitialize(NULL);
if( FAILED(hr) ) {
printf("Failed to initialize COM");
return -1;
}
So I guess that if COM is not initialized, the DirectSound engine falls back to the WaveIn API (creating a DirectShow facade around it).

How do I encode raw 48khz/32bits PCM to FLAC using Microsoft Media Foundation?

I created a SinkWriter that is able to encode video and audio using Microsoft's Media Foundation Platform.
Video is working fine so far but I have some troubles with audio only.
My PCM source has a sample rate of 48828hz, 32 bits per sample and is mono.
Everything is working well so far except for FLAC.
For instance the MP3 output is working more or less but has a wrong format. Regarding to MSDN (MP3 Audio Encoder) the MP3 encoder only supports 16 bits per sample as input. My PCM source as descriped above has 32 bits per sample.
However the export with MP3 is working cause the MF Platform seems like to have some kind of fallback and is using the MPEG Audio layer 1/2 (mpga) with 2 Channels, 32khz and a bitrate of 320kb/s.
Things start to get weird when I set the MF_MT_SUBTYPE to MFAudioFormat_FLAC. The export is working too but the quality of the audio is aweful. There's a lot of noise but I am able to recognize the audio. Regarding to VLC the FLAC file has a sample rate of 44,1khz, 8 bits per sample and is mono.
Does this mean the FLAC codec isn't able to work with the PCM I provide?
Has anyone had the same problem and was able to fix it?
Update
After doing some more research about this problem it seems like that my PCM Audio with a resolution of 32 Bit is too high. So currently I am trying to convert the 32 Bit PCM to 24 Bit for FLAC and 16 Bit for MP3 but with no luck so far. I keep you updated if I make some progress.
--------
Update 2
I've created a minimal example app that shows the problem I am facing.
It reads the 48khz32bit wave file and tries to encode it to flac.
When executing the hr = pSinkWriter->BeginWriting(); command I get the error 0xc00d36b4 whice means The data specified for the media type is invalid, inconsistent, or not supported by this object.
What am I doing wrong here?
#include "stdafx.h"
#include <windows.h>
#include <windowsx.h>
#include <comdef.h>
#include <mfapi.h>
#include <mfidl.h>
#include <mfreadwrite.h>
#include <Mferror.h>
#pragma comment(lib, "ole32")
#pragma comment(lib, "mfplat")
#pragma comment(lib, "mfreadwrite")
#pragma comment(lib, "mfuuid")
using namespace System;
int main(array<System::String ^> ^args)
{
HRESULT hr = CoInitializeEx(0, COINIT_MULTITHREADED);
hr = MFStartup(MF_VERSION);
IMFMediaType *pMediaType;
IMFMediaType *pMediaTypeOut;
IMFSourceReader *pSourceReader;
IMFAttributes *pAttributes;
IMFSinkWriter *pSinkWriter;
hr = MFCreateSourceReaderFromURL(
L"C:\\Temp\\48khz32bit.wav",
NULL,
&pSourceReader
);
hr = MFCreateAttributes(&pAttributes, 1);
hr = pAttributes->SetGUID(
MF_TRANSCODE_CONTAINERTYPE,
MFTranscodeContainerType_WAVE
);
hr = MFCreateSinkWriterFromURL(
L"C:\\Temp\\foo.flac",
NULL,
pAttributes,
&pSinkWriter
);
hr = pSourceReader->GetCurrentMediaType(
MF_SOURCE_READER_FIRST_AUDIO_STREAM,
&pMediaType);
hr = MFCreateMediaType(&pMediaTypeOut);
hr = pMediaTypeOut->SetGUID(
MF_MT_MAJOR_TYPE,
MFMediaType_Audio
);
hr = pMediaTypeOut->SetGUID(
MF_MT_SUBTYPE,
MFAudioFormat_FLAC
);
hr = pMediaTypeOut->SetUINT32(
MF_MT_AUDIO_SAMPLES_PER_SECOND,
48000
);
hr = pMediaTypeOut->SetUINT32(
MF_MT_AUDIO_NUM_CHANNELS,
1
);
hr = pMediaTypeOut->SetUINT32(
MF_MT_AUDIO_BITS_PER_SAMPLE,
32
);
hr = pMediaTypeOut->SetUINT32(
MF_MT_AUDIO_AVG_BYTES_PER_SECOND,
(((32 + 7) / 8) * 1) * 48000
);
hr = pMediaTypeOut->SetUINT32(
MF_MT_AUDIO_BLOCK_ALIGNMENT,
((32 + 7) / 8) * 1
);
DWORD nWriterStreamIndex = -1;
hr = pSinkWriter->AddStream(pMediaTypeOut, &nWriterStreamIndex);
hr = pSinkWriter->BeginWriting();
_com_error err(hr);
LPCTSTR errMsg = err.ErrorMessage();
for (;;)
{
DWORD nStreamIndex, nStreamFlags;
LONGLONG nTime;
IMFSample *pSample;
hr = pSourceReader->ReadSample(
MF_SOURCE_READER_FIRST_AUDIO_STREAM,
0,
&nStreamIndex,
&nStreamFlags,
&nTime,
&pSample);
if (pSample)
{
OutputDebugString(L"Write sample...\n");
hr = pSinkWriter->WriteSample(
nWriterStreamIndex,
pSample
);
}
if (nStreamFlags & MF_SOURCE_READERF_ENDOFSTREAM)
{
break;
}
}
hr = pSinkWriter->Finalize();
return 0;
}
--------
Update 3
I added the solution as answer.
--------
Initialize SinkWriter
HRESULT SinkWriter::InitializeSinkWriter(IMFSinkWriter **ppWriter, DWORD *pStreamIndex, DWORD *pAudioStreamIndex, LPCWSTR filename)
{
*ppWriter = NULL;
*pStreamIndex = NULL;
*pAudioStreamIndex = NULL;
IMFSinkWriter *pSinkWriter = NULL;
// Attributes
IMFAttributes *pAttributes;
HRESULT hr = S_OK;
DX::ThrowIfFailed(
MFCreateAttributes(
&pAttributes,
3
)
);
#if defined(ENABLE_HW_ACCELERATION)
CComPtr<ID3D11Device> device;
D3D_FEATURE_LEVEL levels[] = { D3D_FEATURE_LEVEL_11_1, D3D_FEATURE_LEVEL_11_0 };
#if defined(ENABLE_HW_DRIVER)
DX::ThrowIfFailed(
D3D11CreateDevice(
nullptr,
D3D_DRIVER_TYPE_HARDWARE,
nullptr,
(0 * D3D11_CREATE_DEVICE_SINGLETHREADED) | D3D11_CREATE_DEVICE_VIDEO_SUPPORT,
levels,
ARRAYSIZE(levels),
D3D11_SDK_VERSION,
&device,
nullptr,
nullptr
)
);
const CComQIPtr<ID3D10Multithread> pMultithread = device;
pMultithread->SetMultithreadProtected(TRUE);
#else
DX::ThrowIfFailed(
D3D11CreateDevice(
nullptr,
D3D_DRIVER_TYPE_NULL,
nullptr,
D3D11_CREATE_DEVICE_SINGLETHREADED,
levels,
ARRAYSIZE(levels),
D3D11_SDK_VERSION,
&device,
nullptr,
nullptr)
);
#endif
UINT token;
CComPtr<IMFDXGIDeviceManager> pManager;
DX::ThrowIfFailed(
MFCreateDXGIDeviceManager(
&token,
&pManager
)
);
DX::ThrowIfFailed(
pManager->ResetDevice(
device,
token
)
);
DX::ThrowIfFailed(
pAttributes->SetUnknown(
MF_SOURCE_READER_D3D_MANAGER,
pManager
)
);
DX::ThrowIfFailed(
pAttributes->SetUINT32(
MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS,
TRUE
)
);
#if (WINVER >= 0x0602)
DX::ThrowIfFailed(
pAttributes->SetUINT32(
MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING,
TRUE
)
);
#endif
#else
DX::ThrowIfFailed(
pAttributes->SetUINT32(
MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS,
TRUE
)
);
DX::ThrowIfFailed(
pAttributes->SetUINT32(
MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING,
TRUE
)
);
#endif
DX::ThrowIfFailed(
MFCreateSinkWriterFromURL(
filename,
NULL,
pAttributes,
&pSinkWriter
)
);
if (m_vFormat != VideoFormat::SWFV_NONE)
{
DX::ThrowIfFailed(
InitializeVideoCodec(
pSinkWriter,
pStreamIndex
)
);
}
if (m_audFormat != AudioFormat::SWAF_NONE)
{
DX::ThrowIfFailed(
InitializeAudioCodec(
pSinkWriter,
pAudioStreamIndex
)
);
}
// Tell the sink writer to start accepting data.
DX::ThrowIfFailed(
pSinkWriter->BeginWriting()
);
// Return the pointer to the caller.
*ppWriter = pSinkWriter;
(*ppWriter)->AddRef();
SAFE_RELEASE(pSinkWriter);
return hr;
}
Initialize Audio Codec
HRESULT SinkWriter::InitializeAudioCodec(IMFSinkWriter *pSinkWriter, DWORD *pStreamIndex)
{
// Audio media types
IMFMediaType *pAudioTypeOut = NULL;
IMFMediaType *pAudioTypeIn = NULL;
DWORD audioStreamIndex;
HRESULT hr = S_OK;
// Set the output audio type.
DX::ThrowIfFailed(
MFCreateMediaType(
&pAudioTypeOut
)
);
DX::ThrowIfFailed(
pAudioTypeOut->SetGUID(
MF_MT_MAJOR_TYPE,
MFMediaType_Audio
)
);
DX::ThrowIfFailed(
pAudioTypeOut->SetGUID(
MF_MT_SUBTYPE,
AUDIO_SUBTYPE
)
);
DX::ThrowIfFailed(
pSinkWriter->AddStream(
pAudioTypeOut,
&audioStreamIndex
)
);
// Set the input audio type
DX::ThrowIfFailed(
MFCreateMediaType(
&pAudioTypeIn
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetGUID(
MF_MT_MAJOR_TYPE,
AUDIO_MAJOR_TYPE
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetGUID(
MF_MT_SUBTYPE,
MFAudioFormat_PCM
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetUINT32(
MF_MT_AUDIO_NUM_CHANNELS,
AUDIO_NUM_CHANNELS
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetUINT32(
MF_MT_AUDIO_BITS_PER_SAMPLE,
AUDIO_BITS_PER_SAMPLE
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetUINT32(
MF_MT_AUDIO_BLOCK_ALIGNMENT,
AUDIO_BLOCK_ALIGNMENT
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetUINT32(
MF_MT_AUDIO_SAMPLES_PER_SECOND,
AUDIO_SAMPLES_PER_SECOND
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetUINT32(
MF_MT_AUDIO_AVG_BYTES_PER_SECOND,
AUDIO_AVG_BYTES_PER_SECOND
)
);
DX::ThrowIfFailed(
pSinkWriter->SetInputMediaType(
audioStreamIndex,
pAudioTypeIn,
NULL
)
);
*pStreamIndex = audioStreamIndex;
SAFE_RELEASE(pAudioTypeOut);
SAFE_RELEASE(pAudioTypeIn);
return hr;
}
Push audio data
HRESULT SinkWriter::PushAudio(UINT32* data)
{
HRESULT hr = S_FALSE;
if (m_isInitializing)
{
return hr;
}
IMFSample *pSample = NULL;
IMFMediaBuffer *pBuffer = NULL;
BYTE *pMem = NULL;
size_t cbBuffer = m_bufferLength * sizeof(short);
// Create a new memory buffer.
hr = MFCreateMemoryBuffer(cbBuffer, &pBuffer);
// Lock the buffer and copy the audio frame to the buffer.
if (SUCCEEDED(hr))
{
hr = pBuffer->Lock(&pMem, NULL, NULL);
}
if (SUCCEEDED(hr))
{
CopyMemory(pMem, data, cbBuffer);
}
if (pBuffer)
{
pBuffer->Unlock();
}
if (m_vFormat == VideoFormat::SWFV_NONE && m_audFormat == AudioFormat::SWAF_WAV)
{
DWORD cbWritten = 0;
if (SUCCEEDED(hr))
{
hr = m_pByteStream->Write(pMem, cbBuffer, &cbWritten);
}
if (SUCCEEDED(hr))
{
m_cbWrittenByteStream += cbWritten;
}
}
else
{
// Set the data length of the buffer.
if (SUCCEEDED(hr))
{
hr = pBuffer->SetCurrentLength(cbBuffer);
}
// Create media sample and add the buffer to the sample.
if (SUCCEEDED(hr))
{
hr = MFCreateSample(&pSample);
}
if (SUCCEEDED(hr))
{
hr = pSample->AddBuffer(pBuffer);
}
// Set the timestamp and the duration.
if (SUCCEEDED(hr))
{
hr = pSample->SetSampleTime(m_cbRtStartVideo);
}
if (SUCCEEDED(hr))
{
hr = pSample->SetSampleDuration(m_cbRtDurationVideo);
}
// Send the sample to the Sink Writer
if (SUCCEEDED(hr))
{
hr = m_pSinkWriter->WriteSample(m_audioStreamIndex, pSample);
}
/*if (SUCCEEDED(hr))
{
m_cbRtStartAudio += m_cbRtDurationAudio;
}*/
SAFE_RELEASE(pSample);
SAFE_RELEASE(pBuffer);
}
return hr;
}
So, Microsoft introduced a FLAC Media Foundation Transform (MFT) Encoder CLSID_CMSFLACEncMFT in Windows 10, but the codec remains undocumented at the moment.
Supported Media Formats in Media Foundation is similarly out of date and does not reflect presence of recent additions.
I am not aware of any comment on this, and my opinion is that the codec is added for internal use but the implementation is merely a standard Media Foundation components without licensing restrictions, so the codecs are unrestricted too by, for example, field of use limitations.
This stock codec seems to be limited to 8, 16 and 24 bit PCM input options (that is, not 32 bits/sample - you need to resample respectively). The codec is capable to accept up to 8 channels and flexible samples per second rate (48828 Hz is okay).
Even though the codec (transform) seems to be working, if you want to produce a file, you also need a suitable container format (multiplexer) which is compatible with MFAudioFormat_FLAC (the identifier has 7 results on Google Search at the moment of the post, which basically means noone is even aware of the codec). Outdated documentation does not reflect actual support for FLAC in stock media sinks.
I borrowed a custom media sink that writes a raw MFT output payload into a file, and such FLAC output is playable as the FLAC frames contain necessary information to parse the bitstream for playback.
For the reference, the file itself is: 20180224-175524.flac.
An obvious candidate among stock media sinks WAVE Media Sink is unable to accept FLAC input. Nevertheless it potentially could, the implementation is presumably limited to simpler audio formats.
AVI media sink might possibly take FLAC audio, but it seems to be impossible to create an audio only AVI.
Among other media sink there is however a media sink which can process FLAC: MPEG-4 File Sink. Again, despite the outdated documentation, the media sink takes FLAC input, so you should be able to create .MP4 files with FLAC audio track.
Sample file: 20180224-184012.mp4. "FLAC (framed)"
To sum it up:
FLAC encoder MFT is present in Windows 10 and is available for use; lacks proper documentation though
One needs to take care of conversion of input to compatible format (no direct support for 32-bit PCM)
It is possible to manage MFT directly and consume MFT output, then obtain FLAC bitstream
Alternatively, it is possible to use stock MP4 media sink to produce output with FLAC audio track
Alternatively, it is possible to develop a custom media sink and consume FLAC bitstream from upstream encoder connection
Potentially, the codec is compatible with Transcode API, however the restrictions above apply. The container type needs to be MFTranscodeContainerType_MPEG4 in particular.
The codec is apparently compatible with Media Session API, presumably it is good for use with Sink Writer API either.
In your code as you attempt to use Sink Writer API you should similarly either have MP4 output with input possibly converted to compatible format in your code (compatible PCM or compatible FLAC with encoder MFT managed on your side). Knowing that MP4 media sink overall is capable to create FLAC audio track you should be able to debug fine details in your code and fit the components to work together.
Finally I was able to solve the problem. It wasn't that hard to be honest. But that is always the case if you know how to achieve something ;).
I created a copy and paste example below to give an idea how to implement FLAC encoding with Microsoft Media Foundation.
The missing piece of the puzzle was the MFTranscodeGetAudioOutputAvailableTypes. This function lists all the available output formats from an audio encoder.
If you are not sure what MFTs are supported by the operation system you can call MFTEnumEx function first. This gives you a list of all the available MFTs. In my case with windows 10 there's the FLAC MFT that is defined like this.
Name: Microsoft FLAC Audio Encoder MFT
Input Types: 1 items:
Audio-PCM
Class identifier: 128509e9-c44e-45dc-95e9-c255b8f466a6
Output Types: 1 items:
Audio-0000f1ac-0000-0010-8000-00aa00389b71
Transform Flags: 1
Transform Category: Audio Encoder
So the next thing I did was to create the source reader and get the current media type. The important values for me are sample rate, bit rate and channels.
Then I created a GetOutputMediaTypes function that needs the requested audio format, sample rate, bit rate, channels and a reference to the IMFMediaType.
The MFTranscodeGetAudioOutputAvailableTypes function returns all available types for the MFAudioFormat_flac GUID.
After getting the count of the available media types with hr = pAvailableTypes->GetElementCount(&dwMTCount); I am able to iterate through them and check if a type is supporting my request. If that's the case I return the media type.
The last part is the easiest one.
First add the output media type to the sinkwriter to get the stream index.
DWORD dwWriterStreamIndex = -1;
// Add the stream
hr = pSinkWriter->AddStream(
pOuputMediaType,
&dwWriterStreamIndex
);
Then set the input type and call pSinkWriter->BeginWriting(); so the sinkwriter starts to accepting data.
// Set input media type
hr = pSinkWriter->SetInputMediaType(
dwWriterStreamIndex,
pInputType,
NULL
);
// Tell the sink writer to accept data
hr = pSinkWriter->BeginWriting();
If the output and input media type is correctly set, BeginWriting should return 0 as HRESULT.
We should get no error because we are using the media type the function MFTranscodeGetAudioOutputAvailableTypes is providing.
The last step is to read all samples from the source reader and write it through the sinkwriter into the flac container.
Done :)
I hope I could help with this answer.
Also thanks to Roman R.
Update
This sample is only working with Audio-PCM formats from 4 bits to 24 bits. If you want to encode an 32 Bit Audio-PCM you have to resample it first and then encode it.
--------
Here's the minimal example app.
#include <windows.h>
#include <windowsx.h>
#include <atlstr.h>
#include <comdef.h>
#include <exception>
#include <mfapi.h>
#include <mfplay.h>
#include <mfreadwrite.h>
#include <mmdeviceapi.h>
#include <Audioclient.h>
#include <mferror.h>
#include <Wmcodecdsp.h>
#pragma comment(lib, "mf.lib")
#pragma comment(lib, "mfplat.lib")
#pragma comment(lib, "mfplay.lib")
#pragma comment(lib, "mfreadwrite.lib")
#pragma comment(lib, "mfuuid.lib")
#pragma comment(lib, "wmcodecdspuuid")
inline void ThrowIfFailed(HRESULT hr)
{
if (FAILED(hr))
{
// Get the error message
_com_error err(hr);
LPCTSTR errMsg = err.ErrorMessage();
OutputDebugString(L"################################## ERROR ##################################\n");
OutputDebugString(errMsg);
OutputDebugString(L"\n################################## ----- ##################################\n");
CStringA sb(errMsg);
// Set a breakpoint on this line to catch DirectX API errors
throw std::exception(sb);
}
}
template <class T> void SafeRelease(T **ppT)
{
if (*ppT)
{
(*ppT)->Release();
*ppT = nullptr;
}
}
using namespace System;
HRESULT GetOutputMediaTypes(
GUID cAudioFormat,
UINT32 cSampleRate,
UINT32 cBitPerSample,
UINT32 cChannels,
IMFMediaType **ppType
)
{
// Enumerate all codecs except for codecs with field-of-use restrictions.
// Sort the results.
DWORD dwFlags =
(MFT_ENUM_FLAG_ALL & (~MFT_ENUM_FLAG_FIELDOFUSE)) |
MFT_ENUM_FLAG_SORTANDFILTER;
IMFCollection *pAvailableTypes = NULL; // List of audio media types.
IMFMediaType *pAudioType = NULL; // Corresponding codec.
HRESULT hr = MFTranscodeGetAudioOutputAvailableTypes(
cAudioFormat,
dwFlags,
NULL,
&pAvailableTypes
);
// Get the element count.
DWORD dwMTCount;
hr = pAvailableTypes->GetElementCount(&dwMTCount);
// Iterate through the results and check for the corresponding codec.
for (DWORD i = 0; i < dwMTCount; i++)
{
hr = pAvailableTypes->GetElement(i, (IUnknown**)&pAudioType);
GUID majorType;
hr = pAudioType->GetMajorType(&majorType);
GUID subType;
hr = pAudioType->GetGUID(MF_MT_SUBTYPE, &subType);
if (majorType != MFMediaType_Audio || subType != MFAudioFormat_FLAC)
{
continue;
}
UINT32 sampleRate = NULL;
hr = pAudioType->GetUINT32(
MF_MT_AUDIO_SAMPLES_PER_SECOND,
&sampleRate
);
UINT32 bitRate = NULL;
hr = pAudioType->GetUINT32(
MF_MT_AUDIO_BITS_PER_SAMPLE,
&bitRate
);
UINT32 channels = NULL;
hr = pAudioType->GetUINT32(
MF_MT_AUDIO_NUM_CHANNELS,
&channels
);
if (sampleRate == cSampleRate
&& bitRate == cBitPerSample
&& channels == cChannels)
{
// Found the codec.
// Jump out!
break;
}
}
// Add the media type to the caller
*ppType = pAudioType;
(*ppType)->AddRef();
SafeRelease(&pAudioType);
return hr;
}
int main(array<System::String ^> ^args)
{
HRESULT hr = S_OK;
// Initialize com interface
ThrowIfFailed(
CoInitializeEx(0, COINIT_MULTITHREADED)
);
// Start media foundation
ThrowIfFailed(
MFStartup(MF_VERSION)
);
IMFMediaType *pInputType = NULL;
IMFSourceReader *pSourceReader = NULL;
IMFMediaType *pOuputMediaType = NULL;
IMFSinkWriter *pSinkWriter = NULL;
// Create source reader
hr = MFCreateSourceReaderFromURL(
L"C:\\Temp\\48khz24bit.wav",
NULL,
&pSourceReader
);
// Create sink writer
hr = MFCreateSinkWriterFromURL(
L"C:\\Temp\\foo.flac",
NULL,
NULL,
&pSinkWriter
);
// Get media type from source reader
hr = pSourceReader->GetCurrentMediaType(
MF_SOURCE_READER_FIRST_AUDIO_STREAM,
&pInputType
);
// Get sample rate, bit rate and channels
UINT32 sampleRate = NULL;
hr = pInputType->GetUINT32(
MF_MT_AUDIO_SAMPLES_PER_SECOND,
&sampleRate
);
UINT32 bitRate = NULL;
hr = pInputType->GetUINT32(
MF_MT_AUDIO_BITS_PER_SAMPLE,
&bitRate
);
UINT32 channels = NULL;
hr = pInputType->GetUINT32(
MF_MT_AUDIO_NUM_CHANNELS,
&channels
);
// Try to find a media type that is fitting.
hr = GetOutputMediaTypes(
MFAudioFormat_FLAC,
sampleRate,
bitRate,
channels,
&pOuputMediaType);
DWORD dwWriterStreamIndex = -1;
// Add the stream
hr = pSinkWriter->AddStream(
pOuputMediaType,
&dwWriterStreamIndex
);
// Set input media type
hr = pSinkWriter->SetInputMediaType(
dwWriterStreamIndex,
pInputType,
NULL
);
// Tell the sink writer to accept data
hr = pSinkWriter->BeginWriting();
// Forever alone loop
for (;;)
{
DWORD nStreamIndex, nStreamFlags;
LONGLONG nTime;
IMFSample *pSample;
// Read through the samples until...
hr = pSourceReader->ReadSample(
MF_SOURCE_READER_FIRST_AUDIO_STREAM,
0,
&nStreamIndex,
&nStreamFlags,
&nTime,
&pSample);
if (pSample)
{
OutputDebugString(L"Write sample...\n");
hr = pSinkWriter->WriteSample(
dwWriterStreamIndex,
pSample
);
}
// ... we are at the end of the stream...
if (nStreamFlags & MF_SOURCE_READERF_ENDOFSTREAM)
{
// ... and jump out.
break;
}
}
// Call finalize to finish writing.
hr = pSinkWriter->Finalize();
// Done :D
return 0;
}

Windows Media Foundation recording audio

I'm using the windows media foundation api to enumerate both my microphones and available cameras, which both work.
Here is my enumeration code:
class deviceInput {
public:
deviceInput( REFGUID source );
~deviceInput();
int listDevices(bool refresh = false);
IMFActivate *getDevice(unsigned int deviceId);
const WCHAR *getDeviceName(unsigned int deviceId);
private:
void Clear();
HRESULT EnumerateDevices();
UINT32 m_count;
IMFActivate **m_devices;
REFGUID m_source;
};
deviceInput::deviceInput( REFGUID source )
: m_devices( NULL )
, m_count( 0 )
, m_source( source )
{ }
deviceInput::~deviceInput()
{
Clear();
}
int deviceInput::listDevices(bool refresh)
{
if ( refresh || !m_devices ) {
if ( FAILED(this->EnumerateDevices()) ) return -1;
}
return m_count;
}
IMFActivate *deviceInput::getDevice(unsigned int deviceId)
{
if ( deviceId >= m_count ) return NULL;
IMFActivate *device = m_devices[deviceId];
device->AddRef();
return device;
}
const WCHAR *deviceInput::getDeviceName(unsigned int deviceId)
{
if ( deviceId >= m_count ) return NULL;
HRESULT hr = S_OK;
WCHAR *devName = NULL;
UINT32 length;
hr = m_devices[deviceId]->GetAllocatedString( MF_DEVSOURCE_ATTRIBUTE_FRIENDLY_NAME, &devName, &length );
if ( FAILED(hr) ) return NULL;
return devName;
}
void deviceInput::Clear()
{
if ( m_devices ) {
for (UINT32 i = 0; i < m_count; i++) SafeRelease( &m_devices[i] );
CoTaskMemFree( m_devices );
}
m_devices = NULL;
m_count = 0;
}
HRESULT deviceInput::EnumerateDevices()
{
HRESULT hr = S_OK;
IMFAttributes *pAttributes = NULL;
Clear();
hr = MFCreateAttributes(&pAttributes, 1);
if ( SUCCEEDED(hr) ) hr = pAttributes->SetGUID( MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE, m_source );
if ( SUCCEEDED(hr) ) hr = MFEnumDeviceSources( pAttributes, &m_devices, &m_count );
SafeRelease( &pAttributes );
return hr;
}
To grab audio or camera capture devices, I specify either MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_AUDCAP_GUID or MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_GUID and that works no problem, and I can grab the names of the devices, as well as the IMFActivate. I have code to record the webcam to an output video file, however, I'm having a tough time figuring out how to record the audio to a file. I'm under the impression that I need to use an IMFSinkWriter, but I can't find any examples that use an audio capture IMFActivate and IMFSinkWriter.
I'm not much of a windows api programmer, so I'm sure there's a fairly straight forward answer, but COM stuff is just a bit over my head. As far as audio format, I don't really care, as long as it gets into a file - can be wav, wma, or whatever. Even though I'm recording video, I need the video and audio files separate, so I can't just figure out how to add the audio into my video encoding.
I apologize for the late response, and I hope you can still find this valuable. I recently completed a project similar to yours (recording webcam video along with a selected microphone to a single video file with audio). The key is to creating an aggregate media source.
// http://msdn.microsoft.com/en-us/library/windows/desktop/dd388085(v=vs.85).aspx
HRESULT CreateAggregateMediaSource(IMFMediaSource *videoSource,
IMFMediaSource *audioSource,
IMFMediaSource **aggregateSource)
{
*aggregateSource = nullptr;
IMFCollection *pCollection = nullptr;
HRESULT hr = ::MFCreateCollection(&pCollection);
if (S_OK == hr)
hr = pCollection->AddElement(videoSource);
if (S_OK == hr)
hr = pCollection->AddElement(audioSource);
if (S_OK == hr)
hr = ::MFCreateAggregateSource(pCollection, aggregateSource);
SafeRelease(&pCollection);
return hr;
}
When configuring the sink writer, you will add 2 streams (one for audio and one for video).
Of course, you will also configure the writer correctly for the input stream types.
HRESULT hr = S_OK;
IMFMediaType *videoInputType = nullptr;
IMFMediaType *videoOutputType = nullptr;
DWORD videoOutStreamIndex = 0u;
DWORD audioOutStreamIndex = 0u;
IMFSinkWriter *writer = nullptr;
// [other create and configure writer]
if (S_OK == hr))
hr = writer->AddStream(videoOutputType, &videoOutStreamIndex);
// [more configuration code]
if (S_OK == hr)
hr = writer->AddStream(audioOutputType, &audioOutStreamIndex);
Then when reading the samples, you will need to pay close attention to the reader streamIndex, and sending them to the writer appropriately. You will also need to pay close attention to the format that the codec expects. For instance, IEEE float vs PCM, etc. Good luck, and I hope it is not too late.
Did you have hard time to manage DirectShow audio capture in Record directshow audio device to file?
Capturing with Media Foundation is hardly any simpler. Not even mentioning that in general there are a lot more resources on DirectShow out there....
MSDN offers you a WavSink Sample that implements audio capture into file:
Shows how to implement a custom media sink in Microsoft Media Foundation. The sample implements an archive sink that writes uncompressed PCM audio to a .wav file.
I am not sure why they decided to not make this a standard component. Having Media Foundation inferior to DirectShow in many ways, they could at least make this small thing as an advantage. Anyway, you have the sample and it looks like a good start.