capturing bluetooth audio data with WASAPI - c++

I'm working on a small project where I need to mix incoming data from my internal microphone and a bluetooth headset (Bose). I thought I'd use WASAPI for it. However, whilst I can easily read out my internal microphone, this is not the case for my Bluetooth headphones.
I followed the example given by the Docs almost to the letter, however I made a small change to be able to choose my own "input device" (internal microphone or headset) using th ID they have been given.
Tho method is the following:
HRESULT RecordAudioStreamBLE(MyAudioSink *pMySink, LPWSTR pwszID)
{
HRESULT hr;
REFERENCE_TIME hnsRequestedDuration = REFTIMES_PER_SEC;
REFERENCE_TIME hnsActualDuration;
UINT32 bufferFrameCount;
UINT32 numFramesAvailable;
IMMDeviceEnumerator *pEnumerator = NULL;
IMMDevice *pDevice = NULL;
IAudioClient *pAudioClient = NULL;
IAudioCaptureClient *pCaptureClient = NULL;
WAVEFORMATEX *pwfx = NULL;
UINT32 packetLength = 0;
BOOL bDone = FALSE;
BYTE *pData;
DWORD flags;
hr = CoInitialize(0);
hr = CoCreateInstance(
CLSID_MMDeviceEnumerator, NULL,
CLSCTX_ALL, IID_IMMDeviceEnumerator,
(void**)&pEnumerator);
EXIT_ON_ERROR(hr)
hr = pEnumerator->GetDevice(pwszID, &pDevice);
EXIT_ON_ERROR(hr)
hr = pDevice->Activate(IID_IAudioClient, CLSCTX_ALL,
NULL, (void**)&pAudioClient);
EXIT_ON_ERROR(hr)
hr = pAudioClient->GetMixFormat(&pwfx);
EXIT_ON_ERROR(hr)
hr = pAudioClient->Initialize(AUDCLNT_SHAREMODE_SHARED,
0, hnsRequestedDuration,
0, pwfx, NULL);
EXIT_ON_ERROR(hr)
// Get the size of the allocated buffer.
hr = pAudioClient->GetBufferSize(&bufferFrameCount);
EXIT_ON_ERROR(hr)
hr = pAudioClient->GetService(IID_IAudioCaptureClient,
(void**)&pCaptureClient);
EXIT_ON_ERROR(hr)
// Notify the audio sink which format to use.
hr = pMySink->SetFormat(pwfx);
EXIT_ON_ERROR(hr)
// Calculate the actual duration of the allocated buffer.
hnsActualDuration = (double)REFTIMES_PER_SEC * bufferFrameCount / pwfx->nSamplesPerSec;
hr = pAudioClient->Start(); // Start recording.
EXIT_ON_ERROR(hr)
// Each loop fills about half of the shared buffer.
while (bDone == FALSE)
{
// Sleep for half the buffer duration.
Sleep(hnsActualDuration/REFTIMES_PER_MILLISEC/2);
hr = pCaptureClient->GetNextPacketSize(&packetLength);
EXIT_ON_ERROR(hr)
printf("packet size = %d\n", packetLength);
while (packetLength != 0)
{
// Get the available data in the shared buffer.
hr = pCaptureClient->GetBuffer( &pData,
&numFramesAvailable,
&flags, NULL, NULL);
EXIT_ON_ERROR(hr)
if (flags & AUDCLNT_BUFFERFLAGS_SILENT)
{
pData = NULL; // Tell CopyData to write silence.
}
// Copy the available capture data to the audio sink.
hr = pMySink->CopyData(pData, numFramesAvailable, &bDone);
EXIT_ON_ERROR(hr)
hr = pCaptureClient->ReleaseBuffer(numFramesAvailable);
EXIT_ON_ERROR(hr)
hr = pCaptureClient->GetNextPacketSize(&packetLength);
EXIT_ON_ERROR(hr)
}
}
hr = pAudioClient->Stop(); // Stop recording.
EXIT_ON_ERROR(hr)
Exit:
printf("%s\n", hr);
CoTaskMemFree(pwfx);
SAFE_RELEASE(pEnumerator)
SAFE_RELEASE(pDevice)
SAFE_RELEASE(pAudioClient)
SAFE_RELEASE(pCaptureClient)
return hr;
}
When doing this with the ID for the internal microphone, everything works normally. However, if the ID of the bluetooth device is used I get the following output: (I first list all the active devices and choose the ID in the terminal:
Endpoint 0: "S24D330 (Intel(R) Display Audio)" ({0.0.0.00000000}.{0f483f83-6e29-482e-94b5-fb9cc257a03d})
Endpoint 1: "Hoofdtelefoon (LE-My headphone Hands-Free AG Audio)" ({0.0.0.00000000}.{75c8e0c3-2e44-4538-940d-f8c2ae6424ca})
Endpoint 2: "Hoofdtelefoon (My headphone Stereo)" ({0.0.0.00000000}.{849b4cc2-ed72-4d40-9725-d1ce6b4abfa0})
Endpoint 3: "Luidsprekers (2- High Definition Audio Device)" ({0.0.0.00000000}.{cb8d7625-257e-4fd0-84b8-26de6aeb1e1b})
Endpoint 4: "Microfoon (2- High Definition Audio Device)" ({0.0.1.00000000}.{5edec961-7a46-4554-bdcd-43fb7d9a9d9a})
Endpoint 5: "Hoofdtelefoon (LE-My headphone Hands-Free AG Audio)" ({0.0.1.00000000}.{f937a0fa-1475-495b-81be-7aec0c1c7ea5})
Which input would you like?5
samples per second 16000
packet size = 0
packet size = 0
packet size = 0
packet size = 0
packet size = 0
packet size = 0
packet size = 0
packet size = 0
packet size = 0
^C
The packet size is printed in the loop in the method shown above. Shows that the packet size each time is just 0.
Does anybody know how to fix this, and just get "regular" data out of it?
Do I maybe need to use a different API? Speed is key however.
With kind regards

Many Bluetooth headsets support both the A2DP profile for stereo playback and the Hands-Free protocol for bidirectional mono communication, but not at the same time.
If one of these protocols is active, the other will stall.
I suspect the problem in your case is that something is playing to the A2DP endpoint "Hoofdtelefoon (My headphone Stereo)" and that is causing all activity on the two HF endpoints to stall.

Related

How do I encode raw 48khz/32bits PCM to FLAC using Microsoft Media Foundation?

I created a SinkWriter that is able to encode video and audio using Microsoft's Media Foundation Platform.
Video is working fine so far but I have some troubles with audio only.
My PCM source has a sample rate of 48828hz, 32 bits per sample and is mono.
Everything is working well so far except for FLAC.
For instance the MP3 output is working more or less but has a wrong format. Regarding to MSDN (MP3 Audio Encoder) the MP3 encoder only supports 16 bits per sample as input. My PCM source as descriped above has 32 bits per sample.
However the export with MP3 is working cause the MF Platform seems like to have some kind of fallback and is using the MPEG Audio layer 1/2 (mpga) with 2 Channels, 32khz and a bitrate of 320kb/s.
Things start to get weird when I set the MF_MT_SUBTYPE to MFAudioFormat_FLAC. The export is working too but the quality of the audio is aweful. There's a lot of noise but I am able to recognize the audio. Regarding to VLC the FLAC file has a sample rate of 44,1khz, 8 bits per sample and is mono.
Does this mean the FLAC codec isn't able to work with the PCM I provide?
Has anyone had the same problem and was able to fix it?
Update
After doing some more research about this problem it seems like that my PCM Audio with a resolution of 32 Bit is too high. So currently I am trying to convert the 32 Bit PCM to 24 Bit for FLAC and 16 Bit for MP3 but with no luck so far. I keep you updated if I make some progress.
--------
Update 2
I've created a minimal example app that shows the problem I am facing.
It reads the 48khz32bit wave file and tries to encode it to flac.
When executing the hr = pSinkWriter->BeginWriting(); command I get the error 0xc00d36b4 whice means The data specified for the media type is invalid, inconsistent, or not supported by this object.
What am I doing wrong here?
#include "stdafx.h"
#include <windows.h>
#include <windowsx.h>
#include <comdef.h>
#include <mfapi.h>
#include <mfidl.h>
#include <mfreadwrite.h>
#include <Mferror.h>
#pragma comment(lib, "ole32")
#pragma comment(lib, "mfplat")
#pragma comment(lib, "mfreadwrite")
#pragma comment(lib, "mfuuid")
using namespace System;
int main(array<System::String ^> ^args)
{
HRESULT hr = CoInitializeEx(0, COINIT_MULTITHREADED);
hr = MFStartup(MF_VERSION);
IMFMediaType *pMediaType;
IMFMediaType *pMediaTypeOut;
IMFSourceReader *pSourceReader;
IMFAttributes *pAttributes;
IMFSinkWriter *pSinkWriter;
hr = MFCreateSourceReaderFromURL(
L"C:\\Temp\\48khz32bit.wav",
NULL,
&pSourceReader
);
hr = MFCreateAttributes(&pAttributes, 1);
hr = pAttributes->SetGUID(
MF_TRANSCODE_CONTAINERTYPE,
MFTranscodeContainerType_WAVE
);
hr = MFCreateSinkWriterFromURL(
L"C:\\Temp\\foo.flac",
NULL,
pAttributes,
&pSinkWriter
);
hr = pSourceReader->GetCurrentMediaType(
MF_SOURCE_READER_FIRST_AUDIO_STREAM,
&pMediaType);
hr = MFCreateMediaType(&pMediaTypeOut);
hr = pMediaTypeOut->SetGUID(
MF_MT_MAJOR_TYPE,
MFMediaType_Audio
);
hr = pMediaTypeOut->SetGUID(
MF_MT_SUBTYPE,
MFAudioFormat_FLAC
);
hr = pMediaTypeOut->SetUINT32(
MF_MT_AUDIO_SAMPLES_PER_SECOND,
48000
);
hr = pMediaTypeOut->SetUINT32(
MF_MT_AUDIO_NUM_CHANNELS,
1
);
hr = pMediaTypeOut->SetUINT32(
MF_MT_AUDIO_BITS_PER_SAMPLE,
32
);
hr = pMediaTypeOut->SetUINT32(
MF_MT_AUDIO_AVG_BYTES_PER_SECOND,
(((32 + 7) / 8) * 1) * 48000
);
hr = pMediaTypeOut->SetUINT32(
MF_MT_AUDIO_BLOCK_ALIGNMENT,
((32 + 7) / 8) * 1
);
DWORD nWriterStreamIndex = -1;
hr = pSinkWriter->AddStream(pMediaTypeOut, &nWriterStreamIndex);
hr = pSinkWriter->BeginWriting();
_com_error err(hr);
LPCTSTR errMsg = err.ErrorMessage();
for (;;)
{
DWORD nStreamIndex, nStreamFlags;
LONGLONG nTime;
IMFSample *pSample;
hr = pSourceReader->ReadSample(
MF_SOURCE_READER_FIRST_AUDIO_STREAM,
0,
&nStreamIndex,
&nStreamFlags,
&nTime,
&pSample);
if (pSample)
{
OutputDebugString(L"Write sample...\n");
hr = pSinkWriter->WriteSample(
nWriterStreamIndex,
pSample
);
}
if (nStreamFlags & MF_SOURCE_READERF_ENDOFSTREAM)
{
break;
}
}
hr = pSinkWriter->Finalize();
return 0;
}
--------
Update 3
I added the solution as answer.
--------
Initialize SinkWriter
HRESULT SinkWriter::InitializeSinkWriter(IMFSinkWriter **ppWriter, DWORD *pStreamIndex, DWORD *pAudioStreamIndex, LPCWSTR filename)
{
*ppWriter = NULL;
*pStreamIndex = NULL;
*pAudioStreamIndex = NULL;
IMFSinkWriter *pSinkWriter = NULL;
// Attributes
IMFAttributes *pAttributes;
HRESULT hr = S_OK;
DX::ThrowIfFailed(
MFCreateAttributes(
&pAttributes,
3
)
);
#if defined(ENABLE_HW_ACCELERATION)
CComPtr<ID3D11Device> device;
D3D_FEATURE_LEVEL levels[] = { D3D_FEATURE_LEVEL_11_1, D3D_FEATURE_LEVEL_11_0 };
#if defined(ENABLE_HW_DRIVER)
DX::ThrowIfFailed(
D3D11CreateDevice(
nullptr,
D3D_DRIVER_TYPE_HARDWARE,
nullptr,
(0 * D3D11_CREATE_DEVICE_SINGLETHREADED) | D3D11_CREATE_DEVICE_VIDEO_SUPPORT,
levels,
ARRAYSIZE(levels),
D3D11_SDK_VERSION,
&device,
nullptr,
nullptr
)
);
const CComQIPtr<ID3D10Multithread> pMultithread = device;
pMultithread->SetMultithreadProtected(TRUE);
#else
DX::ThrowIfFailed(
D3D11CreateDevice(
nullptr,
D3D_DRIVER_TYPE_NULL,
nullptr,
D3D11_CREATE_DEVICE_SINGLETHREADED,
levels,
ARRAYSIZE(levels),
D3D11_SDK_VERSION,
&device,
nullptr,
nullptr)
);
#endif
UINT token;
CComPtr<IMFDXGIDeviceManager> pManager;
DX::ThrowIfFailed(
MFCreateDXGIDeviceManager(
&token,
&pManager
)
);
DX::ThrowIfFailed(
pManager->ResetDevice(
device,
token
)
);
DX::ThrowIfFailed(
pAttributes->SetUnknown(
MF_SOURCE_READER_D3D_MANAGER,
pManager
)
);
DX::ThrowIfFailed(
pAttributes->SetUINT32(
MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS,
TRUE
)
);
#if (WINVER >= 0x0602)
DX::ThrowIfFailed(
pAttributes->SetUINT32(
MF_SOURCE_READER_ENABLE_ADVANCED_VIDEO_PROCESSING,
TRUE
)
);
#endif
#else
DX::ThrowIfFailed(
pAttributes->SetUINT32(
MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS,
TRUE
)
);
DX::ThrowIfFailed(
pAttributes->SetUINT32(
MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING,
TRUE
)
);
#endif
DX::ThrowIfFailed(
MFCreateSinkWriterFromURL(
filename,
NULL,
pAttributes,
&pSinkWriter
)
);
if (m_vFormat != VideoFormat::SWFV_NONE)
{
DX::ThrowIfFailed(
InitializeVideoCodec(
pSinkWriter,
pStreamIndex
)
);
}
if (m_audFormat != AudioFormat::SWAF_NONE)
{
DX::ThrowIfFailed(
InitializeAudioCodec(
pSinkWriter,
pAudioStreamIndex
)
);
}
// Tell the sink writer to start accepting data.
DX::ThrowIfFailed(
pSinkWriter->BeginWriting()
);
// Return the pointer to the caller.
*ppWriter = pSinkWriter;
(*ppWriter)->AddRef();
SAFE_RELEASE(pSinkWriter);
return hr;
}
Initialize Audio Codec
HRESULT SinkWriter::InitializeAudioCodec(IMFSinkWriter *pSinkWriter, DWORD *pStreamIndex)
{
// Audio media types
IMFMediaType *pAudioTypeOut = NULL;
IMFMediaType *pAudioTypeIn = NULL;
DWORD audioStreamIndex;
HRESULT hr = S_OK;
// Set the output audio type.
DX::ThrowIfFailed(
MFCreateMediaType(
&pAudioTypeOut
)
);
DX::ThrowIfFailed(
pAudioTypeOut->SetGUID(
MF_MT_MAJOR_TYPE,
MFMediaType_Audio
)
);
DX::ThrowIfFailed(
pAudioTypeOut->SetGUID(
MF_MT_SUBTYPE,
AUDIO_SUBTYPE
)
);
DX::ThrowIfFailed(
pSinkWriter->AddStream(
pAudioTypeOut,
&audioStreamIndex
)
);
// Set the input audio type
DX::ThrowIfFailed(
MFCreateMediaType(
&pAudioTypeIn
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetGUID(
MF_MT_MAJOR_TYPE,
AUDIO_MAJOR_TYPE
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetGUID(
MF_MT_SUBTYPE,
MFAudioFormat_PCM
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetUINT32(
MF_MT_AUDIO_NUM_CHANNELS,
AUDIO_NUM_CHANNELS
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetUINT32(
MF_MT_AUDIO_BITS_PER_SAMPLE,
AUDIO_BITS_PER_SAMPLE
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetUINT32(
MF_MT_AUDIO_BLOCK_ALIGNMENT,
AUDIO_BLOCK_ALIGNMENT
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetUINT32(
MF_MT_AUDIO_SAMPLES_PER_SECOND,
AUDIO_SAMPLES_PER_SECOND
)
);
DX::ThrowIfFailed(
pAudioTypeIn->SetUINT32(
MF_MT_AUDIO_AVG_BYTES_PER_SECOND,
AUDIO_AVG_BYTES_PER_SECOND
)
);
DX::ThrowIfFailed(
pSinkWriter->SetInputMediaType(
audioStreamIndex,
pAudioTypeIn,
NULL
)
);
*pStreamIndex = audioStreamIndex;
SAFE_RELEASE(pAudioTypeOut);
SAFE_RELEASE(pAudioTypeIn);
return hr;
}
Push audio data
HRESULT SinkWriter::PushAudio(UINT32* data)
{
HRESULT hr = S_FALSE;
if (m_isInitializing)
{
return hr;
}
IMFSample *pSample = NULL;
IMFMediaBuffer *pBuffer = NULL;
BYTE *pMem = NULL;
size_t cbBuffer = m_bufferLength * sizeof(short);
// Create a new memory buffer.
hr = MFCreateMemoryBuffer(cbBuffer, &pBuffer);
// Lock the buffer and copy the audio frame to the buffer.
if (SUCCEEDED(hr))
{
hr = pBuffer->Lock(&pMem, NULL, NULL);
}
if (SUCCEEDED(hr))
{
CopyMemory(pMem, data, cbBuffer);
}
if (pBuffer)
{
pBuffer->Unlock();
}
if (m_vFormat == VideoFormat::SWFV_NONE && m_audFormat == AudioFormat::SWAF_WAV)
{
DWORD cbWritten = 0;
if (SUCCEEDED(hr))
{
hr = m_pByteStream->Write(pMem, cbBuffer, &cbWritten);
}
if (SUCCEEDED(hr))
{
m_cbWrittenByteStream += cbWritten;
}
}
else
{
// Set the data length of the buffer.
if (SUCCEEDED(hr))
{
hr = pBuffer->SetCurrentLength(cbBuffer);
}
// Create media sample and add the buffer to the sample.
if (SUCCEEDED(hr))
{
hr = MFCreateSample(&pSample);
}
if (SUCCEEDED(hr))
{
hr = pSample->AddBuffer(pBuffer);
}
// Set the timestamp and the duration.
if (SUCCEEDED(hr))
{
hr = pSample->SetSampleTime(m_cbRtStartVideo);
}
if (SUCCEEDED(hr))
{
hr = pSample->SetSampleDuration(m_cbRtDurationVideo);
}
// Send the sample to the Sink Writer
if (SUCCEEDED(hr))
{
hr = m_pSinkWriter->WriteSample(m_audioStreamIndex, pSample);
}
/*if (SUCCEEDED(hr))
{
m_cbRtStartAudio += m_cbRtDurationAudio;
}*/
SAFE_RELEASE(pSample);
SAFE_RELEASE(pBuffer);
}
return hr;
}
So, Microsoft introduced a FLAC Media Foundation Transform (MFT) Encoder CLSID_CMSFLACEncMFT in Windows 10, but the codec remains undocumented at the moment.
Supported Media Formats in Media Foundation is similarly out of date and does not reflect presence of recent additions.
I am not aware of any comment on this, and my opinion is that the codec is added for internal use but the implementation is merely a standard Media Foundation components without licensing restrictions, so the codecs are unrestricted too by, for example, field of use limitations.
This stock codec seems to be limited to 8, 16 and 24 bit PCM input options (that is, not 32 bits/sample - you need to resample respectively). The codec is capable to accept up to 8 channels and flexible samples per second rate (48828 Hz is okay).
Even though the codec (transform) seems to be working, if you want to produce a file, you also need a suitable container format (multiplexer) which is compatible with MFAudioFormat_FLAC (the identifier has 7 results on Google Search at the moment of the post, which basically means noone is even aware of the codec). Outdated documentation does not reflect actual support for FLAC in stock media sinks.
I borrowed a custom media sink that writes a raw MFT output payload into a file, and such FLAC output is playable as the FLAC frames contain necessary information to parse the bitstream for playback.
For the reference, the file itself is: 20180224-175524.flac.
An obvious candidate among stock media sinks WAVE Media Sink is unable to accept FLAC input. Nevertheless it potentially could, the implementation is presumably limited to simpler audio formats.
AVI media sink might possibly take FLAC audio, but it seems to be impossible to create an audio only AVI.
Among other media sink there is however a media sink which can process FLAC: MPEG-4 File Sink. Again, despite the outdated documentation, the media sink takes FLAC input, so you should be able to create .MP4 files with FLAC audio track.
Sample file: 20180224-184012.mp4. "FLAC (framed)"
To sum it up:
FLAC encoder MFT is present in Windows 10 and is available for use; lacks proper documentation though
One needs to take care of conversion of input to compatible format (no direct support for 32-bit PCM)
It is possible to manage MFT directly and consume MFT output, then obtain FLAC bitstream
Alternatively, it is possible to use stock MP4 media sink to produce output with FLAC audio track
Alternatively, it is possible to develop a custom media sink and consume FLAC bitstream from upstream encoder connection
Potentially, the codec is compatible with Transcode API, however the restrictions above apply. The container type needs to be MFTranscodeContainerType_MPEG4 in particular.
The codec is apparently compatible with Media Session API, presumably it is good for use with Sink Writer API either.
In your code as you attempt to use Sink Writer API you should similarly either have MP4 output with input possibly converted to compatible format in your code (compatible PCM or compatible FLAC with encoder MFT managed on your side). Knowing that MP4 media sink overall is capable to create FLAC audio track you should be able to debug fine details in your code and fit the components to work together.
Finally I was able to solve the problem. It wasn't that hard to be honest. But that is always the case if you know how to achieve something ;).
I created a copy and paste example below to give an idea how to implement FLAC encoding with Microsoft Media Foundation.
The missing piece of the puzzle was the MFTranscodeGetAudioOutputAvailableTypes. This function lists all the available output formats from an audio encoder.
If you are not sure what MFTs are supported by the operation system you can call MFTEnumEx function first. This gives you a list of all the available MFTs. In my case with windows 10 there's the FLAC MFT that is defined like this.
Name: Microsoft FLAC Audio Encoder MFT
Input Types: 1 items:
Audio-PCM
Class identifier: 128509e9-c44e-45dc-95e9-c255b8f466a6
Output Types: 1 items:
Audio-0000f1ac-0000-0010-8000-00aa00389b71
Transform Flags: 1
Transform Category: Audio Encoder
So the next thing I did was to create the source reader and get the current media type. The important values for me are sample rate, bit rate and channels.
Then I created a GetOutputMediaTypes function that needs the requested audio format, sample rate, bit rate, channels and a reference to the IMFMediaType.
The MFTranscodeGetAudioOutputAvailableTypes function returns all available types for the MFAudioFormat_flac GUID.
After getting the count of the available media types with hr = pAvailableTypes->GetElementCount(&dwMTCount); I am able to iterate through them and check if a type is supporting my request. If that's the case I return the media type.
The last part is the easiest one.
First add the output media type to the sinkwriter to get the stream index.
DWORD dwWriterStreamIndex = -1;
// Add the stream
hr = pSinkWriter->AddStream(
pOuputMediaType,
&dwWriterStreamIndex
);
Then set the input type and call pSinkWriter->BeginWriting(); so the sinkwriter starts to accepting data.
// Set input media type
hr = pSinkWriter->SetInputMediaType(
dwWriterStreamIndex,
pInputType,
NULL
);
// Tell the sink writer to accept data
hr = pSinkWriter->BeginWriting();
If the output and input media type is correctly set, BeginWriting should return 0 as HRESULT.
We should get no error because we are using the media type the function MFTranscodeGetAudioOutputAvailableTypes is providing.
The last step is to read all samples from the source reader and write it through the sinkwriter into the flac container.
Done :)
I hope I could help with this answer.
Also thanks to Roman R.
Update
This sample is only working with Audio-PCM formats from 4 bits to 24 bits. If you want to encode an 32 Bit Audio-PCM you have to resample it first and then encode it.
--------
Here's the minimal example app.
#include <windows.h>
#include <windowsx.h>
#include <atlstr.h>
#include <comdef.h>
#include <exception>
#include <mfapi.h>
#include <mfplay.h>
#include <mfreadwrite.h>
#include <mmdeviceapi.h>
#include <Audioclient.h>
#include <mferror.h>
#include <Wmcodecdsp.h>
#pragma comment(lib, "mf.lib")
#pragma comment(lib, "mfplat.lib")
#pragma comment(lib, "mfplay.lib")
#pragma comment(lib, "mfreadwrite.lib")
#pragma comment(lib, "mfuuid.lib")
#pragma comment(lib, "wmcodecdspuuid")
inline void ThrowIfFailed(HRESULT hr)
{
if (FAILED(hr))
{
// Get the error message
_com_error err(hr);
LPCTSTR errMsg = err.ErrorMessage();
OutputDebugString(L"################################## ERROR ##################################\n");
OutputDebugString(errMsg);
OutputDebugString(L"\n################################## ----- ##################################\n");
CStringA sb(errMsg);
// Set a breakpoint on this line to catch DirectX API errors
throw std::exception(sb);
}
}
template <class T> void SafeRelease(T **ppT)
{
if (*ppT)
{
(*ppT)->Release();
*ppT = nullptr;
}
}
using namespace System;
HRESULT GetOutputMediaTypes(
GUID cAudioFormat,
UINT32 cSampleRate,
UINT32 cBitPerSample,
UINT32 cChannels,
IMFMediaType **ppType
)
{
// Enumerate all codecs except for codecs with field-of-use restrictions.
// Sort the results.
DWORD dwFlags =
(MFT_ENUM_FLAG_ALL & (~MFT_ENUM_FLAG_FIELDOFUSE)) |
MFT_ENUM_FLAG_SORTANDFILTER;
IMFCollection *pAvailableTypes = NULL; // List of audio media types.
IMFMediaType *pAudioType = NULL; // Corresponding codec.
HRESULT hr = MFTranscodeGetAudioOutputAvailableTypes(
cAudioFormat,
dwFlags,
NULL,
&pAvailableTypes
);
// Get the element count.
DWORD dwMTCount;
hr = pAvailableTypes->GetElementCount(&dwMTCount);
// Iterate through the results and check for the corresponding codec.
for (DWORD i = 0; i < dwMTCount; i++)
{
hr = pAvailableTypes->GetElement(i, (IUnknown**)&pAudioType);
GUID majorType;
hr = pAudioType->GetMajorType(&majorType);
GUID subType;
hr = pAudioType->GetGUID(MF_MT_SUBTYPE, &subType);
if (majorType != MFMediaType_Audio || subType != MFAudioFormat_FLAC)
{
continue;
}
UINT32 sampleRate = NULL;
hr = pAudioType->GetUINT32(
MF_MT_AUDIO_SAMPLES_PER_SECOND,
&sampleRate
);
UINT32 bitRate = NULL;
hr = pAudioType->GetUINT32(
MF_MT_AUDIO_BITS_PER_SAMPLE,
&bitRate
);
UINT32 channels = NULL;
hr = pAudioType->GetUINT32(
MF_MT_AUDIO_NUM_CHANNELS,
&channels
);
if (sampleRate == cSampleRate
&& bitRate == cBitPerSample
&& channels == cChannels)
{
// Found the codec.
// Jump out!
break;
}
}
// Add the media type to the caller
*ppType = pAudioType;
(*ppType)->AddRef();
SafeRelease(&pAudioType);
return hr;
}
int main(array<System::String ^> ^args)
{
HRESULT hr = S_OK;
// Initialize com interface
ThrowIfFailed(
CoInitializeEx(0, COINIT_MULTITHREADED)
);
// Start media foundation
ThrowIfFailed(
MFStartup(MF_VERSION)
);
IMFMediaType *pInputType = NULL;
IMFSourceReader *pSourceReader = NULL;
IMFMediaType *pOuputMediaType = NULL;
IMFSinkWriter *pSinkWriter = NULL;
// Create source reader
hr = MFCreateSourceReaderFromURL(
L"C:\\Temp\\48khz24bit.wav",
NULL,
&pSourceReader
);
// Create sink writer
hr = MFCreateSinkWriterFromURL(
L"C:\\Temp\\foo.flac",
NULL,
NULL,
&pSinkWriter
);
// Get media type from source reader
hr = pSourceReader->GetCurrentMediaType(
MF_SOURCE_READER_FIRST_AUDIO_STREAM,
&pInputType
);
// Get sample rate, bit rate and channels
UINT32 sampleRate = NULL;
hr = pInputType->GetUINT32(
MF_MT_AUDIO_SAMPLES_PER_SECOND,
&sampleRate
);
UINT32 bitRate = NULL;
hr = pInputType->GetUINT32(
MF_MT_AUDIO_BITS_PER_SAMPLE,
&bitRate
);
UINT32 channels = NULL;
hr = pInputType->GetUINT32(
MF_MT_AUDIO_NUM_CHANNELS,
&channels
);
// Try to find a media type that is fitting.
hr = GetOutputMediaTypes(
MFAudioFormat_FLAC,
sampleRate,
bitRate,
channels,
&pOuputMediaType);
DWORD dwWriterStreamIndex = -1;
// Add the stream
hr = pSinkWriter->AddStream(
pOuputMediaType,
&dwWriterStreamIndex
);
// Set input media type
hr = pSinkWriter->SetInputMediaType(
dwWriterStreamIndex,
pInputType,
NULL
);
// Tell the sink writer to accept data
hr = pSinkWriter->BeginWriting();
// Forever alone loop
for (;;)
{
DWORD nStreamIndex, nStreamFlags;
LONGLONG nTime;
IMFSample *pSample;
// Read through the samples until...
hr = pSourceReader->ReadSample(
MF_SOURCE_READER_FIRST_AUDIO_STREAM,
0,
&nStreamIndex,
&nStreamFlags,
&nTime,
&pSample);
if (pSample)
{
OutputDebugString(L"Write sample...\n");
hr = pSinkWriter->WriteSample(
dwWriterStreamIndex,
pSample
);
}
// ... we are at the end of the stream...
if (nStreamFlags & MF_SOURCE_READERF_ENDOFSTREAM)
{
// ... and jump out.
break;
}
}
// Call finalize to finish writing.
hr = pSinkWriter->Finalize();
// Done :D
return 0;
}

Playing audio through WASAPI - IAudioRenderClient and getting static

I'm trying to learn how to play audio through my default playback device using WASAPI. My test is capturing audio from my default device and sending it to the playback device to render so that I can hear myself talk through my speakers.
It appears that the data I'm getting from my capture device is working fine, but when I send it to the IAudioRenderClient buffer, sometimes I get just pure static and sometimes I just get silence.
If you think you need to see my capture device logic, just ask. I haven't posted it since it appears to me to be a playback device issue.
My devices are working properly in other apps. So it's not a device problem.
Can anyone tell me what I'm doing wrong?
Playback Device Init
#define REFTIMES_PER_SEC 48000
bool AudioManager::InitPlaybackDevice()
{
HRESULT hr;
WAVEFORMATEX* pWfx = nullptr;
BYTE* pData = nullptr;
UINT32 nNumFramesPadding = 0;
// Query the Enumerator for the default playback device
hr = m_pEnumerator->GetDefaultAudioEndpoint(eRender, eCommunications, &m_pPlaybackDevice);
EXIT_ON_ERROR_SHUTDOWN(hr);
hr = m_pPlaybackDevice->Activate(IID_IAudioClient, CLSCTX_ALL, NULL, (void**)&m_pAudioRenderClient);
EXIT_ON_ERROR_SHUTDOWN(hr);
hr = m_pAudioRenderClient->GetMixFormat(&pWfx);
EXIT_ON_ERROR_SHUTDOWN(hr);
hr = m_pAudioRenderClient->Initialize(AUDCLNT_SHAREMODE_SHARED, 0, REFTIMES_PER_SEC, 0, pWfx, NULL);
EXIT_ON_ERROR_SHUTDOWN(hr);
hr = m_pAudioRenderClient->GetBufferSize(&m_nPlaybackBufferFrameCount);
EXIT_ON_ERROR_SHUTDOWN(hr);
hr = m_pAudioRenderClient->GetService(IID_IAudioRenderClient, (void**)&m_pAudioRenderClientObj);
EXIT_ON_ERROR_SHUTDOWN(hr);
hr = m_pAudioRenderClient->GetCurrentPadding(&nNumFramesPadding);
EXIT_ON_ERROR_SHUTDOWN(hr);
hr = m_pAudioRenderClientObj->GetBuffer(m_nPlaybackBufferFrameCount - nNumFramesPadding, &pData);
EXIT_ON_ERROR_SHUTDOWN(hr);
hr = m_pAudioRenderClientObj->ReleaseBuffer(m_nPlaybackBufferFrameCount - nNumFramesPadding, AUDCLNT_BUFFERFLAGS_SILENT);
EXIT_ON_ERROR_SHUTDOWN(hr);
hr = m_pAudioRenderClient->Start();
EXIT_ON_ERROR(hr);
return true;
}
Sending audio data to the playback device
bool AudioManager::SendSampleToPlaybackDevice(BYTE * i_pData, UINT32 i_nSize)
{
if (m_pAudioRenderClientObj != nullptr)
{
HRESULT hr;
BYTE* pData = nullptr;
UINT32 nNumFramesPadding = 0;
int nNumFrames = 0;
hr = m_pAudioRenderClient->GetCurrentPadding(&nNumFramesPadding);
EXIT_ON_ERROR(hr);
nNumFrames = min(i_nSize, m_nPlaybackBufferFrameCount - nNumFramesPadding);
hr = m_pAudioRenderClientObj->GetBuffer(nNumFrames, &pData);
EXIT_ON_ERROR(hr);
if(pData != nullptr)
memcpy(pData, i_pData, nNumFrames);
hr = m_pAudioRenderClientObj->ReleaseBuffer(nNumFrames, 0);
EXIT_ON_ERROR(hr);
return true;
}
return false;
}

Directshow Application crash after exiting of main()

I am writing directshow application. Below is the code that works fine but crashes with the error message "App.exe has stopped working". The entire code is written below. Please note that I am using Windows SDK 7.0 which does not have atlbase.h and hence I cannot use CComPtr<IBaseFilter> myFilter; type of pointer declaration which is supposed to clear memory at exit.
EDIT: The application crashes only if I connect all filters explicitly. In this case, the destructor of my filter is not called. If I just connect source filter to renderer (which will internally connect my filter and a demux filter), the destructor of my filter is called and there is no crash. I have put the macro MANUAL_CONNECT over the code which causes the crash. I have removed RemoveFilter call and replaced it with Release call.
I am writing the application code here:
void main(WORD32 argc, CHAR *argv[])
{
// Initialize the COM library.
HRESULT hr = CoInitialize(NULL);
if (FAILED(hr))
{
printf("ERROR - Could not initialize COM library");
return;
}
{
IGraphBuilder *pGraph = NULL;
IFileSourceFilter *pFileSourceFilter = NULL;
IMediaControl *pControl = NULL;
IMediaEvent *pEvent = NULL;
IBaseFilter *pSource = NULL;
IBaseFilter *pVideoDecode = NULL;
IBaseFilter *pVideoRenderer = NULL;
IEnumPins *pEnumPins = NULL;
IPin *pPinIn = NULL;
IPin *pPinOut = NULL;
ULONG fetched;
PIN_INFO PinInfo;
IEnumFilters *pEnum = NULL;
BOOL stop = FALSE;
int i;
// Create the filter graph manager and query for interfaces.
hr = CoCreateInstance(CLSID_FilterGraph, NULL, CLSCTX_INPROC_SERVER, IID_IGraphBuilder, (void **)&pGraph);
// Create the filter graph manager and query for interfaces.
hr = CoCreateInstance(CLSID_AsyncReader, NULL, CLSCTX_INPROC_SERVER, IID_IBaseFilter, (void **)&pSource);
hr = pGraph->AddFilter(pSource, NULL);
hr = pSource->QueryInterface(IID_IFileSourceFilter, (void**)&pFileSourceFilter);
hr = pFileSourceFilter->Load(L"input.mp4", NULL);
// Create Ittiam HEVC Decoder instance
hr = CoCreateInstance(CLSID_ivdec, NULL, CLSCTX_INPROC_SERVER, IID_IBaseFilter, (void **)&pVideoDecode);
// Create Video Renderer instance. We have used default video renderer
hr = CoCreateInstance(CLSID_VideoRendererDefault, NULL, CLSCTX_INPROC_SERVER, IID_IBaseFilter, (void **)&pVideoRenderer);
// Add decoder filter to the filter graph
hr = pGraph->AddFilter(pVideoDecode, NULL);
// Add renderer filter to the filter graph
hr = pGraph->AddFilter(pVideoRenderer, NULL);
/**************************************************************/
/* -- Connecting source filter to demux filter starts here -- */
/**************************************************************/
// Enumerate pins of the source filter
hr = pSource->EnumPins(&pEnumPins);
hr = pEnumPins->Reset();
// Get pin of source filter. Source filter has only output pin, so no check required
hr = pEnumPins->Next(1, &pPinOut, &fetched);
hr = pEnumPins->Release();
#if MANUAL_CONNECT
// Enumerate pins of the decoder filter
hr = pVideoDecode->EnumPins(&pEnumPins);
hr = pEnumPins->Reset();
// Get pin of decoder filter. Decoder filter has 2 pins, so ensure the selected pin is input pin.
// If not, get another pin
hr = pEnumPins->Next(1, &pPinIn, &fetched);
hr = pPinIn->QueryPinInfo(&PinInfo);
if(PINDIR_OUTPUT == PinInfo.dir)
{
hr = pPinIn->Release();
hr = pEnumPins->Next(1, &pPinIn, &fetched);
}
// Connect output pin of demux filter to input pin of decoder filter
hr = pGraph->Connect(pPinOut, pPinIn);
/*************************************************************/
/* -- Connecting demux filter to decoder filter ends here -- */
/*************************************************************/
/******************************************************************/
/* -- Connecting decoder filter to renderer filter starts here -- */
/******************************************************************/
// Enumerate pins of the decoder filter
hr = pVideoDecode->EnumPins(&pEnumPins);
hr = pEnumPins->Reset();
// Get pin of decoder filter. Decoder filter has 2 pins, so ensure the selected pin is output pin.
// If not, get another pin
hr = pEnumPins->Next(1, &pPinOut, &fetched);
hr = pPinOut->QueryPinInfo(&PinInfo);
if(PINDIR_INPUT == PinInfo.dir)
{
hr = pPinOut->Release();
hr = pEnumPins->Next(1, &pPinOut, &fetched);
}
hr = pEnumPins->Release();
#endif
// Enumerate pins of the renderer filter
hr = pVideoRenderer->EnumPins(&pEnumPins);
hr = pEnumPins->Reset();
// Get pin of renderer filter. Renderer filter has only input pin, so no check required
hr = pEnumPins->Next(1, &pPinIn, &fetched);
hr = pPinIn->QueryPinInfo(&PinInfo);
// Connect output pin of decoder filter to input pin of renderer filter
hr = pGraph->Connect(pPinOut, pPinIn);
/****************************************************************/
/* -- Connecting decoder filter to renderer filter ends here -- */
/****************************************************************/
hr = pGraph->QueryInterface(IID_IMediaControl, (void **)&pControl);
hr = pGraph->QueryInterface(IID_IMediaEvent, (void **)&pEvent);
// Run the graph.
hr = pControl->Run();
if (SUCCEEDED(hr))
{
// Wait for completion.
long evCode;
pEvent->WaitForCompletion(INFINITE, &evCode);
// Note: Do not use INFINITE in a real application, because it
// can block indefinitely.
}
hr = pControl->Stop();
hr = pSource->Release();
hr = pVideoDecode->Release();
hr = pControl->Release();
hr = pEvent->Release();
hr = pGraph->Release();
}
CoUninitialize();
printf("Exiting main!!\n");
}
I have removed error checks from the post but I have all error checks in my code. I can see the Exiting main!! print but than the application crashes. Any suggestions on how to debug this? Please let me know if any information is missing. I am using Microsoft Visual C++ 2010 Express for my development.
You must terminate all COM activity (specifically: release all COM interface pointers) on the thread before calling CoUninitialize, you don't do it.
See, for example, this code and its Release calls at the _tmain bottom.
It makes sense to use more recent version of Visual Studio (2013, 2015), where free community edition already includes ATL and you can enjoy automatic COM interface reference management using CComPtr and friends. This is the first advise to those who uses raw COM interface pointers and experience issues managing them incorrectly.
See also:
Not releasing filter com object causing a crash
Unreleased DirectShow CSource filter makes program crash at process shutdown
Why does CoUninitialize cause an error on exit?
UPDATE: Using raw pointers inaccurately, you keep having leaks:
hr = pGraph->Connect(pPinOut, pPinIn);
// ...
hr = pEnumPins->Next(1, &pPinOut, &fetched);
IEnumPin::Next call overwrites pPinOut pointer and leaks a reference.

DirectShow Memory Leak

I'm experiencing an important memory leak in a camera application that uses an EVR (Enhanced Video Renderer). The leak happens when I prepare, run, stop and unprepare a graph several times (several MB every cycle, depending on the video resolution).
I simplified the code as much as I could; you can download it from here:
https://daptech.box.com/s/6csm91vhcawiw18u42kq
The graph has a video capture filter, a SampleGrabber filter and a renderer. When connecting, it creates an AVI Decompressor filter to complete the graph.
Here are some excerpts...
Declarations:
CComPtr<IGraphBuilder> pGraphBuilder;
CComPtr<IBaseFilter> pSource; // The capture device filter
CComPtr<IBaseFilter> pSampleGrabberFilter; // Filter to capture data that flows through the graph
CComPtr<IBaseFilter> pRenderer; // EVR
Graph preparation (validation removed):
hr = m_p->pGraphBuilder.CoCreateInstance(CLSID_FilterGraph, NULL, CLSCTX_INPROC);
hr = m_p->pSampleGrabberFilter.CoCreateInstance(CLSID_SampleGrabber, NULL, CLSCTX_INPROC);
CComQIPtr<ISampleGrabber, &IID_ISampleGrabber> pSampleGrabber(m_p->pSampleGrabberFilter);
hr = pSampleGrabber->SetBufferSamples(FALSE); // Don't copy image data for better performances
// Setup the callback
hr = pSampleGrabber->SetCallback(&m_p->cb, 1); // 0: SampleCB() is called, so that we directly access the buffer. 1: BufferCB() is called. The doc states that the function always makes a copy of the data.
CComPtr<IPin> pSampleGrabberInputPin( FindPin(m_p->pSampleGrabberFilter, PINDIR_INPUT, NULL) );
CComPtr<IPin> pSampleGrabberOutputPin( FindPin(m_p->pSampleGrabberFilter, PINDIR_OUTPUT, NULL) );
hr = m_p->pGraphBuilder->AddFilter(m_p->pSampleGrabberFilter, L"SampleGrabber");
hr = FindVideoCaptureDevice(m_p->pSource, (unsigned)iCameraIndex);
CComPtr<IPin> pSourceOutputPin( FindPin(m_p->pSource, PINDIR_OUTPUT, L"CAPTURE") );
hr = m_p->pGraphBuilder->AddFilter(m_p->pSource, L"Camera");
hr = m_p->pRenderer.CoCreateInstance(CLSID_EnhancedVideoRenderer, NULL, CLSCTX_INPROC);
hr = m_p->pGraphBuilder->AddFilter(m_p->pRenderer, L"Renderer");
hr = ConfigEVR(m_p->pRenderer, m_staticPreview.m_hWnd, uWidth, uHeight);
CComPtr<IPin> pRendererInputPin( FindPin(m_p->pRenderer, PINDIR_INPUT, NULL) );
// Now select the format:
if (!SelectFormat(pSourceOutputPin, uWidth, uHeight, bCompression))
{ MessageBox(L"No appropriate format found.\n"); return; }
// Connection:
hr = m_p->pGraphBuilder->Connect(pSourceOutputPin, pSampleGrabberInputPin);
hr = m_p->pGraphBuilder->Connect(pSampleGrabberOutputPin, pRendererInputPin);
This function is called to "unprepare" the graph:
void FreeDirectShowGraph()
{
if (pSource) DisconnectDownstream(pSource);
RemoveAndRelease(pRenderer);
RemoveAndRelease(pSampleGrabberFilter);
RemoveAndRelease(pSource);
if (pGraphBuilder) pGraphBuilder.Release();
}
Any clue would be appreciated!

How to render and save captured Video/Audio into a custom file/filter format in DirectShow?

Basiclly, I want to capture audio/video. Run it through a mp4 muxer and save it to a file on disk. Before I used ICaptureGraphBuilder2, but it seems unusuable when saving to custom formats.
My code so far,
I enumerate video/audio devices. In this sample I only try to capture audio. I get the correct device and use GetPin to enumerate the filters pins to get its output pin.
hr = pMoniker2->BindToObject(0, 0, IID_IBaseFilter, (void**)&pSrc2);
hr = pGraph->AddFilter(pSrc2, L"AudioCap");
hr = GetPin(pSrc2, PINDIR_OUTPUT, &outPin);
This is the custom filter, a MP4 muxer. It properly loads and I can get the input pin and connect it with my output pin. So far so good.
HRESULT hr = CreateObjectFromPath(TEXT("c:\\filters\\mp4mux.dll"), clsid, &pUnk);
if (SUCCEEDED(hr))
{
IBaseFilterPtr pFilter = pUnk;
HRESULT hr = pGraph->AddFilter(pFilter, L"Private Filter");
hr = GetPin(pFilter, PINDIR_INPUT, &inPin);
}
hr = pGraph->Connect(outPin, inPin);
This is where I get lost, I can't find out how to take the next steps to Render and save the output to a file on disk. So any help with the next steps would be greatfully appreciated, thanks in advance!
EDIT: Filesink code
AM_MEDIA_TYPE mType;
mType.majortype = MEDIATYPE_Video;
mType.subtype = MEDIASUBTYPE_H264;
mType.bFixedSizeSamples = FALSE;
mType.bTemporalCompression = TRUE;
mType.lSampleSize = 0;
mType.formattype = FORMAT_None;
mType.pUnk = NULL;
mType.cbFormat = 0;
mType.pbFormat = NULL;
//Not 100% sure about the setup of the media format.
IBaseFilter * iFiltera = NULL;
IFileSinkFilter* iFilter = NULL;
IGraphBuilder *pGraph;
hr = pMoniker2->BindToObject(0, 0, IID_IBaseFilter, (void**)&pSrc2); //audio capture
hr = CoCreateInstance(CLSID_FilterGraph, NULL, CLSCTX_INPROC_SERVER, IID_IGraphBuilder, (void**)&pGraph);
hr = CoCreateInstance(CLSID_FileWriter, NULL, CLSCTX_INPROC_SERVER, IID_IBaseFilter, (void**)&iFiltera);
hr = pBuild->SetFiltergraph(pGraph);
hr = pGraph->AddFilter(pSrc2, L"AudioCap");
hr = GetPin(pSrc2, PINDIR_OUTPUT, &outPin); //ADDED
hr = pGraph->AddFilter(iFiltera, L"FileWriter");
hr = iFiltera->QueryInterface(IID_IFileSinkFilter, (void**)&iFilter);
iFilter->SetFileName((LPCOLESTR)"c:\\wav\\tester.mp4", NULL); //UPDATED mType set to NULL
HRESULT hr = CreateObjectFromPath(TEXT("c:\\filters\\mp4mux.dll"), clsid, &pUnk);
IBaseFilterPtr pFilter = pUnk;
if (SUCCEEDED(hr))
{
HRESULT hr = pGraph->AddFilter(pFilter, L"Private Filter");
hr = GetPin(pFilter, PINDIR_INPUT, &inPin); //mux in
hr = GetPin(pFilter, PINDIR_OUTPUT, &mOutPin); //mux out
hr = GetPin(iFiltera, PINDIR_INPUT, &filePin); // filewriter in
}
hr = pGraph->Connect(outPin, inPin); //connect audio out and mux in
hr = pGraph->Connect(mOutPin, filePin); //connect mux out and file in; Error 0x80040217(VFW_E_CANNOT_CONNECT?) //works now
//ADDED code
IMediaControl *pMC = NULL;
hr = pGraph->QueryInterface(IID_IMediaControl, (void **)&pMC);
hr = pMC->Run();
Sleep(4000);
hr = pMC->Stop();
You need to have an idea what filter graph topology you need for specific task. You are doing capture, here - fine. So you have audio capture filter you provided code snippet for. Then you either compress audio (where your preferred choice should be AAC AKA MPEG-4 Part 3, provided that you are to produce MP4 file), or keep audio uncompressed PCM. Then you connect MPEG-4 multiplexer as you do. The multiplexer produces output stream and you are expected to complete the pipeline with File Writer Filter.
You can manually build the chain in GraphEdit SDK application (or there are alternate richer tools). Your filter graph looks like this:
Note that you can expose filter graph in your application, and connect to it remotely and inspect the topology. This makes debugging much easier. Starting/stopping the filter graph (IMediaControl::Run, ::Stop from code) creates you the file.
My understanding is that you get lost immediately after adding multiplexer. Now you need to find its output pin, add File Writer, query its IFileSinkFilter, set the destination file name using it, find its input pin, connect the two unconnected pins (mux output, writer input). Your pipeline is ready to run.