WASAPI: Identify non-active channels on loopback recording - c++

I have a DSP software which captures the audio playing using the WASAPI api in shared loopback mode.
hr = _pAudioClient->Initialize(AUDCLNT_SHAREMODE_SHARED, AUDCLNT_STREAMFLAGS_LOOPBACK, 0, 0, _pFormat, 0);
This part works fine, but now I want to be able to detect the number of channels actually playing. In other words how would I be able to detect if the audio playing is in stereo, 5.1, 7.1?
The problem is:
* Since loopback have to use shared mode there could be multiple sources playing.
* This analysis has to be done in real-time. Can't wait until playback is done.
* Detect the difference between a channel not used at all by any playback source and a channel that is temporarily silent
The best solution in my mind would be If I could retrieve a list of all playback source/sub mixes and query them each for the number of channels. That way I don't have to analyse the audio data stream itself.

Loopback recording takes place in mix format defined on the endpoint, so regardless of what the original audio format was you get the data in the mix format, mixed from possibly multiple played sources and also converted to such shared format.
Device Formats
Loopback Recording
WASAPI loopback contains the mix of all audio being played...
The GetMixFormat method retrieves the stream format that the audio engine uses for its internal processing of shared-mode streams...
After an application has used GetMixFormat or IsFormatSupported to find an appropriate format for a shared-mode or exclusive-mode stream, the application can call the Initialize method to initialize a stream with that format. An application that attempts to initialize a shared-mode stream with a format that is not identical to the mix format obtained from the GetMixFormat method, but that has the same number of channels and the same sample rate as the mix format, is likely to succeed. Before calling Initialize, the application can call IsFormatSupported to verify that Initialize will accept the format.
That is, even though WASAPI offers some flexibility in audio format, channel configuration and sample rate are defined by shared format when it comes to loopback capture.
As you are getting the mix, you cannot really identify "non-active" channels: this information is lost during mixing to shared format.
Also, the actual shared format can be configured interactively via Control Panel:

Ok I now have a solution to my problem. As far as I know you can not detect sub-mixes in the shared mix so the only option was to analyze the audio stream/capture buffer.
First during my main capture loop I set the current timestamp for all channels playing.
const time_t now = Date::getCurrentTimeMillis();
//Iterate all capture frames
for (i = 0; i < numFramesAvailable; ++i) {
for (j = 0; j < _nChannelsIn; ++j) {
//Identify which channels are playing.
if (pCaptureBuffer[j] != 0) {
_pUsedChannels[j] = now;
}
}
}
Then every second I call this function which evaluates if a channel has played the last second. Based upon which channels are playing I can do conditional routing.
void checkUsedChannels() {
const time_t now = Date::getCurrentTimeMillis();
//Compare now against last used timestamp and determine active channels
for (size_t i = 0; i < _nChannelsIn; ++i) {
if (now - _pUsedChannels[i] > 1000) {
_pUsedChannels[i] = 0;
}
}
//Update conditional routing
for (const Input *pInut : _inputs) {
pInut->evalConditions();
}
}
Very simple solution but it appears to be working.

Related

Core Audio Ring Buffer Data comes out blank

I am working off a demo from the book "Learning Core Audio: A Hands-On Guide to Audio Programming for Mac and iOS." Chapter 8 shows how to set up a simple AudioUnit graph to play through from the AUHAL input unit to an output unit. This setup doesn't actually connect the audio units; instead, both units use a callback and pass audio data through an instance of CARingBuffer. I'm coding for MacOS 10.15.6, and using code directly from the publisher here. Here's a picture of how it works:
The code builds and runs, but I get no audio. Note that later, after introducing a speech synthesis unit, I do get playback, so I know the basics are working.
InputRenderProc asks the AUHAL unit for input and stores it in the ring buffer.
MyAUGraphPlayer *player = (MyAUGraphPlayer*) inRefCon;
// have we ever logged input timing? (for offset calculation)
if (player->firstInputSampleTime < 0.0) {
player->firstInputSampleTime = inTimeStamp->mSampleTime;
if ((player->firstOutputSampleTime > -1.0) &&
(player->inToOutSampleTimeOffset < 0.0)) {
player->inToOutSampleTimeOffset = player->firstInputSampleTime - player->firstOutputSampleTime;
}
}
// render into our buffer
OSStatus inputProcErr = noErr;
inputProcErr = AudioUnitRender(player->inputUnit,
ioActionFlags,
inTimeStamp,
inBusNumber,
inNumberFrames,
player->inputBuffer);
if (! inputProcErr) {
inputProcErr = player->ringBuffer->Store(player->inputBuffer,
inNumberFrames,
inTimeStamp->mSampleTime);
UInt32 sz = sizeof(player->inputBuffer);
printf ("stored %d frames at time %f (%d bytes)\n", inNumberFrames, inTimeStamp->mSampleTime, sz);
for (int i = 0; i < player->inputBuffer->mNumberBuffers; i++ ){
//printf("stored audio string[%d]: %s\n", i, player->inputBuffer->mBuffers[i].mData);
}
}
If I uncomment the printf statement, I see what looks like audio data being stored.
stored audio string[1]: #P'\274a\353\273\336^\274x\205 \2741\330B\2747'\274\371\361U\274\346\274\274}\212C\274\334\365%\274\261\367\273\340\307/\274E
stored 512 frames at time 134610.000000 (8 bytes)
However, when I fetch from the ring buffer in the GraphRenderCallback like this...
MyAUGraphPlayer *player = (MyAUGraphPlayer*) inRefCon;
// have we ever logged output timing? (for offset calculation)
if (player->firstOutputSampleTime < 0.0) {
player->firstOutputSampleTime = inTimeStamp->mSampleTime;
if ((player->firstInputSampleTime > -1.0) &&
(player->inToOutSampleTimeOffset < 0.0)) {
player->inToOutSampleTimeOffset = player->firstInputSampleTime - player->firstOutputSampleTime;
}
}
// copy samples out of ring buffer
OSStatus outputProcErr = noErr;
// new CARingBuffer doesn't take bool 4th arg
outputProcErr = player->ringBuffer->Fetch(ioData,
inNumberFrames,
inTimeStamp->mSampleTime + player->inToOutSampleTimeOffset);
I get nothing (I know I can't expect proper null-terminated string output, but I thought I'd see something).
fetched 512 frames at time 160776.000000
fetched audio string[0, size 2048]: xx
fetched audio string[1, size 2048]: xx
fetched 512 frames at time 161288.000000
fetched audio string[0, size 2048]: xx
fetched audio string[1, size 2048]: xx
This is not a permission problem; I have other non-AudioUnit code that can get mic input. In addition, I created a plist that makes this app prompt for mic access every time, so I know that is working. I cannot understand why data goes into this ring buffer, but never comes out.
These days you need to declare that you want to use the microphone, providing an explanation string. This wasn't the case in 2012 when Learning Core Audio was published.
In short, you now need to:
add an NSMicrophoneUsageDescription string to your Info.plist
add sandboxing capability and enable Audio Input
The sample code you're using is a command line tool, so adding an Info.plist to it in Xcode isn't as simple as with a .app package. Also the code does not seem to work if you run it from Xcode. In my case it has to be run for Terminal.app. This may be due to the fact that my Terminal has microphone permissions (viewable in System Preferences > Security & Privacy > Microphone). You can and probably should explicitly request microphone access from the user (yourself in this case!) by using requestAccessForMediaType on an AVCaptureDevice. That's right, AVFoundation code in a Core Audio tutorial, what's the world coming to.
There are more details on the above steps in this answer
p.s. I think the person who thought capturing zeroes instead of returning an error was a good idea is probably good friends with whoever invented returning HTTP 200 with an error code in the body.

Raw files aren't playing, or are playing incorrectly - Oboe (Android-ndk)

I'm attempting to Play a Raw (int16 PCM) encoded audio file in my android application. I've been following and reading through the Oboe documentation/samples to try to get one of my own audio files to play.
The audio file I need to play is roughly 6kb, or 1592 frames (stereo).
Either no sound plays, or sound/jitter plays on startup (with varying output - see bellow)
Troubleshooting
update
I have switched to floats for buffer queuing, instead of keeping everything to int16_t (and converting back to int16_t when done), although now I'm back to no sound.
The audio seems to be either not playing, or playing on startup (which is wrong). The sound should play after I press 'start'.
When the app was implemented with int16_t only, the premature sound was relative to how big the buffer size was. If the buffer size is smaller than the audio file, the sound is very fast and clipped (more drone-like at lower buffer sizes). Bigger than the Raw audio size it seems like it plays on a loop and gets quieter at higher buffer sizes. The sound would also get "softer" when the start button is pressed. I'm not even entirely sure this means the raw audio was playing, it could just be random nonsense jitters from Android.
When filling the buffers with floats, and converting to int16_t afterwards, no audio is played.
(I have tried running systrace, but I honestly don't know what I'm looking for)
The stream opens fine.
The buffer size fails to be ajusted in createPlaybackStream() (although somehow it still sets it to twice the burst size)
The stream starts fine.
The Raw resources are being loaded fine.
Implementation
What I am currently trying in the builder:
Setting the callback to this, or onAudioReady()
Setting the performance mode to LowLatency
Setting the sharing mode to Exclusive
Setting the buffer capacity to (anything bigger than my audio file frame count)
Setting the burst size (frames per call back) to (anything equal to or lower than the buffer capacity / 2)
I am using the Player class and the AAssetManager class from the Rhythm Game sample here: https://github.com/google/oboe/blob/master/samples/RhythmGame. I am using these classes to load my resources and play the sound. Player.renderAudio writes the audio data to the output buffer.
Here are the relevant methods from my audio engine:
void AudioEngine::createPlaybackStream() {
// // Load the RAW PCM data files into memory
std::shared_ptr<AAssetDataSource> soundSource(AAssetDataSource::newFromAssetManager(assetManager, "sound.raw", ChannelCount::Mono));
if (soundSource == nullptr) {
LOGE("Could not load source data for sound");
return;
}
sound = std::make_shared<Player>(soundSource);
AudioStreamBuilder builder;
builder.setCallback(this);
builder.setPerformanceMode(PerformanceMode::LowLatency);
builder.setSharingMode(SharingMode::Exclusive);
builder.setChannelCount(mChannelCount);
Result result = builder.openStream(&stream);
if (result == Result::OK && stream != nullptr) {
mSampleRate = stream->getSampleRate();
mFramesPerBurst = stream->getFramesPerBurst();
int channelCount = stream->getChannelCount();
if (channelCount != mChannelCount) {
LOGW("Requested %d channels but received %d", mChannelCount, channelCount);
}
// Set the buffer size to (burst size * 2) - this will give us the minimum possible latency while minimizing underruns
stream->setBufferSizeInFrames(mFramesPerBurst * 2);
if (setBufferSizeResult != Result::OK) {
LOGW("Failed to set buffer size. Error: %s", convertToText(setBufferSizeResult.error()));
}
// Start the stream - the dataCallback function will start being called
result = stream->requestStart();
if (result != Result::OK) {
LOGE("Error starting stream. %s", convertToText(result));
}
} else {
LOGE("Failed to create stream. Error: %s", convertToText(result));
}
}
DataCallbackResult AudioEngine::onAudioReady(AudioStream *audioStream, void *audioData, int32_t numFrames) {
int16_t *outputBuffer = static_cast<int16_t *>(audioData);
sound->renderAudio(outputBuffer, numFrames);
return DataCallbackResult::Continue;
}
// When the 'start' button is pressed, it calls this method with true
// There should be no sound on app start-up until this button is pressed
// Sound stops when 'stop' is pressed
setPlaying(bool isPlaying) {
sound->setPlaying(isPlaying);
}
Setting the buffer capacity to (anything bigger than my audio file frame count)
You don't need to set the buffer capacity. This will be set automatically at a reasonable level for you. Typically ~3000 frames. Note that buffer capacity is different from buffer size which defaults to 2*framesPerBurst.
Setting the burst size (frames per call back) to (anything equal to or lower than the buffer capacity / 2)
Again, don't do this. onAudioReady will be called every time the stream requires more audio data and numFrames indicates how many frames you should supply. If you override this value with a value which isn't an exact ratio of the audio device's native burst size (typical values are 128, 192 and 240 frames depending on underlying hardware) then you may get audio glitches.
I have switched to floats for buffer queuing
The format which you need to supply data in is determined by the audio stream and it is only known after the stream has been opened. You can get it by calling stream->getFormat().
In the RhythmGame sample (at least the version you're referring to) here's how the formats work:
Source file is converted from 16-bit to float inside AAssetDataSource::newFromAssetManager (floats are the preferred format for any kind of signal processing)
If the stream format is 16-bit then convert it back inside onAudioReady
1592 frames (stereo).
You said that your source was stereo but you're specifying it as mono here:
std::shared_ptr soundSource(AAssetDataSource::newFromAssetManager(assetManager, "sound.raw", ChannelCount::Mono));
Without doubt that will cause audio problems because the AAssetDataSource will have a value for numFrames which is double the correct value. This will cause audio glitches because half the time you'll be playing random parts of system memory.

input delay with PortAudio callback and ASIO sdk

I'm trying to get the input from my guitar to be played through my computer using the portaudio library and the ASIO sdk.
I have been following some of the tutorials on the official website to get the basics set up. Currently I got it working so that portaudio is listening to the right input and output device and I have the callback setup to just output the input and do nothing with it like this:
static int paTestCallback(const void *inputBuffer, void *outputBuffer, unsigned long framesPerBuffer, const PaStreamCallbackTimeInfo* timeInfo, PaStreamCallbackFlags statusFlags, void *userData)
{
float *out = (float*)outputBuffer;
float* in = (float*)inputBuffer;
for (int i = 0; i<framesPerBuffer; i++)
{
*out++ = *in++; /* left */
*out++ = *in++; /* right */
}
return 0;
}
This callback is setup by calling this:
PaError error = Pa_OpenDefaultStream(&stream, 2, 2, paFloat32, 44100, paFramesPerBufferUnspecified, paTestCallback, &data);
Pa_StartStream(stream);
Now, this does work but I have a lot of delay (about 0.5s) when I strike a string on my guitar and when I hear it through the monitors.
Is there a way to solve this delay? Do I need to rewrite the callback method?
EDIT:
So, I got the delay to be a lot better using this code instead of the basic Pa_OpenDefaultStream()
int defaultIn = Pa_GetDefaultInputDevice();
int defaultOut = Pa_GetDefaultOutputDevice();
PaStreamParameters *inParam = new PaStreamParameters();
inParam->channelCount = 2;
inParam->device = defaultIn;
inParam->sampleFormat = paFloat32;
inParam->suggestedLatency = 0.05;
PaStreamParameters *outParam = new PaStreamParameters();
outParam->channelCount = 2;
outParam->device = defaultOut;
outParam->sampleFormat = paFloat32;
outParam->suggestedLatency = 0;
error = Pa_OpenStream(&stream, inParam, outParam, 44100, paFramesPerBufferUnspecified, paNoFlag, paTestCallback, &data);
if (error != paNoError) {
Logger::log("[PortAudioManager] Could not open default stream. Exiting function...");
return;
}
Pa_StartStream(stream);
There is still a little bit of delay though, mostly noticeable when playing more then just a single note.
EDIT:
I figured out with the help of Ross-Bencina that the windows default input device and output device doesn't change anything to the index of the host api's in PortAudio. I seemed to be using MME all this time. I did the following to get the right index for the ASIO device:
int hostNr = Pa_GetHostApiCount();
std::vector<const PaHostApiInfo*> infoVertex;
for (int t = 0; t < hostNr; ++t) {
infoVertex.push_back(Pa_GetHostApiInfo(t));
}
Then I just checked which is the one with ASIO and set the suggestedLatency in both PaStreamParameters to 0 and the delay is now gone and sound is good (although it's mono for now).
You are on the right track using paFramesPerBufferUnspecified.
The ASIO latency behavior depends on the driver. There are two possibilities:
The ASIO driver lets the code (i.e. PortAudio) request a latency (possibly with some constraints). PortAudio finds to best match between a supported driver buffer size and the latency that you request.
The other possibility is that your audio interface does not provide programmatic control over latency settings. Instead, latency is only selectable from the driver's ASIO control panel UI (and the driver will force a fixed buffer size on PortAudio). In this case, you should investigate the driver control panel UI to set the lowest workable latency.
In either case, your approach with Pa_OpenStream is close to optimal, but you should request zero latency for both input and output (in your edit you're requesting 50ms input latency, zero output latency). The end result will be that PortAudio selects the lowest available ASIO buffer size. If this turns out to be unstable (audio glitches) then you'll need to increase the requested latency.
include/pa_asio.h exposes a host-API-specific interface for querying the ASIO buffer sizes allowed by the driver (be aware that this can change if you change settings in the control panel). It also provides a function to display the driver's control panel UI.
EDIT: Note that Pa_GetDefaultInputDevice() and Pa_GetDefaultOutputDevice() will only return ASIO devices if you built PortAudio for ASIO only. If you included any other more common APIs in the build (e.g. WMME or DirectSound) they will be given priority as the (lowest common denominator) default device. You could add a check that you are actually accessing the ASIO device:
assert(Pa_GetHostApiInfo(Pa_GetDeviceInfo(Pa_GetDefaultOutputDevice())->hostApi)->type == paASIO);
If PortAudio is compiled with support for multiple host APIs: To get the default ASIO device: enumerate host APIs using Pa_GetHostApiCount and Pa_GetHostApiInfo to find the ASIO host API. Then pull the default ASIO device indices from the returned PaHostApiInfo struct.

Synchronizing input pins in directshow

I am creating a directshow filter which's purpose is to take 3 input pins and create a video which shows alternately vidoe from the first source, the second source and the third source, in a fixed time internal.
So if i have three webcam connected to my filter, i want the final video for example to show 5 seconds of the first cam, five seconds of the second cam, and so on...
I have tried two approaches:
Approach one
I use a class TimeManager. This class has a function isItPinsTurn(pinname). This functions returns true or false regarding if the pin is supposed to send sample to the output. To do this the TimeManager creates a new thread which sleeps every x seconds.
After it slept it changes to the current active inputpin to the next.
The result is that every x seconds the isItPinSTurn(pinname) function returns another pin. This way every pin only seconds output to the outputpin when it is its turn, hence i get the desired videos with x intervalls between the input cam.
The problem with this approach
Sleep doesn't seem to work in directshow filters. I get a runtime error:
abort() has been called
Approach two
I use the samples GetMediaTime method and a buffer which keeps track of how much video samples in terms of its mediatime, has already been sent to the output pin. This is best illustrated with code:
void MyFilter::acceptFilterInput(LPCWSTR pinname, IMediaSample* sample)
{
mylogger->LogDebug("In acceptFIlterInput", L"D:\\TEMP\\yc.log");
if (wcscmp(pinname, this->currentInputPin) == 0)
{
outpin->Deliver(sample);
LONGLONG timestart;
LONGLONG timeend;
sample->GetTime(&timestart, &timeend);
*mediaTimeBuffer += timeend - timestart;
if (*mediaTimeBuffer > this->MEDIATIME)
{
this->SetNextPinActive(pinname);
*mediaTimeBuffer = 0;
}
}
}
When the filter starts the currentInputPin is set to pin0 (the first). Calls to acceptFilterInput (which is called by the the input pins receie function) adjust the mediaTimeBUffer with the size of the MediaSample-MediaTime. If this buffer is higher than MEDIATIME (which can for example be 5 (seconds)), the buffer is set back to zero and the next pin is set active.
Problems with this approach
I am not even sure if CMediaSample->GetMediaTime returns the data i need, as it seems to return negative numbers, which doesn't seem to make much sense. I didn't find useful information about the return value of GetMediaTime on the web.
You are expected to block execution (incoming calls to IPin::Receive) on input streams so that other streams could catch up on their own streaming threads. You typically achieve this by either using wait/synchronization APIs and functions, or by holding references on media samples so that input peer would block on empty allocator waiting for a media sample (buffer) to get available.
Yes Sleep works well, although polling is the worst of possible options.
Approach two does not make sense for me because I don't see any real synchronization there: there is no execution blocking, and there is no making pin active. You cannot force data on the input pin, you only can wait to get called with new media sample. So you should block accepting data on one input stream/pin until you get data on another.
Some useful relevant information on multiplexing:
How to make a DirectShow Muxer Filter - Part 1
How to make a DirectShow Muxer Filter - Part 2
GDCL MPEG-4 Multiplexer - available in source, and can multiplex data from 2+ streams

How to reduce latency when streaming x264

I would like to produce a zerolatency live video stream and play it in VLC player with as little latency as possible.
This are the settings I currently use:
x264_param_default_preset( &m_Params, "veryfast", "zerolatency" );
m_Params.i_threads = 2;
m_Params.b_sliced_threads = true;
m_Params.i_width = m_SourceWidth;
m_Params.i_height = m_SourceHeight;
m_Params.b_intra_refresh = 1;
m_Params.b_vfr_input = true;
m_Params.i_timebase_num = 1;
m_Params.i_timebase_den = 1000;
m_Params.i_fps_num = 1;
m_Params.i_fps_den = 60;
m_Params.rc.i_vbv_max_bitrate = 512;
m_Params.rc.i_vbv_buffer_size = 256;
m_Params.rc.f_vbv_buffer_init = 1.1f;
m_Params.rc.i_rc_method = X264_RC_CRF;
m_Params.rc.f_rf_constant = 24;
m_Params.rc.f_rf_constant_max = 35;
m_Params.b_annexb = 0;
m_Params.b_repeat_headers = 0;
m_Params.b_aud = 0;
x264_param_apply_profile( &m_Params, "high" );
Using those settings, I have the following issues:
VLC shows lots of missing frames (see screenshot, "verloren"). I am not sure if this is an issue.
If I set a value <200ms for the network stream delay in VLC, VLC renders a few frames and than stops to decode/render frames.
If I set a value >= 200ms for the network stream delay in VLC, everything looks good so far but the latency is, obviously, 200ms, which is too high.
Question:
Which settings (x264lib and VLC) should I use in order to encode and stream with as little latency as possible?
On your x264 settings: many are redundant ie already contained in "zerolatency". However, as best as I can tell, your encoding latency is nevertheless zero frames, ie you put one frame in and you immediately (as soon as your CPU has finished encoding it, anyway) get one frame out. It never waits for a newer frame in order to give an encoded older frame (the way it would with lookahead, for example).
On why vlc pauses unless you give it a large network delay: The problem is that your combination of rate control and vbv settings when encoding is not ideal. What you want to do for low latency encode is to use CBR, and set the VBV buffer to the size of one frame, exactly. This enables a special VBV calculation, if you look in the x264 source.
You may also try not setting anything timing related whatsoever (no fps, no vbv) and use CRF with zerolatency. The results would depend on what container the video is packaged in for streaming.
Read this for more info: http://x264dev.multimedia.cx/archives/249
If you want to have the fastest possible encoding, then delete everything after
x264_param_default_preset( &m_Params, "veryfast", "zerolatency" );
and change veryfast to ultrafast. The rest is because of network delay + decoding.