WaveOutWrite callback creates choppy audio - c++

I have four buffers that I am using for audio playback in a synthesizer. I submit two buffers initially, and then in the callback routine I write data into the next buffer and then submit that buffer.
When I generate each buffer I'm just putting a sine wave into it whose period is a multiple of the buffer length.
When I execute I hear brief pauses between each buffer. I've increased the buffer size to 16K samples at 44100 Hz so I can clearly hear that the whole buffer is playing, but there is an interruption between each.
What I think is happening is that the callback function is only called when ALL buffers that have been written are complete. I need the synthesis to stay ahead of the playback so I need a callback when each buffer is completed.
How do people usually solve this problem?
Update: I've been asked to add code. Here's what I have:
First I connect to the WaveOut device:
// Always grab the mapped wav device.
UINT deviceId = WAVE_MAPPER;
// This is an excelent tutorial:
// http://planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=4422&lngWId=3
WAVEFORMATEX wfx;
wfx.nSamplesPerSec = 44100;
wfx.wBitsPerSample = 16;
wfx.nChannels = 1;
wfx.cbSize = 0;
wfx.wFormatTag = WAVE_FORMAT_PCM;
wfx.nBlockAlign = (wfx.wBitsPerSample >> 3) * wfx.nChannels;
wfx.nAvgBytesPerSec = wfx.nBlockAlign * wfx.nSamplesPerSec;
_waveChangeEventHandle = CreateMutex(NULL,false,NULL);
MMRESULT res;
res = waveOutOpen(&_wo, deviceId, &wfx, (DWORD_PTR)WavCallback,
(DWORD_PTR)this, CALLBACK_FUNCTION);
I initialize the four frames I'll be using:
for (int i=0; i<_numFrames; ++i)
{
WAVEHDR *header = _outputFrames+i;
ZeroMemory(header, sizeof(WAVEHDR));
// Block size is in bytes. We have 2 bytes per sample.
header->dwBufferLength = _codeSpec->OutputNumSamples*2;
header->lpData = (LPSTR)malloc(2 * _codeSpec->OutputNumSamples);
ZeroMemory(header->lpData, 2*_codeSpec->OutputNumSamples);
res = waveOutPrepareHeader(_wo, header, sizeof(WAVEHDR));
if (res != MMSYSERR_NOERROR)
{
printf("Error preparing header: %d\n", res - MMSYSERR_BASE);
}
}
SubmitBuffer();
SubmitBuffer();
Here is the SubmitBuffer code:
void Vodec::SubmitBuffer()
{
WAVEHDR *header = _outputFrames+_curFrame;
MMRESULT res;
res = waveOutWrite(_wo, header, sizeof(WAVEHDR));
if (res != MMSYSERR_NOERROR)
{
if (res = WAVERR_STILLPLAYING)
{
printf("Cannot write when still playing.");
}
else
{
printf("Error calling waveOutWrite: %d\n", res-WAVERR_BASE);
}
}
_curFrame = (_curFrame+1)&0x3;
if (_pointQueue != NULL)
{
RenderQueue();
_nextFrame = (_nextFrame + 1) & 0x3;
}
}
And here is my callback code:
void CALLBACK Vodec::WavCallback(HWAVEOUT hWaveOut,
UINT uMsg,
DWORD dwInstance,
DWORD dwParam1,
DWORD dwParam2 )
{
// Only listen for end of block messages.
if(uMsg != WOM_DONE) return;
Vodec *instance = (Vodec *)dwInstance;
instance->SubmitBuffer();
}
The RenderQueue code is pretty simple - just copies a piece of a template buffer into the output buffer:
void Vodec::RenderQueue()
{
double white = _pointQueue->White;
white = 10.0; // For now just override with a constant value
int numSamples = _codeSpec->OutputNumSamples;
signed short int *data = (signed short int *)_outputFrames[_nextFrame].lpData;
for (int i=0; i<numSamples; ++i)
{
Sample x = white * _noise->Samples[i];
data[i] = (signed short int)(x);
}
_sampleOffset += numSamples;
if (_sampleOffset >= _pointQueue->DurationInSamples)
{
_sampleOffset = 0;
_pointQueue = _pointQueue->next;
}
}
UPDATE: Mostly solved the issue. I need to increment _nextFrame along with _curFrame (not conditionally). The playback buffer was getting ahead of the writing buffer.
However, when I decrease the playback buffer to 1024 samples, it gets choppy again. At 2048 samples it is clear. This happens for both Debug and Release builds.

1024 samples is just about 23ms of audio data. wav is pretty high level API from Windows Vista onwards. If you want low-latency audio playback, you should use CoreAudio. You can get latencies down to 10 ms in shared mode and 3 ms in exclusive mode. Also, the audio depends upon the processes currently running on your system. In other words, it depends on how frequently your audio thread can run to get data. You should also look at MultiMedia Class Scheduler Service and AvSetMmThreadCharacteristics function.

Related

C++ - XAudio2 - Cracking sound when trying to play a continuous sine sound

Edit: Today I found out that I only encounter this problem when I use my headphones with a cord. It's not the headphones that's the problem, because the same headphones can also be used wireless, and then the problem is gone, similar to when I use my other wireless headphones. I would prefer working with a cord though, as it has a smaller delay. So I hope someone can help me with this mystery having provided this additional information.
I have this code to play a sine wave sound. I'm constantly hearing clicking sounds when I try to play this. I'm pretty sure it's because the playback of the buffers is not going perfectly, because when you change the value of l to a larger value (for instance 44100) the clicks are further apart. I think I have followed as accurate as possible the explanation of how to use the callbacks on the Microsoft website. I create three source voices that take turns playing: one is playing while the next is ready and the one after that is being made. I use a total time (tt) to put into the sin() function, so the first byte of the next buffer should align perfectly with the last byte of the current one.
Does anyone know what's going wrong?
P.S.: The many similar questions did not answer my question. In short: I'm not modifying the playing buffer (I don't think so at least); there should be no discontinuity at the border of one buffer to another; I'm not adjusting the frequency either during playback. So I don't think this is a duplicate.
#include <xaudio2.h>
#include <iostream>
#define PI 3.14159265358979323846f
#define l 4410 //0.1 seconds
IXAudio2MasteringVoice* pMasterVoice;
IXAudio2* pXAudio2;
IXAudio2SourceVoice* pSourceVoice[3];
XAUDIO2_BUFFER buffer;
WAVEFORMATEX wfx;
XAUDIO2_VOICE_STATE state;
BYTE pDataBuffer[2*l];
BYTE bytw[2];
int pow16[2];
float w[l];
int i, p;
float tt, ampl;
class VoiceCallback : public IXAudio2VoiceCallback {
public:
HANDLE hBufferEndEvent;
VoiceCallback() : hBufferEndEvent(CreateEvent(NULL, FALSE, FALSE, NULL)) {}
~VoiceCallback() { CloseHandle(hBufferEndEvent); }
//Called when the voice has just finished playing a contiguous audio stream.
void STDMETHODCALLTYPE OnStreamEnd() {SetEvent(hBufferEndEvent);}
//Unused methods are stubs
void STDMETHODCALLTYPE OnVoiceProcessingPassEnd() {}
void STDMETHODCALLTYPE OnVoiceProcessingPassStart(UINT32 SamplesRequired) {}
void STDMETHODCALLTYPE OnBufferEnd(void * pBufferContext) {}
void STDMETHODCALLTYPE OnBufferStart(void * pBufferContext) {}
void STDMETHODCALLTYPE OnLoopEnd(void * pBufferContext) {}
void STDMETHODCALLTYPE OnVoiceError(void * pBufferContext, HRESULT Error) {}
};
VoiceCallback voiceCallback[3];
int main() {
CoInitializeEx(nullptr, COINIT_MULTITHREADED);
pXAudio2 = nullptr;
XAudio2Create(&pXAudio2, 0, XAUDIO2_DEFAULT_PROCESSOR);
pMasterVoice = nullptr;
pXAudio2->CreateMasteringVoice(&pMasterVoice);
tt = 0, p = 660, ampl = 2000;
pow16[0] = 16;
pow16[1] = 4096;
wfx = {0};
wfx.wFormatTag = WAVE_FORMAT_PCM;
wfx.nChannels = (WORD)1; //mono
wfx.nSamplesPerSec = (DWORD)44100; //samplerate
wfx.wBitsPerSample = (WORD)16; //16 bit (signed)
wfx.nBlockAlign = (WORD)2; //2 bytes per sample
wfx.nAvgBytesPerSec = (DWORD)88200; //samplerate*blockalign
wfx.cbSize = (WORD)0;
i = 0;
while (true) {
for (int t = 0; t < l; t++) {
tt = (float)(t + i*l); //total time
w[t] = sin(2.f*PI*tt/p)*ampl;
int intw = (int)w[t];
if (intw < 0) {
intw += 65535;
}
bytw[0] = 0; bytw[1] = 0;
for (int k = 1; k >= 0; k--) {
//turn integer into a little endian byte array
bytw[k] += (BYTE)(16*(intw/pow16[k]));
intw -= bytw[k]*(pow16[k]/16);
bytw[k] += (BYTE)(intw/(pow16[k]/16));
intw -= (intw/(pow16[k]/16))*pow16[k]/16;
}
pDataBuffer[2*t] = bytw[0];
pDataBuffer[2*t + 1] = bytw[1];
}
buffer.AudioBytes = 2*l; //number of bytes per buffer
buffer.pAudioData = pDataBuffer;
buffer.Flags = XAUDIO2_END_OF_STREAM;
if (i > 2) {
pSourceVoice[i%3]->DestroyVoice();
}
pSourceVoice[i%3] = nullptr;
pXAudio2->CreateSourceVoice(&pSourceVoice[i%3], &wfx, 0, XAUDIO2_DEFAULT_FREQ_RATIO, &voiceCallback[i%3], NULL, NULL);
pSourceVoice[i%3]->SubmitSourceBuffer(&buffer);
if (i > 1) {
//wait until the current one is done playing
while (pSourceVoice[(i - 2)%3]->GetState(&state), state.BuffersQueued > 0) {
WaitForSingleObjectEx(voiceCallback[(i - 2)%3].hBufferEndEvent, INFINITE, TRUE);
}
}
if (i > 0) {
//play the next one while you're writing the one after that in the next iteration
pSourceVoice[(i - 1)%3]->Start(0);
}
i++;
}
}
If you want the sound to be 'looping', then submit multiple data-packets to the same source voice -or- set a loop value so it automatically restarts the existing audio packet. If you allow a source voice to run out of data, then you are going to hear the break in between depending upon the latency of your audio output system.
Furthermore, creating and destroy source voices is a relatively expensive operation, so doing it in a loop like this is not particular efficient.
See DirectX Tool Kit for Audio for a complete example of XAudio2 usage, as well as the latest version of the XAudio2 samples from the legacy DirectX SDK on GitHub.

ESP32 i2s_read returns empty buffer after calling this function

I am trying to record audio from an INMP441 which is connected to a ESP32 but returning the buffer containing the bytes the microphone read always leads to something which is NULL.
The code for setting up i2s and the microphone is this:
// i2s config
const i2s_config_t i2s_config = {
.mode = i2s_mode_t(I2S_MODE_MASTER | I2S_MODE_RX), // receive
.sample_rate = SAMPLE_RATE, // 44100 (44,1KHz)
.bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT, // 32 bits per sample
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT, // use right channel
.communication_format = i2s_comm_format_t(I2S_COMM_FORMAT_I2S | I2S_COMM_FORMAT_I2S_MSB),
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1, // interrupt level 1
.dma_buf_count = 64, // number of buffers
.dma_buf_len = SAMPLES_PER_BUFFER}; // 512
// pin config
const i2s_pin_config_t pin_config = {
.bck_io_num = gpio_sck, // serial clock, sck (gpio 33)
.ws_io_num = gpio_ws, // word select, ws (gpio 32)
.data_out_num = I2S_PIN_NO_CHANGE, // only used for speakers
.data_in_num = gpio_sd // serial data, sd (gpio 34)
};
// config i2s driver and pins
// fct must be called before any read/write
esp_err_t err = i2s_driver_install(I2S_PORT, &i2s_config, 0, NULL);
if (err != ESP_OK)
{
Serial.printf("Failed installing the driver: %d\n", err);
}
err = i2s_set_pin(I2S_PORT, &pin_config);
if (err != ESP_OK)
{
Serial.printf("Failed setting pin: %d\n", err);
}
Serial.println("I2S driver installed! :-)");
Setting up the i2s stuff is no problem at all. The tricky part for me is reading from the i2s:
// 44KHz * Byte per sample * time in seconds = total size in bytes
const size_t recordSize = (SAMPLE_RATE * I2S_BITS_PER_SAMPLE_32BIT / 8) * recordTime; //recordTime = 5s
// size in bytes
size_t totalReadSize = 0;
// 32 bits per sample set in config * 1024 samples per buffers = total bits per buffer
char *samples = (char *)calloc(totalBitsPerBuffer, sizeof(char));
// number of bytes read
size_t bytesRead;
Serial.println("Start recording...");
// read until wanted size is reached
while (totalReadSize < recordSize)
{
// read to buffer
esp_err_t err = i2s_read(I2S_PORT, (void *)samples, totalBitsPerBuffer, &bytesRead, portMAX_DELAY);
// check if error occurd, if so stop recording
if (err != ESP_OK)
{
Serial.println("Error while recording!");
break;
}
// check if bytes read works → yes
/*
for (int i = 0; i < bytesRead; i++)
{
uint8_t sample = (uint8_t) samples[i];
Serial.print(sample);
} */
// add read size to total read size
totalReadSize += bytesRead;
// Serial.printf("Currently recorded %d%% \n", totalReadSize * 100 / recordSize);
}
// convert bytes to mb
double_t totalReadSizeMB = (double_t)totalReadSize / 1e+6;
Serial.printf("Total read size: %fMb\n", totalReadSizeMB);
Serial.println("Samples deref");
Serial.println(*samples);
Serial.println("Samples");
Serial.println(samples);
return samples;
Using this code leads to the following output:
I2S driver installed! :-)
Start recording...
Total read size: 0.884736Mb
Samples deref
␀
Samples
When I uncomment the part where I iterate over the bytes read part I get something like this:
200224231255255224210022418725525522493000902552550238002241392542552241520020425225508050021624525501286700194120022461104022421711102242271030018010402242510000188970224141930022291022410185022487830021679001127500967200666902241776600246610224895902244757022418353002224802242274302249741022419339009435001223102242432602243322022412120001241402245911022418580084402248325525522461252255044249255224312452552242212372552241272352550342302552241212262552242112212550252216255014621325501682092550112205255224161202255224237198255224235194255224231922552248518725501141832550421812552241951762550144172255018168255034164255224173157255018215525522455152255028148255021014425505214025522487137255014613225522412112825502361252550180120255018011725522451172550252113255224133111255061082550248105255224891042552249910125522439972550138942552242279225503287255224101832552242478125522410178255224231732552244970255224336525501766225501426125502325625522424553255224109492550186[...]
This shows that the microphone is able to record, but I cant return the actual value of the buffer.
While programming this code I looked up at the official doku and some code which seems to work elsewhere.
I am also new to C++ and am not used to work with pointers.
Does anyone know what the problem could be?

Oboe Async Audio Extraction

I am trying to build a NDK based c++ low latancy audio player which will encounter three operations for multiple audios.
Play from assets.
Stream from an online source.
Play from local device storage.
From one of the Oboe samples provided by Google, I added another function to the class NDKExtractor.cpp to extract a URL based audio and render it to audio device while reading from source at the same time.
int32_t NDKExtractor::decode(char *file, uint8_t *targetData, AudioProperties targetProperties) {
LOGD("Using NDK decoder: %s",file);
// Extract the audio frames
AMediaExtractor *extractor = AMediaExtractor_new();
//using this method instead of AMediaExtractor_setDataSourceFd() as used for asset files in the rythem game example
media_status_t amresult = AMediaExtractor_setDataSource(extractor, file);
if (amresult != AMEDIA_OK) {
LOGE("Error setting extractor data source, err %d", amresult);
return 0;
}
// Specify our desired output format by creating it from our source
AMediaFormat *format = AMediaExtractor_getTrackFormat(extractor, 0);
int32_t sampleRate;
if (AMediaFormat_getInt32(format, AMEDIAFORMAT_KEY_SAMPLE_RATE, &sampleRate)) {
LOGD("Source sample rate %d", sampleRate);
if (sampleRate != targetProperties.sampleRate) {
LOGE("Input (%d) and output (%d) sample rates do not match. "
"NDK decoder does not support resampling.",
sampleRate,
targetProperties.sampleRate);
return 0;
}
} else {
LOGE("Failed to get sample rate");
return 0;
};
int32_t channelCount;
if (AMediaFormat_getInt32(format, AMEDIAFORMAT_KEY_CHANNEL_COUNT, &channelCount)) {
LOGD("Got channel count %d", channelCount);
if (channelCount != targetProperties.channelCount) {
LOGE("NDK decoder does not support different "
"input (%d) and output (%d) channel counts",
channelCount,
targetProperties.channelCount);
}
} else {
LOGE("Failed to get channel count");
return 0;
}
const char *formatStr = AMediaFormat_toString(format);
LOGD("Output format %s", formatStr);
const char *mimeType;
if (AMediaFormat_getString(format, AMEDIAFORMAT_KEY_MIME, &mimeType)) {
LOGD("Got mime type %s", mimeType);
} else {
LOGE("Failed to get mime type");
return 0;
}
// Obtain the correct decoder
AMediaCodec *codec = nullptr;
AMediaExtractor_selectTrack(extractor, 0);
codec = AMediaCodec_createDecoderByType(mimeType);
AMediaCodec_configure(codec, format, nullptr, nullptr, 0);
AMediaCodec_start(codec);
// DECODE
bool isExtracting = true;
bool isDecoding = true;
int32_t bytesWritten = 0;
while (isExtracting || isDecoding) {
if (isExtracting) {
// Obtain the index of the next available input buffer
ssize_t inputIndex = AMediaCodec_dequeueInputBuffer(codec, 2000);
//LOGV("Got input buffer %d", inputIndex);
// The input index acts as a status if its negative
if (inputIndex < 0) {
if (inputIndex == AMEDIACODEC_INFO_TRY_AGAIN_LATER) {
// LOGV("Codec.dequeueInputBuffer try again later");
} else {
LOGE("Codec.dequeueInputBuffer unknown error status");
}
} else {
// Obtain the actual buffer and read the encoded data into it
size_t inputSize;
uint8_t *inputBuffer = AMediaCodec_getInputBuffer(codec, inputIndex,
&inputSize);
//LOGV("Sample size is: %d", inputSize);
ssize_t sampleSize = AMediaExtractor_readSampleData(extractor, inputBuffer,
inputSize);
auto presentationTimeUs = AMediaExtractor_getSampleTime(extractor);
if (sampleSize > 0) {
// Enqueue the encoded data
AMediaCodec_queueInputBuffer(codec, inputIndex, 0, sampleSize,
presentationTimeUs,
0);
AMediaExtractor_advance(extractor);
} else {
LOGD("End of extractor data stream");
isExtracting = false;
// We need to tell the codec that we've reached the end of the stream
AMediaCodec_queueInputBuffer(codec, inputIndex, 0, 0,
presentationTimeUs,
AMEDIACODEC_BUFFER_FLAG_END_OF_STREAM);
}
}
}
if (isDecoding) {
// Dequeue the decoded data
AMediaCodecBufferInfo info;
ssize_t outputIndex = AMediaCodec_dequeueOutputBuffer(codec, &info, 0);
if (outputIndex >= 0) {
// Check whether this is set earlier
if (info.flags & AMEDIACODEC_BUFFER_FLAG_END_OF_STREAM) {
LOGD("Reached end of decoding stream");
isDecoding = false;
} else {
// Valid index, acquire buffer
size_t outputSize;
uint8_t *outputBuffer = AMediaCodec_getOutputBuffer(codec, outputIndex,
&outputSize);
/*LOGV("Got output buffer index %d, buffer size: %d, info size: %d writing to pcm index %d",
outputIndex,
outputSize,
info.size,
m_writeIndex);*/
// copy the data out of the buffer
memcpy(targetData + bytesWritten, outputBuffer, info.size);
bytesWritten += info.size;
AMediaCodec_releaseOutputBuffer(codec, outputIndex, false);
}
} else {
// The outputIndex doubles as a status return if its value is < 0
switch (outputIndex) {
case AMEDIACODEC_INFO_TRY_AGAIN_LATER:
LOGD("dequeueOutputBuffer: try again later");
break;
case AMEDIACODEC_INFO_OUTPUT_BUFFERS_CHANGED:
LOGD("dequeueOutputBuffer: output buffers changed");
break;
case AMEDIACODEC_INFO_OUTPUT_FORMAT_CHANGED:
LOGD("dequeueOutputBuffer: output outputFormat changed");
format = AMediaCodec_getOutputFormat(codec);
LOGD("outputFormat changed to: %s", AMediaFormat_toString(format));
break;
}
}
}
}
// Clean up
AMediaFormat_delete(format);
AMediaCodec_delete(codec);
AMediaExtractor_delete(extractor);
return bytesWritten;
}
Now the problem i am facing is that this code it first extracts all the audio data saves it into a buffer which then becomes part of AFileDataSource which i derived from DataSource class in the same sample.
And after its done extracting the whole file it plays by calling the onAudioReady() for Oboe AudioStreamBuilder.
What I need is to play as it streams the chunk of audio buffer.
Optional Query: Also aside from the question it blocks the UI even though i created a foreground service to communicate with the NDK functions to execute this code. Any thoughts on this?
You probably solved this already, but for future readers...
You need a FIFO buffer to store the decoded audio. You can use the Oboe's FIFO buffer e.g. oboe::FifoBuffer.
You can have a low/high watermark for the buffer and a state machine, so you start decoding when the buffer is almost empty and you stop decoding when it's full (you'll figure out the other states that you need).
As a side note, I implemented such player only to find at some later time, that the AAC codec is broken on some devices (Xiaomi and Amazon come to mind), so I had to throw away the AMediaCodec/AMediaExtractor parts and use an AAC library instead.
You have to implement a ringBuffer (or use the one implemented in the oboe example LockFreeQueue.h) and copy the data on buffers that you send on the ringbuffer from the extracting thread. On the other end of the RingBuffer, the audio thread will get that data from the queue and copy it to the audio buffer. This will happen on onAudioReady(oboe::AudioStream *oboeStream, void *audioData, int32_t numFrames) callback that you have to implement in your class (look oboe docs). Be sure to follow all the good practices on the Audio thread (don't allocate/deallocate memory there, no mutexes and no file I/O etc.)
Optional query: A service doesn't run in a separate thread, so obviously if you call it from UI thread it blocks the UI. Look at other types of services, there you can have IntentService or a service with a Messenger that will launch a separate thread on Java, or you can create threads in C++ side using std::thread

Create CMSampleBufferRef from an AudioInputIOProc

I have an AudioInputIOProc that I'm getting an AudioBufferList from. I need to convert this AudioBufferList to a CMSampleBufferRef.
Here's the code I've written so far:
- (void)handleAudioSamples:(const AudioBufferList*)samples numSamples:(UInt32)numSamples hostTime:(UInt64)hostTime {
// Create a CMSampleBufferRef from the list of samples, which we'll own
AudioStreamBasicDescription monoStreamFormat;
memset(&monoStreamFormat, 0, sizeof(monoStreamFormat));
monoStreamFormat.mSampleRate = 44100;
monoStreamFormat.mFormatID = kAudioFormatMPEG4AAC;
monoStreamFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved;
monoStreamFormat.mBytesPerPacket = 4;
monoStreamFormat.mFramesPerPacket = 1;
monoStreamFormat.mBytesPerFrame = 4;
monoStreamFormat.mChannelsPerFrame = 2;
monoStreamFormat.mBitsPerChannel = 16;
CMFormatDescriptionRef format = NULL;
OSStatus status = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &monoStreamFormat, 0, NULL, 0, NULL, NULL, &format);
if (status != noErr) {
// really shouldn't happen
return;
}
mach_timebase_info_data_t tinfo;
mach_timebase_info(&tinfo);
UInt64 _hostTimeToNSFactor = (double)tinfo.numer / tinfo.denom;
uint64_t timeNS = (uint64_t)(hostTime * _hostTimeToNSFactor);
CMTime presentationTime = CMTimeMake(timeNS, 1000000000);
CMSampleTimingInfo timing = { CMTimeMake(1, 44100), kCMTimeZero, kCMTimeInvalid };
CMSampleBufferRef sampleBuffer = NULL;
status = CMSampleBufferCreate(kCFAllocatorDefault, NULL, false, NULL, NULL, format, numSamples, 1, &timing, 0, NULL, &sampleBuffer);
if (status != noErr) {
// couldn't create the sample buffer
NSLog(#"Failed to create sample buffer");
CFRelease(format);
return;
}
// add the samples to the buffer
status = CMSampleBufferSetDataBufferFromAudioBufferList(sampleBuffer,
kCFAllocatorDefault,
kCFAllocatorDefault,
0,
samples);
if (status != noErr) {
NSLog(#"Failed to add samples to sample buffer");
CFRelease(sampleBuffer);
CFRelease(format);
NSLog(#"Error status code: %d", status);
return;
}
[self addAudioFrame:sampleBuffer];
NSLog(#"Original sample buf size: %ld for %d samples from %d buffers, first buffer has size %d", CMSampleBufferGetTotalSampleSize(sampleBuffer), numSamples, samples->mNumberBuffers, samples->mBuffers[0].mDataByteSize);
NSLog(#"Original sample buf has %ld samples", CMSampleBufferGetNumSamples(sampleBuffer));
}
Now, I'm unsure how to calculate the numSamples given this function definition of an AudioInputIOProc:
OSStatus AudioTee::InputIOProc(AudioDeviceID inDevice, const AudioTimeStamp *inNow, const AudioBufferList *inInputData, const AudioTimeStamp *inInputTime, AudioBufferList *outOutputData, const AudioTimeStamp *inOutputTime, void *inClientData)
This definition exists in the AudioTee.cpp file in WavTap.
The error I'm getting is a CMSampleBufferError_RequiredParameterMissing error with the error code -12731 when I try to call CMSampleBufferSetDataBufferFromAudioBufferList.
Update:
To clarify on the problem a bit, the following is the format of the audio data I'm getting from the AudioDeviceIOProc:
Channels: 2, Sample Rate: 44100, Precision: 32-bit, Sample Encoding: 32-bit Signed Integer PCM, Endian Type: little, Reverse Nibbles: no, Reverse Bits: no
I'm getting an AudioBufferList* that has all the audio data (30 seconds of video) that I need to convert to a CMSampleBufferRef* and add those sample buffers to a video (that is 30 seconds long) that is being written to disk via an AVAssetWriterInput.
Three things look wrong:
You declare that the format ID is kAudioFormatMPEG4AAC, but configure it as LPCM. So try
monoStreamFormat.mFormatID = kAudioFormatLinearPCM;
You also call the format "mono" when it's configured as stereo.
Why use mach_timebase_info which could leave gaps in your audio presentation timestamps? Use sample count instead:
CMTime presentationTime = CMTimeMake(numSamplesProcessed, 44100);
Your CMSampleTimingInfo looks wrong, and you're not using presentationTime. You set the buffer's duration as 1 sample long when it can be numSamples and its presentation time to zero which can't be right. Something like this would make more sense:
CMSampleTimingInfo timing = { CMTimeMake(numSamples, 44100), presentationTime, kCMTimeInvalid };
And some questions:
Does your AudioBufferList have the expected 2 AudioBuffers?
Do you have a runnable version of this?
p.s. I'm guilty of it myself, but allocating memory on the audio thread is considered harmful in audio dev.

FFmpeg + OpenAL - playback streaming sound from video won't work

I am decoding an OGG video (theora & vorbis as codecs) and want to show it on the screen (using Ogre 3D) while playing its sound. I can decode the image stream just fine and the video plays perfectly with the correct frame rate, etc.
However, I cannot get the sound to play at all with OpenAL.
Edit: I managed to make the playing sound resemble the actual audio in the video at least somewhat. Updated sample code.
Edit 2: I was able to get "almost" correct sound now. I had to set OpenAL to use AL_FORMAT_STEREO_FLOAT32 (after initializing the extension) instead of just STEREO16. Now the sound is "only" extremely high pitched and stuttering, but at the correct speed.
Here is how I decode audio packets (in a background thread, the equivalent works just fine for the image stream of the video file):
//------------------------------------------------------------------------------
int decodeAudioPacket( AVPacket& p_packet, AVCodecContext* p_audioCodecContext, AVFrame* p_frame,
FFmpegVideoPlayer* p_player, VideoInfo& p_videoInfo)
{
// Decode audio frame
int got_frame = 0;
int decoded = avcodec_decode_audio4(p_audioCodecContext, p_frame, &got_frame, &p_packet);
if (decoded < 0)
{
p_videoInfo.error = "Error decoding audio frame.";
return decoded;
}
// Frame is complete, store it in audio frame queue
if (got_frame)
{
int bufferSize = av_samples_get_buffer_size(NULL, p_audioCodecContext->channels, p_frame->nb_samples,
p_audioCodecContext->sample_fmt, 0);
int64_t duration = p_frame->pkt_duration;
int64_t dts = p_frame->pkt_dts;
if (staticOgreLog)
{
staticOgreLog->logMessage("Audio frame bufferSize / duration / dts: "
+ boost::lexical_cast<std::string>(bufferSize) + " / "
+ boost::lexical_cast<std::string>(duration) + " / "
+ boost::lexical_cast<std::string>(dts), Ogre::LML_NORMAL);
}
// Create the audio frame
AudioFrame* frame = new AudioFrame();
frame->dataSize = bufferSize;
frame->data = new uint8_t[bufferSize];
if (p_frame->channels == 2)
{
memcpy(frame->data, p_frame->data[0], bufferSize >> 1);
memcpy(frame->data + (bufferSize >> 1), p_frame->data[1], bufferSize >> 1);
}
else
{
memcpy(frame->data, p_frame->data, bufferSize);
}
double timeBase = ((double)p_audioCodecContext->time_base.num) / (double)p_audioCodecContext->time_base.den;
frame->lifeTime = duration * timeBase;
p_player->addAudioFrame(frame);
}
return decoded;
}
So, as you can see, I decode the frame, memcpy it to my own struct, AudioFrame. Now, when the sound is played, I use these audio frame like this:
int numBuffers = 4;
ALuint buffers[4];
alGenBuffers(numBuffers, buffers);
ALenum success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on alGenBuffers : " + Ogre::StringConverter::toString(success) + alGetString(success));
return;
}
// Fill a number of data buffers with audio from the stream
std::vector<AudioFrame*> audioBuffers;
std::vector<unsigned int> audioBufferSizes;
unsigned int numReturned = FFMPEG_PLAYER->getDecodedAudioFrames(numBuffers, audioBuffers, audioBufferSizes);
// Assign the data buffers to the OpenAL buffers
for (unsigned int i = 0; i < numReturned; ++i)
{
alBufferData(buffers[i], _streamingFormat, audioBuffers[i]->data, audioBufferSizes[i], _streamingFrequency);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on alBufferData : " + Ogre::StringConverter::toString(success) + alGetString(success)
+ " size: " + Ogre::StringConverter::toString(audioBufferSizes[i]));
return;
}
}
// Queue the buffers into OpenAL
alSourceQueueBuffers(_source, numReturned, buffers);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error queuing streaming buffers: " + Ogre::StringConverter::toString(success) + alGetString(success));
return;
}
}
alSourcePlay(_source);
The format and frequency I give to OpenAL are AL_FORMAT_STEREO_FLOAT32 (it is a stereo sound stream, and I did initialize the FLOAT32 extension) and 48000 (which is the sample rate of the AVCodecContext of the audio stream).
And during playback, I do the following to refill OpenAL's buffers:
ALint numBuffersProcessed;
// Check if OpenAL is done with any of the queued buffers
alGetSourcei(_source, AL_BUFFERS_PROCESSED, &numBuffersProcessed);
if(numBuffersProcessed <= 0)
return;
// Fill a number of data buffers with audio from the stream
std::vector<AudiFrame*> audioBuffers;
std::vector<unsigned int> audioBufferSizes;
unsigned int numFilled = FFMPEG_PLAYER->getDecodedAudioFrames(numBuffersProcessed, audioBuffers, audioBufferSizes);
// Assign the data buffers to the OpenAL buffers
ALuint buffer;
for (unsigned int i = 0; i < numFilled; ++i)
{
// Pop the oldest queued buffer from the source,
// fill it with the new data, then re-queue it
alSourceUnqueueBuffers(_source, 1, &buffer);
ALenum success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error Unqueuing streaming buffers: " + Ogre::StringConverter::toString(success));
return;
}
alBufferData(buffer, _streamingFormat, audioBuffers[i]->data, audioBufferSizes[i], _streamingFrequency);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on re- alBufferData: " + Ogre::StringConverter::toString(success));
return;
}
alSourceQueueBuffers(_source, 1, &buffer);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error re-queuing streaming buffers: " + Ogre::StringConverter::toString(success) + " "
+ alGetString(success));
return;
}
}
// Make sure the source is still playing,
// and restart it if needed.
ALint playStatus;
alGetSourcei(_source, AL_SOURCE_STATE, &playStatus);
if(playStatus != AL_PLAYING)
alSourcePlay(_source);
As you can see, I do quite heavy error checking. But I do not get any errors, neither from OpenAL nor from FFmpeg.
Edit: What I hear somewhat resembles the actual audio from the video, but VERY high pitched and stuttering VERY much. Also, it seems to be playing on top of TV noise. Very strange. Plus, it is playing much slower than the correct audio would.
Edit: 2 After using AL_FORMAT_STEREO_FLOAT32, the sound plays at the correct speed, but is still very high pitched and stuttering (though less than before).
The video itself is not broken, it can be played fine on any player. OpenAL can also play *.way files just fine in the same application, so it is also working.
Any ideas what could be wrong here or how to do this correctly?
My only guess is that somehow, FFmpeg's decode function does not produce data OpenGL can read. But this is as far as the FFmpeg decode example goes, so I don't know what's missing. As I understand it, the decode_audio4 function decodes the frame to raw data. And OpenAL should be able to work with RAW data (or rather, doesn't work with anything else).
So, I finally figured out how to do it. Gee, what a mess. It was a hint from a user on the libav-users mailing list that put me on the correct path.
Here are my mistakes:
Using the wrong format in the alBufferData function. I used AL_FORMAT_STEREO16 (as that is what every single streaming example with OpenAL uses). I should have used AL_FORMAT_STEREO_FLOAT32, as the video I stream is Ogg and vorbis is stored in floating points. And using swr_convert to convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16 just crashes. No idea why.
Not using swr_convert to convert the decoded audio frame to the target format. After I was trying to use swr_convert to convert from FLTP to S16, and it would simply crash without a reason given, I assumed it was broken. But after figuring out my first mistake, I tried again, converting from FLTP to FLT (non-planar) and then it worked! So OpenAL uses interleaved format, not planar. Good to know.
So here is the decodeAudioPacket function that is working for me with Ogg video, vorbis audio stream:
int decodeAudioPacket( AVPacket& p_packet, AVCodecContext* p_audioCodecContext, AVFrame* p_frame,
SwrContext* p_swrContext, uint8_t** p_destBuffer, int p_destLinesize,
FFmpegVideoPlayer* p_player, VideoInfo& p_videoInfo)
{
// Decode audio frame
int got_frame = 0;
int decoded = avcodec_decode_audio4(p_audioCodecContext, p_frame, &got_frame, &p_packet);
if (decoded < 0)
{
p_videoInfo.error = "Error decoding audio frame.";
return decoded;
}
if(decoded <= p_packet.size)
{
/* Move the unread data to the front and clear the end bits */
int remaining = p_packet.size - decoded;
memmove(p_packet.data, &p_packet.data[decoded], remaining);
av_shrink_packet(&p_packet, remaining);
}
// Frame is complete, store it in audio frame queue
if (got_frame)
{
int outputSamples = swr_convert(p_swrContext,
p_destBuffer, p_destLinesize,
(const uint8_t**)p_frame->extended_data, p_frame->nb_samples);
int bufferSize = av_get_bytes_per_sample(AV_SAMPLE_FMT_FLT) * p_videoInfo.audioNumChannels
* outputSamples;
int64_t duration = p_frame->pkt_duration;
int64_t dts = p_frame->pkt_dts;
if (staticOgreLog)
{
staticOgreLog->logMessage("Audio frame bufferSize / duration / dts: "
+ boost::lexical_cast<std::string>(bufferSize) + " / "
+ boost::lexical_cast<std::string>(duration) + " / "
+ boost::lexical_cast<std::string>(dts), Ogre::LML_NORMAL);
}
// Create the audio frame
AudioFrame* frame = new AudioFrame();
frame->dataSize = bufferSize;
frame->data = new uint8_t[bufferSize];
memcpy(frame->data, p_destBuffer[0], bufferSize);
double timeBase = ((double)p_audioCodecContext->time_base.num) / (double)p_audioCodecContext->time_base.den;
frame->lifeTime = duration * timeBase;
p_player->addAudioFrame(frame);
}
return decoded;
}
And here is how I initialize the context and the destination buffer:
// Initialize SWR context
SwrContext* swrContext = swr_alloc_set_opts(NULL,
audioCodecContext->channel_layout, AV_SAMPLE_FMT_FLT, audioCodecContext->sample_rate,
audioCodecContext->channel_layout, audioCodecContext->sample_fmt, audioCodecContext->sample_rate,
0, NULL);
int result = swr_init(swrContext);
// Create destination sample buffer
uint8_t** destBuffer = NULL;
int destBufferLinesize;
av_samples_alloc_array_and_samples( &destBuffer,
&destBufferLinesize,
videoInfo.audioNumChannels,
2048,
AV_SAMPLE_FMT_FLT,
0);