So I'm picking up C++ after a long hiatus and I had the idea to create a program which can generate music based upon strings of numbers at runtime (was inspired by the composition of Pi done by some people) with the eventual goal being some sort of procedural music generation software.
So far I have been able to make a really primitive version of this with the Beep() function and just feeding through the first so and so digits of Pi as a test. Works like a charm.
What I'm looking for now is how I could kick it up a notch and get some higher quality sound being made (because Beep() literally is the most primitive sound... ever) and I realized I have absolutely no idea how to do this. What I need is either a library or some sort of API that can:
1) Generate sound without pre-existing file. I want the result to be 100% generated by code and not rely on any samples, optimally.
2) If I could get something going that would be capable of playing multiple sounds at a time, like be able to play chords or a melody with a beat, that would be nice.
3) and If I could in any way control the wave it plays (kinda like chiptune mixers can) via equation or some other sort of data, that'd be super helpful.
I don't know if this is a weird request or I just researched it using the wrong terms, but I just wasn't able to find anything along these lines or at least nothing that was well documented at all. :/
If anyone can help, I'd really appreciate it.
EDIT: Also, apparently I'm just super not used to asking stuff on forums, my target platform is Windows (7, specifically, although I wouldn't think that matters).
I use portaudio (http://www.portaudio.com/). It will let you create PCM streams in a portable way. Then you just push the samples into the stream, and they will play.
#edit: using PortAudio is pretty easy. You initialize the library. I use floating point samples to make it super easy. I do it like this:
PaError err = Pa_Initialize();
if ( err != paNoError )
return false;
mPaParams.device = Pa_GetDefaultOutputDevice();
if ( mPaParams.device == paNoDevice )
return false;
mPaParams.channelCount = NUM_CHANNELS;
mPaParams.sampleFormat = paFloat32;
mPaParams.suggestedLatency =
Pa_GetDeviceInfo( mPaParams.device )->defaultLowOutputLatency;
mPaParams.hostApiSpecificStreamInfo = NULL;
Then later when you want to play sounds you create a stream, 2 channels for stereo, at 44khz, good for mp3 audio:
PaError err = Pa_OpenStream( &mPaStream,
NULL, // no input
&mPaParams,
44100, // params
NUM_FRAMES, // frames per buffer
0,
sndCallback,
this
);
Then you implement the callback to fill the PCM audio stream. The callback is a c function, but I just call through to my C++ class to handle the audio. I ripped this from my code, and it may not be 100% correct now as I removed a ton of stuff you won't care about. But its works kind of like this:
static int sndCallback( const void* inputBuffer,
void* outputBuffer,
unsigned long framesPerBuffer,
const PaStreamCallbackTimeInfo* timeInfo,
PaStreamCallbackFlags statusFlags,
void* userData )
{
Snd* snd = (Snd*)userData;
return snd->callback( (float*)outputBuffer, framesPerBuffer );
}
u32 Snd::callback( float* outbuf, u32 nFrames )
{
mPlayMutex.lock(); // use mutexes because this is asyc code!
// clear the output buffer
memset( outbuf, 0, nFrames * NUM_CHANNELS * sizeof( float ));
// mix all the sounds.
if ( mChannels.size() )
{
// I have multiple audio sources I'm mixing. That's what mChannels is.
for ( s32 i = mChannels.size(); i > 0; i-- )
{
for ( u32 j = 0; j < frameCount * NUM_CHANNELS; j++ )
{
float f = outbuf[j] + getNextSample( i ) // <------------------- your code here!!!
if ( f > 1.0 ) f = 1.0; // clamp it so you don't get clipping.
if ( f < -1.0 ) f = -1.0;
outbuf[j] = f;
}
}
}
mPlayMutex.unlock_p();
return 1; // when you are done playing audio return zero.
}
I answered a very similar question on this earlier this week: Note Synthesis, Harmonics (Violin, Piano, Guitar, Bass), Frequencies, MIDI . In your case if you don't want to rely on samples then the wavetable method is out. So your simplest option would be to dynamically vary the frequency and amplitude of sinusoids over time, which is easy but will sound pretty terrible (like a cheap Theremin). Your only real option would be a more sophisticated synthesis algorithm such as one of the Physical Modelling ones (eg Karplus-Strong). That would be an interesting project, but be warned that it does require something of a mathematical background.
You can indeed use something like Portaudio as Rafael has mentioned to physically get the sound out of the PC, in fact I think Portaudio is the best option for that. But generating the data so that it sounds musical is by far your biggest challenge.
Related
Is there a way to decompress sound files using the FMOD library in c++?
I'm developing a sound editor, using the FMOD Engine library, but I got to the problem with compressed audio files, specifically OGG types.
For now I'm just reading the raw data using FMOD::Sound::readData(), and then normalize it and display it to the screen using SFML. This works fine with WAV files, because they are not compressed, but I need to do more steps for compressed formats.
This is what I'm doing now:
FMOD_RESULT r;
FMOD::System* m_fmodSystem = nullptr;
int m_maxChannels = 64;
// Create fmod system
r = FMOD::System_Create(&m_fmodSystem);
FMOD_ERROR_CHECK(r);
// Initialize system
r = m_fmodSystem->init(m_maxChannels, FMOD_INIT_NORMAL, nullptr);
FMOD_ERROR_CHECK(r);
// Create sound
FMOD::Sound* soundResource = nullptr;
FMOD::Channel* channel = nullptr;
r = m_fmodSystem->createSound("640709__chobesha__laser-gun-sound.ogg",
FMOD_DEFAULT | FMOD_OPENONLY,
nullptr,
&soundResource);
FMOD_ERROR_CHECK(r);
// Get sound length in raw bytes
unsigned int audioLength = 0;
FMOD_TIMEUNIT timeUnit = FMOD_TIMEUNIT_RAWBYTES;
r = soundResource->getLength(&audioLength, timeUnit);
FMOD_ERROR_CHECK(r);
// Read sound data
char* audioBuffer = new char[audioLength];
unsigned int readData = 0;
r = soundResource->readData(reinterpret_cast<void*>(audioBuffer),
audioLength,
&readData);
FMOD_ERROR_CHECK(r);
signed short* interpretedData = reinterpret_cast<signed short*>(audioBuffer);
// Analize data to normalize it
signed short maxValue = -32767;
signed short minValue = 32767;
int interpretedDataSize = readData / sizeof(signed short);
for (int i = 0; i < interpretedDataSize; ++i) {
if (interpretedData[i] > maxValue) {
maxValue = interpretedData[i];
}
if (interpretedData[i] < minValue) {
minValue = interpretedData[i];
}
}
float maxValF = static_cast<float>(maxValue);
float minValF = static_cast<float>(minValue);
float* normalizedArray = new float[interpretedDataSize];
// Normalize data
float maxAbsValF = abs(maxValF);
maxAbsValF = maxAbsValF > abs(minValF) ? maxAbsValF : abs(minValF);
for (int i = 0; i < interpretedDataSize; ++i) {
normalizedArray[i] = interpretedData[i] / maxAbsValF;
}
I read on other posts and on the FMOD documentation that I can use the flag FMOD_CREATESAMPLE to tell the createSound function to decompress the data at loadtime, instead of playtime, but It doesn't work in my current structure of the code, I'm guessing because the FMOD_OPENONLY prevents it from closing, and therefor it doesn't gets the chance to decompress, or something. That's what I got from the documentation.
The problem with not using the FMOD_OPENONLY flag, is that I cannot read the data using the readData function, or it returns an error flag.
Searching, I found that I can use the lock function, to help it decompress and to get the pointer to the data of the sound, but even with all of this, it stills appears to be compressed. I donĀ“t know if I'm missing something.
This is the version 2 of the code, with this modifications:
// Create sound
FMOD::Sound* soundResource = nullptr;
FMOD::Channel* channel = nullptr;
r = m_fmodSystem->createSound("640709__chobesha__laser-gun-sound.ogg",
FMOD_DEFAULT | FMOD_CREATESAMPLE,
nullptr,
&soundResource);
FMOD_ERROR_CHECK(r);
// Get sound length in raw bytes
unsigned int audioLength = 0;
FMOD_TIMEUNIT timeUnit = FMOD_TIMEUNIT_RAWBYTES;
r = soundResource->getLength(&audioLength, timeUnit);
FMOD_ERROR_CHECK(r);
// Read sound data
char* audioBuffer = new char[audioLength];
void* ptr2 = nullptr;
unsigned int len1, len2;
r = soundResource->lock(0, audioLength, reinterpret_cast<void**>(&audioBuffer), &ptr2, &len1, &len2);
FMOD_ERROR_CHECK(r);
r = soundResource->unlock(reinterpret_cast<void*>(audioBuffer), ptr2, len1, len2);
FMOD_ERROR_CHECK(r);
This is the graph I get for a WAV sound
The left side is the sound loaded in the Audacity app, and the right side is my graph.
This is the graph for the first try with the OGG file
And this is the graph for the OGG file with the modifications
For what I can see, the first is the same as the second, so I'm assuming both are compressed and what I changed did nothing.
Someone knows a better way to decompress and read the raw data of a sound, preferably using this library of FMOD. If it's not possible with FMOD, what is the best way to decompress any sound file, knowing its format.
This answer is just a couple minority-position takes and a sketchy description of a process I once used. Maybe the thoughts are worth consideration.
One thought: a person who is editing sound (your target audience?) has the know-how to decompress files (e.g., using Audacity), so perhaps adding this capability (handling all possible incoming audio formats) is a lower priority?
Another thought: there are likely many libraries for decompressing sound available. You could employ one of them prior to presenting the results to FMOD. I just did a search on github for "ogg c++" and was shown 51 repositories.
In my own experience, for an application I wrote about seven years ago, I tweaked some code from a Vorbis decoder source so that it output PCM rather than outputting as a .wav. With OGG, the .wav data is converted to PCM prior to compression. So, it decompresses back to PCM before converting that to a .wav. I found the point in the code where the conversion happens and edited that out, leaving the data in a decompressed PCM form.
My application was built to accept PCM, so I actually ended up saving an intermediate step.
I created an application a couple of years ago that allowed me to process audio by downmixing a 6 channel or 8 channel a.k.a 5.1 as 7.1 as matrixed stereo encoded for that purpose I used the portaudio library with great results this is an example of the open stream function and callback to downmix a 7.1 signal
Pa_OpenStream(&Flujo, &inputParameters, &outParameters, SAMPLE_RATE, 1, paClipOff, ptrFunction, NULL);
Notice the use of framesPerBuffer value of just one (1), this is my callback function
int downmixed8channels(const void *input, void *output, unsigned long framesPerBuffer, const PaStreamCallbackTimeInfo * info, PaStreamCallbackFlags state, void * userData)
{
(void)userData;
(void)info;
(void)state;
(void)framesBuffer;
float *ptrInput = (float*)input;
float *ptrOutput = (float*)ouput;
/*This is a struct to identify samples*/
AudioSamples->L = ptrInput[0];
AudioSamples->R = ptrInput[1];
AudioSamples->C = ptrInput[2];
AudioSamples->LFE = ptrInput[3];
AudioSamples->RL = ptrInput[4];
AudioSamples->RR = ptrInput[5];
AudioSamples->SL = ptrInput[6];
AudioSamples->SR = ptrInput[7];
Encoder->8channels(AudioSamples->L,
AudioSamples->R,
AudioSamples->C,
AudioSamples->LFE,
MuestrasdeAudio->SL,
MuestrasdeAudio->SR,
MuestrasdeAudio->RL,
MuestrasdeAudio->RR,);
ptrOutput[0] = Encoder->gtLT();
ptrOutput[1] = Encoder->gtRT();
return paContinue;
}
As you can see the order set by the index in the output and input buffer correspond to a discrete channel
in the case of the output 0 = Left channel, 1 = right Channel. This used to work well, until entered Windows 10 2004, since I updated my system to this new version my audio glitch and I get artifacts like those
Those are captures from the sound of the channel test window under the audio device panel of windows. By the images is clear my program is dropping frames, so the first try to solve this was to use a larger buffer than one to hold samples process them and send then, the reason I did not use a buffer size larger than one in the first place was that the program would drop frames.
But before implementing a I did a proof of concept, would not include audio processing at all, of simple passing of data from input to ouput, for that I set the oputput channelCount parameters to 8 just like the input, resulting in something as simple as this.
for (int i = 0; i < FramesPerBuffer /*1000*/; i++)
{
ptrOutput[i] = ptrOutput[i];
}
but still the program is still dropping samples.
Next I used two callbacks one for writing to a buffer and a second one to read it and send it to output
(void)info;
(void)userData;
(void)state;
(void)output;
float* ptrInput = (float*)input;
for (int i = 0; i < FRAME_SIZE; i++)
{
buffer_input[i] = ptrInput[i];
}
return paContinue;
Callback to store.
(void)info;
(void)userData;
(void)state;
(void)output;
float* ptrOutput = (float*)output;
for (int i = 0; i < FRAME_SIZE; i++)
{
AudioSamples->L = (buffer_input[i] );
AudioSamples->R = (buffer_input[i++]);
AudioSamples->C = (buffer_input[i++] );
AudioSamples->LFE = (buffer_input[i++]);
AudioSamples->SL = (buffer_input[i++] );
AudioSamples->SR = (buffer_input[i++]);
Encoder->Encoder(AudioSamples->L, AudioSamples->R, AudioSamples->C, AudioSamples->LFE,
AudioSamples->SL, AudioSamples->SR);
bufferTransformed[w] = (Encoder->getLT() );
bufferTransformed[w++] = (Encoder->getRT() );
}
w = 0;
for (int i = 0; i < FRAME_REDUCED; i++)
{
ptrOutput[i] = buffer_Transformed[i];
}
return paContinue;
Callback for processing
The processing callback use a reduced frames per buffer since 2 channel is less than eight since it seems in portaudio a frame is composed of a sample for each audio channel.
This also did not work, since the first problem, is how to syncronize the two callback?, after all of this, what recommendation or advice, can you give me to solve this issue,
Notes: the samplerate must be same for both devices, I implemeted logic in the program to prevent this, the bitdepth is also the same I am using paFloat32,
.The portaudio is the modified one use by audacity, since I wanted to use their implementation of WASAPI
loopback
Thank very much in advance!.
At the end of the day it I did not have to change my callbacks functions in any way, what solved it, was changing or increasing the parameter ".suggestedLatency" of the input and output parameters, to 1.0, even the devices defaultLowOutputLatency or defaultHighOutputLatency values where causing to much glitching, I test it until 1.0 was de sweepspot, higher values did not seen to improve.
TL;DR Increased the suggestedLatency until the glitching is gone.
I have no experience in audio programming and C++ is quite low level language so I have a little problems with it. I work with ASIO SDK 2.3 downloaded from http://www.steinberg.net/en/company/developers.html.
I am writing my own host based on example inside SDK.
For now I've managed to go through the whole sample and it looks like it's working. I have external sound card connected to my PC. I've successfully loaded driver for this device, configured it, handled callbacks, casting data from analog to digital etc. common stuff.
And part where I am stuck now:
When I play some track via my device I can see bars moving in the mixer (device's software). So device is connected in right way. In my code I've picked the inputs and outputs with the names of the bars that are moving in mixer. I've also used ASIOCreateBuffers() to create buffer for each input/output.
Now correct me if I am wrong:
When ASIOStart() is called and driver is in running state, when I input the sound signal to my external device I believe the buffers get filled with data, right?
I am reading the documentation but I am a bit lost - how can I access the data being sent by device to application, stored in INPUT buffers? Or signal? I need it for signal analysis or maybe recording in future.
EDIT: If I had made it to complicated then in a nutshell my question is: how can I access input stream data from code? I don't see any objects/callbacks letting me to do so in documentation.
The hostsample in the ASIO SDK is pretty close to what you need. In the bufferSwitchTimeInfo callback there is some code like this:
for (int i = 0; i < asioDriverInfo.inputBuffers + asioDriverInfo.outputBuffers; i++)
{
int ch = asioDriverInfo.bufferInfos[i].channelNum;
if (asioDriverInfo.bufferInfos[i].isInput == ASIOTrue)
{
char* buf = asioDriver.bufferInfos[i].buffers[index];
....
Inside of that if block asioDriver.bufferInfos[i].buffers[index] is a pointer to the raw audio data (index is a parameter to the method).
The format of the buffer is dependent upon the driver and that can be discovered by testing asioDriverInfo.channelInfos[i].type. The types of formats will be 32bit int LSB first, 32bit int MSB first, and so on. You can find the list of values in the ASIOSampleType enum in asio.h. At this point you'll want to convert the samples to some common format for downstream signal processing code. If you're doing signal processing you'll probably want convert to double. The file host\asioconvertsample.cpp will give you some idea of what's involved in the conversion. The most common format you're going to encounter is probably INT32 MSB. Here is how you'd convert it to double.
for (int i = 0; i < asioDriverInfo.inputBuffers + asioDriverInfo.outputBuffers; i++)
{
int ch = asioDriverInfo.bufferInfos[i].channelNum;
if (asioDriverInfo.bufferInfos[i].isInput == ASIOTrue)
{
switch (asioDriverInfo.channelInfos[i].type)
{
case ASIOInt32LSB:
{
double* pDoubleBuf = new double[_bufferSize];
for (int i = 0 ; i < _bufferSize ; ++i)
{
pDoubleBuf[i] = *(int*)asioDriverInfo.bufferInfos.buffers[index] / (double)0x7fffffff;
}
// now pDoubleBuf contains one channels worth of samples in the range of -1.0 to 1.0.
break;
}
// and so on...
Thank you very much. Your answer helped quite much but as I am inexperienced with C++ a bit :P I find it a bit problematic.
In general I've written my own host based on hostsample. I didn't implement asioDriverInfo structure and use common variables for now.
My first problem was:.
char* buf = asioDriver.bufferInfos[i].buffers[index];
as I got error that I can't cast (void*) to char* but this probably solved the problem:
char* buf = static_cast<char*>(bufferInfos[i].buffers[doubleBufferIndex]);
My second problem is with the data conversion. I've checked the file you've recommended me but I find it a little black magic. For now I am trying to follow your example and:
for (int i = 0; i < inputBuffers + outputBuffers; i++)
{
if (bufferInfos[i].isInput)
{
switch (channelInfos[i].type)
{
case ASIOSTInt32LSB:
{
double* pDoubleBuf = new double[buffSize];
for (int j = 0 ; j < buffSize ; ++j)
{
pDoubleBuf[j] = bufferInfos[i].buffers[doubleBufferIndex] / (double)0x7fffffff;
}
break;
}
}
}
I get error there:
pDoubleBuf[j] = bufferInfos[i].buffers[doubleBufferIndex] / (double)0x7fffffff;
which is:
error C2296: '/' : illegal, left operand has type 'void *'
What I don't get is that in your example there is no table there: asioDriverInfo.bufferInfos.buffers[index] after bufferInfos and even if I fix it... to what kind of type should I cast it to make it work. P
PS. I am sure ASIOSTInt32LSB data type is fine for my PC.
The ASIO input and output buffers are accessible using void pointers, but using memcpy or memmove to access I/O buffer will create a memory copy which is to be avoided if you are doing real-time processing. I would suggest casting the pointer type to int* so you can directly access them.
It's also very slow in real-time processing to cast types 1 by 1 when you have like 100+ audio channels when AVX2 is supported on most CPUs.
_mm256_loadu_si256() and _mm256_cvtepi32_ps() will do the conversion much faster.
I'm trying for several months to figure out how it works. I have a program that I'm developing, I have an mp3 file in and out I have the pcm that goes to "alsa" for playback. Using the library mpg123 where the main code is this:
while (mpg123_read (mh, buffer, buffer_size, & done) == MPG123_OK)
sendoutput (dev, buffer, done);
Now, my attempts have been based on the use of library avutil/avcodec on the buffer for reducing/increase the number of samples in one second. The result is awful and isn't audibly. In a previous question someone advised me to increase my PC performance but if a simple program like "VLC" can do this on old computers why I can't?
And for the problem of position in the audio file how can I achieve this?
Edit
I Add some piece of code to try to explain.
SampleConversion.c
#define LENGTH_MS 1000 // how many milliseconds of speech to store 0,5s:x=1:44100 x=22050 sample da memorizzare
#define RATE 44100 // the sampling rate (input)
struct AVResampleContext* audio_cntx = 0;
//(LENGTH_MS*RATE*16*CHANNELS)/8000
void inizializeResample(int inRate, int outRate)
{
audio_cntx = av_resample_init( outRate, //out rate
inRate, //in rate
16, //filter length
10, //phase count
0, //linear FIR filter
0.8 ); //cutoff frequency
assert( audio_cntx && "Failed to create resampling context!");
}
void resample(char dataIn[],char dataOut[],int nsamples)
{
int samples_consumed;
int samples_output = av_resample( audio_cntx, //resample context
(short*)dataOut, //buffout
(short*)dataIn, //buffin
&samples_consumed, //&consumed
nsamples, //nb_samples
sizeof(dataOut)/2,//lenout sizeof(out_buffer)/2 (Right?)
0);//is_last
assert( samples_output > 0 && "Error calling av_resample()!" );
}
void endResample()
{
av_resample_close( audio_cntx );
}
My edited play function (Mpg123.c)
if (isPaused==0 && mpg123_read(mh, buffer, buffer_size, &done) == MPG123_OK)
{
int i=0; char * resBuffer=malloc(sizeof(buffer));
//resBuffer=&buffer[0];
resample(buffer,resBuffer,44100);
if((ao_play(dev, (char*)resBuffer, done)==0)){
return 1;
}
}
Both codes are made by me so I can not ask anybody ever suggested improvements as in the previous question (although I do not know if they are right, sigh)
Edit2: Updated with changes
In the call to av_resample, samples_consumed is never read, so any unconsumed frames are skipped.
Furthermore, nsamples is the constant value 44100 instead of the actual number of frames read (done from mpg123_read).
sizeof(dataOut) is wrong; it's the size of a pointer.
is_last is wrong at the end of the input.
In the play function, sizeof(buffer) is likely to be wrong, depending on the definition of buffer.
I wrote a small sound playing library with PortAudio on Linux. It's for a small game, so there are lots of little sounds when various things happen. I open up a stream for each wav file to play by calling Pa_OpenStream(). On linux this call takes on average around 10ms. However on Windows this typically takes 40 to 70ms. And worse, the first call takes 1.3 seconds. Then after that occasionally it will again take 1.3 seconds. I haven't been able to find anything consistent about why it hangs, except that it happens every first call. The windows build actually runs fine on Wine.
I assume this has to do with differences in the underlying sound API in use in different systems. But oddly enough I haven't found any information anywhere, despite extensive searching.
Here's my play function:
int play(const char * sN)
{
float threshold = .01f;
char * soundName = (char*)sN;
float g = glfwGetTime();
updatePlayer();
float g2 = glfwGetTime();
if (g2-g > threshold) printf("updatePlayer: %f/", g2 - g);
if (!paused && (int)streams.size() < maxStreams && !mute)
{
streamStr * ss = new streamStr;
g = glfwGetTime();
if (g-g2 > threshold) printf("new stream: %f/", g - g2);
PaError err;
sfData * sdata = getData(soundName);
ss->sfd = sdata;
g2 = glfwGetTime();
if (g2-g > threshold)printf("getData: %f/", g2 - g);
err = Pa_OpenStream(&(ss->stream), 0, &sdata->outputParameters, sdata->sfInfo.samplerate, paFramesPerBufferUnspecified, paNoFlag, PaCallback, ss);
if (err)
{
printf("PortAudio error opening output: %s\n", Pa_GetErrorText(err));
delete ss;
return 1;
}
g = glfwGetTime();
if (g-g2 > threshold)
printf("Pa_OpenStream: %f/", g - g2);
Pa_StartStream(ss->stream);
g2 = glfwGetTime();
if (g2-g > threshold)printf("Pa_StartStream: %f/", g2 - g);
addStreams(ss);
g = glfwGetTime();
if (g-g2 > threshold)printf("addStreams: %f", g - g2);
//Pa_SetStreamFinishedCallback(ss, finishedCallback);
printf("\n");
}
return 0;
}
IDK why it's taking that long (because I don't know windows), but I can say you are going about this the wrong way. Specifically, you shouldn't make any timing expectations about opening a new stream. For example, I would expect similar issues (albeit to a much lesser degree) on OS X.
The correct implementation would be to always have a stream open, playing silence. Then, when you need to play a sound, you can play it right away. For best latency, you should pre-load the first few buffers from the file so you don't need to access the disk when playback starts. I don't know what the exact overhead is on windows for opening a stream (I'm sure it depends on the API), but on some versions of OS X, it's huge (the entire kernel switches into preemptive mode if no audio was running before).
That said, 1.3 seconds is insane. I recommend asking on the mailing list. Be sure to say what host-API you are using because you didn't say that here, and, for Windows, it matters. Also, what version of windows.
To minimise startup latency for this use-case (i.e. expecting StartStream() to give minimum startup latency) you should use the paPrimeOutputBuffersUsingStreamCallback stream flag. Otherwise the initial buffers will be zero and the time it takes for the sound to hit the DACs will include playing out the buffer length of zeros (which would be around 80ms on Windows WMME or DirectSound with the default PA settings).