Manage the playback speed and position of an MP3 - c++

I'm trying for several months to figure out how it works. I have a program that I'm developing, I have an mp3 file in and out I have the pcm that goes to "alsa" for playback. Using the library mpg123 where the main code is this:
while (mpg123_read (mh, buffer, buffer_size, & done) == MPG123_OK)
sendoutput (dev, buffer, done);
Now, my attempts have been based on the use of library avutil/avcodec on the buffer for reducing/increase the number of samples in one second. The result is awful and isn't audibly. In a previous question someone advised me to increase my PC performance but if a simple program like "VLC" can do this on old computers why I can't?
And for the problem of position in the audio file how can I achieve this?
Edit
I Add some piece of code to try to explain.
SampleConversion.c
#define LENGTH_MS 1000 // how many milliseconds of speech to store 0,5s:x=1:44100 x=22050 sample da memorizzare
#define RATE 44100 // the sampling rate (input)
struct AVResampleContext* audio_cntx = 0;
//(LENGTH_MS*RATE*16*CHANNELS)/8000
void inizializeResample(int inRate, int outRate)
{
audio_cntx = av_resample_init( outRate, //out rate
inRate, //in rate
16, //filter length
10, //phase count
0, //linear FIR filter
0.8 ); //cutoff frequency
assert( audio_cntx && "Failed to create resampling context!");
}
void resample(char dataIn[],char dataOut[],int nsamples)
{
int samples_consumed;
int samples_output = av_resample( audio_cntx, //resample context
(short*)dataOut, //buffout
(short*)dataIn, //buffin
&samples_consumed, //&consumed
nsamples, //nb_samples
sizeof(dataOut)/2,//lenout sizeof(out_buffer)/2 (Right?)
0);//is_last
assert( samples_output > 0 && "Error calling av_resample()!" );
}
void endResample()
{
av_resample_close( audio_cntx );
}
My edited play function (Mpg123.c)
if (isPaused==0 && mpg123_read(mh, buffer, buffer_size, &done) == MPG123_OK)
{
int i=0; char * resBuffer=malloc(sizeof(buffer));
//resBuffer=&buffer[0];
resample(buffer,resBuffer,44100);
if((ao_play(dev, (char*)resBuffer, done)==0)){
return 1;
}
}
Both codes are made by me so I can not ask anybody ever suggested improvements as in the previous question (although I do not know if they are right, sigh)
Edit2: Updated with changes

In the call to av_resample, samples_consumed is never read, so any unconsumed frames are skipped.
Furthermore, nsamples is the constant value 44100 instead of the actual number of frames read (done from mpg123_read).
sizeof(dataOut) is wrong; it's the size of a pointer.
is_last is wrong at the end of the input.
In the play function, sizeof(buffer) is likely to be wrong, depending on the definition of buffer.

Related

PortAudio To FFmpeg Resampling Resulting in Segmentation Fault

I am getting audio from a microphone with PortAudio (PA). I need to then resample this audio to 44,100KHz. I'm attempting to do this with FFmpeg. Currently, the mic I'm testing with has a sample rate of 48,000KHz, but this won't always be the case when the application is used. Anyway, whenever I attempt to resample with swr_convert, I get a segmentation fault. I am initializing the SwrContext with
this->swr_ctx = swr_alloc_set_opts(
nullptr, // No current context
num_channels, // The number of channls I'm getting from PA
AV_SAMPLE_FMT_S16, // 16 bit Signed, should correspond to paInt16
FINAL_SAMPLE_RATE, // 44100
num_channels, // The number of channls I'm getting from PA
AV_SAMPLE_FMT_S16, // 16 bit Signed, should correspond to paInt16
this->source_sample_rate, // Mic I'm testing with currently is 44800, but depends on source
0, // Logging offset (0 is what examples use, so I did too)
nullptr // "parent logging context, can be NULL"
);
I know PA is working right, as the project works if I hard-code the sample rate in other aspects of this project. The callback looks like this
auto paCallback( const void *inputBuffer, void *outputBuffer, unsigned long framesPerBuffer, const PaStreamCallbackTimeInfo* timeInfo, PaStreamCallbackFlags statusFlags, void *userData ) -> int {
// Calls class's callback handler
return ((Audio*)userData)->classPaCallback((const uint8_t **)inputBuffer);
}
// Class callback handler
auto Audio::classPaCallback(const uint8_t **inputBuffer) -> int {
// This line throws SIGSEGV
int out_count = swr_convert(swr_ctx, this->resample_buffer, BUFFER_CHUNK_SIZE, inputBuffer, this->source_buffer_size);
if (out_count < 0)
throw std::runtime_error("Error resampling audio");
}
// Add data to buffers to handle outside of callback context (This is a special context according to PA docs)
return 0;
}
Playing around with the swr_convert line, I changed the out_count and in_count parameters (BUFFER_CHUNK_SIZE and this->source_buffer_size) to 0 to make sure that the code would at least run, that worked. I then changed one of them to 1, and left the other at 0, to test which buffer access was throwing the SIGSEGV, and it was thrown when the in_count (buffer from PA) was not 0. What am I doing wrong when passing the audio from PA to FFMpeg?
I do know that the PA audio is "interleaved" (I.E. input[0] is the first sample from channel 0, input[1] is the first sample from channel 1, ect.). Is this also the format that FFMpeg uses, or should I create a different SwrContext for each channel?
In case this is helpful, this->resample_buffer is successfully initialized with
av_samples_alloc_array_and_samples(
&this->resample_buffer, // Buffer
nullptr, // "linesize", not used from what I could tell
this->num_channels, // number of channels expected
BUFFER_CHUNK_SIZE, // Number of frames to be stored per channel
AV_SAMPLE_FMT_S16, // 16 bit Signed, should correspond to paInt16
0 // For alignment, not needed here
);
Since resampling directly after receiving the data from PortAudio (PA) was not working, and no one said anything right away, I knew I wasn't making any "dumb" mistakes. After a bit of a break, I decided to try again, but instead of resampling right away, first adding it to the appropriate buffer (based on which channel it was from PA). Then, when removing the data from the buffer, I applied the resample. Doing it this way allowed me to do resampling on one channel, therefore not having to wonder how to include multiple channels for FFMpeg. For anyone who comes across this in the future, the code ended up looking like this
auto *converted_sample = (uint16_t *) malloc(sizeof(uint16_t) * this->converted_sample_max);
if (converted_sample == nullptr) {
throw std::runtime_error("Failed to allocate memory for converted sample");
}
uint16_t *sample = buffer.pop(); // Gets data from circular buffer
if (sample == nullptr) {
free(converted_sample); // IMPORTANT to not incur a memory leak
return return_data{nullptr, 0};
}
if (swr_ctx == nullptr) {
free(converted_sample); // IMPORTANT to not incur a memory leak
throw std::runtime_error("swr_ctx is not initialized");
}
int frames = swr_convert(swr_ctx, (uint8_t **)&converted_sample, this->converted_sample_max, (const uint8_t **)&sample, BUFFER_CHUNK_SIZE);
free(sample); // In my case, it is the job of whoever takes the data out of the circular buffer to free() it
if (frames < 0) {
free(converted_sample); // Prevent memory leak
throw std::runtime_error("no frames converted");
}
return return_data{converted_sample, frames}; // A struct I made
}
this->converted_sample_max is initialized with
this->converted_sample_max = av_rescale_rnd(BUFFER_CHUNK_SIZE, FINAL_SAMPLE_RATE, source_sample_rate, AV_ROUND_UP);

Fixing Real Time Audio with PortAudio in Windows 10

I created an application a couple of years ago that allowed me to process audio by downmixing a 6 channel or 8 channel a.k.a 5.1 as 7.1 as matrixed stereo encoded for that purpose I used the portaudio library with great results this is an example of the open stream function and callback to downmix a 7.1 signal
Pa_OpenStream(&Flujo, &inputParameters, &outParameters, SAMPLE_RATE, 1, paClipOff, ptrFunction, NULL);
Notice the use of framesPerBuffer value of just one (1), this is my callback function
int downmixed8channels(const void *input, void *output, unsigned long framesPerBuffer, const PaStreamCallbackTimeInfo * info, PaStreamCallbackFlags state, void * userData)
{
(void)userData;
(void)info;
(void)state;
(void)framesBuffer;
float *ptrInput = (float*)input;
float *ptrOutput = (float*)ouput;
/*This is a struct to identify samples*/
AudioSamples->L = ptrInput[0];
AudioSamples->R = ptrInput[1];
AudioSamples->C = ptrInput[2];
AudioSamples->LFE = ptrInput[3];
AudioSamples->RL = ptrInput[4];
AudioSamples->RR = ptrInput[5];
AudioSamples->SL = ptrInput[6];
AudioSamples->SR = ptrInput[7];
Encoder->8channels(AudioSamples->L,
AudioSamples->R,
AudioSamples->C,
AudioSamples->LFE,
MuestrasdeAudio->SL,
MuestrasdeAudio->SR,
MuestrasdeAudio->RL,
MuestrasdeAudio->RR,);
ptrOutput[0] = Encoder->gtLT();
ptrOutput[1] = Encoder->gtRT();
return paContinue;
}
As you can see the order set by the index in the output and input buffer correspond to a discrete channel
in the case of the output 0 = Left channel, 1 = right Channel. This used to work well, until entered Windows 10 2004, since I updated my system to this new version my audio glitch and I get artifacts like those
Those are captures from the sound of the channel test window under the audio device panel of windows. By the images is clear my program is dropping frames, so the first try to solve this was to use a larger buffer than one to hold samples process them and send then, the reason I did not use a buffer size larger than one in the first place was that the program would drop frames.
But before implementing a I did a proof of concept, would not include audio processing at all, of simple passing of data from input to ouput, for that I set the oputput channelCount parameters to 8 just like the input, resulting in something as simple as this.
for (int i = 0; i < FramesPerBuffer /*1000*/; i++)
{
ptrOutput[i] = ptrOutput[i];
}
but still the program is still dropping samples.
Next I used two callbacks one for writing to a buffer and a second one to read it and send it to output
(void)info;
(void)userData;
(void)state;
(void)output;
float* ptrInput = (float*)input;
for (int i = 0; i < FRAME_SIZE; i++)
{
buffer_input[i] = ptrInput[i];
}
return paContinue;
Callback to store.
(void)info;
(void)userData;
(void)state;
(void)output;
float* ptrOutput = (float*)output;
for (int i = 0; i < FRAME_SIZE; i++)
{
AudioSamples->L = (buffer_input[i] );
AudioSamples->R = (buffer_input[i++]);
AudioSamples->C = (buffer_input[i++] );
AudioSamples->LFE = (buffer_input[i++]);
AudioSamples->SL = (buffer_input[i++] );
AudioSamples->SR = (buffer_input[i++]);
Encoder->Encoder(AudioSamples->L, AudioSamples->R, AudioSamples->C, AudioSamples->LFE,
AudioSamples->SL, AudioSamples->SR);
bufferTransformed[w] = (Encoder->getLT() );
bufferTransformed[w++] = (Encoder->getRT() );
}
w = 0;
for (int i = 0; i < FRAME_REDUCED; i++)
{
ptrOutput[i] = buffer_Transformed[i];
}
return paContinue;
Callback for processing
The processing callback use a reduced frames per buffer since 2 channel is less than eight since it seems in portaudio a frame is composed of a sample for each audio channel.
This also did not work, since the first problem, is how to syncronize the two callback?, after all of this, what recommendation or advice, can you give me to solve this issue,
Notes: the samplerate must be same for both devices, I implemeted logic in the program to prevent this, the bitdepth is also the same I am using paFloat32,
.The portaudio is the modified one use by audacity, since I wanted to use their implementation of WASAPI
loopback
Thank very much in advance!.
At the end of the day it I did not have to change my callbacks functions in any way, what solved it, was changing or increasing the parameter ".suggestedLatency" of the input and output parameters, to 1.0, even the devices defaultLowOutputLatency or defaultHighOutputLatency values where causing to much glitching, I test it until 1.0 was de sweepspot, higher values did not seen to improve.
TL;DR Increased the suggestedLatency until the glitching is gone.

IMFTransform::ProcessOutput Efficiency

I've noticed that as apparently documented IMFTransform::ProcessOutput() for a resampler can only output one sample per call! I guess its more orientated at large frame size video coding. Given all the code I have been looking at as reference for related audio playback allocates one IMFMediaBuffer per call of ProcessOutput, this seems a little insane and terrible architecture - unless I am missing something?
It is especially bad from the point of view of media buffer usage. For example a SourceReader decoding my test MP3 gives me chunks of about 64KB in one sample with one buffer. Which is sensible. But GetOutputStreamInfo() is requesting a media buffer of just 24 bytes per call for ProcessOutput().
64KB chunks => chopped into many 24B chunks => to further processing, seems very daft overhead (the resampler would be doing a lot of overhead per every 24 bytes, and enforcing that overhead later down the pipeline if its not consolidated).
From https://learn.microsoft.com/en-us/windows/win32/api/mftransform/nf-mftransform-imftransform-processoutput
Its says:
The MFT cannot return more than one sample per stream in a single call to ProcessOutput
The MFT writes the output data to the start of the buffer, overwriting any data that already exists in the buffer
So it's not even the case it can append to the end of partially full buffer attached to the sample.
I could create my own pooling object that supports the media buffers interface but pointer bumps into a vanilla locked media buffer I guess. The only other option seemingly being to lock/copy those 24 bytes to another larger buffer for processing. But this all seems excessive, and at the wrong granularity.
What is the best way to deal with this?
Here is a simplified sketch of my test so far:
...
status = transform->ProcessInput(0, sample, 0);
sample->Release();
while(1)
{
MFT_OUTPUT_STREAM_INFO outDetails{};
MFT_OUTPUT_DATA_BUFFER outData{};
IMFMediaBuffer* outBuffer;
IMFSample* outSample;
DWORD outStatus;
status = transform->GetOutputStreamInfo(0, &outDetails);
status = MFCreateAlignedMemoryBuffer(outDetails.cbSize, outDetails.cbAlignment, &outBuffer);
status = MFCreateSample(&outSample);
status = outSample->AddBuffer(outBuffer);
outBuffer->Release();
outData.pSample = outSample;
status = transform->ProcessOutput(0, 1, &outData, &outStatus);
if (status == MF_E_TRANSFORM_NEED_MORE_INPUT)
break;
...
}
I wrote some code for you to prove that audio resamper is capable to process large audio blocks at once. It is good, efficient processing style:
winrt::com_ptr<IMFTransform> Transform;
winrt::check_hresult(CoCreateInstance(CLSID_CResamplerMediaObject, nullptr, CLSCTX_ALL, IID_PPV_ARGS(Transform.put())));
WAVEFORMATEX InputWaveFormatEx { WAVE_FORMAT_PCM, 1, 44100, 44100 * 2, 2, 16 };
WAVEFORMATEX OutputWaveFormatEx { WAVE_FORMAT_PCM, 1, 48000, 48000 * 2, 2, 16 };
winrt::com_ptr<IMFMediaType> InputMediaType;
winrt::check_hresult(MFCreateMediaType(InputMediaType.put()));
winrt::check_hresult(MFInitMediaTypeFromWaveFormatEx(InputMediaType.get(), &InputWaveFormatEx, sizeof InputWaveFormatEx));
winrt::com_ptr<IMFMediaType> OutputMediaType;
winrt::check_hresult(MFCreateMediaType(OutputMediaType.put()));
winrt::check_hresult(MFInitMediaTypeFromWaveFormatEx(OutputMediaType.get(), &OutputWaveFormatEx, sizeof OutputWaveFormatEx));
winrt::check_hresult(Transform->SetInputType(0, InputMediaType.get(), 0));
winrt::check_hresult(Transform->SetOutputType(0, OutputMediaType.get(), 0));
MFT_OUTPUT_STREAM_INFO OutputStreamInfo { };
winrt::check_hresult(Transform->GetOutputStreamInfo(0, &OutputStreamInfo));
_A(!(OutputStreamInfo.dwFlags & MFT_OUTPUT_STREAM_SINGLE_SAMPLE_PER_BUFFER));
DWORD const InputMediaBufferSize = InputWaveFormatEx.nAvgBytesPerSec;
winrt::com_ptr<IMFMediaBuffer> InputMediaBuffer;
winrt::check_hresult(MFCreateMemoryBuffer(InputMediaBufferSize, InputMediaBuffer.put()));
winrt::check_hresult(InputMediaBuffer->SetCurrentLength(InputMediaBufferSize));
winrt::com_ptr<IMFSample> InputSample;
winrt::check_hresult(MFCreateSample(InputSample.put()));
winrt::check_hresult(InputSample->AddBuffer(InputMediaBuffer.get()));
winrt::check_hresult(Transform->ProcessInput(0, InputSample.get(), 0));
DWORD const OutputMediaBufferCapacity = OutputWaveFormatEx.nAvgBytesPerSec;
winrt::com_ptr<IMFMediaBuffer> OutputMediaBuffer;
winrt::check_hresult(MFCreateMemoryBuffer(OutputMediaBufferCapacity, OutputMediaBuffer.put()));
winrt::check_hresult(OutputMediaBuffer->SetCurrentLength(0));
winrt::com_ptr<IMFSample> OutputSample;
winrt::check_hresult(MFCreateSample(OutputSample.put()));
winrt::check_hresult(OutputSample->AddBuffer(OutputMediaBuffer.get()));
MFT_OUTPUT_DATA_BUFFER OutputDataBuffer { 0, OutputSample.get() };
DWORD Status;
winrt::check_hresult(Transform->ProcessOutput(0, 1, &OutputDataBuffer, &Status));
DWORD OutputMediaBufferSize = 0;
winrt::check_hresult(OutputMediaBuffer->GetCurrentLength(&OutputMediaBufferSize));
You can see that after feeding one second of input, the output holds [almost] one second of data as expected.

Realtime streaming with QAudioOutput

I am working on a C++ project to read/process/play raw audio from a microphone array system, with its own C++ API. I am using Qt to program the software.
From this post about Real Time Streaming With QAudioOutput (Qt), I wanted to follow up and ask for advice about what to do if the Raw Audio Data comes from a function call that takes about 1000ms (1 sec) to process? How would I still be able to achieve the real time audio playback.
It takes about about a second to process because I had read that when writing to QIODevice::QAudioFormat->start(); it is advisable to use a period's worth of bytes to prevent buffer underrun / overrun. http://cell0907.blogspot.sg/2012/10/qt-audio-output.html
I have set up a QByteArray and QDataStream to stream the data received from the function call.
The API is CcmXXX()
Reading the data from the microphone array returns an array of 32 bit integers
Of the 32 bit integers, 24 bits resolution, 8 bits LSB padded zeros.
It comes in block sizes (set at 1024 samples) x 40 microphones
Each chunk writes about one block, till the number of bytes written reaches close to the period size / free amount of bytes.
Tested: Connected my slots to a notify of about 50ms, to write one period worth of bytes. QByteArray in circular buffer style. Added a mutex lock/unlock at the read/write portions.
Result: Very short split ms of actual audio played, lots of jittering and non-recorded sounds.
Please do offer feedback on how I could improve my code.
Setting up QAudioFormat
void MainWindow::init_audio_format(){
m_format.setSampleRate(48000); //(8000, 11025, 16000, 22050, 32000, 44100, 48000, 88200, 96000, 192000
m_format.setByteOrder(QAudioFormat::LittleEndian);
m_format.setChannelCount(1);
m_format.setCodec("audio/pcm");
m_format.setSampleSize(32); //(8, 16, 24, 32, 48, 64)
m_format.setSampleType(QAudioFormat::SignedInt); //(SignedInt, UnSignedInt, Float)
m_device = QAudioDeviceInfo::defaultOutputDevice();
QAudioDeviceInfo info(m_device);
if (!info.isFormatSupported(m_format)) {
qWarning() << "Raw audio format not supported by backend, cannot play audio.";
return;
}
}
Initialising Audio and QByteArray/Datastream
void MainWindow::init_audio_output(){
m_bytearray.resize(65536);
mstream = new QDataStream(&m_bytearray,QIODevice::ReadWrite);
mstream->setByteOrder(QDataStream::LittleEndian);
audio = new QAudioOutput(m_device,m_format,this);
audio->setBufferSize(131072);
audio->setNotifyInterval(50);
m_audiodevice = audio->start();
connect(audio,SIGNAL(notify()),this,SLOT(slot_writedata()));
read_frames();
}
Slot:
void MainWindow::slot_writedata(){
QMutex mutex;
mutex.lock();
read_frames();
mutex.unlock();
}
To read the frames:
void MainWindow::read_frames(){
qint32* buffer;
int frameSize, byteCount=0;
DWORD tdFrames, fdFrames;
float fvalue = 0;
qint32 q32value;
frameSize = 40 * mBlockSize; //40 mics
buffer = new int[frameSize];
int periodBytes = audio->periodSize();
int freeBytes = audio->bytesFree();
int chunks = qMin(periodBytes/mBlockSize,freeBytes/mBlockSize);
CcmStartInput();
while(chunks){
CcmReadFrames(buffer,NULL,frameSize,0,&tdFrames,&fdFrames,NULL,CCM_WAIT);
if(tdFrames==0){
break;
}
int diffBytes = periodBytes - byteCount;
if(diffBytes>=(int)sizeof(q32value)*mBlockSize){
for(int x=0;x<mBlockSize;x++){
q32value = (quint32)buffer[x]/256;
*mstream << (qint32)fvalue;
byteCount+=sizeof(q32value);
}
}
else{
for(int x=0;x<(diffBytes/(int)sizeof(q32value));x++){
q32value = (quint32)buffer[x]/256;
*mstream << (qint32) fvalue;
byteCount+=sizeof(q32value);
}
}
--chunks;
}
CcmStopInput();
mPosEnd = mPos + byteCount;
write_frames();
mPos += byteCount;
if(mPos >= m_bytearray.length()){
mPos = 0;
mstream->device()->seek(0); //change mstream pointer back to bytearray start
}
}
To write the frames:
void MainWindow::write_frames()
{
int len = m_bytearray.length() - mPos;
int bytesWritten = mPosEnd - mPos;
if(len>=audio->periodSize()){
m_audiodevice->write(m_bytearray.data()+mPos, bytesWritten);
}
else{
w_data.replace(0,qAbs(len),m_bytearray.data()+mPos);
w_data.replace(qAbs(len),audio->periodSize()-abs(len),m_bytearray.data());
m_audiodevice->write(w_data.data(),audio->periodSize());
}
}
Audio support in Qt is actually quite rudimentary. The goal is to have media playback at the lowest possible implementation and maintenance cost. The situation is especially bad on windows, where I think the ancient MME API is still employed for audio playback.
As a result, the Qt audio API is very far from realtime, making it particularly ill-suited for such applications. I recommend using portaudio or rtaudio, which you can still wrap in Qt style IO devices if you will. This will give you access to better performing platform audio APIs and much better playback performance at very low latency.

Generating Sounds at Runtime with C++

So I'm picking up C++ after a long hiatus and I had the idea to create a program which can generate music based upon strings of numbers at runtime (was inspired by the composition of Pi done by some people) with the eventual goal being some sort of procedural music generation software.
So far I have been able to make a really primitive version of this with the Beep() function and just feeding through the first so and so digits of Pi as a test. Works like a charm.
What I'm looking for now is how I could kick it up a notch and get some higher quality sound being made (because Beep() literally is the most primitive sound... ever) and I realized I have absolutely no idea how to do this. What I need is either a library or some sort of API that can:
1) Generate sound without pre-existing file. I want the result to be 100% generated by code and not rely on any samples, optimally.
2) If I could get something going that would be capable of playing multiple sounds at a time, like be able to play chords or a melody with a beat, that would be nice.
3) and If I could in any way control the wave it plays (kinda like chiptune mixers can) via equation or some other sort of data, that'd be super helpful.
I don't know if this is a weird request or I just researched it using the wrong terms, but I just wasn't able to find anything along these lines or at least nothing that was well documented at all. :/
If anyone can help, I'd really appreciate it.
EDIT: Also, apparently I'm just super not used to asking stuff on forums, my target platform is Windows (7, specifically, although I wouldn't think that matters).
I use portaudio (http://www.portaudio.com/). It will let you create PCM streams in a portable way. Then you just push the samples into the stream, and they will play.
#edit: using PortAudio is pretty easy. You initialize the library. I use floating point samples to make it super easy. I do it like this:
PaError err = Pa_Initialize();
if ( err != paNoError )
return false;
mPaParams.device = Pa_GetDefaultOutputDevice();
if ( mPaParams.device == paNoDevice )
return false;
mPaParams.channelCount = NUM_CHANNELS;
mPaParams.sampleFormat = paFloat32;
mPaParams.suggestedLatency =
Pa_GetDeviceInfo( mPaParams.device )->defaultLowOutputLatency;
mPaParams.hostApiSpecificStreamInfo = NULL;
Then later when you want to play sounds you create a stream, 2 channels for stereo, at 44khz, good for mp3 audio:
PaError err = Pa_OpenStream( &mPaStream,
NULL, // no input
&mPaParams,
44100, // params
NUM_FRAMES, // frames per buffer
0,
sndCallback,
this
);
Then you implement the callback to fill the PCM audio stream. The callback is a c function, but I just call through to my C++ class to handle the audio. I ripped this from my code, and it may not be 100% correct now as I removed a ton of stuff you won't care about. But its works kind of like this:
static int sndCallback( const void* inputBuffer,
void* outputBuffer,
unsigned long framesPerBuffer,
const PaStreamCallbackTimeInfo* timeInfo,
PaStreamCallbackFlags statusFlags,
void* userData )
{
Snd* snd = (Snd*)userData;
return snd->callback( (float*)outputBuffer, framesPerBuffer );
}
u32 Snd::callback( float* outbuf, u32 nFrames )
{
mPlayMutex.lock(); // use mutexes because this is asyc code!
// clear the output buffer
memset( outbuf, 0, nFrames * NUM_CHANNELS * sizeof( float ));
// mix all the sounds.
if ( mChannels.size() )
{
// I have multiple audio sources I'm mixing. That's what mChannels is.
for ( s32 i = mChannels.size(); i > 0; i-- )
{
for ( u32 j = 0; j < frameCount * NUM_CHANNELS; j++ )
{
float f = outbuf[j] + getNextSample( i ) // <------------------- your code here!!!
if ( f > 1.0 ) f = 1.0; // clamp it so you don't get clipping.
if ( f < -1.0 ) f = -1.0;
outbuf[j] = f;
}
}
}
mPlayMutex.unlock_p();
return 1; // when you are done playing audio return zero.
}
I answered a very similar question on this earlier this week: Note Synthesis, Harmonics (Violin, Piano, Guitar, Bass), Frequencies, MIDI . In your case if you don't want to rely on samples then the wavetable method is out. So your simplest option would be to dynamically vary the frequency and amplitude of sinusoids over time, which is easy but will sound pretty terrible (like a cheap Theremin). Your only real option would be a more sophisticated synthesis algorithm such as one of the Physical Modelling ones (eg Karplus-Strong). That would be an interesting project, but be warned that it does require something of a mathematical background.
You can indeed use something like Portaudio as Rafael has mentioned to physically get the sound out of the PC, in fact I think Portaudio is the best option for that. But generating the data so that it sounds musical is by far your biggest challenge.