How can I quantitatively measure gstreamer H264 latency between source and display? - gstreamer

I have a project where we are using gstreamer , x264, etc, to multicast a video stream over a local network to multiple receivers (dedicated computers attached to monitors). We're using gstreamer on both the video source (camera) systems and the display monitors.
We're using RTP, payload 96, and libx264 to encode the video stream (no audio).
But now I need to quantify the latency between (as close as possible to) frame acquisition and display.
Does anyone have suggestions that use the existing software?
Ideally I'd like to be able to run the testing software for a few hours to generate enough statistics to quantify the system. Meaning that I can't do one-off tests like point the source camera at the receiving display monitor displaying a high resolution and manually calculate the difference...
I do realise that using a pure software-only solution, I will not be able to quantify the video acquisition delay (i.e. CCD to framebuffer).
I can arrange that the system clocks on the source and display systems are synchronised to a high accuracy (using PTP), so I will be able to trust the system clocks (else I will use some software to track the difference between the system clocks and remove this from the test results).
In case it helps, the project applications are written in C++, so I can use C event callbacks, if they're available, to consider embedding system time in a custom header (e.g. frame xyz, encoded at time TTT - and use the same information on the receiver to calculate a difference).

I have a solution to this:
I wrote a gstreamer filter plugin (based on the plugin templates) that saves the system time when a frame is captured (and makes a mark on the video buffer) before passing it on the H.264 encoder and network transport.
On the receiving side, I locate the mark (which provides me with a 1 of 20 index) and again note the system time.
I hope it will be a relatively simple exercise to then correlate indices and compare system times. As long as the two system's clocks are reasonably in sync (or have a known difference), I should be able to calculate the difference (which is the latency).
The filter->source is set differently on the sender and the receiver, to determine the filter's timing behaviour.
/* chain function
* this function does the actual processing
*/
static GstFlowReturn
gst_my_filter_chain (GstPad * pad, GstBuffer * buf)
{
GstMyFilter *filter;
struct timeval nowTimeval;
guint8* data;
int i,j,offset;
filter = GST_MYFILTER (GST_OBJECT_PARENT (pad));
if (filter->startTime == 0){
filter->startTime = GST_BUFFER_TIMESTAMP(buf);
gettimeofday(&filter->startTimeval, NULL);
filter->startTimeUL = (filter->startTimeval.tv_sec*1e6 + filter->startTimeval.tv_usec)/1e3; // in milliseconds?
filter->index = 0;
GstCaps* caps;
gint width, height;
const GstStructure *str;
caps = GST_BUFFER_CAPS(buf);
str = gst_caps_get_structure (caps, 0);
if (!gst_structure_get_int (str, "width", &width) ||
!gst_structure_get_int (str, "height", &height)) {
g_print ("No width/height available\n");
} else {
g_print ("The video size of this set of capabilities is %dx%d\n",
width, height);
filter->width=width;
filter->height=height;
}
}
gettimeofday(&nowTimeval, NULL);
unsigned long timeNow = (nowTimeval.tv_sec*1e6 + nowTimeval.tv_usec)/1e3; // in milliseconds?
if (filter->silent == FALSE){
fprintf(filter->ofp, "%20lu,",
timeNow);
}
data = GST_BUFFER_DATA(buf);
if (filter->source){
offset = filter->index % 20;
for (i = 0; i < 10; i++){
for (j = 0; j < 10; j++){
data[(i+20)*filter->width+(j+offset*10)*1]=23;
}
}
fprintf(filter->ofp, " %u", offset);
} else {
unsigned long avg;
unsigned int min=(unsigned int)(-1UL);
unsigned int minpos=0;
int k=0;
for (k=0; k < 20; k++){
avg=0;
i=5; // in the middle of the box row
for (j = 0; j < 10; j++){
avg += data[(i+20)*filter->width+(j+k*10)*1];
}
if (avg < min){
min = avg;
minpos=k;
}
}
fprintf(filter->ofp, " %u", minpos);
}
fprintf(filter->ofp, "\n");
filter->index++;
/* just push out the incoming buffer without touching it */
return gst_pad_push (filter->srcpad, buf);
}
Usage is as follows:
Sender / server:
GST_DEBUG="*:2" gst-launch-0.10 -v --gst-plugin-path=../../src/.libs videotestsrc num-buffers=100 ! myfilter src=1 ! x264enc tune=zerolatency,speed-preset=fast ! rtph264pay ! udpsink port=3000 host=127.0.0.1
Receiver / client:
GST_DEBUG="*:2" gst-launch-0.10 -v --gst-plugin-path=../../src/.libs udpsrc port=3000 ! "application/x-rtp, media=(string)video, encoding-name=(string)H264, payload=(int)96" ! gstrtpjitterbuffer do-lost=true ! rtph264depay ! ffdec_h264 ! myfilter src=0 ! ffmpegcolorspace ! ximagesink
Obviously, in the testing implementation I am not going to be using localhost (127.0.0.1)!!
I use the --gst-plugin-path because I have not installed my timing filter.
The project requires a latency as small as possible - ideally 100ms or less. Now with some numbers, I can start fine tuning required parameters to minimize the latency.

I have done this before by writing a simple application that renders sequential numbers (say mod 60) and displays them to the screen. Then you can point your camera at the monitor, and have one of your client machines render that stream to a second monitor. Take a picture with you phone and look a the two numbers to compute your latency.

The latency-clock project has been brought to my attention, and I think it provides a much better solution!
It embeds a binary representation of the current time into the image buffer, and extracts that binary image on decode.
Obviously the system clocks must be synchronised!

Related

Fixing Real Time Audio with PortAudio in Windows 10

I created an application a couple of years ago that allowed me to process audio by downmixing a 6 channel or 8 channel a.k.a 5.1 as 7.1 as matrixed stereo encoded for that purpose I used the portaudio library with great results this is an example of the open stream function and callback to downmix a 7.1 signal
Pa_OpenStream(&Flujo, &inputParameters, &outParameters, SAMPLE_RATE, 1, paClipOff, ptrFunction, NULL);
Notice the use of framesPerBuffer value of just one (1), this is my callback function
int downmixed8channels(const void *input, void *output, unsigned long framesPerBuffer, const PaStreamCallbackTimeInfo * info, PaStreamCallbackFlags state, void * userData)
{
(void)userData;
(void)info;
(void)state;
(void)framesBuffer;
float *ptrInput = (float*)input;
float *ptrOutput = (float*)ouput;
/*This is a struct to identify samples*/
AudioSamples->L = ptrInput[0];
AudioSamples->R = ptrInput[1];
AudioSamples->C = ptrInput[2];
AudioSamples->LFE = ptrInput[3];
AudioSamples->RL = ptrInput[4];
AudioSamples->RR = ptrInput[5];
AudioSamples->SL = ptrInput[6];
AudioSamples->SR = ptrInput[7];
Encoder->8channels(AudioSamples->L,
AudioSamples->R,
AudioSamples->C,
AudioSamples->LFE,
MuestrasdeAudio->SL,
MuestrasdeAudio->SR,
MuestrasdeAudio->RL,
MuestrasdeAudio->RR,);
ptrOutput[0] = Encoder->gtLT();
ptrOutput[1] = Encoder->gtRT();
return paContinue;
}
As you can see the order set by the index in the output and input buffer correspond to a discrete channel
in the case of the output 0 = Left channel, 1 = right Channel. This used to work well, until entered Windows 10 2004, since I updated my system to this new version my audio glitch and I get artifacts like those
Those are captures from the sound of the channel test window under the audio device panel of windows. By the images is clear my program is dropping frames, so the first try to solve this was to use a larger buffer than one to hold samples process them and send then, the reason I did not use a buffer size larger than one in the first place was that the program would drop frames.
But before implementing a I did a proof of concept, would not include audio processing at all, of simple passing of data from input to ouput, for that I set the oputput channelCount parameters to 8 just like the input, resulting in something as simple as this.
for (int i = 0; i < FramesPerBuffer /*1000*/; i++)
{
ptrOutput[i] = ptrOutput[i];
}
but still the program is still dropping samples.
Next I used two callbacks one for writing to a buffer and a second one to read it and send it to output
(void)info;
(void)userData;
(void)state;
(void)output;
float* ptrInput = (float*)input;
for (int i = 0; i < FRAME_SIZE; i++)
{
buffer_input[i] = ptrInput[i];
}
return paContinue;
Callback to store.
(void)info;
(void)userData;
(void)state;
(void)output;
float* ptrOutput = (float*)output;
for (int i = 0; i < FRAME_SIZE; i++)
{
AudioSamples->L = (buffer_input[i] );
AudioSamples->R = (buffer_input[i++]);
AudioSamples->C = (buffer_input[i++] );
AudioSamples->LFE = (buffer_input[i++]);
AudioSamples->SL = (buffer_input[i++] );
AudioSamples->SR = (buffer_input[i++]);
Encoder->Encoder(AudioSamples->L, AudioSamples->R, AudioSamples->C, AudioSamples->LFE,
AudioSamples->SL, AudioSamples->SR);
bufferTransformed[w] = (Encoder->getLT() );
bufferTransformed[w++] = (Encoder->getRT() );
}
w = 0;
for (int i = 0; i < FRAME_REDUCED; i++)
{
ptrOutput[i] = buffer_Transformed[i];
}
return paContinue;
Callback for processing
The processing callback use a reduced frames per buffer since 2 channel is less than eight since it seems in portaudio a frame is composed of a sample for each audio channel.
This also did not work, since the first problem, is how to syncronize the two callback?, after all of this, what recommendation or advice, can you give me to solve this issue,
Notes: the samplerate must be same for both devices, I implemeted logic in the program to prevent this, the bitdepth is also the same I am using paFloat32,
.The portaudio is the modified one use by audacity, since I wanted to use their implementation of WASAPI
loopback
Thank very much in advance!.
At the end of the day it I did not have to change my callbacks functions in any way, what solved it, was changing or increasing the parameter ".suggestedLatency" of the input and output parameters, to 1.0, even the devices defaultLowOutputLatency or defaultHighOutputLatency values where causing to much glitching, I test it until 1.0 was de sweepspot, higher values did not seen to improve.
TL;DR Increased the suggestedLatency until the glitching is gone.

Realtime streaming with QAudioOutput

I am working on a C++ project to read/process/play raw audio from a microphone array system, with its own C++ API. I am using Qt to program the software.
From this post about Real Time Streaming With QAudioOutput (Qt), I wanted to follow up and ask for advice about what to do if the Raw Audio Data comes from a function call that takes about 1000ms (1 sec) to process? How would I still be able to achieve the real time audio playback.
It takes about about a second to process because I had read that when writing to QIODevice::QAudioFormat->start(); it is advisable to use a period's worth of bytes to prevent buffer underrun / overrun. http://cell0907.blogspot.sg/2012/10/qt-audio-output.html
I have set up a QByteArray and QDataStream to stream the data received from the function call.
The API is CcmXXX()
Reading the data from the microphone array returns an array of 32 bit integers
Of the 32 bit integers, 24 bits resolution, 8 bits LSB padded zeros.
It comes in block sizes (set at 1024 samples) x 40 microphones
Each chunk writes about one block, till the number of bytes written reaches close to the period size / free amount of bytes.
Tested: Connected my slots to a notify of about 50ms, to write one period worth of bytes. QByteArray in circular buffer style. Added a mutex lock/unlock at the read/write portions.
Result: Very short split ms of actual audio played, lots of jittering and non-recorded sounds.
Please do offer feedback on how I could improve my code.
Setting up QAudioFormat
void MainWindow::init_audio_format(){
m_format.setSampleRate(48000); //(8000, 11025, 16000, 22050, 32000, 44100, 48000, 88200, 96000, 192000
m_format.setByteOrder(QAudioFormat::LittleEndian);
m_format.setChannelCount(1);
m_format.setCodec("audio/pcm");
m_format.setSampleSize(32); //(8, 16, 24, 32, 48, 64)
m_format.setSampleType(QAudioFormat::SignedInt); //(SignedInt, UnSignedInt, Float)
m_device = QAudioDeviceInfo::defaultOutputDevice();
QAudioDeviceInfo info(m_device);
if (!info.isFormatSupported(m_format)) {
qWarning() << "Raw audio format not supported by backend, cannot play audio.";
return;
}
}
Initialising Audio and QByteArray/Datastream
void MainWindow::init_audio_output(){
m_bytearray.resize(65536);
mstream = new QDataStream(&m_bytearray,QIODevice::ReadWrite);
mstream->setByteOrder(QDataStream::LittleEndian);
audio = new QAudioOutput(m_device,m_format,this);
audio->setBufferSize(131072);
audio->setNotifyInterval(50);
m_audiodevice = audio->start();
connect(audio,SIGNAL(notify()),this,SLOT(slot_writedata()));
read_frames();
}
Slot:
void MainWindow::slot_writedata(){
QMutex mutex;
mutex.lock();
read_frames();
mutex.unlock();
}
To read the frames:
void MainWindow::read_frames(){
qint32* buffer;
int frameSize, byteCount=0;
DWORD tdFrames, fdFrames;
float fvalue = 0;
qint32 q32value;
frameSize = 40 * mBlockSize; //40 mics
buffer = new int[frameSize];
int periodBytes = audio->periodSize();
int freeBytes = audio->bytesFree();
int chunks = qMin(periodBytes/mBlockSize,freeBytes/mBlockSize);
CcmStartInput();
while(chunks){
CcmReadFrames(buffer,NULL,frameSize,0,&tdFrames,&fdFrames,NULL,CCM_WAIT);
if(tdFrames==0){
break;
}
int diffBytes = periodBytes - byteCount;
if(diffBytes>=(int)sizeof(q32value)*mBlockSize){
for(int x=0;x<mBlockSize;x++){
q32value = (quint32)buffer[x]/256;
*mstream << (qint32)fvalue;
byteCount+=sizeof(q32value);
}
}
else{
for(int x=0;x<(diffBytes/(int)sizeof(q32value));x++){
q32value = (quint32)buffer[x]/256;
*mstream << (qint32) fvalue;
byteCount+=sizeof(q32value);
}
}
--chunks;
}
CcmStopInput();
mPosEnd = mPos + byteCount;
write_frames();
mPos += byteCount;
if(mPos >= m_bytearray.length()){
mPos = 0;
mstream->device()->seek(0); //change mstream pointer back to bytearray start
}
}
To write the frames:
void MainWindow::write_frames()
{
int len = m_bytearray.length() - mPos;
int bytesWritten = mPosEnd - mPos;
if(len>=audio->periodSize()){
m_audiodevice->write(m_bytearray.data()+mPos, bytesWritten);
}
else{
w_data.replace(0,qAbs(len),m_bytearray.data()+mPos);
w_data.replace(qAbs(len),audio->periodSize()-abs(len),m_bytearray.data());
m_audiodevice->write(w_data.data(),audio->periodSize());
}
}
Audio support in Qt is actually quite rudimentary. The goal is to have media playback at the lowest possible implementation and maintenance cost. The situation is especially bad on windows, where I think the ancient MME API is still employed for audio playback.
As a result, the Qt audio API is very far from realtime, making it particularly ill-suited for such applications. I recommend using portaudio or rtaudio, which you can still wrap in Qt style IO devices if you will. This will give you access to better performing platform audio APIs and much better playback performance at very low latency.

Manage the playback speed and position of an MP3

I'm trying for several months to figure out how it works. I have a program that I'm developing, I have an mp3 file in and out I have the pcm that goes to "alsa" for playback. Using the library mpg123 where the main code is this:
while (mpg123_read (mh, buffer, buffer_size, & done) == MPG123_OK)
sendoutput (dev, buffer, done);
Now, my attempts have been based on the use of library avutil/avcodec on the buffer for reducing/increase the number of samples in one second. The result is awful and isn't audibly. In a previous question someone advised me to increase my PC performance but if a simple program like "VLC" can do this on old computers why I can't?
And for the problem of position in the audio file how can I achieve this?
Edit
I Add some piece of code to try to explain.
SampleConversion.c
#define LENGTH_MS 1000 // how many milliseconds of speech to store 0,5s:x=1:44100 x=22050 sample da memorizzare
#define RATE 44100 // the sampling rate (input)
struct AVResampleContext* audio_cntx = 0;
//(LENGTH_MS*RATE*16*CHANNELS)/8000
void inizializeResample(int inRate, int outRate)
{
audio_cntx = av_resample_init( outRate, //out rate
inRate, //in rate
16, //filter length
10, //phase count
0, //linear FIR filter
0.8 ); //cutoff frequency
assert( audio_cntx && "Failed to create resampling context!");
}
void resample(char dataIn[],char dataOut[],int nsamples)
{
int samples_consumed;
int samples_output = av_resample( audio_cntx, //resample context
(short*)dataOut, //buffout
(short*)dataIn, //buffin
&samples_consumed, //&consumed
nsamples, //nb_samples
sizeof(dataOut)/2,//lenout sizeof(out_buffer)/2 (Right?)
0);//is_last
assert( samples_output > 0 && "Error calling av_resample()!" );
}
void endResample()
{
av_resample_close( audio_cntx );
}
My edited play function (Mpg123.c)
if (isPaused==0 && mpg123_read(mh, buffer, buffer_size, &done) == MPG123_OK)
{
int i=0; char * resBuffer=malloc(sizeof(buffer));
//resBuffer=&buffer[0];
resample(buffer,resBuffer,44100);
if((ao_play(dev, (char*)resBuffer, done)==0)){
return 1;
}
}
Both codes are made by me so I can not ask anybody ever suggested improvements as in the previous question (although I do not know if they are right, sigh)
Edit2: Updated with changes
In the call to av_resample, samples_consumed is never read, so any unconsumed frames are skipped.
Furthermore, nsamples is the constant value 44100 instead of the actual number of frames read (done from mpg123_read).
sizeof(dataOut) is wrong; it's the size of a pointer.
is_last is wrong at the end of the input.
In the play function, sizeof(buffer) is likely to be wrong, depending on the definition of buffer.

why the output of mp3 decode sounds so delayed?(with ffmpeg mp3lame lib)

i'm recording sound and encoding to mp3 with ffmpeg lib. then decode the mp3 data right away, play the decode data, but it's sounds so delayed.
here are the codes:
the function encode first parameter accepts the raw pcm data, len = 44100.
encode parameters:
cntx_->channels = 1;
cntx_->sample_rate = 44100;
cntx_->sample_fmt = 6;
cntx_->channel_layout = AV_CH_LAYOUT_MONO;
cntx_->bit_rate = 8000;
err_ = avcodec_open2(cntx_, codec_, NULL);
vector<unsigned char> encode(unsigned char* encode_data, unsigned int len)
{
vector<unsigned char> ret;
AVPacket avpkt;
av_init_packet(&avpkt);
unsigned int len_encoded = 0;
int data_left = len / 2;
int miss_c = 0, i = 0;
while (data_left > 0)
{
int sz = data_left > cntx_->frame_size ? cntx_->frame_size : data_left;
mp3_frame_->nb_samples = sz;
mp3_frame_->format = cntx_->sample_fmt;
mp3_frame_->channel_layout = cntx_->channel_layout;
int needed_size = av_samples_get_buffer_size(NULL, 1,
mp3_frame_->nb_samples, cntx_->sample_fmt, 1);
int r = avcodec_fill_audio_frame(mp3_frame_, 1, cntx_->sample_fmt, encode_data + len_encoded, needed_size, 0);
int gotted = -1;
r = avcodec_encode_audio2(cntx_, &avpkt, mp3_frame_, &gotted);
if (gotted){
i++;
ret.insert(ret.end(), avpkt.data, avpkt.data + avpkt.size);
}
else if (gotted == 0){
miss_c++;
}
len_encoded += needed_size;
data_left -= sz;
av_free_packet(&avpkt);
}
return ret;
}
std::vector<unsigned char> decode(unsigned char* data, unsigned int len)
{
std::vector<unsigned char> ret;
AVPacket avpkt;
av_init_packet(&avpkt);
avpkt.data = data;
avpkt.size = len;
AVFrame* pframe = av_frame_alloc();
while (avpkt.size > 0){
int goted = -1;av_frame_unref(pframe);
int used = avcodec_decode_audio4(cntx_, pframe, &goted, &avpkt);
if (goted){
ret.insert(ret.end(), pframe->data[0], pframe->data[0] + pframe->linesize[0]);
avpkt.data += used;
avpkt.size -= used;
avpkt.dts = avpkt.pts = AV_NOPTS_VALUE;
}
else if (goted == 0){
avpkt.data += used;
avpkt.size -= used;
avpkt.dts = avpkt.pts = AV_NOPTS_VALUE;
}
else if(goted < 0){
break;
}
}
av_frame_free(&pframe);
return ret;
}
Suppose it's the 100th call to encode(data, len), this "frame" would appear in 150th or later in the decode call, the latency is not acceptable. It seems the mp3lame encoder would keep the sample data for later use, but not my desire.
I don't know what is going wrong. Thank you for any information.
today i debug the code again and post some detail:
encode: each pcm sample frame len = 23040 ,which is 10 times of mp3 frame size, each time call encode only output 9 frames, this output cause decode output 20736 samples, 1 frame(2304 bytes) is lost, and the sound is noisy.
if the mp3 or mp2 encode is not suitable for real time voice transfer, which encoder should i choose?
Suppose it's the 100th call to encode(data, len), this "frame" would appear in 150th or later in the decode call, the latency is not acceptable.
Understand how the codec works and adjust your expectations accordingly.
MP3 is a lossy codec. It works by converting your time domain PCM data to the frequency domain. This conversion alone requires time (because frequency components do not exist in any instant of time... they can only exist over a period of time). At a simple level, it then uses a handful of algorithms to determine what spectral information to keep, and what to throw away. Each MP3 frame is hundreds of samples long in duration. (576 is as low as you can typically go. Twice that is a typical number.)
Now that you have the minimum time for creating frames, MP3 also uses what is called a bit reservoir. If a complex passage requires more bandwidth, it borrows unused bandwidth from neighboring frames. To facilitate this, a buffer of many frames is required.
On top of all the codec work, FFmpeg itself has buffering (for detection of input and what not), and there are buffers in your pipes to and from FFmpeg. I would imagine the codec itself may also employ general buffering on the input and output.
Finally, you're decoding the stream and playing it back, which means that most of the same kinds of buffers used in encoding are now used for decoding. And, we're not even talking about the several hundred milliseconds of latency you have for getting the audio data through a sound card and out the analog to your speaker.
You have an unrealistic expectation, and while it is possible to tweak some things to reduce latency (such as disabling the bit reservoir), it will result in a poor quality stream and will not be of low latency anyway.

Generating Sounds at Runtime with C++

So I'm picking up C++ after a long hiatus and I had the idea to create a program which can generate music based upon strings of numbers at runtime (was inspired by the composition of Pi done by some people) with the eventual goal being some sort of procedural music generation software.
So far I have been able to make a really primitive version of this with the Beep() function and just feeding through the first so and so digits of Pi as a test. Works like a charm.
What I'm looking for now is how I could kick it up a notch and get some higher quality sound being made (because Beep() literally is the most primitive sound... ever) and I realized I have absolutely no idea how to do this. What I need is either a library or some sort of API that can:
1) Generate sound without pre-existing file. I want the result to be 100% generated by code and not rely on any samples, optimally.
2) If I could get something going that would be capable of playing multiple sounds at a time, like be able to play chords or a melody with a beat, that would be nice.
3) and If I could in any way control the wave it plays (kinda like chiptune mixers can) via equation or some other sort of data, that'd be super helpful.
I don't know if this is a weird request or I just researched it using the wrong terms, but I just wasn't able to find anything along these lines or at least nothing that was well documented at all. :/
If anyone can help, I'd really appreciate it.
EDIT: Also, apparently I'm just super not used to asking stuff on forums, my target platform is Windows (7, specifically, although I wouldn't think that matters).
I use portaudio (http://www.portaudio.com/). It will let you create PCM streams in a portable way. Then you just push the samples into the stream, and they will play.
#edit: using PortAudio is pretty easy. You initialize the library. I use floating point samples to make it super easy. I do it like this:
PaError err = Pa_Initialize();
if ( err != paNoError )
return false;
mPaParams.device = Pa_GetDefaultOutputDevice();
if ( mPaParams.device == paNoDevice )
return false;
mPaParams.channelCount = NUM_CHANNELS;
mPaParams.sampleFormat = paFloat32;
mPaParams.suggestedLatency =
Pa_GetDeviceInfo( mPaParams.device )->defaultLowOutputLatency;
mPaParams.hostApiSpecificStreamInfo = NULL;
Then later when you want to play sounds you create a stream, 2 channels for stereo, at 44khz, good for mp3 audio:
PaError err = Pa_OpenStream( &mPaStream,
NULL, // no input
&mPaParams,
44100, // params
NUM_FRAMES, // frames per buffer
0,
sndCallback,
this
);
Then you implement the callback to fill the PCM audio stream. The callback is a c function, but I just call through to my C++ class to handle the audio. I ripped this from my code, and it may not be 100% correct now as I removed a ton of stuff you won't care about. But its works kind of like this:
static int sndCallback( const void* inputBuffer,
void* outputBuffer,
unsigned long framesPerBuffer,
const PaStreamCallbackTimeInfo* timeInfo,
PaStreamCallbackFlags statusFlags,
void* userData )
{
Snd* snd = (Snd*)userData;
return snd->callback( (float*)outputBuffer, framesPerBuffer );
}
u32 Snd::callback( float* outbuf, u32 nFrames )
{
mPlayMutex.lock(); // use mutexes because this is asyc code!
// clear the output buffer
memset( outbuf, 0, nFrames * NUM_CHANNELS * sizeof( float ));
// mix all the sounds.
if ( mChannels.size() )
{
// I have multiple audio sources I'm mixing. That's what mChannels is.
for ( s32 i = mChannels.size(); i > 0; i-- )
{
for ( u32 j = 0; j < frameCount * NUM_CHANNELS; j++ )
{
float f = outbuf[j] + getNextSample( i ) // <------------------- your code here!!!
if ( f > 1.0 ) f = 1.0; // clamp it so you don't get clipping.
if ( f < -1.0 ) f = -1.0;
outbuf[j] = f;
}
}
}
mPlayMutex.unlock_p();
return 1; // when you are done playing audio return zero.
}
I answered a very similar question on this earlier this week: Note Synthesis, Harmonics (Violin, Piano, Guitar, Bass), Frequencies, MIDI . In your case if you don't want to rely on samples then the wavetable method is out. So your simplest option would be to dynamically vary the frequency and amplitude of sinusoids over time, which is easy but will sound pretty terrible (like a cheap Theremin). Your only real option would be a more sophisticated synthesis algorithm such as one of the Physical Modelling ones (eg Karplus-Strong). That would be an interesting project, but be warned that it does require something of a mathematical background.
You can indeed use something like Portaudio as Rafael has mentioned to physically get the sound out of the PC, in fact I think Portaudio is the best option for that. But generating the data so that it sounds musical is by far your biggest challenge.