Realtime streaming with QAudioOutput - c++

I am working on a C++ project to read/process/play raw audio from a microphone array system, with its own C++ API. I am using Qt to program the software.
From this post about Real Time Streaming With QAudioOutput (Qt), I wanted to follow up and ask for advice about what to do if the Raw Audio Data comes from a function call that takes about 1000ms (1 sec) to process? How would I still be able to achieve the real time audio playback.
It takes about about a second to process because I had read that when writing to QIODevice::QAudioFormat->start(); it is advisable to use a period's worth of bytes to prevent buffer underrun / overrun. http://cell0907.blogspot.sg/2012/10/qt-audio-output.html
I have set up a QByteArray and QDataStream to stream the data received from the function call.
The API is CcmXXX()
Reading the data from the microphone array returns an array of 32 bit integers
Of the 32 bit integers, 24 bits resolution, 8 bits LSB padded zeros.
It comes in block sizes (set at 1024 samples) x 40 microphones
Each chunk writes about one block, till the number of bytes written reaches close to the period size / free amount of bytes.
Tested: Connected my slots to a notify of about 50ms, to write one period worth of bytes. QByteArray in circular buffer style. Added a mutex lock/unlock at the read/write portions.
Result: Very short split ms of actual audio played, lots of jittering and non-recorded sounds.
Please do offer feedback on how I could improve my code.
Setting up QAudioFormat
void MainWindow::init_audio_format(){
m_format.setSampleRate(48000); //(8000, 11025, 16000, 22050, 32000, 44100, 48000, 88200, 96000, 192000
m_format.setByteOrder(QAudioFormat::LittleEndian);
m_format.setChannelCount(1);
m_format.setCodec("audio/pcm");
m_format.setSampleSize(32); //(8, 16, 24, 32, 48, 64)
m_format.setSampleType(QAudioFormat::SignedInt); //(SignedInt, UnSignedInt, Float)
m_device = QAudioDeviceInfo::defaultOutputDevice();
QAudioDeviceInfo info(m_device);
if (!info.isFormatSupported(m_format)) {
qWarning() << "Raw audio format not supported by backend, cannot play audio.";
return;
}
}
Initialising Audio and QByteArray/Datastream
void MainWindow::init_audio_output(){
m_bytearray.resize(65536);
mstream = new QDataStream(&m_bytearray,QIODevice::ReadWrite);
mstream->setByteOrder(QDataStream::LittleEndian);
audio = new QAudioOutput(m_device,m_format,this);
audio->setBufferSize(131072);
audio->setNotifyInterval(50);
m_audiodevice = audio->start();
connect(audio,SIGNAL(notify()),this,SLOT(slot_writedata()));
read_frames();
}
Slot:
void MainWindow::slot_writedata(){
QMutex mutex;
mutex.lock();
read_frames();
mutex.unlock();
}
To read the frames:
void MainWindow::read_frames(){
qint32* buffer;
int frameSize, byteCount=0;
DWORD tdFrames, fdFrames;
float fvalue = 0;
qint32 q32value;
frameSize = 40 * mBlockSize; //40 mics
buffer = new int[frameSize];
int periodBytes = audio->periodSize();
int freeBytes = audio->bytesFree();
int chunks = qMin(periodBytes/mBlockSize,freeBytes/mBlockSize);
CcmStartInput();
while(chunks){
CcmReadFrames(buffer,NULL,frameSize,0,&tdFrames,&fdFrames,NULL,CCM_WAIT);
if(tdFrames==0){
break;
}
int diffBytes = periodBytes - byteCount;
if(diffBytes>=(int)sizeof(q32value)*mBlockSize){
for(int x=0;x<mBlockSize;x++){
q32value = (quint32)buffer[x]/256;
*mstream << (qint32)fvalue;
byteCount+=sizeof(q32value);
}
}
else{
for(int x=0;x<(diffBytes/(int)sizeof(q32value));x++){
q32value = (quint32)buffer[x]/256;
*mstream << (qint32) fvalue;
byteCount+=sizeof(q32value);
}
}
--chunks;
}
CcmStopInput();
mPosEnd = mPos + byteCount;
write_frames();
mPos += byteCount;
if(mPos >= m_bytearray.length()){
mPos = 0;
mstream->device()->seek(0); //change mstream pointer back to bytearray start
}
}
To write the frames:
void MainWindow::write_frames()
{
int len = m_bytearray.length() - mPos;
int bytesWritten = mPosEnd - mPos;
if(len>=audio->periodSize()){
m_audiodevice->write(m_bytearray.data()+mPos, bytesWritten);
}
else{
w_data.replace(0,qAbs(len),m_bytearray.data()+mPos);
w_data.replace(qAbs(len),audio->periodSize()-abs(len),m_bytearray.data());
m_audiodevice->write(w_data.data(),audio->periodSize());
}
}

Audio support in Qt is actually quite rudimentary. The goal is to have media playback at the lowest possible implementation and maintenance cost. The situation is especially bad on windows, where I think the ancient MME API is still employed for audio playback.
As a result, the Qt audio API is very far from realtime, making it particularly ill-suited for such applications. I recommend using portaudio or rtaudio, which you can still wrap in Qt style IO devices if you will. This will give you access to better performing platform audio APIs and much better playback performance at very low latency.

Related

How to concurrently write to a file in c++(in other words, whats the fastest way to write to a file)

I'm building a graphics engine, and I need to write te result image to a .bmp file. I'm storing the pixels in a vector<Color>. While also saving the width and the heigth of the image. Currently I'm writing the image as follows(I didn't write this code myself):
std::ostream &img::operator<<(std::ostream &out, EasyImage const &image) {
//temporaryily enable exceptions on output stream
enable_exceptions(out, std::ios::badbit | std::ios::failbit);
//declare some struct-vars we're going to need:
bmpfile_magic magic;
bmpfile_header file_header;
bmp_header header;
uint8_t padding[] =
{0, 0, 0, 0};
//calculate the total size of the pixel data
unsigned int line_width = image.get_width() * 3; //3 bytes per pixel
unsigned int line_padding = 0;
if (line_width % 4 != 0) {
line_padding = 4 - (line_width % 4);
}
//lines must be aligned to a multiple of 4 bytes
line_width += line_padding;
unsigned int pixel_size = image.get_height() * line_width;
//start filling the headers
magic.magic[0] = 'B';
magic.magic[1] = 'M';
file_header.file_size = to_little_endian(pixel_size + sizeof(file_header) + sizeof(header) + sizeof(magic));
file_header.bmp_offset = to_little_endian(sizeof(file_header) + sizeof(header) + sizeof(magic));
file_header.reserved_1 = 0;
file_header.reserved_2 = 0;
header.header_size = to_little_endian(sizeof(header));
header.width = to_little_endian(image.get_width());
header.height = to_little_endian(image.get_height());
header.nplanes = to_little_endian(1);
header.bits_per_pixel = to_little_endian(24);//3bytes or 24 bits per pixel
header.compress_type = 0; //no compression
header.pixel_size = pixel_size;
header.hres = to_little_endian(11811); //11811 pixels/meter or 300dpi
header.vres = to_little_endian(11811); //11811 pixels/meter or 300dpi
header.ncolors = 0; //no color palette
header.nimpcolors = 0;//no important colors
//okay that should be all the header stuff: let's write it to the stream
out.write((char *) &magic, sizeof(magic));
out.write((char *) &file_header, sizeof(file_header));
out.write((char *) &header, sizeof(header));
//okay let's write the pixels themselves:
//they are arranged left->right, bottom->top, b,g,r
// this is the main bottleneck
for (unsigned int i = 0; i < image.get_height(); i++) {
//loop over all lines
for (unsigned int j = 0; j < image.get_width(); j++) {
//loop over all pixels in a line
//we cast &color to char*. since the color fields are ordered blue,green,red they should be written automatically
//in the right order
out.write((char *) &image(j, i), 3 * sizeof(uint8_t));
}
if (line_padding > 0)
out.write((char *) padding, line_padding);
}
//okay we should be done
return out;
}
As you can see, the pixels are being written one by one. This is quite slow, I put some timers in my program, and found that the writing was my main bottleneck.
I tried to write entire (horizontal) lines, but I did not find how to do it(best I found was this.
Secondly, I wanted to write to the file using multithreading(not sure if I need to use threading or processing). using openMP. But that means I need to specify which byte address to write to, I think, which I couldn't solve.
Latstly, I thought about immidiatly writing to the file whenever I drew an object, but then I had the same issue with writing to specific locations in the file.
So, my question is: what's the best(fastest) way to tackle this problem. (Compiling this for windows and linux)
The fastest method to write to a file is to use hardware assist. Write your output to memory (a.k.a. buffer), then tell the hardware device to transfer from memory to the file (disk).
The next fastest method is to write all the data to a buffer then block write the data to the file. If you want other tasks or threads to execute during your writing, then create a thread that writes the buffer to the file.
When writing to a file, the more data per transaction, the more efficient the write will be. For example, 1 write of 1024 bytes is faster than 1024 writes of one byte.
The idea is to keep the data streaming. Slowing down the transfer rate may be faster than a burst write, delay, burst write, delay, etc.
Remember that the disk is essentially a serial device (unless you have a special hard drive). Bits are laid down on the platters using a bit stream. Writing data in parallel will have adverse effects because the head will have to be moved between the parallel activities.
Remember that if you use more than one core, there will be more traffic on the data bus. The transfer to the file will have to pause while other threads/tasks are using the data bus. So, if you can, block all tasks, then transfer your data. :-)
I've written programs that copy from slow memory to fast memory, then transferred from fast memory to the hard drive. That was also using interrupts (threads).
Summary
Fast writing to a file involves:
Keep the data streaming; minimize the pauses.
Write in binary mode (no translations, please).
Write in blocks (format into memory as necessary before writing the block).
Maximize the data in a transaction.
Use separate writing thread, if you want other tasks running "concurrently".
The hard drive is a serial device, not parallel. Bits are written to the platters in a serial stream.

Fixing Real Time Audio with PortAudio in Windows 10

I created an application a couple of years ago that allowed me to process audio by downmixing a 6 channel or 8 channel a.k.a 5.1 as 7.1 as matrixed stereo encoded for that purpose I used the portaudio library with great results this is an example of the open stream function and callback to downmix a 7.1 signal
Pa_OpenStream(&Flujo, &inputParameters, &outParameters, SAMPLE_RATE, 1, paClipOff, ptrFunction, NULL);
Notice the use of framesPerBuffer value of just one (1), this is my callback function
int downmixed8channels(const void *input, void *output, unsigned long framesPerBuffer, const PaStreamCallbackTimeInfo * info, PaStreamCallbackFlags state, void * userData)
{
(void)userData;
(void)info;
(void)state;
(void)framesBuffer;
float *ptrInput = (float*)input;
float *ptrOutput = (float*)ouput;
/*This is a struct to identify samples*/
AudioSamples->L = ptrInput[0];
AudioSamples->R = ptrInput[1];
AudioSamples->C = ptrInput[2];
AudioSamples->LFE = ptrInput[3];
AudioSamples->RL = ptrInput[4];
AudioSamples->RR = ptrInput[5];
AudioSamples->SL = ptrInput[6];
AudioSamples->SR = ptrInput[7];
Encoder->8channels(AudioSamples->L,
AudioSamples->R,
AudioSamples->C,
AudioSamples->LFE,
MuestrasdeAudio->SL,
MuestrasdeAudio->SR,
MuestrasdeAudio->RL,
MuestrasdeAudio->RR,);
ptrOutput[0] = Encoder->gtLT();
ptrOutput[1] = Encoder->gtRT();
return paContinue;
}
As you can see the order set by the index in the output and input buffer correspond to a discrete channel
in the case of the output 0 = Left channel, 1 = right Channel. This used to work well, until entered Windows 10 2004, since I updated my system to this new version my audio glitch and I get artifacts like those
Those are captures from the sound of the channel test window under the audio device panel of windows. By the images is clear my program is dropping frames, so the first try to solve this was to use a larger buffer than one to hold samples process them and send then, the reason I did not use a buffer size larger than one in the first place was that the program would drop frames.
But before implementing a I did a proof of concept, would not include audio processing at all, of simple passing of data from input to ouput, for that I set the oputput channelCount parameters to 8 just like the input, resulting in something as simple as this.
for (int i = 0; i < FramesPerBuffer /*1000*/; i++)
{
ptrOutput[i] = ptrOutput[i];
}
but still the program is still dropping samples.
Next I used two callbacks one for writing to a buffer and a second one to read it and send it to output
(void)info;
(void)userData;
(void)state;
(void)output;
float* ptrInput = (float*)input;
for (int i = 0; i < FRAME_SIZE; i++)
{
buffer_input[i] = ptrInput[i];
}
return paContinue;
Callback to store.
(void)info;
(void)userData;
(void)state;
(void)output;
float* ptrOutput = (float*)output;
for (int i = 0; i < FRAME_SIZE; i++)
{
AudioSamples->L = (buffer_input[i] );
AudioSamples->R = (buffer_input[i++]);
AudioSamples->C = (buffer_input[i++] );
AudioSamples->LFE = (buffer_input[i++]);
AudioSamples->SL = (buffer_input[i++] );
AudioSamples->SR = (buffer_input[i++]);
Encoder->Encoder(AudioSamples->L, AudioSamples->R, AudioSamples->C, AudioSamples->LFE,
AudioSamples->SL, AudioSamples->SR);
bufferTransformed[w] = (Encoder->getLT() );
bufferTransformed[w++] = (Encoder->getRT() );
}
w = 0;
for (int i = 0; i < FRAME_REDUCED; i++)
{
ptrOutput[i] = buffer_Transformed[i];
}
return paContinue;
Callback for processing
The processing callback use a reduced frames per buffer since 2 channel is less than eight since it seems in portaudio a frame is composed of a sample for each audio channel.
This also did not work, since the first problem, is how to syncronize the two callback?, after all of this, what recommendation or advice, can you give me to solve this issue,
Notes: the samplerate must be same for both devices, I implemeted logic in the program to prevent this, the bitdepth is also the same I am using paFloat32,
.The portaudio is the modified one use by audacity, since I wanted to use their implementation of WASAPI
loopback
Thank very much in advance!.
At the end of the day it I did not have to change my callbacks functions in any way, what solved it, was changing or increasing the parameter ".suggestedLatency" of the input and output parameters, to 1.0, even the devices defaultLowOutputLatency or defaultHighOutputLatency values where causing to much glitching, I test it until 1.0 was de sweepspot, higher values did not seen to improve.
TL;DR Increased the suggestedLatency until the glitching is gone.

Realtime streaming with QAudioOutput (qt)

I want to play real-time sounds responding with no appreciable lag to user interaction.
To have low latency I have to send small chunks of pcm data.
What I am doing:
QAudioFormat format;
format.setSampleRate(22050);
format.setChannelCount(1);
format.setSampleSize(16);
format.setCodec("audio/pcm");
format.setByteOrder(QAudioFormat::LittleEndian);
format.setSampleType(QAudioFormat::SignedInt);
QAudioDeviceInfo info(QAudioDeviceInfo::defaultOutputDevice());
if (!info.isFormatSupported(format)) {
qWarning()<<"raw audio format not supported by backend, cannot play audio.";
return;
}
qAudioOutput = new QAudioOutput(format, NULL);
qAudioDevice=qAudioOutput->start();
and later
void Enqueue(TYPESAMPLEPCM *data,int countBytes){
while(qAudioOutput->bytesFree()<countBytes){
Sleep(1);
}
qAudioDevice->write((char *)data,countBytes);
}
The chunks of data are 256 bytes (128 samples that would give "granularity" of around 6 milliseconds.
Enqueue is called from a loop in a thread with high priority that provides the data chunks. There's no latency there as the speed it calls Enqueue is by far faster than rendering the audio data.
But it looks to me there's a buffer underrun situation because the sound plays but with kind of a "crackling" regular noise.
If I raise the chunk size to 256 samples the problem almost disappears. Only some crackling at the beginning (?)
The platform is Windows and Qt 5.3.
Is that the right procedure or I am missing something?
The issue is about
void Enqueue(TYPESAMPLEPCM *data,int countBytes){
while(qAudioOutput->bytesFree()<countBytes){
Sleep(1);
}
qAudioDevice->write((char *)data,countBytes);
}
being a little naive.
First of all, Sleep(1);. You are on windows. The problem is that windows is not a realtime os, and is expected to have a time resolution around 10 - 15 ms. Which means when there is no place for incoming audio you are going to sleep lot more than you expect.
Second. Do you really need to sleep when audio output cannot consume the amount of data which was provided? What you really want to is to provide some audio after the audio output has consumed some. In concrete terms, it means :
Setting the QAudioOutput notify interval, ie the period at which the system will consume the audio data and tell you about it.
Getting notified about QAudioOutput consuming some data. Which is in a slot connected to QAudioOutput::notify()
Buffering data chunks which come from your high priority thread when audio output is full.
This give :
QByteArray samplebuffer;
//init code
{
qAudioOutput = new QAudioOutput(format, NULL);
...
qAudioOutput->setNotifyInterval(128); //play with this number
connect(qAudioOutput, SIGNAL(notify), someobject, SLOT(OnAudioNotify));
...
qAudioDevice=qAudioOutput->start();
}
void EnqueueLock(TYPESAMPLEPCM *data,int countBytes)
{
//lock mutex
samplebuffer.append((char *)data,countBytes);
tryWritingSomeSampleData();
//unlock mutex
}
//slot
void SomeClass::OnAudioNotify()
{
//lock mutex
tryWritingSomeSampleData()
//unlock mutex
}
void SomeClass::tryWritingSomeSampleData()
{
int towritedevice = min(qAudioOutput->bytesFree(), samplebuffer.size());
if(towritedevice > 0)
{
qAudioDevice->write(samplebuffer.data(),towritedevice);
samplebuffer.remove(0,towritedevice); //pop front what is written
}
}
As you see you need to protect samplebuffer from concurrent access. Provide adequate mutex.

Manage the playback speed and position of an MP3

I'm trying for several months to figure out how it works. I have a program that I'm developing, I have an mp3 file in and out I have the pcm that goes to "alsa" for playback. Using the library mpg123 where the main code is this:
while (mpg123_read (mh, buffer, buffer_size, & done) == MPG123_OK)
sendoutput (dev, buffer, done);
Now, my attempts have been based on the use of library avutil/avcodec on the buffer for reducing/increase the number of samples in one second. The result is awful and isn't audibly. In a previous question someone advised me to increase my PC performance but if a simple program like "VLC" can do this on old computers why I can't?
And for the problem of position in the audio file how can I achieve this?
Edit
I Add some piece of code to try to explain.
SampleConversion.c
#define LENGTH_MS 1000 // how many milliseconds of speech to store 0,5s:x=1:44100 x=22050 sample da memorizzare
#define RATE 44100 // the sampling rate (input)
struct AVResampleContext* audio_cntx = 0;
//(LENGTH_MS*RATE*16*CHANNELS)/8000
void inizializeResample(int inRate, int outRate)
{
audio_cntx = av_resample_init( outRate, //out rate
inRate, //in rate
16, //filter length
10, //phase count
0, //linear FIR filter
0.8 ); //cutoff frequency
assert( audio_cntx && "Failed to create resampling context!");
}
void resample(char dataIn[],char dataOut[],int nsamples)
{
int samples_consumed;
int samples_output = av_resample( audio_cntx, //resample context
(short*)dataOut, //buffout
(short*)dataIn, //buffin
&samples_consumed, //&consumed
nsamples, //nb_samples
sizeof(dataOut)/2,//lenout sizeof(out_buffer)/2 (Right?)
0);//is_last
assert( samples_output > 0 && "Error calling av_resample()!" );
}
void endResample()
{
av_resample_close( audio_cntx );
}
My edited play function (Mpg123.c)
if (isPaused==0 && mpg123_read(mh, buffer, buffer_size, &done) == MPG123_OK)
{
int i=0; char * resBuffer=malloc(sizeof(buffer));
//resBuffer=&buffer[0];
resample(buffer,resBuffer,44100);
if((ao_play(dev, (char*)resBuffer, done)==0)){
return 1;
}
}
Both codes are made by me so I can not ask anybody ever suggested improvements as in the previous question (although I do not know if they are right, sigh)
Edit2: Updated with changes
In the call to av_resample, samples_consumed is never read, so any unconsumed frames are skipped.
Furthermore, nsamples is the constant value 44100 instead of the actual number of frames read (done from mpg123_read).
sizeof(dataOut) is wrong; it's the size of a pointer.
is_last is wrong at the end of the input.
In the play function, sizeof(buffer) is likely to be wrong, depending on the definition of buffer.

why the output of mp3 decode sounds so delayed?(with ffmpeg mp3lame lib)

i'm recording sound and encoding to mp3 with ffmpeg lib. then decode the mp3 data right away, play the decode data, but it's sounds so delayed.
here are the codes:
the function encode first parameter accepts the raw pcm data, len = 44100.
encode parameters:
cntx_->channels = 1;
cntx_->sample_rate = 44100;
cntx_->sample_fmt = 6;
cntx_->channel_layout = AV_CH_LAYOUT_MONO;
cntx_->bit_rate = 8000;
err_ = avcodec_open2(cntx_, codec_, NULL);
vector<unsigned char> encode(unsigned char* encode_data, unsigned int len)
{
vector<unsigned char> ret;
AVPacket avpkt;
av_init_packet(&avpkt);
unsigned int len_encoded = 0;
int data_left = len / 2;
int miss_c = 0, i = 0;
while (data_left > 0)
{
int sz = data_left > cntx_->frame_size ? cntx_->frame_size : data_left;
mp3_frame_->nb_samples = sz;
mp3_frame_->format = cntx_->sample_fmt;
mp3_frame_->channel_layout = cntx_->channel_layout;
int needed_size = av_samples_get_buffer_size(NULL, 1,
mp3_frame_->nb_samples, cntx_->sample_fmt, 1);
int r = avcodec_fill_audio_frame(mp3_frame_, 1, cntx_->sample_fmt, encode_data + len_encoded, needed_size, 0);
int gotted = -1;
r = avcodec_encode_audio2(cntx_, &avpkt, mp3_frame_, &gotted);
if (gotted){
i++;
ret.insert(ret.end(), avpkt.data, avpkt.data + avpkt.size);
}
else if (gotted == 0){
miss_c++;
}
len_encoded += needed_size;
data_left -= sz;
av_free_packet(&avpkt);
}
return ret;
}
std::vector<unsigned char> decode(unsigned char* data, unsigned int len)
{
std::vector<unsigned char> ret;
AVPacket avpkt;
av_init_packet(&avpkt);
avpkt.data = data;
avpkt.size = len;
AVFrame* pframe = av_frame_alloc();
while (avpkt.size > 0){
int goted = -1;av_frame_unref(pframe);
int used = avcodec_decode_audio4(cntx_, pframe, &goted, &avpkt);
if (goted){
ret.insert(ret.end(), pframe->data[0], pframe->data[0] + pframe->linesize[0]);
avpkt.data += used;
avpkt.size -= used;
avpkt.dts = avpkt.pts = AV_NOPTS_VALUE;
}
else if (goted == 0){
avpkt.data += used;
avpkt.size -= used;
avpkt.dts = avpkt.pts = AV_NOPTS_VALUE;
}
else if(goted < 0){
break;
}
}
av_frame_free(&pframe);
return ret;
}
Suppose it's the 100th call to encode(data, len), this "frame" would appear in 150th or later in the decode call, the latency is not acceptable. It seems the mp3lame encoder would keep the sample data for later use, but not my desire.
I don't know what is going wrong. Thank you for any information.
today i debug the code again and post some detail:
encode: each pcm sample frame len = 23040 ,which is 10 times of mp3 frame size, each time call encode only output 9 frames, this output cause decode output 20736 samples, 1 frame(2304 bytes) is lost, and the sound is noisy.
if the mp3 or mp2 encode is not suitable for real time voice transfer, which encoder should i choose?
Suppose it's the 100th call to encode(data, len), this "frame" would appear in 150th or later in the decode call, the latency is not acceptable.
Understand how the codec works and adjust your expectations accordingly.
MP3 is a lossy codec. It works by converting your time domain PCM data to the frequency domain. This conversion alone requires time (because frequency components do not exist in any instant of time... they can only exist over a period of time). At a simple level, it then uses a handful of algorithms to determine what spectral information to keep, and what to throw away. Each MP3 frame is hundreds of samples long in duration. (576 is as low as you can typically go. Twice that is a typical number.)
Now that you have the minimum time for creating frames, MP3 also uses what is called a bit reservoir. If a complex passage requires more bandwidth, it borrows unused bandwidth from neighboring frames. To facilitate this, a buffer of many frames is required.
On top of all the codec work, FFmpeg itself has buffering (for detection of input and what not), and there are buffers in your pipes to and from FFmpeg. I would imagine the codec itself may also employ general buffering on the input and output.
Finally, you're decoding the stream and playing it back, which means that most of the same kinds of buffers used in encoding are now used for decoding. And, we're not even talking about the several hundred milliseconds of latency you have for getting the audio data through a sound card and out the analog to your speaker.
You have an unrealistic expectation, and while it is possible to tweak some things to reduce latency (such as disabling the bit reservoir), it will result in a poor quality stream and will not be of low latency anyway.