Realtime streaming with QAudioOutput (qt) - c++

I want to play real-time sounds responding with no appreciable lag to user interaction.
To have low latency I have to send small chunks of pcm data.
What I am doing:
QAudioFormat format;
format.setSampleRate(22050);
format.setChannelCount(1);
format.setSampleSize(16);
format.setCodec("audio/pcm");
format.setByteOrder(QAudioFormat::LittleEndian);
format.setSampleType(QAudioFormat::SignedInt);
QAudioDeviceInfo info(QAudioDeviceInfo::defaultOutputDevice());
if (!info.isFormatSupported(format)) {
qWarning()<<"raw audio format not supported by backend, cannot play audio.";
return;
}
qAudioOutput = new QAudioOutput(format, NULL);
qAudioDevice=qAudioOutput->start();
and later
void Enqueue(TYPESAMPLEPCM *data,int countBytes){
while(qAudioOutput->bytesFree()<countBytes){
Sleep(1);
}
qAudioDevice->write((char *)data,countBytes);
}
The chunks of data are 256 bytes (128 samples that would give "granularity" of around 6 milliseconds.
Enqueue is called from a loop in a thread with high priority that provides the data chunks. There's no latency there as the speed it calls Enqueue is by far faster than rendering the audio data.
But it looks to me there's a buffer underrun situation because the sound plays but with kind of a "crackling" regular noise.
If I raise the chunk size to 256 samples the problem almost disappears. Only some crackling at the beginning (?)
The platform is Windows and Qt 5.3.
Is that the right procedure or I am missing something?

The issue is about
void Enqueue(TYPESAMPLEPCM *data,int countBytes){
while(qAudioOutput->bytesFree()<countBytes){
Sleep(1);
}
qAudioDevice->write((char *)data,countBytes);
}
being a little naive.
First of all, Sleep(1);. You are on windows. The problem is that windows is not a realtime os, and is expected to have a time resolution around 10 - 15 ms. Which means when there is no place for incoming audio you are going to sleep lot more than you expect.
Second. Do you really need to sleep when audio output cannot consume the amount of data which was provided? What you really want to is to provide some audio after the audio output has consumed some. In concrete terms, it means :
Setting the QAudioOutput notify interval, ie the period at which the system will consume the audio data and tell you about it.
Getting notified about QAudioOutput consuming some data. Which is in a slot connected to QAudioOutput::notify()
Buffering data chunks which come from your high priority thread when audio output is full.
This give :
QByteArray samplebuffer;
//init code
{
qAudioOutput = new QAudioOutput(format, NULL);
...
qAudioOutput->setNotifyInterval(128); //play with this number
connect(qAudioOutput, SIGNAL(notify), someobject, SLOT(OnAudioNotify));
...
qAudioDevice=qAudioOutput->start();
}
void EnqueueLock(TYPESAMPLEPCM *data,int countBytes)
{
//lock mutex
samplebuffer.append((char *)data,countBytes);
tryWritingSomeSampleData();
//unlock mutex
}
//slot
void SomeClass::OnAudioNotify()
{
//lock mutex
tryWritingSomeSampleData()
//unlock mutex
}
void SomeClass::tryWritingSomeSampleData()
{
int towritedevice = min(qAudioOutput->bytesFree(), samplebuffer.size());
if(towritedevice > 0)
{
qAudioDevice->write(samplebuffer.data(),towritedevice);
samplebuffer.remove(0,towritedevice); //pop front what is written
}
}
As you see you need to protect samplebuffer from concurrent access. Provide adequate mutex.

Related

Realtime streaming with QAudioOutput

I am working on a C++ project to read/process/play raw audio from a microphone array system, with its own C++ API. I am using Qt to program the software.
From this post about Real Time Streaming With QAudioOutput (Qt), I wanted to follow up and ask for advice about what to do if the Raw Audio Data comes from a function call that takes about 1000ms (1 sec) to process? How would I still be able to achieve the real time audio playback.
It takes about about a second to process because I had read that when writing to QIODevice::QAudioFormat->start(); it is advisable to use a period's worth of bytes to prevent buffer underrun / overrun. http://cell0907.blogspot.sg/2012/10/qt-audio-output.html
I have set up a QByteArray and QDataStream to stream the data received from the function call.
The API is CcmXXX()
Reading the data from the microphone array returns an array of 32 bit integers
Of the 32 bit integers, 24 bits resolution, 8 bits LSB padded zeros.
It comes in block sizes (set at 1024 samples) x 40 microphones
Each chunk writes about one block, till the number of bytes written reaches close to the period size / free amount of bytes.
Tested: Connected my slots to a notify of about 50ms, to write one period worth of bytes. QByteArray in circular buffer style. Added a mutex lock/unlock at the read/write portions.
Result: Very short split ms of actual audio played, lots of jittering and non-recorded sounds.
Please do offer feedback on how I could improve my code.
Setting up QAudioFormat
void MainWindow::init_audio_format(){
m_format.setSampleRate(48000); //(8000, 11025, 16000, 22050, 32000, 44100, 48000, 88200, 96000, 192000
m_format.setByteOrder(QAudioFormat::LittleEndian);
m_format.setChannelCount(1);
m_format.setCodec("audio/pcm");
m_format.setSampleSize(32); //(8, 16, 24, 32, 48, 64)
m_format.setSampleType(QAudioFormat::SignedInt); //(SignedInt, UnSignedInt, Float)
m_device = QAudioDeviceInfo::defaultOutputDevice();
QAudioDeviceInfo info(m_device);
if (!info.isFormatSupported(m_format)) {
qWarning() << "Raw audio format not supported by backend, cannot play audio.";
return;
}
}
Initialising Audio and QByteArray/Datastream
void MainWindow::init_audio_output(){
m_bytearray.resize(65536);
mstream = new QDataStream(&m_bytearray,QIODevice::ReadWrite);
mstream->setByteOrder(QDataStream::LittleEndian);
audio = new QAudioOutput(m_device,m_format,this);
audio->setBufferSize(131072);
audio->setNotifyInterval(50);
m_audiodevice = audio->start();
connect(audio,SIGNAL(notify()),this,SLOT(slot_writedata()));
read_frames();
}
Slot:
void MainWindow::slot_writedata(){
QMutex mutex;
mutex.lock();
read_frames();
mutex.unlock();
}
To read the frames:
void MainWindow::read_frames(){
qint32* buffer;
int frameSize, byteCount=0;
DWORD tdFrames, fdFrames;
float fvalue = 0;
qint32 q32value;
frameSize = 40 * mBlockSize; //40 mics
buffer = new int[frameSize];
int periodBytes = audio->periodSize();
int freeBytes = audio->bytesFree();
int chunks = qMin(periodBytes/mBlockSize,freeBytes/mBlockSize);
CcmStartInput();
while(chunks){
CcmReadFrames(buffer,NULL,frameSize,0,&tdFrames,&fdFrames,NULL,CCM_WAIT);
if(tdFrames==0){
break;
}
int diffBytes = periodBytes - byteCount;
if(diffBytes>=(int)sizeof(q32value)*mBlockSize){
for(int x=0;x<mBlockSize;x++){
q32value = (quint32)buffer[x]/256;
*mstream << (qint32)fvalue;
byteCount+=sizeof(q32value);
}
}
else{
for(int x=0;x<(diffBytes/(int)sizeof(q32value));x++){
q32value = (quint32)buffer[x]/256;
*mstream << (qint32) fvalue;
byteCount+=sizeof(q32value);
}
}
--chunks;
}
CcmStopInput();
mPosEnd = mPos + byteCount;
write_frames();
mPos += byteCount;
if(mPos >= m_bytearray.length()){
mPos = 0;
mstream->device()->seek(0); //change mstream pointer back to bytearray start
}
}
To write the frames:
void MainWindow::write_frames()
{
int len = m_bytearray.length() - mPos;
int bytesWritten = mPosEnd - mPos;
if(len>=audio->periodSize()){
m_audiodevice->write(m_bytearray.data()+mPos, bytesWritten);
}
else{
w_data.replace(0,qAbs(len),m_bytearray.data()+mPos);
w_data.replace(qAbs(len),audio->periodSize()-abs(len),m_bytearray.data());
m_audiodevice->write(w_data.data(),audio->periodSize());
}
}
Audio support in Qt is actually quite rudimentary. The goal is to have media playback at the lowest possible implementation and maintenance cost. The situation is especially bad on windows, where I think the ancient MME API is still employed for audio playback.
As a result, the Qt audio API is very far from realtime, making it particularly ill-suited for such applications. I recommend using portaudio or rtaudio, which you can still wrap in Qt style IO devices if you will. This will give you access to better performing platform audio APIs and much better playback performance at very low latency.

ffmpeg C API - creating queue of frames

I have created using the C API of ffmpeg a C++ application that reads frames from a file and writes them to a new file. Everything works fine, as long as I write immediately the frames to the output. In other words, the following structure of the program outputs the correct result (I put only the pseudocode for now, if needed I can also post some real snippets but the classes that I have created for handling the ffmpeg functionalities are quite large):
AVFrame* frame = av_frame_alloc();
int got_frame;
// readFrame returns 0 if file is ended, got frame = 1 if
// a complete frame has been extracted
while(readFrame(inputfile,frame, &got_frame)) {
if (got_frame) {
// I actually do some processing here
writeFrame(outputfile,frame);
}
}
av_frame_free(&frame);
The next step has been to parallelize the application and, as a consequence, frames are not written immediately after they are read (I do not want to go into the details of the parallelization). In this case problems arise: there is some flickering in the output, as if some frames get repeated randomly. However, the number of frames and the duration of the output video remains correct.
What I am trying to do now is to separate completely the reading from writing in the serial implementation in order to understand what is going on. I am creating a queue of pointers to frames:
std::queue<AVFrame*> queue;
int ret = 1, got_frame;
while (ret) {
AVFrame* frame = av_frame_alloc();
ret = readFrame(inputfile,frame,&got_frame);
if (got_frame)
queue.push(frame);
}
To write frames to the output file I do:
while (!queue.empty()) {
frame = queue.front();
queue.pop();
writeFrame(outputFile,frame);
av_frame_free(&frame);
}
The result in this case is an output video with the correct duration and number of frames that is only a repetition of the last 3 (I think) frames of the video.
My guess is that something might go wrong because of the fact that in the first case I use always the same memory location for reading frames, while in the second case I allocate many different frames.
Any suggestions on what could be the problem?
Ah, so I'm assuming that readFrame() is a wrapper around libavformat's av_read_frame() and libavcodec's avcodec_decode_video2(), is that right?
From the documentation:
When AVCodecContext.refcounted_frames is set to 1, the frame is
reference counted and the returned reference belongs to the caller.
The caller must release the frame using av_frame_unref() when the
frame is no longer needed.
and:
When
AVCodecContext.refcounted_frames is set to 0, the returned reference
belongs to the decoder and is valid only until the next call to this
function or until closing or flushing the decoder.
Obviously, from this it follows from this that you need to set AVCodecContext.refcounted_frames to 1. The default is 0, so my gut feeling is you need to set it to 1 and that will fix your problem. Don't forget to use av_fame_unref() on the pictures after use to prevent memleaks, and also don't forget to free your AVFrame in this loop if got_frame = 0 - again to prevent memleaks:
while (ret) {
AVFrame* frame = av_frame_alloc();
ret = readFrame(inputfile,frame,&got_frame);
if (got_frame)
queue.push(frame);
else
av_frame_free(frame);
}
(Or alternatively you could implement some cache for frame so you only realloc it if the previous object was pushed in the queue.)
There's nothing obviously wrong with your pseudocode. The problem almost certainly lies in how you lock the queue between threads.
Your memory allocation seems same to me. Do you maybe do something else in between reading and writing the frames?
Is queue the same queue in the routines that read and write the frames?

How to get the latest frames in ffmpeg, not the next frame

I have an application which connects to an RTSP camera and processes some of the frames of video. Depending on the camera resolution and frame rate, I don't need to process all the frames and sometimes my processing takes a while. I've designed things so that when the frame is read, its passed off to a work queue for another thread to deal with. However, depending on system load/resolution/frame rate/network/file system/etc, I occasionally will find cases where the program doesn't keep up with the camera.
I've found that with ffmpeg(I'm using the latest git drop from mid october and running on windows) that being a couple seconds behind is fine and you keep getting the next frame, next frame, etc. However, once you get, say, 15-20 seconds behind that frames you get from ffmpeg occasionally have corruption. That is, what is returned as the next frame often has graphical glitches (streaking of the bottom of the frame, etc).
What I'd like to do is put in a check, somehow, to detect if I'm more than X frames behind the live stream and if so, flush the caches frames out and start fetching the latest/current frames.
My current snippet of my frame buffer reading thread (C++) :
while(runThread)
{
av_init_packet(&(newPacket));
int errorCheck = av_read_frame(context, &(newPacket));
if (errorCheck < 0)
{
// error
}
else
{
int frameFinished = 0;
int decodeCode = avcodec_decode_video2(ccontext, actualFrame, &frameFinished, &newPacket);
if (decodeCode <0)
{
// error
}
else
if (decodeCode == 0)
{
// no frame could be decompressed / decoded / etc
}
else
if ((decodeCode > 0) && (frameFinished))
{
// do my processing / copy the frame off for later processing / etc
}
else
{
// decoded some data, but frame was not finished...
// Save data and reconstitute the pieces somehow??
// Given that we free the packet, I doubt there is any way to use this partial information
}
av_free_packet(&(newPacket));
}
}
I've google'd and looked through the ffmpeg documents for some function I can call to flush things and enable me to catch up but I can't seem to find anything. This same sort of solution would be needed if you wanted to only occasionally monitor a video source(eg, if you only wanted to snag one frame per second or per minute). The only thing I could come up with is disconnecting from the camera and reconnecting. However, I still need a way to detect if the frames I am receiving are old.
Ideally, I'd be able to do something like this :
while(runThread)
{
av_init_packet(&(newPacket));
// Not a real function, but I'd like to do something like this
if (av_check_frame_buffer_size(context) > 30_frames)
{
// flush frame buffer.
av_frame_buffer_flush(context);
}
int errorCheck = av_read_frame(context, &(newPacket));
...
}
}

Serial communication read not working at all time

I am writing a c++ application for half duplex communication to download data from a device. Following is the class i am using for serial communication.
class CSerialCommHelper
{
HANDLE m_pPortHandle; //Handle to the COM port
HANDLE m_hReadThread; //Handle to the Read thread
HANDLE m_hPortMutex; //Handle to Port Mutex
std::wstring m_strPortName; //Portname
COMMTIMEOUTS m_CommTimeouts; //Communication Timeout Structure
_DCB dcb; //Device Control Block
DWORD m_dwThreadID; //Thread ID
string m_strBuffer;
public:
CSerialCommHelper();
HRESULT Open();
HRESULT ConfigPort();
static void * ReadThread(void *);
HRESULT Write(const unsigned char *,DWORD);
string GetFrameFromBuffer();
HRESULT Close();
~CSerialCommHelper(void);
};
ReadThread and Write function is as follows :
void * CSerialCommHelper::ReadThread(void * pObj)
{
CSerialCommHelper *pCSerialCommHelper =(CSerialCommHelper *)pObj;
DWORD dwBytesTransferred =0;
DWORD byte=0;;
while (pCSerialCommHelper->m_pPortHandle != INVALID_HANDLE_VALUE)
{
pCSerialCommHelper->m_strBuffer.clear();
pCSerialCommHelper->m_usBufSize=0;
WaitForSingleObject(pCSerialCommHelper->m_hPortMutex,INFINITE);
do
{
dwBytesTransferred = 0;
ReadFile (pCSerialCommHelper->m_pPortHandle,&byte,1,&dwBytesTransferred,NULL);
if (dwBytesTransferred == 1)
{
pCSerialCommHelper->m_strBuffer.push_back((char)byte);
pCSerialCommHelper->m_usBufSize++;
continue;
}
}
while ((dwBytesTransferred == 1) && (pCSerialCommHelper->m_pPortHandle != INVALID_HANDLE_VALUE));
ReleaseMutex(pCSerialCommHelper->m_hPortMutex);
Sleep(2);
}
ExitThread(0);
return 0;
}
Write function waits for readthread to release mutex and writes to data to port .
GetFrameFromBuffer will be called from application which uses the SerialCommhelper
and it returns the m_strBuffer string .
My problem is whenever i am trying to download huge amount of data.
I am losing some data frames .
I am getting response from device in between .0468 to .1716 secs.
After analysing different error scenarios i came to know that is not problem with time as other frames are getting downloaded at the same time interval.
Function which is calling getframebuffer is continuosly calling it until is gets a filled string.
It seems like these two statements should not be in your outer while loop:
pCSerialCommHelper->m_strBuffer.clear();
pCSerialCommHelper->m_usBufSize=0;
Your inner while loop reads bytes as long as they're immediately available, and the outer loop does a Sleep(2) the moment the inner loop doesn't give you a byte.
If you're waiting until an entire packet is available, it seems like you should keep looping until you get all the bytes, without clearing partway through the process.
I don't really know the ReadFile API, but I'm guessing that ReadFile might return 0 if there's no bytes immediately available, or at least available by whatever timeout you specified when opening the serial device.
ReleaseMutex(pCSerialCommHelper->m_hPortMutex);
Sleep(2);
That Sleep() call is hiding the real problem. It is never correct in threaded code, always a band-aid for a timing bug.
You certainly seem to have one, that m_hPortMutex spells doom as well. If you do in fact have multiple threads trying to read from the serial port then they are going to start fighting over that mutex. The outcome will be very poor, each thread will get a handful of bytes from the port. But clearly you want to read a frame of data. There is zero hope that you can glue the handfuls of bytes that each thread gets back together into a frame, you've lost their sequence. So sleeping for a while seemed like a workaround, it inject delays that can give you a better shot at reading a frame. Usually, not always. You also wrote it in the wrong place.
This code is just broken. Delete the Sleep(). Do not exit the loop until you've read the entire frame.

Prevent frame dropping while saving frames to disk

I am trying to write C++ code which saves incoming video frames to disk. Asynchronously arriving frames are pushed onto queue by a producer thread. The frames are popped off the queue by a consumer thread. Mutual exclusion of producer and consumer is done using a mutex. However, I still notice frames being dropped. The dropped frames (likely) correspond to instances when producer tries to push the current frame onto queue but cannot do so since consumer holds the lock. Any suggestions ? I essentially do not want the producer to wait. A waiting consumer is okay for me.
EDIT-0 : Alternate idea which does not involve locking. Will this work ?
Producer initially enqueues n seconds worth of video. n can be some small multiple of frame-rate.
As long as queue contains >= n seconds worth of video, consumer dequeues on a frame by frame basis and saves to disk.
When the video is done, the queue is flushed to disk.
EDIT-1: The frames arrive at ~ 15 fps.
EDIT-2 : Outline of code :
Main driver code
// Main function
void LVD::DumpFrame(const IplImage *frame)
{
// Copies frame into internal buffer.
// buffer object is a wrapper around OpenCV's IplImage
Initialize(frame);
// (Producer thread) -- Pushes buffer onto queue
// Thread locks queue, pushes buffer onto queue, unlocks queue and dies
PushBufferOntoQueue();
// (Consumer thread) -- Pop off queue and save to disk
// Thread locks queue, pops it, unlocks queue,
// saves popped buffer to disk and dies
DumpQueue();
++m_frame_id;
}
void LVD::Initialize(const IplImage *frame)
{
if(NULL == m_buffer) // first iteration
m_buffer = new ImageBuffer(frame);
else
m_buffer->Copy(frame);
}
Producer
void LVD::PushBufferOntoQueue()
{
m_queingThread = ::CreateThread( NULL, 0, ThreadFuncPushImageBufferOntoQueue, this, 0, &m_dwThreadID);
}
DWORD WINAPI LVD::ThreadFuncPushImageBufferOntoQueue(void *arg)
{
LVD* videoDumper = reinterpret_cast<LVD*>(arg);
LocalLock ll( &videoDumper->m_que_lock, 60*1000 );
videoDumper->m_frameQue.push(*(videoDumper->m_buffer));
ll.Unlock();
return 0;
}
Consumer
void LVD::DumpQueue()
{
m_dumpingThread = ::CreateThread( NULL, 0, ThreadFuncDumpFrames, this, 0, &m_dwThreadID);
}
DWORD WINAPI LVD::ThreadFuncDumpFrames(void *arg)
{
LVD* videoDumper = reinterpret_cast<LVD*>(arg);
LocalLock ll( &videoDumper->m_que_lock, 60*1000 );
if(videoDumper->m_frameQue.size() > 0 )
{
videoDumper->m_save_frame=videoDumper->m_frameQue.front();
videoDumper->m_frameQue.pop();
}
ll.Unlock();
stringstream ss;
ss << videoDumper->m_saveDir.c_str() << "\\";
ss << videoDumper->m_startTime.c_str() << "\\";
ss << setfill('0') << setw(6) << videoDumper->m_frame_id;
ss << ".png";
videoDumper->m_save_frame.SaveImage(ss.str().c_str());
return 0;
}
Note:
(1) I cannot use C++11. Therefore, Herb Sutter's DDJ article is not an option.
(2) I found a reference to an unbounded single producer-consumer queue. However, the author(s) state that enqueue(adding frames) is probably not wait-free.
(3) I also found liblfds, a C-library but not sure if it will serve my purpose.
The queue cannot be the problem. Video frames arrive at 16 msec intervals, at worst. Your queue only needs to store a pointer to a frame. Adding/removing one in a thread-safe way can never take more than a microsecond.
You'll need to look for another explanation and solution. Video does forever present a fire-hose problem. Disk drives are not generally fast enough to keep up with an uncompressed video stream. So if your consumer cannot keep up with the producer then something is going go give. With a dropped frame the likely outcome when you (correctly) prevent the queue from growing without bound.
Be sure to consider encoding the video. Real-time MPEG and AVC encoders are available. After they compress the stream you should not have a problem keeping up with the disk.
Circular buffer is definitely a good alternative. If you make it use a 2^n size, you can also use this trick to update the pointers:
inline int update_index(int x)
{
return (x + 1) & (size-1);
}
That way, there is no need to use expensive compare (and consequential jumps) or divide (the single most expensive integer operation in any processor - not counting "fill/copy large chunks of memory" type operations).
When dealing with video (or graphics in general) it is essential to do "buffer management". Typically, this is a case of tracking state of the "framebuffer" and avoiding to copy content more than necessary.
The typical approach is to allocate 2 or 3 video-buffers (or frame buffers, or what you call it). A buffer can be owned by either the producer or the consumer. The transfer is ONLY the ownership. So when the video-driver signals that "this buffer is full", the ownership is now with the consumer, that will read the buffer and store it to disk [or whatever]. When the storing is finished, the buffer is given back ("freed") so that the producer can re-use it. Copying the data out of the buffer is expensive [takes time], so you don't want to do that unless it's ABSOLUTELY necessary.