Prevent frame dropping while saving frames to disk - c++

I am trying to write C++ code which saves incoming video frames to disk. Asynchronously arriving frames are pushed onto queue by a producer thread. The frames are popped off the queue by a consumer thread. Mutual exclusion of producer and consumer is done using a mutex. However, I still notice frames being dropped. The dropped frames (likely) correspond to instances when producer tries to push the current frame onto queue but cannot do so since consumer holds the lock. Any suggestions ? I essentially do not want the producer to wait. A waiting consumer is okay for me.
EDIT-0 : Alternate idea which does not involve locking. Will this work ?
Producer initially enqueues n seconds worth of video. n can be some small multiple of frame-rate.
As long as queue contains >= n seconds worth of video, consumer dequeues on a frame by frame basis and saves to disk.
When the video is done, the queue is flushed to disk.
EDIT-1: The frames arrive at ~ 15 fps.
EDIT-2 : Outline of code :
Main driver code
// Main function
void LVD::DumpFrame(const IplImage *frame)
{
// Copies frame into internal buffer.
// buffer object is a wrapper around OpenCV's IplImage
Initialize(frame);
// (Producer thread) -- Pushes buffer onto queue
// Thread locks queue, pushes buffer onto queue, unlocks queue and dies
PushBufferOntoQueue();
// (Consumer thread) -- Pop off queue and save to disk
// Thread locks queue, pops it, unlocks queue,
// saves popped buffer to disk and dies
DumpQueue();
++m_frame_id;
}
void LVD::Initialize(const IplImage *frame)
{
if(NULL == m_buffer) // first iteration
m_buffer = new ImageBuffer(frame);
else
m_buffer->Copy(frame);
}
Producer
void LVD::PushBufferOntoQueue()
{
m_queingThread = ::CreateThread( NULL, 0, ThreadFuncPushImageBufferOntoQueue, this, 0, &m_dwThreadID);
}
DWORD WINAPI LVD::ThreadFuncPushImageBufferOntoQueue(void *arg)
{
LVD* videoDumper = reinterpret_cast<LVD*>(arg);
LocalLock ll( &videoDumper->m_que_lock, 60*1000 );
videoDumper->m_frameQue.push(*(videoDumper->m_buffer));
ll.Unlock();
return 0;
}
Consumer
void LVD::DumpQueue()
{
m_dumpingThread = ::CreateThread( NULL, 0, ThreadFuncDumpFrames, this, 0, &m_dwThreadID);
}
DWORD WINAPI LVD::ThreadFuncDumpFrames(void *arg)
{
LVD* videoDumper = reinterpret_cast<LVD*>(arg);
LocalLock ll( &videoDumper->m_que_lock, 60*1000 );
if(videoDumper->m_frameQue.size() > 0 )
{
videoDumper->m_save_frame=videoDumper->m_frameQue.front();
videoDumper->m_frameQue.pop();
}
ll.Unlock();
stringstream ss;
ss << videoDumper->m_saveDir.c_str() << "\\";
ss << videoDumper->m_startTime.c_str() << "\\";
ss << setfill('0') << setw(6) << videoDumper->m_frame_id;
ss << ".png";
videoDumper->m_save_frame.SaveImage(ss.str().c_str());
return 0;
}
Note:
(1) I cannot use C++11. Therefore, Herb Sutter's DDJ article is not an option.
(2) I found a reference to an unbounded single producer-consumer queue. However, the author(s) state that enqueue(adding frames) is probably not wait-free.
(3) I also found liblfds, a C-library but not sure if it will serve my purpose.

The queue cannot be the problem. Video frames arrive at 16 msec intervals, at worst. Your queue only needs to store a pointer to a frame. Adding/removing one in a thread-safe way can never take more than a microsecond.
You'll need to look for another explanation and solution. Video does forever present a fire-hose problem. Disk drives are not generally fast enough to keep up with an uncompressed video stream. So if your consumer cannot keep up with the producer then something is going go give. With a dropped frame the likely outcome when you (correctly) prevent the queue from growing without bound.
Be sure to consider encoding the video. Real-time MPEG and AVC encoders are available. After they compress the stream you should not have a problem keeping up with the disk.

Circular buffer is definitely a good alternative. If you make it use a 2^n size, you can also use this trick to update the pointers:
inline int update_index(int x)
{
return (x + 1) & (size-1);
}
That way, there is no need to use expensive compare (and consequential jumps) or divide (the single most expensive integer operation in any processor - not counting "fill/copy large chunks of memory" type operations).
When dealing with video (or graphics in general) it is essential to do "buffer management". Typically, this is a case of tracking state of the "framebuffer" and avoiding to copy content more than necessary.
The typical approach is to allocate 2 or 3 video-buffers (or frame buffers, or what you call it). A buffer can be owned by either the producer or the consumer. The transfer is ONLY the ownership. So when the video-driver signals that "this buffer is full", the ownership is now with the consumer, that will read the buffer and store it to disk [or whatever]. When the storing is finished, the buffer is given back ("freed") so that the producer can re-use it. Copying the data out of the buffer is expensive [takes time], so you don't want to do that unless it's ABSOLUTELY necessary.

Related

Producer-Consumer producer creating 2 elements POSIX Semaphores

3 Consumers 2 producers. Reading and writing to one buffer.
Producer A is pushing 1 element to buffer (length N) and Producer B is pushing 2 elements to buffer. No active waiting. I can't use System V semaphores.
Sample code for producer A:
void producerA(){
while(1){
sem_wait(full);
sem_wait(mutex);
Data * newData = (Data*) malloc(sizeof(Data));
newData->val = generateRandomletter();
newData->A = false;
newData->B = false;
newData->C = false;
*((Data*) mem+tail) = *newData;
++elements;
tail = (tail + 1) % N;
sem_post(mutex);
sem_post(empty);
}
}
Consumers look similar except they read or consume but that's irrelevant.
I am having a lot of trouble with Producer B. Obviously I can't do things like
sem_wait(full); sem_wait(full);
I also tried having a different semaphore for producer B that would be upped the first time there are 2 or more free spots in the buffer. But that didn't work out because I still need to properly lower and increase semaphores full and empty.
In what ways can I solve this problem?
https://gist.github.com/RobPiwowarek/65cb9896c109699c70217ba014b9ed20
That would be solution to the entire problem I had.
TLDR:
The easiest synchronisation I can provide was with using semaphores full and empty to represent the number of elements I have pushed to buffer. However that kind of solution does not work for POSIX semaphores if I have a producer that creates 2 elements.
My solution is a different concept.
The outline of a process comes down to:
while(1){
down(mutex);
size = get size
if (condition related to size based on what process this is)
{
do your job;
updateSize(int diff); // this can up() specific semaphores
// based on size
// each process has his own semaphore
up(mutex);
}
else
{
up(mutex);
down(process's own semaphore);
continue;
}
}
I hope this will be useful to someone in the future.

ffmpeg C API - creating queue of frames

I have created using the C API of ffmpeg a C++ application that reads frames from a file and writes them to a new file. Everything works fine, as long as I write immediately the frames to the output. In other words, the following structure of the program outputs the correct result (I put only the pseudocode for now, if needed I can also post some real snippets but the classes that I have created for handling the ffmpeg functionalities are quite large):
AVFrame* frame = av_frame_alloc();
int got_frame;
// readFrame returns 0 if file is ended, got frame = 1 if
// a complete frame has been extracted
while(readFrame(inputfile,frame, &got_frame)) {
if (got_frame) {
// I actually do some processing here
writeFrame(outputfile,frame);
}
}
av_frame_free(&frame);
The next step has been to parallelize the application and, as a consequence, frames are not written immediately after they are read (I do not want to go into the details of the parallelization). In this case problems arise: there is some flickering in the output, as if some frames get repeated randomly. However, the number of frames and the duration of the output video remains correct.
What I am trying to do now is to separate completely the reading from writing in the serial implementation in order to understand what is going on. I am creating a queue of pointers to frames:
std::queue<AVFrame*> queue;
int ret = 1, got_frame;
while (ret) {
AVFrame* frame = av_frame_alloc();
ret = readFrame(inputfile,frame,&got_frame);
if (got_frame)
queue.push(frame);
}
To write frames to the output file I do:
while (!queue.empty()) {
frame = queue.front();
queue.pop();
writeFrame(outputFile,frame);
av_frame_free(&frame);
}
The result in this case is an output video with the correct duration and number of frames that is only a repetition of the last 3 (I think) frames of the video.
My guess is that something might go wrong because of the fact that in the first case I use always the same memory location for reading frames, while in the second case I allocate many different frames.
Any suggestions on what could be the problem?
Ah, so I'm assuming that readFrame() is a wrapper around libavformat's av_read_frame() and libavcodec's avcodec_decode_video2(), is that right?
From the documentation:
When AVCodecContext.refcounted_frames is set to 1, the frame is
reference counted and the returned reference belongs to the caller.
The caller must release the frame using av_frame_unref() when the
frame is no longer needed.
and:
When
AVCodecContext.refcounted_frames is set to 0, the returned reference
belongs to the decoder and is valid only until the next call to this
function or until closing or flushing the decoder.
Obviously, from this it follows from this that you need to set AVCodecContext.refcounted_frames to 1. The default is 0, so my gut feeling is you need to set it to 1 and that will fix your problem. Don't forget to use av_fame_unref() on the pictures after use to prevent memleaks, and also don't forget to free your AVFrame in this loop if got_frame = 0 - again to prevent memleaks:
while (ret) {
AVFrame* frame = av_frame_alloc();
ret = readFrame(inputfile,frame,&got_frame);
if (got_frame)
queue.push(frame);
else
av_frame_free(frame);
}
(Or alternatively you could implement some cache for frame so you only realloc it if the previous object was pushed in the queue.)
There's nothing obviously wrong with your pseudocode. The problem almost certainly lies in how you lock the queue between threads.
Your memory allocation seems same to me. Do you maybe do something else in between reading and writing the frames?
Is queue the same queue in the routines that read and write the frames?

Realtime streaming with QAudioOutput (qt)

I want to play real-time sounds responding with no appreciable lag to user interaction.
To have low latency I have to send small chunks of pcm data.
What I am doing:
QAudioFormat format;
format.setSampleRate(22050);
format.setChannelCount(1);
format.setSampleSize(16);
format.setCodec("audio/pcm");
format.setByteOrder(QAudioFormat::LittleEndian);
format.setSampleType(QAudioFormat::SignedInt);
QAudioDeviceInfo info(QAudioDeviceInfo::defaultOutputDevice());
if (!info.isFormatSupported(format)) {
qWarning()<<"raw audio format not supported by backend, cannot play audio.";
return;
}
qAudioOutput = new QAudioOutput(format, NULL);
qAudioDevice=qAudioOutput->start();
and later
void Enqueue(TYPESAMPLEPCM *data,int countBytes){
while(qAudioOutput->bytesFree()<countBytes){
Sleep(1);
}
qAudioDevice->write((char *)data,countBytes);
}
The chunks of data are 256 bytes (128 samples that would give "granularity" of around 6 milliseconds.
Enqueue is called from a loop in a thread with high priority that provides the data chunks. There's no latency there as the speed it calls Enqueue is by far faster than rendering the audio data.
But it looks to me there's a buffer underrun situation because the sound plays but with kind of a "crackling" regular noise.
If I raise the chunk size to 256 samples the problem almost disappears. Only some crackling at the beginning (?)
The platform is Windows and Qt 5.3.
Is that the right procedure or I am missing something?
The issue is about
void Enqueue(TYPESAMPLEPCM *data,int countBytes){
while(qAudioOutput->bytesFree()<countBytes){
Sleep(1);
}
qAudioDevice->write((char *)data,countBytes);
}
being a little naive.
First of all, Sleep(1);. You are on windows. The problem is that windows is not a realtime os, and is expected to have a time resolution around 10 - 15 ms. Which means when there is no place for incoming audio you are going to sleep lot more than you expect.
Second. Do you really need to sleep when audio output cannot consume the amount of data which was provided? What you really want to is to provide some audio after the audio output has consumed some. In concrete terms, it means :
Setting the QAudioOutput notify interval, ie the period at which the system will consume the audio data and tell you about it.
Getting notified about QAudioOutput consuming some data. Which is in a slot connected to QAudioOutput::notify()
Buffering data chunks which come from your high priority thread when audio output is full.
This give :
QByteArray samplebuffer;
//init code
{
qAudioOutput = new QAudioOutput(format, NULL);
...
qAudioOutput->setNotifyInterval(128); //play with this number
connect(qAudioOutput, SIGNAL(notify), someobject, SLOT(OnAudioNotify));
...
qAudioDevice=qAudioOutput->start();
}
void EnqueueLock(TYPESAMPLEPCM *data,int countBytes)
{
//lock mutex
samplebuffer.append((char *)data,countBytes);
tryWritingSomeSampleData();
//unlock mutex
}
//slot
void SomeClass::OnAudioNotify()
{
//lock mutex
tryWritingSomeSampleData()
//unlock mutex
}
void SomeClass::tryWritingSomeSampleData()
{
int towritedevice = min(qAudioOutput->bytesFree(), samplebuffer.size());
if(towritedevice > 0)
{
qAudioDevice->write(samplebuffer.data(),towritedevice);
samplebuffer.remove(0,towritedevice); //pop front what is written
}
}
As you see you need to protect samplebuffer from concurrent access. Provide adequate mutex.

How to scynhronise read_handler calls of sock.async_read_some to a specific frequency?

How to scynhronise read_handler calls of sock.async_read_some to a specific frequency, while reading streams of 812 bytes (which is streamed with 125 Hz frequency).
I have a problem related with reading a stream from a robot. I am very new to the boost asio. And I have very little info on this concept. Here is a sample block from my code. What the read_handler does is, it processes the data coming from robot. This loop should execute at every 8 ms which is my sampling time and also by the time it starts to execute, reading of the data stream from the robot should be completed. When I look at the robot's stream data, data comes at each 8 ms. So the robot data is OK. But the execution of read_handler is somehow problematic. for instance, one loop starts at time =0, second loop starts at time=2, third loop starts at time = 16, forth loop starts at time=18, and fifth loop again starts at time = 32. So, the triggering time of the loop changes from every first time to second time. But on the third it syncronizes again to a multiple of 8 ms. What I need is read_handler should trigger at every 8 ms (when the data arrives) but it catches this sampling time at every two calls (total of 16 ms). This is crucial since I am making computations, and feeding a command back to robot later on (A control system). This code segment is not detailed with sending commands etc, this segment only contains very basic data processes.
So, what might be causing these variations between calls, and how can I fix it?
I searched through the net and stack overflow but I couldn't run into another time related issue as I faced.
void read_handler(const boost::system::error_code &ec, std::size_t bytes_transferred)
{
if (!ec)
{
thisLoopStart = clock();
loopInstant[iterationNum]=diffclock(startTime, endLoopTime);
std::cout << "Byte transfered: " << bytes_transferred << std::endl;
printf("Byte transfered: %d", bytes_transferred);
printf("Byte transfered: %d", bytes_transferred);
printf("Byte transfered: %d\n", bytes_transferred);
//std::cout << std::string(buffer.data(), bytes_transferred) << std::endl;
char myTempDoubleTime[8];
for (int j = 0; j<1; j++)
{
for (int i = 0; i < 8; i++ )
{
myTempDoubleTime[7-i]=buffer[4+i+8*j]; //636
}
memcpy(&controllerTime[iterationNum], myTempDoubleTime, 8);
}
endLoopTime = clock();
thisLoopDuration = diffclock(thisLoopStart, endLoopTime);
loopTimes[iterationNum] = thisLoopDuration;
if (iterationNum++>500)
{//io_service.~io_service();
//io_service.reset();
//io_service.run();
exitThread = 1;
printf("Program terminates...\n");
GlobalObjects::myFTSystem->StopAcquisition();
for(int i=1;i<iterationNum;i++)
fprintf(LoopDurations, "%f\t%f\t%f\n", loopTimes[i], controllerTime[i], loopInstant[i]);
fclose(LoopDurations);
closeConnectionToServer();
printf("Connection is closed...\n");
io_service.stop();
}
sock.async_read_some(boost::asio::buffer(buffer), read_handler);
}
}
If the incoming stream timing is controlled by the robot itself, then you shouldn't be worrying about trying to read specifically at such and such time. If you're expecting a burst of 812 bytes from the robot every X seconds, simply keep async_reading from your client socket. boost::asio will invoke your callback as soon as the read is complete.
As for your mysterious delay, try explicitly stating the size of your buffer in your call to async_read_some like so:
sock.async_read_some(boost::asio::buffer(buffer, 812), read_handler);
If you're sure you're always transmitting enough data to fill such a buffer, then this should cause your callback to be invoked consistently because the buffer supplied to boost::asio is full. If this doesn't solve your problem, then do as sehe suggested and implement a dealine_timer that you can use to have finer time-based control over your asynchronous ops.
Edit
You should also be checking the bytes_transferred in your OnRead handler to ensure that you've made a complete read from the robot. Right now you're just printing it. You could have an incomplete read which means you should immediately start reading again repeatedly from the socket until you're sure you've consumed all of the data you're expecting. Otherwise, you're going to screw yourself up by trying to act on incomplete data, most likely failing there, then starting up another ::async_read assuming you're starting a clean new read when really you're just going to read old data you ignored and left on the socket, and begin fragmenting your reads.
This could explain why you're seeing inconsistent times that are both shorter and longer than your expected interval. Explicitly specifying buffer size and checking the bytes_transferred that you're passed in the handler will guarantee that you catch such a problem. Also look at docs for completion_condition types you can pass to ::asio such as ::transfer_exactly(num_bytes), but I'm not sure if those apply to async read ops.

Serial communication read not working at all time

I am writing a c++ application for half duplex communication to download data from a device. Following is the class i am using for serial communication.
class CSerialCommHelper
{
HANDLE m_pPortHandle; //Handle to the COM port
HANDLE m_hReadThread; //Handle to the Read thread
HANDLE m_hPortMutex; //Handle to Port Mutex
std::wstring m_strPortName; //Portname
COMMTIMEOUTS m_CommTimeouts; //Communication Timeout Structure
_DCB dcb; //Device Control Block
DWORD m_dwThreadID; //Thread ID
string m_strBuffer;
public:
CSerialCommHelper();
HRESULT Open();
HRESULT ConfigPort();
static void * ReadThread(void *);
HRESULT Write(const unsigned char *,DWORD);
string GetFrameFromBuffer();
HRESULT Close();
~CSerialCommHelper(void);
};
ReadThread and Write function is as follows :
void * CSerialCommHelper::ReadThread(void * pObj)
{
CSerialCommHelper *pCSerialCommHelper =(CSerialCommHelper *)pObj;
DWORD dwBytesTransferred =0;
DWORD byte=0;;
while (pCSerialCommHelper->m_pPortHandle != INVALID_HANDLE_VALUE)
{
pCSerialCommHelper->m_strBuffer.clear();
pCSerialCommHelper->m_usBufSize=0;
WaitForSingleObject(pCSerialCommHelper->m_hPortMutex,INFINITE);
do
{
dwBytesTransferred = 0;
ReadFile (pCSerialCommHelper->m_pPortHandle,&byte,1,&dwBytesTransferred,NULL);
if (dwBytesTransferred == 1)
{
pCSerialCommHelper->m_strBuffer.push_back((char)byte);
pCSerialCommHelper->m_usBufSize++;
continue;
}
}
while ((dwBytesTransferred == 1) && (pCSerialCommHelper->m_pPortHandle != INVALID_HANDLE_VALUE));
ReleaseMutex(pCSerialCommHelper->m_hPortMutex);
Sleep(2);
}
ExitThread(0);
return 0;
}
Write function waits for readthread to release mutex and writes to data to port .
GetFrameFromBuffer will be called from application which uses the SerialCommhelper
and it returns the m_strBuffer string .
My problem is whenever i am trying to download huge amount of data.
I am losing some data frames .
I am getting response from device in between .0468 to .1716 secs.
After analysing different error scenarios i came to know that is not problem with time as other frames are getting downloaded at the same time interval.
Function which is calling getframebuffer is continuosly calling it until is gets a filled string.
It seems like these two statements should not be in your outer while loop:
pCSerialCommHelper->m_strBuffer.clear();
pCSerialCommHelper->m_usBufSize=0;
Your inner while loop reads bytes as long as they're immediately available, and the outer loop does a Sleep(2) the moment the inner loop doesn't give you a byte.
If you're waiting until an entire packet is available, it seems like you should keep looping until you get all the bytes, without clearing partway through the process.
I don't really know the ReadFile API, but I'm guessing that ReadFile might return 0 if there's no bytes immediately available, or at least available by whatever timeout you specified when opening the serial device.
ReleaseMutex(pCSerialCommHelper->m_hPortMutex);
Sleep(2);
That Sleep() call is hiding the real problem. It is never correct in threaded code, always a band-aid for a timing bug.
You certainly seem to have one, that m_hPortMutex spells doom as well. If you do in fact have multiple threads trying to read from the serial port then they are going to start fighting over that mutex. The outcome will be very poor, each thread will get a handful of bytes from the port. But clearly you want to read a frame of data. There is zero hope that you can glue the handfuls of bytes that each thread gets back together into a frame, you've lost their sequence. So sleeping for a while seemed like a workaround, it inject delays that can give you a better shot at reading a frame. Usually, not always. You also wrote it in the wrong place.
This code is just broken. Delete the Sleep(). Do not exit the loop until you've read the entire frame.