I have an application which connects to an RTSP camera and processes some of the frames of video. Depending on the camera resolution and frame rate, I don't need to process all the frames and sometimes my processing takes a while. I've designed things so that when the frame is read, its passed off to a work queue for another thread to deal with. However, depending on system load/resolution/frame rate/network/file system/etc, I occasionally will find cases where the program doesn't keep up with the camera.
I've found that with ffmpeg(I'm using the latest git drop from mid october and running on windows) that being a couple seconds behind is fine and you keep getting the next frame, next frame, etc. However, once you get, say, 15-20 seconds behind that frames you get from ffmpeg occasionally have corruption. That is, what is returned as the next frame often has graphical glitches (streaking of the bottom of the frame, etc).
What I'd like to do is put in a check, somehow, to detect if I'm more than X frames behind the live stream and if so, flush the caches frames out and start fetching the latest/current frames.
My current snippet of my frame buffer reading thread (C++) :
while(runThread)
{
av_init_packet(&(newPacket));
int errorCheck = av_read_frame(context, &(newPacket));
if (errorCheck < 0)
{
// error
}
else
{
int frameFinished = 0;
int decodeCode = avcodec_decode_video2(ccontext, actualFrame, &frameFinished, &newPacket);
if (decodeCode <0)
{
// error
}
else
if (decodeCode == 0)
{
// no frame could be decompressed / decoded / etc
}
else
if ((decodeCode > 0) && (frameFinished))
{
// do my processing / copy the frame off for later processing / etc
}
else
{
// decoded some data, but frame was not finished...
// Save data and reconstitute the pieces somehow??
// Given that we free the packet, I doubt there is any way to use this partial information
}
av_free_packet(&(newPacket));
}
}
I've google'd and looked through the ffmpeg documents for some function I can call to flush things and enable me to catch up but I can't seem to find anything. This same sort of solution would be needed if you wanted to only occasionally monitor a video source(eg, if you only wanted to snag one frame per second or per minute). The only thing I could come up with is disconnecting from the camera and reconnecting. However, I still need a way to detect if the frames I am receiving are old.
Ideally, I'd be able to do something like this :
while(runThread)
{
av_init_packet(&(newPacket));
// Not a real function, but I'd like to do something like this
if (av_check_frame_buffer_size(context) > 30_frames)
{
// flush frame buffer.
av_frame_buffer_flush(context);
}
int errorCheck = av_read_frame(context, &(newPacket));
...
}
}
Related
I have created using the C API of ffmpeg a C++ application that reads frames from a file and writes them to a new file. Everything works fine, as long as I write immediately the frames to the output. In other words, the following structure of the program outputs the correct result (I put only the pseudocode for now, if needed I can also post some real snippets but the classes that I have created for handling the ffmpeg functionalities are quite large):
AVFrame* frame = av_frame_alloc();
int got_frame;
// readFrame returns 0 if file is ended, got frame = 1 if
// a complete frame has been extracted
while(readFrame(inputfile,frame, &got_frame)) {
if (got_frame) {
// I actually do some processing here
writeFrame(outputfile,frame);
}
}
av_frame_free(&frame);
The next step has been to parallelize the application and, as a consequence, frames are not written immediately after they are read (I do not want to go into the details of the parallelization). In this case problems arise: there is some flickering in the output, as if some frames get repeated randomly. However, the number of frames and the duration of the output video remains correct.
What I am trying to do now is to separate completely the reading from writing in the serial implementation in order to understand what is going on. I am creating a queue of pointers to frames:
std::queue<AVFrame*> queue;
int ret = 1, got_frame;
while (ret) {
AVFrame* frame = av_frame_alloc();
ret = readFrame(inputfile,frame,&got_frame);
if (got_frame)
queue.push(frame);
}
To write frames to the output file I do:
while (!queue.empty()) {
frame = queue.front();
queue.pop();
writeFrame(outputFile,frame);
av_frame_free(&frame);
}
The result in this case is an output video with the correct duration and number of frames that is only a repetition of the last 3 (I think) frames of the video.
My guess is that something might go wrong because of the fact that in the first case I use always the same memory location for reading frames, while in the second case I allocate many different frames.
Any suggestions on what could be the problem?
Ah, so I'm assuming that readFrame() is a wrapper around libavformat's av_read_frame() and libavcodec's avcodec_decode_video2(), is that right?
From the documentation:
When AVCodecContext.refcounted_frames is set to 1, the frame is
reference counted and the returned reference belongs to the caller.
The caller must release the frame using av_frame_unref() when the
frame is no longer needed.
and:
When
AVCodecContext.refcounted_frames is set to 0, the returned reference
belongs to the decoder and is valid only until the next call to this
function or until closing or flushing the decoder.
Obviously, from this it follows from this that you need to set AVCodecContext.refcounted_frames to 1. The default is 0, so my gut feeling is you need to set it to 1 and that will fix your problem. Don't forget to use av_fame_unref() on the pictures after use to prevent memleaks, and also don't forget to free your AVFrame in this loop if got_frame = 0 - again to prevent memleaks:
while (ret) {
AVFrame* frame = av_frame_alloc();
ret = readFrame(inputfile,frame,&got_frame);
if (got_frame)
queue.push(frame);
else
av_frame_free(frame);
}
(Or alternatively you could implement some cache for frame so you only realloc it if the previous object was pushed in the queue.)
There's nothing obviously wrong with your pseudocode. The problem almost certainly lies in how you lock the queue between threads.
Your memory allocation seems same to me. Do you maybe do something else in between reading and writing the frames?
Is queue the same queue in the routines that read and write the frames?
I'm developing an application in C++ with a GUI in OpenGL. I have a GUI with a background, some squared textures over the background and some text written next to the textures. When I run the program textures start to flash in blocks of n, and a string is printed on the gui in order to identify which block is flashing (I don't know how to better explain, is the P300Speller paradigm). I'm finding a problem due to flashes because the GUI doesn't update as long as I don't move mouse over the window! Can someone help me?
I'll try to post and explain some code. flashSquare is the function which makes textures flash. I use a boolean vector (showSquare) to indicate if the nth texture has to be normal or flashed (I load different textures) and a stack (flashHelperS) filled with integers which represent the stimulus sequence and the order of the flashing textures.
void flashSquare(int square){
if (firstTime){ // send stimulus command
sendToServer(START_STIMULUS);
fillStackS(); // Fill stack of stimuli
firstTime = false;
pausa = false;
}
if (pausa == false){
cout << " Stack size : " << flashHelperS.size() << endl;
if (!flashHelperS.empty()){ // If the stack is not empty
int tmp = flashHelperS.top(); // select the next texture to flash
flashHelperS.pop(); // and remove it from stack
showSquare[tmp] = !showSquare[tmp]; // change bool of current
// flashing texture
cout << showSquare[tmp] << endl;
if (showSquare[tmp]){
counterSquares[tmp]++;
sendToServer(tmp + 1);
}
square = tmp;
glutPostRedisplay(); // Redisplay to show current texture flashed
if (flashHelperS.size() >= 0) { //If the stack is not empty repeat
glutTimerFunc(interface->getFlashFrequency(), flashSquare, 0);
}
}
else{ // If the stack is empty reset
initCounterSquare();
sendToServer(END_STIMULUS);
pausa = true;
if (interface->getMaxSessions() == currentSession){
sendToServer(END_CALIBRATION);
drawOnScreen(); // Draw a string on the screen
firstTime = true;
}
else currentSession++;
}
}
}
For example if stack has 8 items and maxSessions is 3, it means that glutTimerFunc calls flashSquare 8 times, making all the 8 texture flash, than I send to server END_STIMULUS code and repeat this operation other 2 times (because maxSessions is 3). When server receives END_STIMULUS and it is ready to receive more data it sends a code to my client, RESTART_S. When I receive RESTART_S (stack is empty and currentSession < 3) I call flashSquare again to start a new train of stimuli and flash.
if (idFromClient[0] == RESTART_S) {
sendToServer(FLASH_S);
firstTime = true;
flashSquare(0);
}
When I call flashSquare manually the first time it works, but when I receive from server RESTART_S flashSquare is correctly called, indeed it sends to server START_STIMULS, firstTime = false, pausa = false, and it fills the stack. The execution of program goes on to the print of stack size and I can verify that all variables are correctly assigned (debugger shows that the function is called but it does nothing). The problem is that if I don't move mouse over the window with the textures, even if the window is focused, the execution doesn't go on.
I want to play real-time sounds responding with no appreciable lag to user interaction.
To have low latency I have to send small chunks of pcm data.
What I am doing:
QAudioFormat format;
format.setSampleRate(22050);
format.setChannelCount(1);
format.setSampleSize(16);
format.setCodec("audio/pcm");
format.setByteOrder(QAudioFormat::LittleEndian);
format.setSampleType(QAudioFormat::SignedInt);
QAudioDeviceInfo info(QAudioDeviceInfo::defaultOutputDevice());
if (!info.isFormatSupported(format)) {
qWarning()<<"raw audio format not supported by backend, cannot play audio.";
return;
}
qAudioOutput = new QAudioOutput(format, NULL);
qAudioDevice=qAudioOutput->start();
and later
void Enqueue(TYPESAMPLEPCM *data,int countBytes){
while(qAudioOutput->bytesFree()<countBytes){
Sleep(1);
}
qAudioDevice->write((char *)data,countBytes);
}
The chunks of data are 256 bytes (128 samples that would give "granularity" of around 6 milliseconds.
Enqueue is called from a loop in a thread with high priority that provides the data chunks. There's no latency there as the speed it calls Enqueue is by far faster than rendering the audio data.
But it looks to me there's a buffer underrun situation because the sound plays but with kind of a "crackling" regular noise.
If I raise the chunk size to 256 samples the problem almost disappears. Only some crackling at the beginning (?)
The platform is Windows and Qt 5.3.
Is that the right procedure or I am missing something?
The issue is about
void Enqueue(TYPESAMPLEPCM *data,int countBytes){
while(qAudioOutput->bytesFree()<countBytes){
Sleep(1);
}
qAudioDevice->write((char *)data,countBytes);
}
being a little naive.
First of all, Sleep(1);. You are on windows. The problem is that windows is not a realtime os, and is expected to have a time resolution around 10 - 15 ms. Which means when there is no place for incoming audio you are going to sleep lot more than you expect.
Second. Do you really need to sleep when audio output cannot consume the amount of data which was provided? What you really want to is to provide some audio after the audio output has consumed some. In concrete terms, it means :
Setting the QAudioOutput notify interval, ie the period at which the system will consume the audio data and tell you about it.
Getting notified about QAudioOutput consuming some data. Which is in a slot connected to QAudioOutput::notify()
Buffering data chunks which come from your high priority thread when audio output is full.
This give :
QByteArray samplebuffer;
//init code
{
qAudioOutput = new QAudioOutput(format, NULL);
...
qAudioOutput->setNotifyInterval(128); //play with this number
connect(qAudioOutput, SIGNAL(notify), someobject, SLOT(OnAudioNotify));
...
qAudioDevice=qAudioOutput->start();
}
void EnqueueLock(TYPESAMPLEPCM *data,int countBytes)
{
//lock mutex
samplebuffer.append((char *)data,countBytes);
tryWritingSomeSampleData();
//unlock mutex
}
//slot
void SomeClass::OnAudioNotify()
{
//lock mutex
tryWritingSomeSampleData()
//unlock mutex
}
void SomeClass::tryWritingSomeSampleData()
{
int towritedevice = min(qAudioOutput->bytesFree(), samplebuffer.size());
if(towritedevice > 0)
{
qAudioDevice->write(samplebuffer.data(),towritedevice);
samplebuffer.remove(0,towritedevice); //pop front what is written
}
}
As you see you need to protect samplebuffer from concurrent access. Provide adequate mutex.
I have a while loop which displays things on window using openGL, but the animation is too fast compared to how it runs on other computers, so I need something in the loop which will allow to display only 1/40 seconds after previous display, how do I do that? (I'm c++ noob)
You need to check the time at the beginning of you loop, check the time again at the end of the loop after you've finished all of your rendering and update logic and then Sleep() for the difference between the elapsed time and the target frame time (25ms for 40 fps).
This is some code I used in C++ with the SDL library. Basically you need to have a function to start a timer at the start of your loop (StartFpsTimer()) and a function to wait enough time till the next frame is due based on the constant frame rate that you want to have (WaitTillNextFrame()).
The m_oTimer object is a simple timer object that you can start, stop, pause.
GAME_ENGINE_FPS is the frame rate that you would like to have.
// Sets the timer for the main loop
void StartFpsTimer()
{
m_oTimer.Start();
}
// Waits till the next frame is due (to call the loop at regular intervals)
void WaitTillNextFrame()
{
if(this->m_oTimer.GetTicks() < 1000.0 / GAME_ENGINE_FPS) {
delay((1000.0 / GAME_ENGINE_FPS) - m_oTimer.GetTicks());
}
}
while (this->IsRunning())
{
// Starts the fps timer
this->StartFpsTimer();
// Input
this->HandleEvents();
// Logic
this->Update();
// Rendering
this->Draw();
// Wait till the next frame
this->WaitTillNextFrame();
}
I am trying to write C++ code which saves incoming video frames to disk. Asynchronously arriving frames are pushed onto queue by a producer thread. The frames are popped off the queue by a consumer thread. Mutual exclusion of producer and consumer is done using a mutex. However, I still notice frames being dropped. The dropped frames (likely) correspond to instances when producer tries to push the current frame onto queue but cannot do so since consumer holds the lock. Any suggestions ? I essentially do not want the producer to wait. A waiting consumer is okay for me.
EDIT-0 : Alternate idea which does not involve locking. Will this work ?
Producer initially enqueues n seconds worth of video. n can be some small multiple of frame-rate.
As long as queue contains >= n seconds worth of video, consumer dequeues on a frame by frame basis and saves to disk.
When the video is done, the queue is flushed to disk.
EDIT-1: The frames arrive at ~ 15 fps.
EDIT-2 : Outline of code :
Main driver code
// Main function
void LVD::DumpFrame(const IplImage *frame)
{
// Copies frame into internal buffer.
// buffer object is a wrapper around OpenCV's IplImage
Initialize(frame);
// (Producer thread) -- Pushes buffer onto queue
// Thread locks queue, pushes buffer onto queue, unlocks queue and dies
PushBufferOntoQueue();
// (Consumer thread) -- Pop off queue and save to disk
// Thread locks queue, pops it, unlocks queue,
// saves popped buffer to disk and dies
DumpQueue();
++m_frame_id;
}
void LVD::Initialize(const IplImage *frame)
{
if(NULL == m_buffer) // first iteration
m_buffer = new ImageBuffer(frame);
else
m_buffer->Copy(frame);
}
Producer
void LVD::PushBufferOntoQueue()
{
m_queingThread = ::CreateThread( NULL, 0, ThreadFuncPushImageBufferOntoQueue, this, 0, &m_dwThreadID);
}
DWORD WINAPI LVD::ThreadFuncPushImageBufferOntoQueue(void *arg)
{
LVD* videoDumper = reinterpret_cast<LVD*>(arg);
LocalLock ll( &videoDumper->m_que_lock, 60*1000 );
videoDumper->m_frameQue.push(*(videoDumper->m_buffer));
ll.Unlock();
return 0;
}
Consumer
void LVD::DumpQueue()
{
m_dumpingThread = ::CreateThread( NULL, 0, ThreadFuncDumpFrames, this, 0, &m_dwThreadID);
}
DWORD WINAPI LVD::ThreadFuncDumpFrames(void *arg)
{
LVD* videoDumper = reinterpret_cast<LVD*>(arg);
LocalLock ll( &videoDumper->m_que_lock, 60*1000 );
if(videoDumper->m_frameQue.size() > 0 )
{
videoDumper->m_save_frame=videoDumper->m_frameQue.front();
videoDumper->m_frameQue.pop();
}
ll.Unlock();
stringstream ss;
ss << videoDumper->m_saveDir.c_str() << "\\";
ss << videoDumper->m_startTime.c_str() << "\\";
ss << setfill('0') << setw(6) << videoDumper->m_frame_id;
ss << ".png";
videoDumper->m_save_frame.SaveImage(ss.str().c_str());
return 0;
}
Note:
(1) I cannot use C++11. Therefore, Herb Sutter's DDJ article is not an option.
(2) I found a reference to an unbounded single producer-consumer queue. However, the author(s) state that enqueue(adding frames) is probably not wait-free.
(3) I also found liblfds, a C-library but not sure if it will serve my purpose.
The queue cannot be the problem. Video frames arrive at 16 msec intervals, at worst. Your queue only needs to store a pointer to a frame. Adding/removing one in a thread-safe way can never take more than a microsecond.
You'll need to look for another explanation and solution. Video does forever present a fire-hose problem. Disk drives are not generally fast enough to keep up with an uncompressed video stream. So if your consumer cannot keep up with the producer then something is going go give. With a dropped frame the likely outcome when you (correctly) prevent the queue from growing without bound.
Be sure to consider encoding the video. Real-time MPEG and AVC encoders are available. After they compress the stream you should not have a problem keeping up with the disk.
Circular buffer is definitely a good alternative. If you make it use a 2^n size, you can also use this trick to update the pointers:
inline int update_index(int x)
{
return (x + 1) & (size-1);
}
That way, there is no need to use expensive compare (and consequential jumps) or divide (the single most expensive integer operation in any processor - not counting "fill/copy large chunks of memory" type operations).
When dealing with video (or graphics in general) it is essential to do "buffer management". Typically, this is a case of tracking state of the "framebuffer" and avoiding to copy content more than necessary.
The typical approach is to allocate 2 or 3 video-buffers (or frame buffers, or what you call it). A buffer can be owned by either the producer or the consumer. The transfer is ONLY the ownership. So when the video-driver signals that "this buffer is full", the ownership is now with the consumer, that will read the buffer and store it to disk [or whatever]. When the storing is finished, the buffer is given back ("freed") so that the producer can re-use it. Copying the data out of the buffer is expensive [takes time], so you don't want to do that unless it's ABSOLUTELY necessary.