ALSA: cannot recovery from underrun, prepare failed: Broken pipe - c++

I'm writing a program that reads from two mono ALSA devices and writes them to one stereo ALSA device.
I use three threads and ping-pong buffer to manage them. Two reading threads and one writing threads. Their configurations are as follows:
// Capture ALSA device
alsaBufferSize = 16384;
alsaCaptureChunkSize = 4096;
bitsPerSample = 16;
samplingFrequency = 24000;
numOfChannels = 1;
block = true;
// Playback device (only list params that are different from above)
alsaBufferSize = 16384 * 2;
numOfChannels = 2;
Two reading threads would write ping buffer and then pong buffer. The writing thread would wait for any of two buffer ready, lock it, read from it, and then unlock it.
But when I run this program, xrun appears and can't be recovered.
ALSA lib pcm.c:7316:(snd_pcm_recover) underrun occurred
ALSA lib pcm.c:7319:(snd_pcm_recover) cannot recovery from underrun, prepare failed: Broken pipe
Below is my code for writing to ALSA playback device:
bool CALSAWriter::writen(uint8_t** a_pOutputBuffer, uint32_t a_rFrames)
bool ret = false;
// 1. write audio chunk from ALSA
const snd_pcm_sframes_t alsaCaptureChunkSize = static_cast<snd_pcm_sframes_t>(a_rFrames); //(m_pALSACfg->alsaCaptureChunkSize);
const snd_pcm_sframes_t writenFrames = snd_pcm_writen(m_pALSAHandle, (void**)a_pOutputBuffer, alsaCaptureChunkSize);
if (0 < writenFrames)
{// write succeeded
ret = true;
{// write failed
logPrint("CALSAWriter WRITE FAILED for writen farmes = %d ", writenFrames);
ret = false;
const int alsaReadError = static_cast<int>(writenFrames);// alsa error is of int type
if (ALSA_OK == snd_pcm_recover(m_pALSAHandle, alsaReadError, 0))
{// recovery succeeded
a_rFrames = 0;// only recovery was done, no write at all was done
logPrint("CALSAWriter: failed to recover from ALSA write error: %s (%i)", snd_strerror(alsaReadError), alsaReadError);
ret = false;
// 2. check current buffer load
snd_pcm_sframes_t framesInBuffer = 0;
snd_pcm_sframes_t delayedFrames = 0;
snd_pcm_avail_delay(m_pALSAHandle, &framesInBuffer, &delayedFrames);
// round to nearest int, cast is safe, buffer size is no bigger than uint32_t
const int32_t ONE_HUNDRED_PERCENTS = 100;
const uint32_t bufferLoadInPercents = ONE_HUNDRED_PERCENTS *
static_cast<int32_t>(framesInBuffer) / static_cast<int32_t>(m_pALSACfg->alsaBufferSize);
logPrint("write: ALSA buffer percentage: %u, delayed frames: %d", bufferLoadInPercents, delayedFrames);
return ret;
Other diagnostic info:
02:53:00.465047 log info V 1 [write: ALSA buffer percentage: 75, delayed frames: 4096]
02:53:00.635758 log info V 1 [write: ALSA buffer percentage: 74, delayed frames: 4160]
02:53:00.805714 log info V 1 [write: ALSA buffer percentage: 74, delayed frames: 4152]
02:53:00.976781 log info V 1 [write: ALSA buffer percentage: 74, delayed frames: 4144]
02:53:01.147948 log info V 1 [write: ALSA buffer percentage: 0, delayed frames: 0]
02:53:01.317113 log error V 1 [CALSAWriter WRITE FAILED for writen farmes = -32 ]
02:53:01.317795 log error V 1 [CALSAWriter: failed to recover from ALSA write error: Broken pipe (-32)]

It took me about 3 days to find solution. Thanks for #CL. tips of "writen is called too late".
Thread switching time is not constant.
Insert an empty buffer before you invoke "writen" at the first time. The time length of this buffer could be any value to avoid multi-thread switching. I set it to 150ms.
Or you can set thread priority to high while I can't do this. Refer to ALSA: Ways to prevent underrun for speaker.
Problem diagnostic:
The fact is:
"readi" return every 171ms (4096/24000 = 0.171). Reading thread set buffer as ready.
Once buffer is ready, "writen" is invoked in writing thread. The buffer is copied to ALSA playback device. And it'll take playback device 171ms to play this part of buffer.
If playback device has finished playing all the buffer, and no new buffer is written. "Underrun" occurred.
The real scenario here:
At 0ms, "readi" starts. At 171ms "readi" finishes.
At 172ms, (1ms for thread switching), "writen" starts. At 343ms, "underrun" shall happen, if no new buffer written.
At 171ms, "readi" starts again. At 342ms "readi" finishes.
At this time, thread switching takes 2ms. Before "writen" starts at 344ms, "underrun" occurred at 343ms
When CPU load is high, it's not guarantee how long "thread switching" shall take. That's why you can insert an empty buffer at first write. And turn scenario into:
At 0ms, "readi" starts. At 171ms "readi" finishes.
At 172ms, (1ms for thread switching), "writen" starts with an 150ms-long buffer. At 493ms, "underrun" shall happen, if no new buffer written.
At 171ms, "readi" starts again. At 342ms "readi" finishes.
At this time, thread switching takes 50ms. "writen" starts at 392ms, "underrun" won't occur at all.


Higher Than Expected Inherent Latency in OpenAL Playback (Windows, C++)

TL/DR: A simple echo program that records and plays back audio immediately is showing higher than expected latency.
I am working on a real-time audio broadcasting application. I have decided to use OpenAL to both capture and playback audio samples. I am planning on sending UDP packets with raw PCM data across a LAN network. My ideal latency between recording on one machine and playing back on another machine is 30ms. (A lofty goal).
As a test, I made a small program that records samples from a microphone and immediately plays them back to the host speakers. I did this to test the baseline latency.
However, I'm seeing an inherent latency of about 65 - 70 ms from simply recording the audio and playing it back.
I have reduced the buffer size that openAL uses to 100 samples at 44100 samples per second. Ideally, this would yield a latency of 2 - 3 ms.
I have yet to try this on another platform (MacOS / Linux) to determine if this is an OpenAL issue or a Windows issue.
Here's the code:
using std::list;
#define FREQ 44100 // Sample rate
#define CAP_SIZE 100 // How much to capture at a time (affects latency)
#define NUM_BUFFERS 10
int main(int argC, char* argV[])
list<ALuint> bufferQueue; // A quick and dirty queue of buffer objects
ALuint helloBuffer[NUM_BUFFERS], helloSource;
ALCdevice* audioDevice = alcOpenDevice(NULL); // Request default audio device
ALCcontext* audioContext = alcCreateContext(audioDevice, NULL); // Create the audio context
// Request the default capture device with a ~2ms buffer
ALCdevice* inputDevice = alcCaptureOpenDevice(NULL, FREQ, AL_FORMAT_MONO16, CAP_SIZE);
alGenBuffers(NUM_BUFFERS, &helloBuffer[0]); // Create some buffer-objects
// Queue our buffers onto an STL list
for (int ii = 0; ii < NUM_BUFFERS; ++ii) {
alGenSources(1, &helloSource); // Create a sound source
short* buffer = new short[CAP_SIZE]; // A buffer to hold captured audio
ALCint samplesIn = 0; // How many samples are captured
ALint availBuffers = 0; // Buffers to be recovered
ALuint myBuff; // The buffer we're using
ALuint buffHolder[NUM_BUFFERS]; // An array to hold catch the unqueued buffers
alcCaptureStart(inputDevice); // Begin capturing
bool done = false;
while (!done) { // Main loop
// Poll for recoverable buffers
alGetSourcei(helloSource, AL_BUFFERS_PROCESSED, &availBuffers);
if (availBuffers > 0) {
alSourceUnqueueBuffers(helloSource, availBuffers, buffHolder);
for (int ii = 0; ii < availBuffers; ++ii) {
// Push the recovered buffers back on the queue
// Poll for captured audio
alcGetIntegerv(inputDevice, ALC_CAPTURE_SAMPLES, 1, &samplesIn);
if (samplesIn > CAP_SIZE) {
// Grab the sound
alcCaptureSamples(inputDevice, buffer, samplesIn);
// Stuff the captured data in a buffer-object
if (!bufferQueue.empty()) { // We just drop the data if no buffers are available
myBuff = bufferQueue.front(); bufferQueue.pop_front();
alBufferData(myBuff, AL_FORMAT_MONO16, buffer, samplesIn * sizeof(short), FREQ);
// Queue the buffer
alSourceQueueBuffers(helloSource, 1, &myBuff);
// Restart the source if needed
// (if we take too long and the queue dries up,
// the source stops playing).
ALint sState = 0;
alGetSourcei(helloSource, AL_SOURCE_STATE, &sState);
if (sState != AL_PLAYING) {
// Stop capture
// Stop the sources
alSourceStopv(1, &helloSource);
alSourcei(helloSource, AL_BUFFER, 0);
// Clean-up
alDeleteSources(1, &helloSource);
alDeleteBuffers(NUM_BUFFERS, &helloBuffer[0]);
return 0;
Here is an image of the waveform showing the delay in input sound and the resulting echo. This example is shows a latency of about 70ms.
Waveform with 70ms echo delay
System specs:
Intel Core i7-9750H
24 GB Ram
Windows 10 Home: V 2004 - Build 19041.508
Sound driver: Realtek Audio (Driver version 10.0.19041.1)
Input device: Logitech G330 Headset
Issue is reproducible on other Windows Systems.
I tried to use PortAudio to do a similar thing and achieved a similar result. I've determined that this is due to Windows audio drivers.
I rebuilt PortAudio with ASIO audio only and installed ASIO4ALL audio driver. This has achieved an acceptable latency of <10ms.
I ultimately resolved this issue by ditching OpenAL in favor of PortAudio and Steinberg ASIO. I installed ASIO4ALL and rebuilt PortAudio to accept only ASIO device drivers. I needed to use the ASIO SDK from Steinberg to do this. (Followed guide here). This has allowed me to achieve a latency of between 5 and 10 ms.
This post helped a lot: input delay with PortAudio callback and ASIO sdk

Using FTDI D2xx and Thorlabs APT communication protocol results in delays on Linux

I am trying to communicate with the Thorlabs TDC001 controllers (apt - dc servo controller) by using the FTDI D2xx driver on Linux. However, when I send writing commands, large delays occur (1-2 seconds) until the command is actually executed on TDC001.
In particular, this can be observed when the connected linear stage is moving and a new position command is sent. It takes 1-2 seconds until the stage actually changes its direction. Also, if I request DCSTATUSUPDATE (which gives position and velocity) and then read out the queue of FTDI, I do not get the right data. Only if I wait 1 second between requesting and reading, I get the (correct) data, but for the past. I added the C++ code for this case.
I need live-data and faster execution of writing commands for closed-loop control.
I'm not sure if the problem is on the side of Thorlabs or FTDI. Everything works, except for the large delays. There are other commands, e.g. MOVE_STOP, which respond immediately. Also, if I send a new position command right after finishing homing, it is executed immediately.
Whenever I ask for FT_GetStatus, there is nothing else in the Tx or Rx queue except the 20 bytes in Rx for DCSTATUSUPDATE.
The references for D2XX and APT communication protocol can be found here:
FTDI Programmer's Guide
Thorlabs APT Communication Protocol
The initialization function:
bool CommunicationFunctions::initializeKeyHandle(string serialnumber){
* This function initializes the TDC motor controller and finds its corresponding keyhandle.
keyHandle = NULL;
// To open the device, the vendor and product ID must be set correctly
ftStatus = FT_SetVIDPID(0x403,0xfaf0);
//Open device:
const char* tmp = serialnumber.c_str();
int numAttempts=0;
while (keyHandle ==0){
ftStatus = FT_OpenEx(const_cast<char*>(tmp),FT_OPEN_BY_SERIAL_NUMBER, &keyHandle);
if (numAttempts++>20){
cerr << "Device Could Not Be Opened";
return false;
// Set baud rate to 115200
ftStatus = FT_SetBaudRate(keyHandle,115200);
// 8 data bits, 1 stop bit, no parity
ftStatus = FT_SetDataCharacteristics(keyHandle, FT_BITS_8, FT_STOP_BITS_1, FT_PARITY_NONE);
// Pre purge dwell 50ms.
// Purge the device.
ftStatus = FT_Purge(keyHandle, FT_PURGE_RX | FT_PURGE_TX);
// Post purge dwell 50ms.
// Reset device.
ftStatus = FT_ResetDevice(keyHandle);
// Set flow control to RTS/CTS.
ftStatus = FT_SetFlowControl(keyHandle, FT_FLOW_RTS_CTS, 0, 0);
// Set RTS.
ftStatus = FT_SetRts(keyHandle);
return true;
How I read out my data:
bool CommunicationFunctions::read_tdc(int32_t* position, uint16_t* velocity){
uint8_t *RxBuffer = new uint8_t[256]();
DWORD RxBytes;
DWORD BytesReceived = 0;
// Manually request status update:
uint8_t req_statusupdate[6] = {0x90,0x04,0x01,0x00,0x50,0x01};
ftStatus = FT_Write(keyHandle, req_statusupdate, (DWORD)6, &written);
if(ftStatus != FT_OK){
cerr << "Command could not be transmitted: Request status update" << endl;
return false;
// sleep(1); //**this sleep() would lead to right result, but I don't want this delay**
// Get number of bytes in queue of TDC001
// Check if there are bytes in queue before reading them, otherwise do
// not read anything in
if(ftStatus != FT_OK){
cerr << "Read device failed!" << endl;
return false;
// Check if enough bytes are received, i.e. if signal is right
if(!(BytesReceived >= 6)){
cerr << "Error in bytes received" << endl;
return false;
// Look for correct message in RxBuffer and read out velocity and position
// Delete receive buffer
delete[] RxBuffer;
RxBuffer = NULL;
return true;
My system is as follows:
Linux Kernel: 4.12.0
OS: Fedora 25
Compiler: gcc-7.1
The following is a simplified example of the problem function.
// Get Raw Buffer from the camera
void v4l2_Processor::get_Raw_Frame(void* buffer)

VIDIOC_DQBUF hangs on camera disconnection

My application is using v4l2 running in a separate thread. If a camera gets disconnected then the user is given an appropriate message before terminating the thread cleanly. This works in the vast majority of cases. However, if the execution is inside the VIDIOC_DQBUF ioctl when the camera is disconnected then the ioctl doesn't return causing the entire thread to lock up.
My system is as follows:
Linux Kernel: 4.12.0
OS: Fedora 25
Compiler: gcc-7.1
The following is a simplified example of the problem function.
// Get Raw Buffer from the camera
void v4l2_Processor::get_Raw_Frame(void* buffer)
struct v4l2_buffer buf;
memset(&buf, 0, sizeof (buf));
buf.memory = V4L2_MEMORY_MMAP;
// Grab next frame
if (ioctl(m_FD, VIDIOC_DQBUF, &buf) < 0)
{ // If the camera becomes disconnected when the execution is
// in the above ioctl, then the ioctl never returns.
std::cerr << "Error in DQBUF\n";
// Queue for next frame
if (ioctl(m_FD, VIDIOC_QBUF, &buf) < 0)
std::cerr << "Error in QBUF\n";
memcpy(buffer, m_Buffers[buf.index].buff,
Can anybody shed any light on why this ioctl locks up and what I might do to solve this problem?
I appreciate any help offered.
I am currently having the same issue. However, my entire thread doesn't lock up. The ioctl times out (15s) but thats way too long.
Is there a what to query V4L2 (that wont hang) if video is streaming? or at least change the ioctl timeout ?
#Amanda you can change the timeout of the dequeue in the v4l2_capture driver source & rebuild the kernel/kernel module
modify the timeout in the dqueue function:
if (!wait_event_interruptible_timeout(cam->enc_queue,
cam->enc_counter != 0,
50 * HZ)) // Modify this constant
Best of luck!

Does v4l2 camera capture with mmap ring buffer make sense for tracking application

I'm working on a v4l2 API for capturing images from a raw sensor on embedded platform. My capture routine is related to the example on [1]. The proposed method for streaming is using mmaped buffers as a ringbuffer.
For initialization, buffers (default = 4 buffers) are requested using ioctl with VIDIOC_REQBUFS identifier. Subsequently, they are queued using VIDIOC_QBUF. The entire streaming procedure is described here [2]. As soon as streaming starts, the driver fills the queued buffers with data. The timestamp of v4l2_buffer struct indicates the time of first byte captured which in my case results in a time interval of approximately 8.3 ms (=120fps) between buffers. So far so good.
Now what I would expect of a ring buffer is that new captures automatically overwrite older ones in a circular fashion. But this is not what happens. Only when a buffer is queued again (VIDIOC_QBUF) after it has been dequeued (VIDIOC_DQBUF) and processed (demosaic, tracking step,..), a new frame is assigned to a buffer. If I do meet the timing condition (processing < 8.3 ms) I don't get the latest captured frame when dequeuing but the oldest captured one (according to FIFO), so the one of 3x8.3 ms before the current one. If the timing condition is not met the time span gets even larger, as the buffers are not overwritten.
So I have several questions:
1. Does it even make sense for this tracking application to have a ring buffer as I don't really need history of frames? I certainly doubt that, but by using the proposed mmap method drivers mostly require a minimum amount of buffers to be requested.
2. Should a seperate thread continously DQBUF and QBUF to accomplish the buffer overwrite? How could this be accomplished?
3. As a workaround one could probably dequeue and requeue all buffers on every capture, but this doesn't sound right. Is there someone with more experience in real time capture and streaming who can point to the "proper" way to go?
4. Also currently, I'm doing the preprocessing step (demosaicing) between DQBUF and QBUF and the tracking step afterwards. Should the tracking step also be executed before QBUF is called again?
So the main code basically performs Capture() and Track() subsequently in a while loop. The Capture routine looks as follows:
cv::Mat v4l2Camera::Capture( size_t timeout ) {
fd_set fds;
FD_SET(mFD, &fds);
struct timeval tv;
tv.tv_sec = 0;
tv.tv_usec = 0;
const bool threaded = true; //false;
// proper register settings
if( timeout > 0 )
tv.tv_sec = timeout / 1000;
tv.tv_usec = (timeout - (tv.tv_sec * 1000)) * 1000;
const int result = select(mFD + 1, &fds, NULL, NULL, &tv);
if( result == -1 )
//if (EINTR == errno)
printf("v4l2 -- select() failed (errno=%i) (%s)\n", errno, strerror(errno));
return cv::Mat();
else if( result == 0 )
if( timeout > 0 )
printf("v4l2 -- select() timed out...\n");
return cv::Mat(); // timeout, not necessarily an error (TRY_AGAIN)
// dequeue input buffer from V4L2
struct v4l2_buffer buf;
memset(&buf, 0, sizeof(v4l2_buffer));
if( xioctl(mFD, VIDIOC_DQBUF, &buf) < 0 )
printf("v4l2 -- ioctl(VIDIOC_DQBUF) failed (errno=%i) (%s)\n", errno, strerror(errno));
return cv::Mat();
if( buf.index >= mBufferCountMMap )
printf("v4l2 -- invalid mmap buffer index (%u)\n", buf.index);
return cv::Mat();
// emit ringbuffer entry
printf("v4l2 -- recieved %ux%u video frame (index=%u)\n", mWidth, mHeight, (uint32_t)buf.index);
void* image_ptr = mBuffersMMap[buf.index].ptr;
// frame processing (& tracking step)
cv::Mat demosaic_mat = demosaic(image_ptr,mSize,mDepth,1);
// re-queue buffer to V4L2
if( xioctl(mFD, VIDIOC_QBUF, &buf) < 0 )
printf("v4l2 -- ioctl(VIDIOC_QBUF) failed (errno=%i) (%s)\n", errno, strerror(errno));
return demosaic_mat;
As my knowledge is limited regarding capturing and streaming video I appreciate any help.

C++/C FFmpeg artifact build up across video frames

I am building a recorder for capturing video and audio in separate threads (using Boost thread groups) using FFmpeg 2.8.6 on Ubuntu 16.04. I followed the demuxing_decoding example here:
Video capture specifics:
I am reading H264 off a Logitech C920 webcam and writing the video to a raw file. The issue I notice with the video is that there seems to be a build-up of artifacts across frames until a particular frame resets. Here is my frame grabbing, and decoding functions:
// Used for injecting decoding functions for different media types, allowing
// for a generic decode loop
typedef std::function<int(AVPacket*, int*, int)> PacketDecoder;
* Decodes a video packet.
* If the decoding operation is successful, returns the number of bytes decoded,
* else returns the result of the decoding process from ffmpeg
int decode_video_packet(AVPacket *packet,
int *got_frame,
int cached){
int ret = 0;
int decoded = packet->size;
*got_frame = 0;
//Decode video frame
ret = avcodec_decode_video2(video_decode_context,
video_frame, got_frame, packet);
if (ret < 0) {
//FFmpeg users should use av_err2str
char errbuf[128];
av_strerror(ret, errbuf, sizeof(errbuf));
std::cerr << "Error decoding video frame " << errbuf << std::endl;
decoded = ret;
} else {
if (*got_frame) {
video_frame->pts = av_frame_get_best_effort_timestamp(video_frame);
//Write to log file
AVRational *time_base = &video_decode_context->time_base;
log_frame(video_frame, time_base,
video_frame->coded_picture_number, video_log_stream);
#if( DEBUG )
std::cout << "Video frame " << ( cached ? "(cached)" : "" )
<< " coded:" << video_frame->coded_picture_number
<< " pts:" << pts << std::endl;
/*Copy decoded frame to destination buffer:
*This is required since rawvideo expects non aligned data*/
(const uint8_t **)(video_frame->data),
//Write to rawvideo file
//Unref the refcounted frame
return decoded;
* Grabs frames in a loop and decodes them using the specified decoding function
int process_frames(AVFormatContext *context,
PacketDecoder packet_decoder) {
int ret = 0;
int got_frame;
AVPacket packet;
//Initialize packet, set data to NULL, let the demuxer fill it
av_init_packet(&packet); = NULL;
packet.size = 0;
// read frames from the file
for (;;) {
ret = av_read_frame(context, &packet);
if (ret < 0) {
if (ret == AVERROR(EAGAIN)) {
} else {
//Convert timing fields to the decoder timebase
unsigned int stream_index = packet.stream_index;
AVPacket orig_packet = packet;
do {
ret = packet_decoder(&packet, &got_frame, 0);
if (ret < 0) {
} += ret;
packet.size -= ret;
} while (packet.size > 0);
if(stop_recording == true) {
//Flush cached frames
std::cout << "Flushing frames" << std::endl; = NULL;
packet.size = 0;
do {
packet_decoder(&packet, &got_frame, 1);
} while (got_frame);
av_log(0, AV_LOG_INFO, "Done processing frames\n");
return ret;
How do I go about debugging the underlying issue?
Is it possible that running the decoding code in a thread other than the one in which the decoding context was opened is causing the problem?
Am I doing something wrong in the decoding code?
Things I have tried/found:
I found this thread that is about the same problem here: FFMPEG decoding artifacts between keyframes
(I cannot post samples of my corrupted frames due to privacy issues, but the image linked to in that question depicts the same issue I have)
However, the answer to the question is posted by the OP without specific details about how the issue was fixed. The OP only mentions that he wasn't 'preserving the packets correctly', but nothing about what was wrong or how to fix it. I do not have enough reputation to post a comment seeking clarification.
I was initially passing the packet into the decoding function by value, but switched to passing by pointer on the off chance that the packet freeing was being done incorrectly.
I found another question about debugging decoding issues, but couldn't find anything conclusive: How is video decoding corruption debugged?
I'd appreciate any insight. Thanks a lot!
[EDIT] In response to Ronald's answer, I am adding a little more information that wouldn't fit in a comment:
I am only calling decode_video_packet() from the thread processing video frames; the other thread processing audio frames calls a similar decode_audio_packet() function. So only one thread calls the function. I should mention that I have set the thread_count in the decoding context to 1, failing which I would get a segfault in malloc.c while flushing the cached frames.
I can see this being a problem if the process_frames and the frame decoder function were run on separate threads, which is not the case. Is there a specific reason why it would matter if the freeing is done within the function, or after it returns? I believe the freeing function is passed a copy of the original packet because multiple decode calls would be required for audio packet in case the decoder doesnt decode the entire audio packet.
A general problem is that the corruption does not occur all the time. I can debug better if it is deterministic. Otherwise, I can't even say if a solution works or not.
A few things to check:
are you running multiple threads that are calling decode_video_packet()? If you are: don't do that! FFmpeg has built-in support for multi-threaded decoding, and you should let FFmpeg do threading internally and transparently.
you are calling av_free_packet() right after calling the frame decoder function, but at that point it may not yet have had a chance to copy the contents. You should probably let decode_video_packet() free the packet instead, after calling avcodec_decode_video2().
General debugging advice:
run it without any threading and see if that works;
if it does, and with threading it fails, use thread debuggers such as tsan or helgrind to help in finding race conditions that point to your code.
it can also help to know whether the output you're getting is reproduceable (this suggests a non-threading-related bug in your code) or changes from one run to the other (this suggests a race condition in your code).
And yes, the periodic clean-ups are because of keyframes.