VIDIOC_DQBUF hangs on camera disconnection - c++

My application is using v4l2 running in a separate thread. If a camera gets disconnected then the user is given an appropriate message before terminating the thread cleanly. This works in the vast majority of cases. However, if the execution is inside the VIDIOC_DQBUF ioctl when the camera is disconnected then the ioctl doesn't return causing the entire thread to lock up.
My system is as follows:
Linux Kernel: 4.12.0
OS: Fedora 25
Compiler: gcc-7.1
The following is a simplified example of the problem function.
// Get Raw Buffer from the camera
void v4l2_Processor::get_Raw_Frame(void* buffer)
{
struct v4l2_buffer buf;
memset(&buf, 0, sizeof (buf));
buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
buf.memory = V4L2_MEMORY_MMAP;
// Grab next frame
if (ioctl(m_FD, VIDIOC_DQBUF, &buf) < 0)
{ // If the camera becomes disconnected when the execution is
// in the above ioctl, then the ioctl never returns.
std::cerr << "Error in DQBUF\n";
}
// Queue for next frame
if (ioctl(m_FD, VIDIOC_QBUF, &buf) < 0)
{
std::cerr << "Error in QBUF\n";
}
memcpy(buffer, m_Buffers[buf.index].buff,
m_Buffers[buf.index].buf_length);
}
Can anybody shed any light on why this ioctl locks up and what I might do to solve this problem?
I appreciate any help offered.
Amanda

I am currently having the same issue. However, my entire thread doesn't lock up. The ioctl times out (15s) but thats way too long.
Is there a what to query V4L2 (that wont hang) if video is streaming? or at least change the ioctl timeout ?
UPDATE:
#Amanda you can change the timeout of the dequeue in the v4l2_capture driver source & rebuild the kernel/kernel module
modify the timeout in the dqueue function:
if (!wait_event_interruptible_timeout(cam->enc_queue,
cam->enc_counter != 0,
50 * HZ)) // Modify this constant
Best of luck!

Related

Using FTDI D2xx and Thorlabs APT communication protocol results in delays on Linux

I am trying to communicate with the Thorlabs TDC001 controllers (apt - dc servo controller) by using the FTDI D2xx driver on Linux. However, when I send writing commands, large delays occur (1-2 seconds) until the command is actually executed on TDC001.
In particular, this can be observed when the connected linear stage is moving and a new position command is sent. It takes 1-2 seconds until the stage actually changes its direction. Also, if I request DCSTATUSUPDATE (which gives position and velocity) and then read out the queue of FTDI, I do not get the right data. Only if I wait 1 second between requesting and reading, I get the (correct) data, but for the past. I added the C++ code for this case.
I need live-data and faster execution of writing commands for closed-loop control.
I'm not sure if the problem is on the side of Thorlabs or FTDI. Everything works, except for the large delays. There are other commands, e.g. MOVE_STOP, which respond immediately. Also, if I send a new position command right after finishing homing, it is executed immediately.
Whenever I ask for FT_GetStatus, there is nothing else in the Tx or Rx queue except the 20 bytes in Rx for DCSTATUSUPDATE.
The references for D2XX and APT communication protocol can be found here:
FTDI Programmer's Guide
Thorlabs APT Communication Protocol
The initialization function:
bool CommunicationFunctions::initializeKeyHandle(string serialnumber){
//INITIALIZATION//
/*
* This function initializes the TDC motor controller and finds its corresponding keyhandle.
*/
keyHandle = NULL;
// To open the device, the vendor and product ID must be set correctly
ftStatus = FT_SetVIDPID(0x403,0xfaf0);
}
//Open device:
const char* tmp = serialnumber.c_str();
int numAttempts=0;
while (keyHandle ==0){
ftStatus = FT_OpenEx(const_cast<char*>(tmp),FT_OPEN_BY_SERIAL_NUMBER, &keyHandle);
if (numAttempts++>20){
cerr << "Device Could Not Be Opened";
return false;
}
}
// Set baud rate to 115200
ftStatus = FT_SetBaudRate(keyHandle,115200);
// 8 data bits, 1 stop bit, no parity
ftStatus = FT_SetDataCharacteristics(keyHandle, FT_BITS_8, FT_STOP_BITS_1, FT_PARITY_NONE);
// Pre purge dwell 50ms.
usleep(50);
// Purge the device.
ftStatus = FT_Purge(keyHandle, FT_PURGE_RX | FT_PURGE_TX);
// Post purge dwell 50ms.
usleep(50);
// Reset device.
ftStatus = FT_ResetDevice(keyHandle);
// Set flow control to RTS/CTS.
ftStatus = FT_SetFlowControl(keyHandle, FT_FLOW_RTS_CTS, 0, 0);
// Set RTS.
ftStatus = FT_SetRts(keyHandle);
return true;
}
How I read out my data:
bool CommunicationFunctions::read_tdc(int32_t* position, uint16_t* velocity){
uint8_t *RxBuffer = new uint8_t[256]();
DWORD RxBytes;
DWORD BytesReceived = 0;
// Manually request status update:
uint8_t req_statusupdate[6] = {0x90,0x04,0x01,0x00,0x50,0x01};
ftStatus = FT_Write(keyHandle, req_statusupdate, (DWORD)6, &written);
if(ftStatus != FT_OK){
cerr << "Command could not be transmitted: Request status update" << endl;
return false;
}
// sleep(1); //**this sleep() would lead to right result, but I don't want this delay**
// Get number of bytes in queue of TDC001
FT_GetQueueStatus(keyHandle,&RxBytes);
// Check if there are bytes in queue before reading them, otherwise do
// not read anything in
if(RxBytes>0){
ftStatus=FT_Read(keyHandle,RxBuffer,RxBytes,&BytesReceived);
if(ftStatus != FT_OK){
cerr << "Read device failed!" << endl;
return false;
}
}
// Check if enough bytes are received, i.e. if signal is right
if(!(BytesReceived >= 6)){
cerr << "Error in bytes received" << endl;
return false;
}
// Look for correct message in RxBuffer and read out velocity and position
getPosVel(position,velocity,RxBuffer);
// Delete receive buffer
delete[] RxBuffer;
RxBuffer = NULL;
return true;
}
If I use read_tdc function after homing and during movement to absolute position, I just get "Homing completed" message in the first attempt. When I try read_tdc again, I get an old value (probably the one from before). I don't understand what happens here. Why does this old data even remain in the queue (latency is 10 ms). Can anybody help me to get faster responses and reactions?

Does v4l2 camera capture with mmap ring buffer make sense for tracking application

I'm working on a v4l2 API for capturing images from a raw sensor on embedded platform. My capture routine is related to the example on [1]. The proposed method for streaming is using mmaped buffers as a ringbuffer.
For initialization, buffers (default = 4 buffers) are requested using ioctl with VIDIOC_REQBUFS identifier. Subsequently, they are queued using VIDIOC_QBUF. The entire streaming procedure is described here [2]. As soon as streaming starts, the driver fills the queued buffers with data. The timestamp of v4l2_buffer struct indicates the time of first byte captured which in my case results in a time interval of approximately 8.3 ms (=120fps) between buffers. So far so good.
Now what I would expect of a ring buffer is that new captures automatically overwrite older ones in a circular fashion. But this is not what happens. Only when a buffer is queued again (VIDIOC_QBUF) after it has been dequeued (VIDIOC_DQBUF) and processed (demosaic, tracking step,..), a new frame is assigned to a buffer. If I do meet the timing condition (processing < 8.3 ms) I don't get the latest captured frame when dequeuing but the oldest captured one (according to FIFO), so the one of 3x8.3 ms before the current one. If the timing condition is not met the time span gets even larger, as the buffers are not overwritten.
So I have several questions:
1. Does it even make sense for this tracking application to have a ring buffer as I don't really need history of frames? I certainly doubt that, but by using the proposed mmap method drivers mostly require a minimum amount of buffers to be requested.
2. Should a seperate thread continously DQBUF and QBUF to accomplish the buffer overwrite? How could this be accomplished?
3. As a workaround one could probably dequeue and requeue all buffers on every capture, but this doesn't sound right. Is there someone with more experience in real time capture and streaming who can point to the "proper" way to go?
4. Also currently, I'm doing the preprocessing step (demosaicing) between DQBUF and QBUF and the tracking step afterwards. Should the tracking step also be executed before QBUF is called again?
So the main code basically performs Capture() and Track() subsequently in a while loop. The Capture routine looks as follows:
cv::Mat v4l2Camera::Capture( size_t timeout ) {
fd_set fds;
FD_ZERO(&fds);
FD_SET(mFD, &fds);
struct timeval tv;
tv.tv_sec = 0;
tv.tv_usec = 0;
const bool threaded = true; //false;
// proper register settings
this->format2registerSetting();
//
if( timeout > 0 )
{
tv.tv_sec = timeout / 1000;
tv.tv_usec = (timeout - (tv.tv_sec * 1000)) * 1000;
}
//
const int result = select(mFD + 1, &fds, NULL, NULL, &tv);
if( result == -1 )
{
//if (EINTR == errno)
printf("v4l2 -- select() failed (errno=%i) (%s)\n", errno, strerror(errno));
return cv::Mat();
}
else if( result == 0 )
{
if( timeout > 0 )
printf("v4l2 -- select() timed out...\n");
return cv::Mat(); // timeout, not necessarily an error (TRY_AGAIN)
}
// dequeue input buffer from V4L2
struct v4l2_buffer buf;
memset(&buf, 0, sizeof(v4l2_buffer));
buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
buf.memory = V4L2_MEMORY_MMAP; //V4L2_MEMORY_USERPTR;
if( xioctl(mFD, VIDIOC_DQBUF, &buf) < 0 )
{
printf("v4l2 -- ioctl(VIDIOC_DQBUF) failed (errno=%i) (%s)\n", errno, strerror(errno));
return cv::Mat();
}
if( buf.index >= mBufferCountMMap )
{
printf("v4l2 -- invalid mmap buffer index (%u)\n", buf.index);
return cv::Mat();
}
// emit ringbuffer entry
printf("v4l2 -- recieved %ux%u video frame (index=%u)\n", mWidth, mHeight, (uint32_t)buf.index);
void* image_ptr = mBuffersMMap[buf.index].ptr;
// frame processing (& tracking step)
cv::Mat demosaic_mat = demosaic(image_ptr,mSize,mDepth,1);
// re-queue buffer to V4L2
if( xioctl(mFD, VIDIOC_QBUF, &buf) < 0 )
printf("v4l2 -- ioctl(VIDIOC_QBUF) failed (errno=%i) (%s)\n", errno, strerror(errno));
return demosaic_mat;
}
As my knowledge is limited regarding capturing and streaming video I appreciate any help.

C++/C FFmpeg artifact build up across video frames

Context:
I am building a recorder for capturing video and audio in separate threads (using Boost thread groups) using FFmpeg 2.8.6 on Ubuntu 16.04. I followed the demuxing_decoding example here: https://www.ffmpeg.org/doxygen/2.8/demuxing_decoding_8c-example.html
Video capture specifics:
I am reading H264 off a Logitech C920 webcam and writing the video to a raw file. The issue I notice with the video is that there seems to be a build-up of artifacts across frames until a particular frame resets. Here is my frame grabbing, and decoding functions:
// Used for injecting decoding functions for different media types, allowing
// for a generic decode loop
typedef std::function<int(AVPacket*, int*, int)> PacketDecoder;
/**
* Decodes a video packet.
* If the decoding operation is successful, returns the number of bytes decoded,
* else returns the result of the decoding process from ffmpeg
*/
int decode_video_packet(AVPacket *packet,
int *got_frame,
int cached){
int ret = 0;
int decoded = packet->size;
*got_frame = 0;
//Decode video frame
ret = avcodec_decode_video2(video_decode_context,
video_frame, got_frame, packet);
if (ret < 0) {
//FFmpeg users should use av_err2str
char errbuf[128];
av_strerror(ret, errbuf, sizeof(errbuf));
std::cerr << "Error decoding video frame " << errbuf << std::endl;
decoded = ret;
} else {
if (*got_frame) {
video_frame->pts = av_frame_get_best_effort_timestamp(video_frame);
//Write to log file
AVRational *time_base = &video_decode_context->time_base;
log_frame(video_frame, time_base,
video_frame->coded_picture_number, video_log_stream);
#if( DEBUG )
std::cout << "Video frame " << ( cached ? "(cached)" : "" )
<< " coded:" << video_frame->coded_picture_number
<< " pts:" << pts << std::endl;
#endif
/*Copy decoded frame to destination buffer:
*This is required since rawvideo expects non aligned data*/
av_image_copy(video_dest_attr.video_destination_data,
video_dest_attr.video_destination_linesize,
(const uint8_t **)(video_frame->data),
video_frame->linesize,
video_decode_context->pix_fmt,
video_decode_context->width,
video_decode_context->height);
//Write to rawvideo file
fwrite(video_dest_attr.video_destination_data[0],
1,
video_dest_attr.video_destination_bufsize,
video_out_file);
//Unref the refcounted frame
av_frame_unref(video_frame);
}
}
return decoded;
}
/**
* Grabs frames in a loop and decodes them using the specified decoding function
*/
int process_frames(AVFormatContext *context,
PacketDecoder packet_decoder) {
int ret = 0;
int got_frame;
AVPacket packet;
//Initialize packet, set data to NULL, let the demuxer fill it
av_init_packet(&packet);
packet.data = NULL;
packet.size = 0;
// read frames from the file
for (;;) {
ret = av_read_frame(context, &packet);
if (ret < 0) {
if (ret == AVERROR(EAGAIN)) {
continue;
} else {
break;
}
}
//Convert timing fields to the decoder timebase
unsigned int stream_index = packet.stream_index;
av_packet_rescale_ts(&packet,
context->streams[stream_index]->time_base,
context->streams[stream_index]->codec->time_base);
AVPacket orig_packet = packet;
do {
ret = packet_decoder(&packet, &got_frame, 0);
if (ret < 0) {
break;
}
packet.data += ret;
packet.size -= ret;
} while (packet.size > 0);
av_free_packet(&orig_packet);
if(stop_recording == true) {
break;
}
}
//Flush cached frames
std::cout << "Flushing frames" << std::endl;
packet.data = NULL;
packet.size = 0;
do {
packet_decoder(&packet, &got_frame, 1);
} while (got_frame);
av_log(0, AV_LOG_INFO, "Done processing frames\n");
return ret;
}
Questions:
How do I go about debugging the underlying issue?
Is it possible that running the decoding code in a thread other than the one in which the decoding context was opened is causing the problem?
Am I doing something wrong in the decoding code?
Things I have tried/found:
I found this thread that is about the same problem here: FFMPEG decoding artifacts between keyframes
(I cannot post samples of my corrupted frames due to privacy issues, but the image linked to in that question depicts the same issue I have)
However, the answer to the question is posted by the OP without specific details about how the issue was fixed. The OP only mentions that he wasn't 'preserving the packets correctly', but nothing about what was wrong or how to fix it. I do not have enough reputation to post a comment seeking clarification.
I was initially passing the packet into the decoding function by value, but switched to passing by pointer on the off chance that the packet freeing was being done incorrectly.
I found another question about debugging decoding issues, but couldn't find anything conclusive: How is video decoding corruption debugged?
I'd appreciate any insight. Thanks a lot!
[EDIT] In response to Ronald's answer, I am adding a little more information that wouldn't fit in a comment:
I am only calling decode_video_packet() from the thread processing video frames; the other thread processing audio frames calls a similar decode_audio_packet() function. So only one thread calls the function. I should mention that I have set the thread_count in the decoding context to 1, failing which I would get a segfault in malloc.c while flushing the cached frames.
I can see this being a problem if the process_frames and the frame decoder function were run on separate threads, which is not the case. Is there a specific reason why it would matter if the freeing is done within the function, or after it returns? I believe the freeing function is passed a copy of the original packet because multiple decode calls would be required for audio packet in case the decoder doesnt decode the entire audio packet.
A general problem is that the corruption does not occur all the time. I can debug better if it is deterministic. Otherwise, I can't even say if a solution works or not.
A few things to check:
are you running multiple threads that are calling decode_video_packet()? If you are: don't do that! FFmpeg has built-in support for multi-threaded decoding, and you should let FFmpeg do threading internally and transparently.
you are calling av_free_packet() right after calling the frame decoder function, but at that point it may not yet have had a chance to copy the contents. You should probably let decode_video_packet() free the packet instead, after calling avcodec_decode_video2().
General debugging advice:
run it without any threading and see if that works;
if it does, and with threading it fails, use thread debuggers such as tsan or helgrind to help in finding race conditions that point to your code.
it can also help to know whether the output you're getting is reproduceable (this suggests a non-threading-related bug in your code) or changes from one run to the other (this suggests a race condition in your code).
And yes, the periodic clean-ups are because of keyframes.

ALSA: cannot recovery from underrun, prepare failed: Broken pipe

I'm writing a program that reads from two mono ALSA devices and writes them to one stereo ALSA device.
I use three threads and ping-pong buffer to manage them. Two reading threads and one writing threads. Their configurations are as follows:
// Capture ALSA device
alsaBufferSize = 16384;
alsaCaptureChunkSize = 4096;
bitsPerSample = 16;
samplingFrequency = 24000;
numOfChannels = 1;
block = true;
accessType = SND_PCM_ACCESS_RW_INTERLEAVED;
// Playback device (only list params that are different from above)
alsaBufferSize = 16384 * 2;
numOfChannels = 2;
accessType = SND_PCM_ACCESS_RW_NON_INTERLEAVED;
Two reading threads would write ping buffer and then pong buffer. The writing thread would wait for any of two buffer ready, lock it, read from it, and then unlock it.
But when I run this program, xrun appears and can't be recovered.
ALSA lib pcm.c:7316:(snd_pcm_recover) underrun occurred
ALSA lib pcm.c:7319:(snd_pcm_recover) cannot recovery from underrun, prepare failed: Broken pipe
Below is my code for writing to ALSA playback device:
bool CALSAWriter::writen(uint8_t** a_pOutputBuffer, uint32_t a_rFrames)
{
bool ret = false;
// 1. write audio chunk from ALSA
const snd_pcm_sframes_t alsaCaptureChunkSize = static_cast<snd_pcm_sframes_t>(a_rFrames); //(m_pALSACfg->alsaCaptureChunkSize);
const snd_pcm_sframes_t writenFrames = snd_pcm_writen(m_pALSAHandle, (void**)a_pOutputBuffer, alsaCaptureChunkSize);
if (0 < writenFrames)
{// write succeeded
ret = true;
}
else
{// write failed
logPrint("CALSAWriter WRITE FAILED for writen farmes = %d ", writenFrames);
ret = false;
const int alsaReadError = static_cast<int>(writenFrames);// alsa error is of int type
if (ALSA_OK == snd_pcm_recover(m_pALSAHandle, alsaReadError, 0))
{// recovery succeeded
a_rFrames = 0;// only recovery was done, no write at all was done
}
else
{
logPrint("CALSAWriter: failed to recover from ALSA write error: %s (%i)", snd_strerror(alsaReadError), alsaReadError);
ret = false;
}
}
// 2. check current buffer load
snd_pcm_sframes_t framesInBuffer = 0;
snd_pcm_sframes_t delayedFrames = 0;
snd_pcm_avail_delay(m_pALSAHandle, &framesInBuffer, &delayedFrames);
// round to nearest int, cast is safe, buffer size is no bigger than uint32_t
const int32_t ONE_HUNDRED_PERCENTS = 100;
const uint32_t bufferLoadInPercents = ONE_HUNDRED_PERCENTS *
static_cast<int32_t>(framesInBuffer) / static_cast<int32_t>(m_pALSACfg->alsaBufferSize);
logPrint("write: ALSA buffer percentage: %u, delayed frames: %d", bufferLoadInPercents, delayedFrames);
return ret;
}
Other diagnostic info:
02:53:00.465047 log info V 1 [write: ALSA buffer percentage: 75, delayed frames: 4096]
02:53:00.635758 log info V 1 [write: ALSA buffer percentage: 74, delayed frames: 4160]
02:53:00.805714 log info V 1 [write: ALSA buffer percentage: 74, delayed frames: 4152]
02:53:00.976781 log info V 1 [write: ALSA buffer percentage: 74, delayed frames: 4144]
02:53:01.147948 log info V 1 [write: ALSA buffer percentage: 0, delayed frames: 0]
02:53:01.317113 log error V 1 [CALSAWriter WRITE FAILED for writen farmes = -32 ]
02:53:01.317795 log error V 1 [CALSAWriter: failed to recover from ALSA write error: Broken pipe (-32)]
It took me about 3 days to find solution. Thanks for #CL. tips of "writen is called too late".
Issue:
Thread switching time is not constant.
Solution:
Insert an empty buffer before you invoke "writen" at the first time. The time length of this buffer could be any value to avoid multi-thread switching. I set it to 150ms.
Or you can set thread priority to high while I can't do this. Refer to ALSA: Ways to prevent underrun for speaker.
Problem diagnostic:
The fact is:
"readi" return every 171ms (4096/24000 = 0.171). Reading thread set buffer as ready.
Once buffer is ready, "writen" is invoked in writing thread. The buffer is copied to ALSA playback device. And it'll take playback device 171ms to play this part of buffer.
If playback device has finished playing all the buffer, and no new buffer is written. "Underrun" occurred.
The real scenario here:
At 0ms, "readi" starts. At 171ms "readi" finishes.
At 172ms, (1ms for thread switching), "writen" starts. At 343ms, "underrun" shall happen, if no new buffer written.
At 171ms, "readi" starts again. At 342ms "readi" finishes.
At this time, thread switching takes 2ms. Before "writen" starts at 344ms, "underrun" occurred at 343ms
When CPU load is high, it's not guarantee how long "thread switching" shall take. That's why you can insert an empty buffer at first write. And turn scenario into:
At 0ms, "readi" starts. At 171ms "readi" finishes.
At 172ms, (1ms for thread switching), "writen" starts with an 150ms-long buffer. At 493ms, "underrun" shall happen, if no new buffer written.
At 171ms, "readi" starts again. At 342ms "readi" finishes.
At this time, thread switching takes 50ms. "writen" starts at 392ms, "underrun" won't occur at all.

valgrind/helgrind gets killed on stress test

I'm making a web server on linux in C++ with pthreads. I tested it with valgrind for leaks and memory problems - all fixed. I tested it with helgrind for thread problems - all fixed. I'm trying a stress test. I'm getting problem when the probram is run with helgrind
valgrind --tool=helgrind ./chats
It just dies on random places with the text "Killed" as it would do when I kill it with kill -9. The only report I get sometimes from helgrind is that the program exists while still holding some locks, which is normal when gets killed.
When checking for leaks:
valgrind --leak-check=full ./chats
it's more stable, but I managed to make it die once with few hundreds of concurrent connections.
I tried running program alone and couldn't make it crash at all. I tried up to 250 concurrent connections. Each thread delays with 100ms to make it easier to have multiple connections at the same time. No crash.
In all cases threads as well as connections do not get above 10 and I see it crash even with 2 connections, but never with only one connection at the same time (with including main thread and one helper thread is total of 3).
Is it possible that the problem will only happen when run with
helgrind or just helgrind makes it more likely to show?
What be the reason that a program gets killed (by kernel?) Allocating too much memory, too many file descriptors?
I tested a bit more and I found out that it only dies when the client times out and closes the connection. So here is the code which detects that the client closed the socket:
void *TcpClient::run(){
int ret;
struct timeval tv;
char * buff = (char *)malloc(10001);
int br;
colorPrintf(TC_GREEN, "new client starting: %d\n", sockFd);
while(isRunning()){
tv.tv_sec = 0;
tv.tv_usec = 500*1000;
FD_SET(sockFd, &readFds);
ret = select(sockFd+1, &readFds, NULL, NULL, &tv);
if(ret < 0){
//select error
continue;
}else if(ret == 0){
// no data to read
continue;
}
br = read(sockFd, buff, 10000);
buff[br] = 0;
if (br == 0){
// client disconnected;
setRunning(false);
break;
}
if (reader != NULL){
reader->tcpRead(this, std::string(buff, br));
}else{
readBuffer.append(buff, br);
}
//printf("received: %s\n", buff);
}
free(buff);
sendFeedback((void *)1);
colorPrintf(TC_RED, "closing client socket: %d\n", sockFd);
::close(sockFd);
sockFd = -1;
return NULL;
}
// this method writes to socket
bool TcpClient::write(std::string data){
int bw;
int dataLen = data.length();
bw = ::write(sockFd, data.data(), dataLen);
if (bw != dataLen){
return false; // I don't close the socket in this case, maybe I should
}
return true;
}
P.S. Threads are:
main thread. connections are accepted here.
one helper thread which listen for signals and sends signals. It stops signal reception for the app and manually polls the signal queue. The reason is because it's hard to handle signals when using threads. I found this technique here in stackoverflow and it seams to work pretty fine in other projects.
client connection threads
The full code is pretty big, but I can post chunks if someone is interested.
Update:
I managed to trigger the problem with only one connection. It's all happening in client thread. This is what I do:
I read/parse headers. I put delay before writing so the client can timeout (which causes the problem).
Here the client timeouts and leaves (probably closes socket)
I write back headers
I write back the html code.
Here is how I write back
bw = ::write(sockFd, data.data(), dataLen);
// bw is = dataLen = 108 when writing the headers
//then secondary write for HTML kills the program. there is a message before and after write()
bw = ::write(sockFd, data.data(), dataLen); // doesn't go past this point second time
Update 2: Got it :)
gdb sais:
Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x41401940 (LWP 10554)]
0x0000003ac2e0d89b in write () from /lib64/libpthread.so.0
Question 1: What should I do to void receiving this signal.
Question 2: How to know that remote side disconnected while writing. On read select returns that there is data but data read is 0. How about write?
Well I just had to handle the SIGPIPE singal and write returned -1 -> I close socket and quit thread gracefully. Works like a charm.
I guess the easiest way is to set signal handler of SIGPIPE to SIG_IGN:
signal(SIGPIPE, SIG_IGN);
Note that first write was successful and didn't kill the program. If you have similar problem check if you are writing once or multiple times. If you are not familiar with gdb this is how to do it:
gdb ./your-program
> run
and gdb will tell you all about signals and sigfaults.