On x86 (either 64-bit or 32-bit) Linux --
for example:
void signal_handler(int) {
// want to know where the program is interrupted ...
}
int main() {
...
signal(SIGALRM, signal_handler);
alarm(5);
...
printf(...); <------- at this point, we trigger signal_handler
...
}
In signal_handler, how can we know we are interrupted at printf in main()?
Use sigaction with SA_SIGINFO set in sa_flags.
Prototype code:
#define _GNU_SOURCE 1 /* To pick up REG_RIP */
#include <stdio.h>
#include <signal.h>
#include <assert.h>
static void
handler(int signo, siginfo_t *info, void *context)
{
const ucontext_t *con = (ucontext_t *)context;
/* I know, never call printf from a signal handler. Meh. */
printf("IP: %lx\n", con->uc_mcontext.gregs[REG_RIP]);
}
int
main(int argc, char *argv[])
{
struct sigaction sa = { };
sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = handler;
assert(sigaction(SIGINT, &sa, NULL) == 0);
for (;;);
return 0;
}
Run it and hit Ctrl-C. (Use Ctrl-\ to terminate...)
This is for x86_64. For 32-bit x86, use REG_EIP instead of REG_RIP.
[edit]
Of course, if you are actually in a library function (like printf) or a system call (like write), the RIP/EIP register might point somewhere funny...
You might want to use libunwind to crawl the stack.
Depending on your OS / Platform, it could be in a variety of areas:
On the current stack a set number of registers deep
On an/the interrupt stack
Within some sort of callback associated with your signal routine...
Without having additional info, I don't think we can track this much further. Adding a C/C++ tag to your might generate more responses and views.
In signal_handler, how can we know we are interrupted at printf in main()?
You can't, at least not from a C, C++, or POSIX perspective. Your operating system may provide OS-specific calls that let you poke at the stack. That's more than a bit dubious, though.
With timer-based signals, which instruction triggered the signal is a toss of the coin.
Workaround idea: if there are only a small number of places which can trigger a signal handler or you are only interested in which bigger block it happened you could maintain that in a variable.
entered = 1; // or entered = ENTER_PRINTF1;
printf(....);
Related
I'm currently working on a project running on a heavily modified version of Linux patched to be able to access a VMEbus. Most of the bus-handling is done, I have a VMEAccess class that uses mmap to write at a specific address of /dev/mem so a driver can pull that data and push it onto the bus.
When the program starts, it has no idea where the slave board it's looking for is located on the bus so it must find it by poking around: it tries to read every address one by one, if a device is connected there the read method returns some data but if there isn't anything connected a SIGBUS signal will be sent to the program.
I tried several solutions (mostly using signal handling) but after some time, I decided on using jumps. The first longjmp() call works fine but the second call to VMEAccess::readWord() gives me a Bus Error even though my handler should prevent the program from crashing.
Here's my code:
#include <iostream>
#include <string>
#include <sstream>
#include <csignal>
#include <cstdlib>
#include <csignal>
#include <csetjmp>
#include "types.h"
#include "VME_access.h"
VMEAccess *busVME;
int main(int argc, char const *argv[]);
void catch_sigbus (int sig);
void exit_function(int sig);
volatile BOOL bus_error;
volatile UDWORD offset;
jmp_buf env;
int main(int argc, char const *argv[])
{
sigemptyset(&sigBusHandler.sa_mask);
struct sigaction sigIntHandler;
sigIntHandler.sa_handler = exit_function;
sigemptyset(&sigIntHandler.sa_mask);
sigIntHandler.sa_flags = 0;
sigaction(SIGINT, &sigIntHandler, NULL);
/* */
struct sigaction sigBusHandler;
sigBusHandler.sa_handler = catch_sigbus;
sigemptyset(&sigBusHandler.sa_mask);
sigBusHandler.sa_flags = 0;
sigaction(SIGBUS, &sigBusHandler, NULL);
busVME = new VMEAccess(VME_SHORT);
offset = 0x01FE;
setjmp(env);
printf("%d\n", sigismember(&sigBusHandler.sa_mask, SIGBUS));
busVME->readWord(offset);
sleep(1);
printf("%#08x\n", offset+0xC1000000);
return 0;
}
void catch_sigbus (int sig)
{
offset++;
printf("%#08x\n", offset);
longjmp(env, 1);
}
void exit_function(int sig)
{
delete busVME;
exit(0);
}
As mentioned in the comments, using longjmp in a signal handler is a bad idea. After doing the jump out of a signal handler your program is effectively still in the signal handler. So calling non-async-signal-safe functions leads to undefined behavior for example. Using siglongjmp won't really help here, quoting man signal-safety:
If a signal handler interrupts the execution of an unsafe function, and the handler terminates via a call to longjmp(3) or siglongjmp(3) and the program subsequently calls an unsafe function, then the behavior of the program is undefined.
And just for example, this (siglongjmp) did cause some problems in libcurl code in the past, see here: error: longjmp causes uninitialized stack frame
I'd suggest to use a regular loop and modify the exit condition in the signal handler (you modify the offset there anyway) instead. Something like the following (pseudo-code):
int had_sigbus = 0;
int main(int argc, char const *argv[])
{
...
for (offset = 0x01FE; offset is sane; ++offset) {
had_sigbus = 0;
probe(offset);
if (!had_sigbus) {
// found
break;
}
}
...
}
void catch_sigbus(int)
{
had_sigbus = 1;
}
This way it's immediately obvious that there is a loop, and the whole logic is much easier to follow. And there are no jumps, so it should work for more than one probe :) But obviously probe() must handle the failed call (the one interrupted with SIGBUS) internally too - and probably return an error. If it does return an error using the had_sigbus function might be not necessary at all.
In C++11, what is the safest (and perferrably most efficient) way to execute unsafe code on a signal being caught, given a type of request-loop (as part of a web request loop)? For example, on catching a SIGUSR1 from a linux command line: kill -30 <process pid>
It is acceptable for the 'unsafe code' to be run on the next request being fired, and no information is lost if the signal is fired multiple times before the unsafe code is run.
For example, my current code is:
static bool globalFlag = false;
void signalHandler(int sig_num, siginfo_t * info, void * context) {
globalFlag = true;
}
void doUnsafeThings() {
// thigns like std::vector push_back, new char[1024], set global vars, etc.
}
void doRegularThings() {
// read filesystem, read global variables, etc.
}
void main(void) {
// set up signal handler (for SIGUSR1) ...
struct sigaction sigact;
sigact.sa_sigaction = onSyncSignal;
sigact.sa_flags = SA_RESTART | SA_SIGINFO;
sigaction(SIGUSR1, &sigact, (struct sigaction *)NULL);
// main loop ...
while(acceptMoreRequests()) { // blocks until new request received
if (globalFlag) {
globalFlag = false;
doUnsafeThings();
}
doRegularThings();
}
}
where I know there could be problems in the main loop testing+setting the globalFlag boolean.
Edit: The if (globalFlag) test will be run in a fairly tight loop, and an 'occasional' false negative is acceptable. However, I suspect there's no optimisation over Basile Starynkevitch's solution anyway?
You should declare your flag
static volatile sig_atomic_t globalFlag = 0;
See e.g. sig_atomic_t, this question and don't forget the volatile qualifier. (It may have been spelled sigatomic_t for C).
On Linux (specifically) you could use signalfd(2) to get a filedescriptor for the signal, and that fd can be poll(2)-ed by your event loop.
Some event loop libraries (libevent, libev ...) know how to handle signals.
And there is also the trick of setting up a pipe (see pipe(2) and pipe(7) for more) at initialization, and just write(2)-ing some byte on it in the signal handler. The event loop would poll and read that pipe. Such a trick is recommended by Qt.
Read also signal(7) and signal-safety(7) (it explains what are the limited set of functions or syscalls usable inside a signal handler)....
BTW, correctness is more important than efficiency. In general, you get few signals (e.g. most programs get a signal once every second at most, not every millisecond).
This is my code :
#define _OPEN_SYS
#include <unistd.h>
#include <stdio.h>
#include <signal.h>
#include <time.h>
volatile int footprint = 0;
void catcher(int signum) {
puts("inside signal catcher!");
alarm(0);
footprint = 1;
return;
}
main() {
printf("footprint=%d\n", footprint);
struct sigaction sact;
sigemptyset(&sact.sa_mask);
sact.sa_flags = 0;
sact.sa_handler = catcher;
if (footprint == 0) {
puts("the signal catcher never gained control");
sigaction(SIGALRM, &sact, NULL);
printf("before loop");
alarm(5); /* timer will pop in five seconds */
while (true);
} else
puts("the signal catcher gained control");
printf("after loop");
}
my output is :
footprint=0
the signal catcher never gained control
before loopinside signal catcher!
and the application keep running forever , I need someway to break this loop , I'm using similar code to make timeout for sybase statement execution as OCCI doesn't support timeout.
Signals such as SIGALRM will interrupt most system calls (but beware of automatically restartable calls). You cannot rely on them to interrupt your syscall-free loop. And even when it does, execution resumes after a signal, so your code happily goes right back to looping.
In fact, your code is not even valid C++ (!!!). Section 1.10p24 of the Standard says:
The implementation may assume that any thread will eventually do one
of the following:
terminate,
make a call to a library I/O function,
access or modify a volatile object, or
perform a synchronization operation or
an atomic operation.
Alex's suggestion of while ( footprint == 0 ) ; will at least correct this defect.
A loop such as while (true); can't be interrupted, except by terminating the thread executing it. The loop has to be coded to check for an interrupt condition and exit.
As Alex mentioned in a comment, while ( footprint == 0 ) ; would correctly implement a loop checking for the given signal handler.
Just being pedantic, footprint should be declared sig_atomic_t not int, but it probably doesn't matter.
I want to catch SIGBUS, my code is shown below:
#include <stdlib.h>
#include <signal.h>
#include <iostream>
#include <stdio.h>
void catch_sigbus (int sig)
{
//std::cout << "SIGBUS" << std::endl;
printf("SIGBUS\n");
exit(-1);
}
int main(int argc, char **argv) {
signal (SIGBUS, catch_sigbus);
int *iptr;
char *cptr;
#if defined(__GNUC__)
# if defined(__i386__)
/* Enable Alignment Checking on x86 */
__asm__("pushf\norl $0x40000,(%esp)\npopf");
# elif defined(__x86_64__)
/* Enable Alignment Checking on x86_64 */
__asm__("pushf\norl $0x40000,(%rsp)\npopf");
# endif
#endif
/* malloc() always provides aligned memory */
cptr = (char*)malloc(sizeof(int) + 1);
/* Increment the pointer by one, making it misaligned */
iptr = (int *) ++cptr;
/* Dereference it as an int pointer, causing an unaligned access */
*iptr = 42;
return 0;
}
When I use printf, it can catch by calling catch_sigbus, but when I use cout, it cannot. So anybody could help me? I run on Ubuntu 12.04.
I have another question. When I catch SIGBUS, how can I get si_code? BUS_ADRALN/BUS_ADRERR/BUS_OBJERR
You can't use printf or cout in a signal handler. Nor can you call exit. You got lucky with printf this time, but you weren't as lucky with cout. If your program is in a different state maybe cout will work and printf won't. Or maybe neither, or both. Check the documentation of your operating system to see which functions are signal safe (if it exists, it's often very badly documented).
Your safest bet in this case is to call write to STDERR_FILENO directly and then call _exit (not exit, that one is unsafe in a signal handler). On some systems it's safe to call fprintf to stderr, but I'm not sure if glibc is one of them.
Edit: To answer your added question, you need to set up your signal handlers with sigaction to get the additional information. This example is as far as I'd go inside a signal handler, I included an alternative method if you want to go advanced. Notice that write is theoretically unsafe because it will break errno, but since we're doing _exit it will be safe in this particular case:
#include <stdlib.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
void
bus_handler(int sig, siginfo_t *si, void *vuctx)
{
char buf[2];
#if 1
/*
* I happen to know that si_code can only be 1, 2 or 3 on this
* particular system, so we only need to handle one digit.
*/
buf[0] = '0' + si->si_code;
buf[1] = '\n';
write(STDERR_FILENO, buf, sizeof(buf));
#else
/*
* This is a trick I sometimes use for debugging , this will
* be visible in strace while not messing with external state too
* much except breaking errno.
*/
write(-1, NULL, si->si_code);
#endif
_exit(1);
}
int
main(int argc, char **argv)
{
struct sigaction sa;
char *cptr;
int *iptr;
memset(&sa, 0, sizeof(sa));
sa.sa_sigaction = bus_handler;
sa.sa_flags = SA_SIGINFO;
sigfillset(&sa.sa_mask);
sigaction(SIGBUS, &sa, NULL);
#if defined(__GNUC__)
# if defined(__i386__)
/* Enable Alignment Checking on x86 */
__asm__("pushf\norl $0x40000,(%esp)\npopf");
# elif defined(__x86_64__)
/* Enable Alignment Checking on x86_64 */
__asm__("pushf\norl $0x40000,(%rsp)\npopf");
# endif
#endif
/* malloc() always provides aligned memory */
cptr = (char*)malloc(sizeof(int) + 1);
/* Increment the pointer by one, making it misaligned */
iptr = (int *) ++cptr;
/* Dereference it as an int pointer, causing an unaligned access */
*iptr = 42;
return 0;
}
According to your comment, you're trying to debug SIGBUS, and there's existing question about that already:
Debugging SIGBUS on x86 Linux
That lists a number of possible causes. But probably the most likely reason for SIGBUS is, you have memory corruption and it messes things up so you get SIGBUS later... So debugging the signal might not be helpful at nailing the bug. That being said, use debugger to catch the point in code where signal is thrown, and see if it is a pointer. That's a good starting point for trying to find out the cause.
recently I set out to port ucos-ii to Ubuntu PC.
As we know, it's not possible to simulate the "process" in the ucos-ii by simply adding a flag in "while" loop in the pthread's call-back function to perform pause and resume(like the solution below). Because the "process" in ucos-ii can be paused or resumed at any time!
How to sleep or pause a PThread in c on Linux
I have found one solution on the web-site below, but it can't be built because it's out of date. It uses the process in Linux to simulate the task(acts like the process in our Linux) in ucos-ii.
http://www2.hs-esslingen.de/~zimmerma/software/index_uk.html
If pthread can act like the process which can be paused and resumed at any time, please tell me some related functions, I can figure it out myself. If it can't, I think I should focus on the older solution. Thanks a lot.
The Modula-3 garbage collector needs to suspend pthreads at an arbitrary time, not just when they are waiting on a condition variable or mutex. It does it by registering a (Unix) signal handler that suspends the thread and then using pthread_kill to send a signal to the target thread. I think it works (it has been reliable for others but I'm debugging an issue with it right now...) It's a bit kludgy, though....
Google for ThreadPThread.m3 and look at the routines "StopWorld" and "StartWorld". Handler itself is in ThreadPThreadC.c.
If stopping at specific points with a condition variable is insufficient, then you can't do this with pthreads. The pthread interface does not include suspend/resume functionality.
See, for example, answer E.4 here:
The POSIX standard provides no mechanism by which a thread A can suspend the execution of another thread B, without cooperation from B. The only way to implement a suspend/restart mechanism is to have B check periodically some global variable for a suspend request and then suspend itself on a condition variable, which another thread can signal later to restart B.
That FAQ answer goes on to describe a couple of non-standard ways of doing it, one in Solaris and one in LinuxThreads (which is now obsolete; do not confuse it with current threading on Linux); neither of those apply to your situation.
On Linux you can probably setup custom signal handler (eg. using signal()) that will contain wait for another signal (eg. using sigsuspend()). You then send the signals using pthread_kill() or tgkill(). It is important to use so-called "realtime signals" for this, because normal signals like SIGUSR1 and SIGUSR2 don't get queued, which means that they can get lost under high load conditions. You send a signal several times, but it gets received only once, because before while signal handler is running, new signals of the same kind are ignored. So if you have concurent threads doing PAUSE/RESUME , you can loose RESUME event and cause deadlock. On the other hand, the pending realtime signals (like SIGRTMIN+1 and SIGRTMIN+2) are not deduplicated, so there can be several same rt signals in queue at the same time.
DISCLAIMER: I had not tried this yet. But in theory it should work.
Also see man 7 signal-safety. There is a list of calls that you can safely call in signal handlers. Fortunately sigsuspend() seems to be one of them.
UPDATE: I have working code right here:
//Filename: pthread_pause.c
//Author: Tomas 'Harvie' Mudrunka 2021
//Build: CFLAGS=-lpthread make pthread_pause; ./pthread_pause
//Test: valgrind --tool=helgrind ./pthread_pause
//I've wrote this code as excercise to solve following stack overflow question:
// https://stackoverflow.com/questions/9397068/how-to-pause-a-pthread-any-time-i-want/68119116#68119116
#define _GNU_SOURCE //pthread_yield() needs this
#include <signal.h>
#include <pthread.h>
//#include <pthread_extra.h>
#include <semaphore.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>
#include <errno.h>
#include <sys/resource.h>
#include <time.h>
#define PTHREAD_XSIG_STOP (SIGRTMIN+0)
#define PTHREAD_XSIG_CONT (SIGRTMIN+1)
#define PTHREAD_XSIGRTMIN (SIGRTMIN+2) //First unused RT signal
pthread_t main_thread;
sem_t pthread_pause_sem;
pthread_once_t pthread_pause_once_ctrl = PTHREAD_ONCE_INIT;
void pthread_pause_once(void) {
sem_init(&pthread_pause_sem, 0, 1);
}
#define pthread_pause_init() (pthread_once(&pthread_pause_once_ctrl, &pthread_pause_once))
#define NSEC_PER_SEC (1000*1000*1000)
// timespec_normalise() from https://github.com/solemnwarning/timespec/
struct timespec timespec_normalise(struct timespec ts)
{
while(ts.tv_nsec >= NSEC_PER_SEC) {
++(ts.tv_sec); ts.tv_nsec -= NSEC_PER_SEC;
}
while(ts.tv_nsec <= -NSEC_PER_SEC) {
--(ts.tv_sec); ts.tv_nsec += NSEC_PER_SEC;
}
if(ts.tv_nsec < 0) { // Negative nanoseconds isn't valid according to POSIX.
--(ts.tv_sec); ts.tv_nsec = (NSEC_PER_SEC + ts.tv_nsec);
}
return ts;
}
void pthread_nanosleep(struct timespec t) {
//Sleep calls on Linux get interrupted by signals, causing premature wake
//Pthread (un)pause is built using signals
//Therefore we need self-restarting sleep implementation
//IO timeouts are restarted by SA_RESTART, but sleeps do need explicit restart
//We also need to sleep using absolute time, because relative time is paused
//You should use this in any thread that gets (un)paused
struct timespec wake;
clock_gettime(CLOCK_MONOTONIC, &wake);
t = timespec_normalise(t);
wake.tv_sec += t.tv_sec;
wake.tv_nsec += t.tv_nsec;
wake = timespec_normalise(wake);
while(clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &wake, NULL)) if(errno!=EINTR) break;
return;
}
void pthread_nsleep(time_t s, long ns) {
struct timespec t;
t.tv_sec = s;
t.tv_nsec = ns;
pthread_nanosleep(t);
}
void pthread_sleep(time_t s) {
pthread_nsleep(s, 0);
}
void pthread_pause_yield() {
//Call this to give other threads chance to run
//Wait until last (un)pause action gets finished
sem_wait(&pthread_pause_sem);
sem_post(&pthread_pause_sem);
//usleep(0);
//nanosleep(&((const struct timespec){.tv_sec=0,.tv_nsec=1}), NULL);
//pthread_nsleep(0,1); //pthread_yield() is not enough, so we use sleep
pthread_yield();
}
void pthread_pause_handler(int signal) {
//Do nothing when there are more signals pending (to cleanup the queue)
//This is no longer needed, since we use semaphore to limit pending signals
/*
sigset_t pending;
sigpending(&pending);
if(sigismember(&pending, PTHREAD_XSIG_STOP)) return;
if(sigismember(&pending, PTHREAD_XSIG_CONT)) return;
*/
//Post semaphore to confirm that signal is handled
sem_post(&pthread_pause_sem);
//Suspend if needed
if(signal == PTHREAD_XSIG_STOP) {
sigset_t sigset;
sigfillset(&sigset);
sigdelset(&sigset, PTHREAD_XSIG_STOP);
sigdelset(&sigset, PTHREAD_XSIG_CONT);
sigsuspend(&sigset); //Wait for next signal
} else return;
}
void pthread_pause_enable() {
//Having signal queue too deep might not be necessary
//It can be limited using RLIMIT_SIGPENDING
//You can get runtime SigQ stats using following command:
//grep -i sig /proc/$(pgrep binary)/status
//This is no longer needed, since we use semaphores
//struct rlimit sigq = {.rlim_cur = 32, .rlim_max=32};
//setrlimit(RLIMIT_SIGPENDING, &sigq);
pthread_pause_init();
//Prepare sigset
sigset_t sigset;
sigemptyset(&sigset);
sigaddset(&sigset, PTHREAD_XSIG_STOP);
sigaddset(&sigset, PTHREAD_XSIG_CONT);
//Register signal handlers
//signal(PTHREAD_XSIG_STOP, pthread_pause_handler);
//signal(PTHREAD_XSIG_CONT, pthread_pause_handler);
//We now use sigaction() instead of signal(), because it supports SA_RESTART
const struct sigaction pause_sa = {
.sa_handler = pthread_pause_handler,
.sa_mask = sigset,
.sa_flags = SA_RESTART,
.sa_restorer = NULL
};
sigaction(PTHREAD_XSIG_STOP, &pause_sa, NULL);
sigaction(PTHREAD_XSIG_CONT, &pause_sa, NULL);
//UnBlock signals
pthread_sigmask(SIG_UNBLOCK, &sigset, NULL);
}
void pthread_pause_disable() {
//This is important for when you want to do some signal unsafe stuff
//Eg.: locking mutex, calling printf() which has internal mutex, etc...
//After unlocking mutex, you can enable pause again.
pthread_pause_init();
//Make sure all signals are dispatched before we block them
sem_wait(&pthread_pause_sem);
//Block signals
sigset_t sigset;
sigemptyset(&sigset);
sigaddset(&sigset, PTHREAD_XSIG_STOP);
sigaddset(&sigset, PTHREAD_XSIG_CONT);
pthread_sigmask(SIG_BLOCK, &sigset, NULL);
sem_post(&pthread_pause_sem);
}
int pthread_pause(pthread_t thread) {
sem_wait(&pthread_pause_sem);
//If signal queue is full, we keep retrying
while(pthread_kill(thread, PTHREAD_XSIG_STOP) == EAGAIN) usleep(1000);
pthread_pause_yield();
return 0;
}
int pthread_unpause(pthread_t thread) {
sem_wait(&pthread_pause_sem);
//If signal queue is full, we keep retrying
while(pthread_kill(thread, PTHREAD_XSIG_CONT) == EAGAIN) usleep(1000);
pthread_pause_yield();
return 0;
}
void *thread_test() {
//Whole process dies if you kill thread immediately before it is pausable
//pthread_pause_enable();
while(1) {
//Printf() is not async signal safe (because it holds internal mutex),
//you should call it only with pause disabled!
//Will throw helgrind warnings anyway, not sure why...
//See: man 7 signal-safety
pthread_pause_disable();
printf("Running!\n");
pthread_pause_enable();
//Pausing main thread should not cause deadlock
//We pause main thread here just to test it is OK
pthread_pause(main_thread);
//pthread_nsleep(0, 1000*1000);
pthread_unpause(main_thread);
//Wait for a while
//pthread_nsleep(0, 1000*1000*100);
pthread_unpause(main_thread);
}
}
int main() {
pthread_t t;
main_thread = pthread_self();
pthread_pause_enable(); //Will get inherited by all threads from now on
//you need to call pthread_pause_enable (or disable) before creating threads,
//otherwise first (un)pause signal will kill whole process
pthread_create(&t, NULL, thread_test, NULL);
while(1) {
pthread_pause(t);
printf("PAUSED\n");
pthread_sleep(3);
printf("UNPAUSED\n");
pthread_unpause(t);
pthread_sleep(1);
/*
pthread_pause_disable();
printf("RUNNING!\n");
pthread_pause_enable();
*/
pthread_pause(t);
pthread_unpause(t);
}
pthread_join(t, NULL);
printf("DIEDED!\n");
}
I am also working on library called "pthread_extra", which will have stuff like this and much more. Will publish soon.
UPDATE2: This is still causing deadlocks when calling pause/unpause rapidly (removed sleep() calls). Printf() implementation in glibc has mutex, so if you suspend thread which is in middle of printf() and then want to printf() from your thread which plans to unpause that thread later, it will never happen, because printf() is locked. Unfortunately i've removed the printf() and only run empty while loop in the thread, but i still get deadlocks under high pause/unpause rates. and i don't know why. Maybe (even realtime) Linux signals are not 100% safe. There is realtime signal queue, maybe it just overflows or something...
UPDATE3: i think i've managed to fix the deadlock, but had to completely rewrite most of the code. Now i have one (sig_atomic_t) variable per each thread which holds state whether that thread should be running or not. Works kinda like condition variable. pthread_(un)pause() transparently remembers this for each thread. I don't have two signals. now i only have one signal. handler of that signal looks at that variable and only blocks on sigsuspend() when that variable says the thread should NOT run. otherwise it returns from signal handler. in order to suspend/resume the thread i now set the sig_atomic_t variable to desired state and call that signal (which is common for both suspend and resume). It is important to use realtime signals to be sure handler will actualy run after you've modified the state variable. Code is bit complex because of the thread status database. I will share the code in separate solution as soon as i manage to simplify it enough. But i want to preserve the two signal version in here, because it kinda works, i like the simplicity and maybe people will give us more insight on how to optimize it.
UPDATE4: I've fixed the deadlock in original code (no need for helper variable holding the status) by using single handler for two signals and optimizing signal queue a bit. There is still some problem with printf() shown by helgrind, but it is not caused by my signals, it happens even when i do not call pause/unpause at all. Overall this was only tested on LINUX, not sure how portable the code is, because there seem to be some undocumented behaviour of signal handlers which was originaly causing the deadlock.
Please note that pause/unpause cannot be nested. if you pause 3 times, and unpause 1 time, the thread WILL RUN. If you need such behaviour, you should create some kind of wrapper which will count the nesting levels and signal the thread accordingly.
UPDATE5: I've improved robustness of the code by following changes: I ensure proper serialization of pause/unpause calls by use of semaphores. This hopefuly fixes last remaining deadlocks. Now you can be sure that when pause call returns, the target thread is actualy already paused. This also solves issues with signal queue overflowing. Also i've added SA_RESTART flag, which prevents internal signals from causing interuption of IO waits. Sleeps/delays still have to be restarted manualy, but i provide convenient wrapper called pthread_nanosleep() which does just that.
UPDATE6: i realized that simply restarting nanosleep() is not enough, because that way timeout does not run when thread is paused. Therefore i've modified pthread_nanosleep() to convert timeout interval to absolute time point in the future and sleep until that. Also i've hidden semaphore initialization, so user does not need to do that.
Here is example of thread function within a class with pause/resume functionality...
class SomeClass
{
public:
// ... construction/destruction
void Resume();
void Pause();
void Stop();
private:
static void* ThreadFunc(void* pParam);
pthread_t thread;
pthread_mutex_t mutex;
pthread_cond_t cond_var;
int command;
};
SomeClass::SomeClass()
{
pthread_mutex_init(&mutex, NULL);
pthread_cond_init(&cond_var, NULL);
// create thread in suspended state..
command = 0;
pthread_create(&thread, NULL, ThreadFunc, this);
}
SomeClass::~SomeClass()
{
// we should stop the thread and exit ThreadFunc before calling of blocking pthread_join function
// also it prevents the mutex staying locked..
Stop();
pthread_join(thread, NULL);
pthread_cond_destroy(&cond_var);
pthread_mutex_destroy(&mutex);
}
void* SomeClass::ThreadFunc(void* pParam)
{
SomeClass* pThis = (SomeClass*)pParam;
timespec time_ns = {0, 50*1000*1000}; // 50 milliseconds
while(1)
{
pthread_mutex_lock(&pThis->mutex);
if (pThis->command == 2) // command to stop thread..
{
// be sure to unlock mutex before exit..
pthread_mutex_unlock(&pThis->mutex);
return NULL;
}
else if (pThis->command == 0) // command to pause thread..
{
pthread_cond_wait(&pThis->cond_var, &pThis->mutex);
// dont forget to unlock the mutex..
pthread_mutex_unlock(&pThis->mutex);
continue;
}
if (pThis->command == 1) // command to run..
{
// normal runing process..
fprintf(stderr, "*");
}
pthread_mutex_unlock(&pThis->mutex);
// it's important to give main thread few time after unlock 'this'
pthread_yield();
// ... or...
//nanosleep(&time_ns, NULL);
}
pthread_exit(NULL);
}
void SomeClass::Stop()
{
pthread_mutex_lock(&mutex);
command = 2;
pthread_cond_signal(&cond_var);
pthread_mutex_unlock(&mutex);
}
void SomeClass::Pause()
{
pthread_mutex_lock(&mutex);
command = 0;
// in pause command we dont need to signal cond_var because we not in wait state now..
pthread_mutex_unlock(&mutex);
}
void SomeClass::Resume()
{
pthread_mutex_lock(&mutex);
command = 1;
pthread_cond_signal(&cond_var);
pthread_mutex_unlock(&mutex);
}