Can POSIX timers safely modify C++ STL objects? - c++

I'm attempting to write a C++ "wrapper" for the POSIX timer system on Linux, so that my C++ program can set timeouts for things (such as waiting for a message to arrive over the network) using the system clock, without dealing with POSIX's ugly C interface. It seems to work most of the time, but occasionally my program will segfault after several minutes of running successfully. The problem seems to be that my LinuxTimerManager object (or one of its member objects) gets its memory corrupted, but unfortunately the problem refuses to appear if I run the program under Valgrind, so I'm stuck staring at my code to try to figure out what's wrong with it.
Here's the core of my timer-wrapper implementation:
LinuxTimerManager.h:
namespace util {
using timer_id_t = int;
class LinuxTimerManager {
private:
timer_id_t next_id;
std::map<timer_id_t, timer_t> timer_handles;
std::map<timer_id_t, std::function<void(void)>> timer_callbacks;
std::set<timer_id_t> cancelled_timers;
friend void timer_signal_handler(int signum, siginfo_t* info, void* ucontext);
public:
LinuxTimerManager();
timer_id_t register_timer(const int delay_ms, std::function<void(void)> callback);
void cancel_timer(const timer_id_t timer_id);
};
void timer_signal_handler(int signum, siginfo_t* info, void* ucontext);
}
LinuxTimerManager.cpp:
namespace util {
LinuxTimerManager* tm_instance;
LinuxTimerManager::LinuxTimerManager() : next_id(0) {
tm_instance = this;
struct sigaction sa = {0};
sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = timer_signal_handler;
sigemptyset(&sa.sa_mask);
int success_flag = sigaction(SIGRTMIN, &sa, NULL);
assert(success_flag == 0);
}
void timer_signal_handler(int signum, siginfo_t* info, void* ucontext) {
timer_id_t timer_id = info->si_value.sival_int;
auto cancelled_location = tm_instance->cancelled_timers.find(timer_id);
//Only fire the callback if the timer is not in the cancelled set
if(cancelled_location == tm_instance->cancelled_timers.end()) {
tm_instance->timer_callbacks.at(timer_id)();
} else {
tm_instance->cancelled_timers.erase(cancelled_location);
}
tm_instance->timer_callbacks.erase(timer_id);
timer_delete(tm_instance->timer_handles.at(timer_id));
tm_instance->timer_handles.erase(timer_id);
}
timer_id_t LinuxTimerManager::register_timer(const int delay_ms, std::function<void(void)> callback) {
struct sigevent timer_event = {0};
timer_event.sigev_notify = SIGEV_SIGNAL;
timer_event.sigev_signo = SIGRTMIN;
timer_event.sigev_value.sival_int = next_id;
timer_t timer_handle;
int success_flag = timer_create(CLOCK_REALTIME, &timer_event, &timer_handle);
assert(success_flag == 0);
timer_handles[next_id] = timer_handle;
timer_callbacks[next_id] = callback;
struct itimerspec timer_spec = {0};
timer_spec.it_interval.tv_sec = 0;
timer_spec.it_interval.tv_nsec = 0;
timer_spec.it_value.tv_sec = 0;
timer_spec.it_value.tv_nsec = delay_ms * 1000000;
timer_settime(timer_handle, 0, &timer_spec, NULL);
return next_id++;
}
void LinuxTimerManager::cancel_timer(const timer_id_t timer_id) {
if(timer_handles.find(timer_id) != timer_handles.end()) {
cancelled_timers.emplace(timer_id);
}
}
}
When my program crashes, the segfault always comes from timer_signal_handler(), usually the lines tm_instance->timer_callbacks.erase(timer_id) or tm_instance->timer_handles.erase(timer_id). The actual segfault is thrown from somewhere deep in the std::map implementation (i.e. stl_tree.h).
Could my memory corruption be caused by a race condition between different timer signals modifying the same LinuxTimerManager? I thought only one timer signal was delivered at a time, but maybe I misunderstood the man pages. Is it just generally unsafe to make a Linux signal handler modify a complex C++ object like std::map?

The signal can occur in the middle of e.g. malloc or free and thus most calls which do interesting things with containers could result in reentering the memory allocation support while its data structures are in an arbitrary state. (As pointed out in the comments, most functions are not safe to call in asynchronous signal handlers. malloc and free are just examples.) Reentering a component in this fashion leads to pretty much arbitrary failure.
Libraries cannot be made safe against this behavior without blocking signals for the entire process during any operations within the library. Doing that is prohibitively expensive, both in the overhead of managing the signal mask and in the amount of time signals would be blocked. (It has to be for the entire process as a signal handler should not block on locks. If a thread handling a signal calls into a library protected by mutexes while another thread holds a mutex the signal handler needs, the handler will block. It is very hard to avoid deadlock when this can happen.)
Designs which work around this typically have a thread which listens for specific event and then does the processing. You have to use semaphores to synchronize between the thread and the signal handler.

Related

Multithreaded C++: force read from memory, bypassing cache

I'm working on a personal hobby-time game engine and I'm working on a multithreaded batch executor. I was originally using a concurrent lockless queue and std::function all over the place to facilitate communication between the master and slave threads, but decided to scrap it in favor of a lighter-weight way of doing things that give me tight control over memory allocation: function pointers and memory pools.
Anyway, I've run into a problem:
The function pointer, no matter what I try, is only getting read correctly by one thread while the others read a null pointer and thus fail an assert.
I'm fairly certain this is a problem with caching. I have confirmed that all threads have the same address for the pointer. I've tried declaring it as volatile, intptr_t, std::atomic, and tried all sorts of casting-fu and the threads all just seem to ignore it and continue reading their cached copies.
I've modeled the master and slave in a model checker to make sure the concurrency is good, and there is no livelock or deadlock (provided that the shared variables all synchronize correctly)
void Executor::operator() (int me) {
while (true) {
printf("Slave %d waiting.\n", me);
{
std::unique_lock<std::mutex> lock(batch.ready_m);
while(!batch.running) batch.ready.wait(lock);
running_threads++;
}
printf("Slave %d running.\n", me);
BatchFunc func = batch.func;
assert(func != nullptr);
int index;
if (batch.store_values) {
while ((index = batch.item.fetch_add(1)) < batch.n_items) {
void* data = reinterpret_cast<void*>(batch.data_buffer + index * batch.item_size);
func(batch.share_data, data);
}
}
else {
while ((index = batch.item.fetch_add(1)) < batch.n_items) {
void** data = reinterpret_cast<void**>(batch.data_buffer + index * batch.item_size);
func(batch.share_data, *data);
}
}
// at least one thread finished, so make sure we won't loop back around
batch.running = false;
if (running_threads.fetch_sub(1) == 1) { // I am the last one
batch.done = true; // therefore all threads are done
batch.complete.notify_all();
}
}
}
void Executor::run_batch() {
assert(!batch.running);
if (batch.func == nullptr || batch.n_items == 0) return;
batch.item.store(0);
batch.running = true;
batch.done = false;
batch.ready.notify_all();
printf("Master waiting.\n");
{
std::unique_lock<std::mutex> lock(batch.complete_m);
while (!batch.done) batch.complete.wait(lock);
}
printf("Master ready.\n");
batch.func = nullptr;
batch.n_items = 0;
}
batch.func is being set by another function
template<typename SharedT, typename ItemT>
void set_batch_job(void(*func)(const SharedT*, ItemT*), const SharedT& share_data, bool byValue = true) {
static_assert(sizeof(SharedT) <= SHARED_DATA_MAXSIZE, "Shared data too large");
static_assert(std::is_pod<SharedT>::value, "Shared data type must be POD");
assert(std::is_pod<ItemT>::value || !byValue);
assert(!batch.running);
batch.func = reinterpret_cast<volatile BatchFunc>(func);
memcpy(batch.share_data, (void*) &share_data, sizeof(SharedT));
batch.store_values = byValue;
if (byValue) {
batch.item_size = sizeof(ItemT);
}
else { // store pointers instead of values
batch.item_size = sizeof(ItemT*);
}
batch.n_items = 0;
}
and here is the struct (and typedef) that it's dealing with
typedef void(*BatchFunc)(const void*, void*);
struct JobBatch {
volatile BatchFunc func;
void* const share_data = operator new(SHARED_DATA_MAXSIZE);
intptr_t const data_buffer = reinterpret_cast<intptr_t>(operator new (EXEC_DATA_BUFFER_SIZE));
volatile size_t item_size;
std::atomic<int> item; // Index into the data array
volatile int n_items = 0;
std::condition_variable complete; // slave -> master signal
std::condition_variable ready; // master -> slave signal
std::mutex complete_m;
std::mutex ready_m;
bool store_values = false;
volatile bool running = false; // there is work to do in the batch
volatile bool done = false; // there is no work left to do
JobBatch();
} batch;
How do I make sure that all the necessary reads and writes to batch.func get synchronized properly between threads?
Just in case it matters: I'm using Visual Studio and compiling an x64 Debug Windows executable. Intel i5, Windows 10, 8GB RAM.
So I did a little reading on the C++ memory model and I managed to hack together a solution using atomic_thread_fence. Everything is probably super broken because I'm crazy and shouldn't roll my own system here, but hey, it's fun to learn!
Basically, whenever you're done writing things that you want other threads to see, you need to call atomic_thread_fence(std::memory_order_release)
On the receiving thread(s), you call atomic_thread_fence(std::memory_order_acquire) before reading shared data.
In my case, release should be done immediately before waiting on a condition variable and acquire should be done immediately before using data written by other threads.
This ensures that the writes on one thread are seen by the others.
I'm no expert, so this is probably not the right way to tackle the problem and will likely be faced with certain doom. For instance, I still have a deadlock/livelock problem to sort out.
tl;dr: it's not exactly a cache thing: threads may not have their data totally in sync with each other unless you enforce that with atomic memory fences.

c++11 shared_ptr using in multi-threads

Recently I'm thinking a high performance event-driven multi-threads framework using c++11. And it mainly takes c++11 facilities such as std::thread, std::condition_variable, std::mutex, std::shared_ptr etc into consideration. In general, this framework has three basic components: job, worker and streamline, well, it seems to be a real factory. When user construct his business model in server end, he just needs to consider the data and its processor. Once the model is established, user only needs to construct data class inherited job and processor class inherited worker.
For example:
class Data : public job {};
class Processsor : public worker {};
When server get data, it just new a Data object through auto data = std::make_shared<Data>() in the data source callback thread and call the streamline. job_dispatch to transfer the processor and data to other thread. Of course user doesn't have to think to free memory. The streamline. job_dispatch mainly do below stuff:
void evd_thread_pool::job_dispatch(std::shared_ptr<evd_thread_job> job) {
auto task = std::make_shared<evd_task_wrap>(job);
task->worker = streamline.worker;
// worker has been registered in streamline first of all
{
std::unique_lock<std::mutex> lck(streamline.mutex);
streamline.task_list.push_back(std::move(task));
}
streamline.cv.notify_all();
}
The evd_task_wrap used in the job_dispatch defined as:
struct evd_task_wrap {
std::shared_ptr<evd_thread_job> order;
std::shared_ptr<evd_thread_processor> worker;
evd_task_wrap(std::shared_ptr<evd_thread_job>& o)
:order(o) {}
};
Finally the task_wrap will be dispatched into the processing thread through task_list that is a std::list object. And the processing thread mainly do the stuff as:
void evd_factory_impl::thread_proc() {
std::shared_ptr<evd_task_wrap> wrap = nullptr;
while (true) {
{
std::unique_lock<std::mutex> lck(streamline.mutex);
if (streamline.task_list.empty())
streamline.cv.wait(lck,
[&]()->bool{return !streamline.task_list.empty();});
wrap = std::move(streamline.task_list.front());
streamline.task_list.pop_front();
}
if (-1 == wrap->order->get_type())
break;
wrap->worker->process_task(wrap->order);
wrap.reset();
}
}
But I don't know why the process will often crash in the thread_proc function. And the coredump prompt that sometimes the wrap is a empty shared_ptr or segment fault happened in _Sp_counted_ptr_inplace::_M_dispose that is called in wrap.reset(). And I supposed the shared_ptr has the thread synchronous problem in this scenario while I know the control block in shared_ptr is thread-safety. And of course the shared_ptr in job_dispatch and thread_proc is different shared_ptr object even though they point to the same storage. Does anyone has more specific suggestion on how to solve this problem? Or if there exists similar lightweight framework with automatic memory management using c++11
The example of process_task such as:
void log_handle::process_task(std::shared_ptr<crx::evd_thread_job> job) {
auto j = std::dynamic_pointer_cast<log_job>(job);
j->log->Printf(0, j->print_str.c_str());
write(STDOUT_FILENO, j->print_str.c_str(), j->print_str.size());
}
class log_factory {
public:
log_factory(const std::string& name);
virtual ~log_factory();
void print_ts(const char *format, ...) { //here dispatch the job
char log_buf[4096] = {0};
va_list args;
va_start(args, format);
vsprintf(log_buf, format, args);
va_end(args);
auto job = std::make_shared<log_job>(log_buf, &m_log);
m_log_th.job_dispatch(job);
}
public:
E15_Log m_log;
std::shared_ptr<log_handle> m_log_handle;
crx::evd_thread_pool m_log_th;
};
I detected a problem in your code, which may or may not be related:
You use notify_all from your condition variable. That will awaken ALL threads from sleep. It is OK if you wrap your wait in a while loop, like:
while (streamline.task_list.empty())
streamline.cv.wait(lck, [&]()->bool{return !streamline.task_list.empty();});
But since you are using an if, all threads leave the wait. If you dispatch a single product and having several consumer threads, all but one thread will call wrap = std::move(streamline.task_list.front()); while the tasklist is empty and cause UB.

set flag in signal handler

In C++11, what is the safest (and perferrably most efficient) way to execute unsafe code on a signal being caught, given a type of request-loop (as part of a web request loop)? For example, on catching a SIGUSR1 from a linux command line: kill -30 <process pid>
It is acceptable for the 'unsafe code' to be run on the next request being fired, and no information is lost if the signal is fired multiple times before the unsafe code is run.
For example, my current code is:
static bool globalFlag = false;
void signalHandler(int sig_num, siginfo_t * info, void * context) {
globalFlag = true;
}
void doUnsafeThings() {
// thigns like std::vector push_back, new char[1024], set global vars, etc.
}
void doRegularThings() {
// read filesystem, read global variables, etc.
}
void main(void) {
// set up signal handler (for SIGUSR1) ...
struct sigaction sigact;
sigact.sa_sigaction = onSyncSignal;
sigact.sa_flags = SA_RESTART | SA_SIGINFO;
sigaction(SIGUSR1, &sigact, (struct sigaction *)NULL);
// main loop ...
while(acceptMoreRequests()) { // blocks until new request received
if (globalFlag) {
globalFlag = false;
doUnsafeThings();
}
doRegularThings();
}
}
where I know there could be problems in the main loop testing+setting the globalFlag boolean.
Edit: The if (globalFlag) test will be run in a fairly tight loop, and an 'occasional' false negative is acceptable. However, I suspect there's no optimisation over Basile Starynkevitch's solution anyway?
You should declare your flag
static volatile sig_atomic_t globalFlag = 0;
See e.g. sig_atomic_t, this question and don't forget the volatile qualifier. (It may have been spelled sigatomic_t for C).
On Linux (specifically) you could use signalfd(2) to get a filedescriptor for the signal, and that fd can be poll(2)-ed by your event loop.
Some event loop libraries (libevent, libev ...) know how to handle signals.
And there is also the trick of setting up a pipe (see pipe(2) and pipe(7) for more) at initialization, and just write(2)-ing some byte on it in the signal handler. The event loop would poll and read that pipe. Such a trick is recommended by Qt.
Read also signal(7) and signal-safety(7) (it explains what are the limited set of functions or syscalls usable inside a signal handler)....
BTW, correctness is more important than efficiency. In general, you get few signals (e.g. most programs get a signal once every second at most, not every millisecond).

c++ winapi threads

These days I'm trying to learn more things about threads in windows. I thought about making this practical application:
Let's say there are several threads started when a button "Start" is pressed. Assume these threads are intensive (they keep running / have always something to work on).
This app would also have a "Stop" button. When this button is pressed all the threads should close in a nice way: free resources and abandon work and return the state they were before the "Start" button was pressed.
Another request of the app is that the functions runned by the threads shouldn't contain any instruction checking if the "Stop" button was pressed. The function running in the thread shouldn't care about the stop button.
Language: C++
OS: Windows
Problems:
WrapperFunc(function, param)
{
// what to write here ?
// if i write this:
function(param);
// i cannot stop the function from executing
}
How should I construct the wrapper function so that I can stop the thread properly?
( without using TerminateThread or some other functions )
What if the programmer allocates some memory dynamically? How can I free it before closing
the thread?( note that when I press "Stop button" the thread is still processing data)
I though about overloading the new operator or just imposing the usage of a predefined
function to be used when allocating memory dynamically. This, however, means
that the programmer who uses this api is constrained and it's not what I want.
Thank you
Edit: Skeleton to describe the functionality I'd like to achieve.
struct wrapper_data
{
void* (*function)(LPVOID);
LPVOID *params;
};
/*
this function should make sure that the threads stop properly
( free memory allocated dynamically etc )
*/
void* WrapperFunc(LPVOID *arg)
{
wrapper_data *data = (wrapper_data*) arg;
// what to write here ?
// if i write this:
data->function(data->params);
// i cannot stop the function from executing
delete data;
}
// will have exactly the same arguments as CreateThread
MyCreateThread(..., function, params, ...)
{
// this should create a thread that runs the wrapper function
wrapper_data *data = new wrapper_data;
data->function = function;
data->params = params;
CreateThread(..., WrapperFunc, (LPVOID) wrapper_data, ...);
}
thread_function(LPVOID *data)
{
while(1)
{
//do stuff
}
}
// as you can see I want it to be completely invisible
// to the programmer who uses this
MyCreateThread(..., thread_function, (LPVOID) params,...);
One solution is to have some kind of signal that tells the threads to stop working. Often this can be a global boolean variable that is normally false but when set to true it tells the threads to stop. As for the cleaning up, do it when the threads main loop is done before returning from the thread.
I.e. something like this:
volatile bool gStopThreads = false; // Defaults to false, threads should not stop
void thread_function()
{
while (!gStopThreads)
{
// Do some stuff
}
// All processing done, clean up after my self here
}
As for the cleaning up bit, if you keep the data inside a struct or a class, you can forcibly kill them from outside the threads and just either delete the instances if you allocated them dynamically or let the system handle it if created e.g. on the stack or as global objects. Of course, all data your thread allocates (including files, sockets etc.) must be placed in this structure or class.
A way of keeping the stopping functionality in the wrapper, is to have the actual main loop in the wrapper, together with the check for the stop-signal. Then in the main loop just call a doStuff-like function that does the actual processing. However, if it contains operations that might take time, you end up with the first problem again.
See my answer to this similar question:
How do I guarantee fast shutdown of my win32 app?
Basically, you can use QueueUserAPC to queue a proc which throws an exception. The exception should bubble all the way up to a 'catch' in your thread proc.
As long as any libraries you're using are reasonably exception-aware and use RAII, this works remarkably well. I haven't successfully got this working with boost::threads however, as it's doesn't put suspended threads into an alertable wait state, so QueueUserAPC can't wake them.
If you don't want the "programmer" of the function that the thread will execute deal with the "stop" event, make the thread execute a function of "you" that deals with the "stop" event and when that event isn't signaled executes the "programmer" function...
In other words the "while(!event)" will be in a function that calls the "job" function.
Code Sample.
typedef void (*JobFunction)(LPVOID params); // The prototype of the function to execute inside the thread
struct structFunctionParams
{
int iCounter;
structFunctionParams()
{
iCounter = 0;
}
};
struct structJobParams
{
bool bStop;
JobFunction pFunction;
LPVOID pFunctionParams;
structJobParams()
{
bStop = false;
pFunction = NULL;
pFunctionParams = NULL;
}
};
DWORD WINAPI ThreadProcessJob(IN LPVOID pParams)
{
structJobParams* pJobParams = (structJobParams*)pParams;
while(!pJobParams->bStop)
{
// Execute the "programmer" function
pJobParams->pFunction(pJobParams->pFunctionParams);
}
return 0;
}
void ThreadFunction(LPVOID pParams)
{
// Do Something....
((structFunctionParams*)pParams)->iCounter ++;
}
int _tmain(int argc, _TCHAR* argv[])
{
structFunctionParams stFunctionParams;
structJobParams stJobParams;
stJobParams.pFunction = &ThreadFunction;
stJobParams.pFunctionParams = &stFunctionParams;
DWORD dwIdThread = 0;
HANDLE hThread = CreateThread(
NULL,
0,
ThreadProcessJob,
(LPVOID) &stJobParams, 0, &dwIdThread);
if(hThread)
{
// Give it 5 seconds to work
Sleep(5000);
stJobParams.bStop = true; // Signal to Stop
WaitForSingleObject(hThread, INFINITE); // Wait to finish
CloseHandle(hThread);
}
}

How to pause a pthread ANY TIME I want?

recently I set out to port ucos-ii to Ubuntu PC.
As we know, it's not possible to simulate the "process" in the ucos-ii by simply adding a flag in "while" loop in the pthread's call-back function to perform pause and resume(like the solution below). Because the "process" in ucos-ii can be paused or resumed at any time!
How to sleep or pause a PThread in c on Linux
I have found one solution on the web-site below, but it can't be built because it's out of date. It uses the process in Linux to simulate the task(acts like the process in our Linux) in ucos-ii.
http://www2.hs-esslingen.de/~zimmerma/software/index_uk.html
If pthread can act like the process which can be paused and resumed at any time, please tell me some related functions, I can figure it out myself. If it can't, I think I should focus on the older solution. Thanks a lot.
The Modula-3 garbage collector needs to suspend pthreads at an arbitrary time, not just when they are waiting on a condition variable or mutex. It does it by registering a (Unix) signal handler that suspends the thread and then using pthread_kill to send a signal to the target thread. I think it works (it has been reliable for others but I'm debugging an issue with it right now...) It's a bit kludgy, though....
Google for ThreadPThread.m3 and look at the routines "StopWorld" and "StartWorld". Handler itself is in ThreadPThreadC.c.
If stopping at specific points with a condition variable is insufficient, then you can't do this with pthreads. The pthread interface does not include suspend/resume functionality.
See, for example, answer E.4 here:
The POSIX standard provides no mechanism by which a thread A can suspend the execution of another thread B, without cooperation from B. The only way to implement a suspend/restart mechanism is to have B check periodically some global variable for a suspend request and then suspend itself on a condition variable, which another thread can signal later to restart B.
That FAQ answer goes on to describe a couple of non-standard ways of doing it, one in Solaris and one in LinuxThreads (which is now obsolete; do not confuse it with current threading on Linux); neither of those apply to your situation.
On Linux you can probably setup custom signal handler (eg. using signal()) that will contain wait for another signal (eg. using sigsuspend()). You then send the signals using pthread_kill() or tgkill(). It is important to use so-called "realtime signals" for this, because normal signals like SIGUSR1 and SIGUSR2 don't get queued, which means that they can get lost under high load conditions. You send a signal several times, but it gets received only once, because before while signal handler is running, new signals of the same kind are ignored. So if you have concurent threads doing PAUSE/RESUME , you can loose RESUME event and cause deadlock. On the other hand, the pending realtime signals (like SIGRTMIN+1 and SIGRTMIN+2) are not deduplicated, so there can be several same rt signals in queue at the same time.
DISCLAIMER: I had not tried this yet. But in theory it should work.
Also see man 7 signal-safety. There is a list of calls that you can safely call in signal handlers. Fortunately sigsuspend() seems to be one of them.
UPDATE: I have working code right here:
//Filename: pthread_pause.c
//Author: Tomas 'Harvie' Mudrunka 2021
//Build: CFLAGS=-lpthread make pthread_pause; ./pthread_pause
//Test: valgrind --tool=helgrind ./pthread_pause
//I've wrote this code as excercise to solve following stack overflow question:
// https://stackoverflow.com/questions/9397068/how-to-pause-a-pthread-any-time-i-want/68119116#68119116
#define _GNU_SOURCE //pthread_yield() needs this
#include <signal.h>
#include <pthread.h>
//#include <pthread_extra.h>
#include <semaphore.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>
#include <errno.h>
#include <sys/resource.h>
#include <time.h>
#define PTHREAD_XSIG_STOP (SIGRTMIN+0)
#define PTHREAD_XSIG_CONT (SIGRTMIN+1)
#define PTHREAD_XSIGRTMIN (SIGRTMIN+2) //First unused RT signal
pthread_t main_thread;
sem_t pthread_pause_sem;
pthread_once_t pthread_pause_once_ctrl = PTHREAD_ONCE_INIT;
void pthread_pause_once(void) {
sem_init(&pthread_pause_sem, 0, 1);
}
#define pthread_pause_init() (pthread_once(&pthread_pause_once_ctrl, &pthread_pause_once))
#define NSEC_PER_SEC (1000*1000*1000)
// timespec_normalise() from https://github.com/solemnwarning/timespec/
struct timespec timespec_normalise(struct timespec ts)
{
while(ts.tv_nsec >= NSEC_PER_SEC) {
++(ts.tv_sec); ts.tv_nsec -= NSEC_PER_SEC;
}
while(ts.tv_nsec <= -NSEC_PER_SEC) {
--(ts.tv_sec); ts.tv_nsec += NSEC_PER_SEC;
}
if(ts.tv_nsec < 0) { // Negative nanoseconds isn't valid according to POSIX.
--(ts.tv_sec); ts.tv_nsec = (NSEC_PER_SEC + ts.tv_nsec);
}
return ts;
}
void pthread_nanosleep(struct timespec t) {
//Sleep calls on Linux get interrupted by signals, causing premature wake
//Pthread (un)pause is built using signals
//Therefore we need self-restarting sleep implementation
//IO timeouts are restarted by SA_RESTART, but sleeps do need explicit restart
//We also need to sleep using absolute time, because relative time is paused
//You should use this in any thread that gets (un)paused
struct timespec wake;
clock_gettime(CLOCK_MONOTONIC, &wake);
t = timespec_normalise(t);
wake.tv_sec += t.tv_sec;
wake.tv_nsec += t.tv_nsec;
wake = timespec_normalise(wake);
while(clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &wake, NULL)) if(errno!=EINTR) break;
return;
}
void pthread_nsleep(time_t s, long ns) {
struct timespec t;
t.tv_sec = s;
t.tv_nsec = ns;
pthread_nanosleep(t);
}
void pthread_sleep(time_t s) {
pthread_nsleep(s, 0);
}
void pthread_pause_yield() {
//Call this to give other threads chance to run
//Wait until last (un)pause action gets finished
sem_wait(&pthread_pause_sem);
sem_post(&pthread_pause_sem);
//usleep(0);
//nanosleep(&((const struct timespec){.tv_sec=0,.tv_nsec=1}), NULL);
//pthread_nsleep(0,1); //pthread_yield() is not enough, so we use sleep
pthread_yield();
}
void pthread_pause_handler(int signal) {
//Do nothing when there are more signals pending (to cleanup the queue)
//This is no longer needed, since we use semaphore to limit pending signals
/*
sigset_t pending;
sigpending(&pending);
if(sigismember(&pending, PTHREAD_XSIG_STOP)) return;
if(sigismember(&pending, PTHREAD_XSIG_CONT)) return;
*/
//Post semaphore to confirm that signal is handled
sem_post(&pthread_pause_sem);
//Suspend if needed
if(signal == PTHREAD_XSIG_STOP) {
sigset_t sigset;
sigfillset(&sigset);
sigdelset(&sigset, PTHREAD_XSIG_STOP);
sigdelset(&sigset, PTHREAD_XSIG_CONT);
sigsuspend(&sigset); //Wait for next signal
} else return;
}
void pthread_pause_enable() {
//Having signal queue too deep might not be necessary
//It can be limited using RLIMIT_SIGPENDING
//You can get runtime SigQ stats using following command:
//grep -i sig /proc/$(pgrep binary)/status
//This is no longer needed, since we use semaphores
//struct rlimit sigq = {.rlim_cur = 32, .rlim_max=32};
//setrlimit(RLIMIT_SIGPENDING, &sigq);
pthread_pause_init();
//Prepare sigset
sigset_t sigset;
sigemptyset(&sigset);
sigaddset(&sigset, PTHREAD_XSIG_STOP);
sigaddset(&sigset, PTHREAD_XSIG_CONT);
//Register signal handlers
//signal(PTHREAD_XSIG_STOP, pthread_pause_handler);
//signal(PTHREAD_XSIG_CONT, pthread_pause_handler);
//We now use sigaction() instead of signal(), because it supports SA_RESTART
const struct sigaction pause_sa = {
.sa_handler = pthread_pause_handler,
.sa_mask = sigset,
.sa_flags = SA_RESTART,
.sa_restorer = NULL
};
sigaction(PTHREAD_XSIG_STOP, &pause_sa, NULL);
sigaction(PTHREAD_XSIG_CONT, &pause_sa, NULL);
//UnBlock signals
pthread_sigmask(SIG_UNBLOCK, &sigset, NULL);
}
void pthread_pause_disable() {
//This is important for when you want to do some signal unsafe stuff
//Eg.: locking mutex, calling printf() which has internal mutex, etc...
//After unlocking mutex, you can enable pause again.
pthread_pause_init();
//Make sure all signals are dispatched before we block them
sem_wait(&pthread_pause_sem);
//Block signals
sigset_t sigset;
sigemptyset(&sigset);
sigaddset(&sigset, PTHREAD_XSIG_STOP);
sigaddset(&sigset, PTHREAD_XSIG_CONT);
pthread_sigmask(SIG_BLOCK, &sigset, NULL);
sem_post(&pthread_pause_sem);
}
int pthread_pause(pthread_t thread) {
sem_wait(&pthread_pause_sem);
//If signal queue is full, we keep retrying
while(pthread_kill(thread, PTHREAD_XSIG_STOP) == EAGAIN) usleep(1000);
pthread_pause_yield();
return 0;
}
int pthread_unpause(pthread_t thread) {
sem_wait(&pthread_pause_sem);
//If signal queue is full, we keep retrying
while(pthread_kill(thread, PTHREAD_XSIG_CONT) == EAGAIN) usleep(1000);
pthread_pause_yield();
return 0;
}
void *thread_test() {
//Whole process dies if you kill thread immediately before it is pausable
//pthread_pause_enable();
while(1) {
//Printf() is not async signal safe (because it holds internal mutex),
//you should call it only with pause disabled!
//Will throw helgrind warnings anyway, not sure why...
//See: man 7 signal-safety
pthread_pause_disable();
printf("Running!\n");
pthread_pause_enable();
//Pausing main thread should not cause deadlock
//We pause main thread here just to test it is OK
pthread_pause(main_thread);
//pthread_nsleep(0, 1000*1000);
pthread_unpause(main_thread);
//Wait for a while
//pthread_nsleep(0, 1000*1000*100);
pthread_unpause(main_thread);
}
}
int main() {
pthread_t t;
main_thread = pthread_self();
pthread_pause_enable(); //Will get inherited by all threads from now on
//you need to call pthread_pause_enable (or disable) before creating threads,
//otherwise first (un)pause signal will kill whole process
pthread_create(&t, NULL, thread_test, NULL);
while(1) {
pthread_pause(t);
printf("PAUSED\n");
pthread_sleep(3);
printf("UNPAUSED\n");
pthread_unpause(t);
pthread_sleep(1);
/*
pthread_pause_disable();
printf("RUNNING!\n");
pthread_pause_enable();
*/
pthread_pause(t);
pthread_unpause(t);
}
pthread_join(t, NULL);
printf("DIEDED!\n");
}
I am also working on library called "pthread_extra", which will have stuff like this and much more. Will publish soon.
UPDATE2: This is still causing deadlocks when calling pause/unpause rapidly (removed sleep() calls). Printf() implementation in glibc has mutex, so if you suspend thread which is in middle of printf() and then want to printf() from your thread which plans to unpause that thread later, it will never happen, because printf() is locked. Unfortunately i've removed the printf() and only run empty while loop in the thread, but i still get deadlocks under high pause/unpause rates. and i don't know why. Maybe (even realtime) Linux signals are not 100% safe. There is realtime signal queue, maybe it just overflows or something...
UPDATE3: i think i've managed to fix the deadlock, but had to completely rewrite most of the code. Now i have one (sig_atomic_t) variable per each thread which holds state whether that thread should be running or not. Works kinda like condition variable. pthread_(un)pause() transparently remembers this for each thread. I don't have two signals. now i only have one signal. handler of that signal looks at that variable and only blocks on sigsuspend() when that variable says the thread should NOT run. otherwise it returns from signal handler. in order to suspend/resume the thread i now set the sig_atomic_t variable to desired state and call that signal (which is common for both suspend and resume). It is important to use realtime signals to be sure handler will actualy run after you've modified the state variable. Code is bit complex because of the thread status database. I will share the code in separate solution as soon as i manage to simplify it enough. But i want to preserve the two signal version in here, because it kinda works, i like the simplicity and maybe people will give us more insight on how to optimize it.
UPDATE4: I've fixed the deadlock in original code (no need for helper variable holding the status) by using single handler for two signals and optimizing signal queue a bit. There is still some problem with printf() shown by helgrind, but it is not caused by my signals, it happens even when i do not call pause/unpause at all. Overall this was only tested on LINUX, not sure how portable the code is, because there seem to be some undocumented behaviour of signal handlers which was originaly causing the deadlock.
Please note that pause/unpause cannot be nested. if you pause 3 times, and unpause 1 time, the thread WILL RUN. If you need such behaviour, you should create some kind of wrapper which will count the nesting levels and signal the thread accordingly.
UPDATE5: I've improved robustness of the code by following changes: I ensure proper serialization of pause/unpause calls by use of semaphores. This hopefuly fixes last remaining deadlocks. Now you can be sure that when pause call returns, the target thread is actualy already paused. This also solves issues with signal queue overflowing. Also i've added SA_RESTART flag, which prevents internal signals from causing interuption of IO waits. Sleeps/delays still have to be restarted manualy, but i provide convenient wrapper called pthread_nanosleep() which does just that.
UPDATE6: i realized that simply restarting nanosleep() is not enough, because that way timeout does not run when thread is paused. Therefore i've modified pthread_nanosleep() to convert timeout interval to absolute time point in the future and sleep until that. Also i've hidden semaphore initialization, so user does not need to do that.
Here is example of thread function within a class with pause/resume functionality...
class SomeClass
{
public:
// ... construction/destruction
void Resume();
void Pause();
void Stop();
private:
static void* ThreadFunc(void* pParam);
pthread_t thread;
pthread_mutex_t mutex;
pthread_cond_t cond_var;
int command;
};
SomeClass::SomeClass()
{
pthread_mutex_init(&mutex, NULL);
pthread_cond_init(&cond_var, NULL);
// create thread in suspended state..
command = 0;
pthread_create(&thread, NULL, ThreadFunc, this);
}
SomeClass::~SomeClass()
{
// we should stop the thread and exit ThreadFunc before calling of blocking pthread_join function
// also it prevents the mutex staying locked..
Stop();
pthread_join(thread, NULL);
pthread_cond_destroy(&cond_var);
pthread_mutex_destroy(&mutex);
}
void* SomeClass::ThreadFunc(void* pParam)
{
SomeClass* pThis = (SomeClass*)pParam;
timespec time_ns = {0, 50*1000*1000}; // 50 milliseconds
while(1)
{
pthread_mutex_lock(&pThis->mutex);
if (pThis->command == 2) // command to stop thread..
{
// be sure to unlock mutex before exit..
pthread_mutex_unlock(&pThis->mutex);
return NULL;
}
else if (pThis->command == 0) // command to pause thread..
{
pthread_cond_wait(&pThis->cond_var, &pThis->mutex);
// dont forget to unlock the mutex..
pthread_mutex_unlock(&pThis->mutex);
continue;
}
if (pThis->command == 1) // command to run..
{
// normal runing process..
fprintf(stderr, "*");
}
pthread_mutex_unlock(&pThis->mutex);
// it's important to give main thread few time after unlock 'this'
pthread_yield();
// ... or...
//nanosleep(&time_ns, NULL);
}
pthread_exit(NULL);
}
void SomeClass::Stop()
{
pthread_mutex_lock(&mutex);
command = 2;
pthread_cond_signal(&cond_var);
pthread_mutex_unlock(&mutex);
}
void SomeClass::Pause()
{
pthread_mutex_lock(&mutex);
command = 0;
// in pause command we dont need to signal cond_var because we not in wait state now..
pthread_mutex_unlock(&mutex);
}
void SomeClass::Resume()
{
pthread_mutex_lock(&mutex);
command = 1;
pthread_cond_signal(&cond_var);
pthread_mutex_unlock(&mutex);
}