I'm working in C++ on Unix.
Say I have a long running function that does something, for example read stuff from file and parse it. In this function I keep count of the things I read from the file in a local variable num_read.
I want to catch CTRL+c in a custom signal handler and print the value of num_read.
The only way I can think of is allocating num_read on the heap and storing its adress in a global variable that can be accessed by my signal handler. Is there a more elegant way?
The answer is no. There is no way of communicating between
a signal handler and the rest of code except by global
variables.
Also, you can only do a very, very limited number of things in
a signal handler. You cannot use a << on an std::ostream,
for example, nor can you call printf. The usual way of
handling signals under Unix is to catch them in a separate
thread. The alternative (which works for other OS's as well) is
to define a global variable of sig_atomic_t, which is set in
the signal handler, and polled in the main loop. (In your case,
for example, you might poll it every time you update
num_read.)
Except the traditional Unix way with signal handlers, there is other:
since Linux kernel 2.6.22 there is signalfd() function present. You may obtain a usual file descriptor and poll it (using select or epoll) for incoming signals. So when you handle a signal there is no any limitations proper to them -- it's just usual userspace code, so you can call whatever you want...
as far as I know for OS X, there is similar feature present in kqueue (search this site or internet for EVFILT_SIGNAL and kqueue)
Related
I searched a lot but none answered my question, I read that it's not safe to use cout in signal handlers like this:
void ctrlZHandler(int sig_num) {
//SIGTSTP-18
std::cout << "smash: got ctrl-Z" << std::endl;
SmallShell::route_signal(sig_num);
}
will it solve the problem if I move the printing inside route_signal?
Is there a lists of safe-to-call functions in C++11?
What if the only solution to use write, can you show me short example, and let's say route_signal have 100 printings should I replace all with write()? that sounds exhausting with the need to allocate memory and free...
The reason why using std::cout inside signal handlers isn't recommented is because signals might interrupt your running code whenever and std::cout::operator << is not reentrant.
This means if you are executing std::cout::operator << when a signal is raised that also uses it within it's execution, the result is undefined.
So, no. Moving it into route_signal would not solve this and you should replace every call of std::cout within!
One workaround would be to set a flag that this signal was received and create a output outside the signal handler after it returned.
Signal handlers need to run quickly and be reentrant, which is why they shouldn’t call output stream functions like cout <<, either directly or indirectly.
If you are doing this temporarily under controlled conditions for testing, it might be okay, but make sure the signal you are handling is not triggered again until the handler has finished and be aware that stream functions can be slow, which might mess up your tests as well.
will it solve the problem if I move the printing inside route_signal?
No.
Is there a lists of safe-to-call functions in C++11?
For practical purposes, the only safe thing you can do is set a volatile sig_atomic_t or lock-free atomic flag inside a signal handler. (N3690 intro.execution §1.9 ¶6)
I'm no C nor C++ language lawyer, but I believe anything permitted in a conforming C application is allowed in a C++11 signal handler. However, that set is very, very limited: abort, quick_exit, _Exit, and signal. (ISO/IEC 9899:2011 §7.14.1.1 ¶5).
What if the only solution to use write, can you show me short example, and let's say route_signal have 100 printings should I replace all with write()? that sounds exhausting with the need to allocate memory and free...
A better solution is to redesign your program to use sigwait or to check that a flag safely set inside the signal handler.
If you insist on using write, and if you trust that it is safe to call inside a signal handler in your C++ implementation — which it probably is but, again, is not guaranteed by C++ itself — then you simply have a coding problem. You'll need to figure out formatting yourself, bearing in mind that even on POSIX-conforming systems malloc and free are not async-signal-safe. It can certainly be done.
I am still a little confused as to why exactly it is unsafe to receive a signal and call a non async safe function from within that signal handler. Could someone explain the reasoning behind this and possibly try and give me some references that I can follow to read up more on this myself?
In other words I am asking why it is unsafe to say call printf from within a signal handler. Is it because of intra-process issues and possible race conditions resulting from two possible calls to printf without protection or is it because of inter process races to the same resource (in this example stdout). Say a thread within process A is calling printf and another thread receives the signal and then calls printf. Is it possibly because the kernel here will not know what to do because it will not be able to distinguish between the two calls.
Say a thread within process A is calling printf and another thread
receives the signal and then calls printf. Is it possibly because the
kernel here will not know what to do because it will not be able to
distinguish between the two calls.
It's not the kernel that will have issues. It's your application itself. printf is not a kernel function. It's a function in the C library, that your application uses. printf is actually a fairly complicated function. It supports a wide variety of output formatting.
The end result of this formatting is a formatted output string that's written to standard output. That process in and of itself also involves some work. The formatted output string gets written into the internal stdout file handle's output buffer. The output buffer gets flushed (and only at this point the kernel takes over and writes a defined chunk of data to a file) whenever certain defined conditions occur, namely when the output buffer is full, and/or whenever a newline character gets written to the output stream.
All of that is supported by the output buffer's internal data structures, which you don't have to worry about because it's the C library's job. Now, a signal can arrive at any point while printf does its work. And I mean, at any time. It might very well arrive while printf is in the middle of updating the output buffer's internal data structure, and they're in a temporarily inconsistent state because printf hasn't yet finished updating it.
Example: on modern C/C++ implementations, printf may not be signal-safe, but it is thread safe. Multiple threads can use printf to write to standard output. It's the threads' responsibility to coordinate this process amongst themselves, to make sure that the eventual output actually makes sense, and it's not jumbled up, at random, from multiple threads' output, but that's beside the point.
The point is that printf is thread safe, and that typically means that somewhere there's a mutex involved in the process. So, the sequence of events that might occur is:
printf acquires the internal mutex.
printf proceeds with its work with formatting the string and writing it to stdout's output buffer.
before printf is done, and can release the acquired mutex, a signal arrives.
Now, the internal mutex is locked. The thing about signal handlers is that it's generally not specified which thread, in a process, gets to handle the signal. A given implementation might pick a thread at random, or it might always pick the thread that's currently running. In any case, it can certainly pick the thread that has locked the printf, here, in order to handle the signal.
So now, your signal handler runs, and it also decides to call printf. Because printf's internal mutex is locked, the thread has to wait for the mutex to get unlocked.
And wait.
And wait.
Because, if you were keeping track of things: the mutex is locked by the thread that was interrupted to service the signal. The mutex won't get unlocked until the thread resumes running. But that won't happen until the signal handler terminates, and the thread resumes running, but the signal handler is now waiting for the mutex to get unlocked.
You're boned.
Now, of course, printf might use the C++ equivalent of std::recursive_mutex, to avoid this problem, but even this won't solve all possible deadlocks that could get introduced by a signal.
To summarize, the reason why it's "unsafe to receive a signal and call a non async safe function from within that signal handler" is because it's not, by definition. It's not safe to call a non-async safe function from within the signal handler" because the signal is an asynchronous event, and since it's not an async-safe function, you can't, by definition. Water is wet because it's water, and an async-unsafe function cannot be called from an asynchronous signal handler.
I am still a little confused as to why exactly it is unsafe to receive a signal and call a non async safe function from within that signal handler. Could someone explain the reasoning behind this and possibly try and give me some references that I can follow to read up more on this myself?
In other words I am asking why it is unsafe to say call printf from within a signal handler. Is it because of intra-process issues and possible race conditions resulting from two possible calls to printf without protection or is it because of inter process races to the same resource (in this example stdout). Say a thread within process A is calling printf and another thread receives the signal and then calls printf. Is it possibly because the kernel here will not know what to do because it will not be able to distinguish between the two calls.
Say a thread within process A is calling printf and another thread
receives the signal and then calls printf. Is it possibly because the
kernel here will not know what to do because it will not be able to
distinguish between the two calls.
It's not the kernel that will have issues. It's your application itself. printf is not a kernel function. It's a function in the C library, that your application uses. printf is actually a fairly complicated function. It supports a wide variety of output formatting.
The end result of this formatting is a formatted output string that's written to standard output. That process in and of itself also involves some work. The formatted output string gets written into the internal stdout file handle's output buffer. The output buffer gets flushed (and only at this point the kernel takes over and writes a defined chunk of data to a file) whenever certain defined conditions occur, namely when the output buffer is full, and/or whenever a newline character gets written to the output stream.
All of that is supported by the output buffer's internal data structures, which you don't have to worry about because it's the C library's job. Now, a signal can arrive at any point while printf does its work. And I mean, at any time. It might very well arrive while printf is in the middle of updating the output buffer's internal data structure, and they're in a temporarily inconsistent state because printf hasn't yet finished updating it.
Example: on modern C/C++ implementations, printf may not be signal-safe, but it is thread safe. Multiple threads can use printf to write to standard output. It's the threads' responsibility to coordinate this process amongst themselves, to make sure that the eventual output actually makes sense, and it's not jumbled up, at random, from multiple threads' output, but that's beside the point.
The point is that printf is thread safe, and that typically means that somewhere there's a mutex involved in the process. So, the sequence of events that might occur is:
printf acquires the internal mutex.
printf proceeds with its work with formatting the string and writing it to stdout's output buffer.
before printf is done, and can release the acquired mutex, a signal arrives.
Now, the internal mutex is locked. The thing about signal handlers is that it's generally not specified which thread, in a process, gets to handle the signal. A given implementation might pick a thread at random, or it might always pick the thread that's currently running. In any case, it can certainly pick the thread that has locked the printf, here, in order to handle the signal.
So now, your signal handler runs, and it also decides to call printf. Because printf's internal mutex is locked, the thread has to wait for the mutex to get unlocked.
And wait.
And wait.
Because, if you were keeping track of things: the mutex is locked by the thread that was interrupted to service the signal. The mutex won't get unlocked until the thread resumes running. But that won't happen until the signal handler terminates, and the thread resumes running, but the signal handler is now waiting for the mutex to get unlocked.
You're boned.
Now, of course, printf might use the C++ equivalent of std::recursive_mutex, to avoid this problem, but even this won't solve all possible deadlocks that could get introduced by a signal.
To summarize, the reason why it's "unsafe to receive a signal and call a non async safe function from within that signal handler" is because it's not, by definition. It's not safe to call a non-async safe function from within the signal handler" because the signal is an asynchronous event, and since it's not an async-safe function, you can't, by definition. Water is wet because it's water, and an async-unsafe function cannot be called from an asynchronous signal handler.
I am looking some info about reentrancy, then I encountered about signal and thread. What is the difference between the two?
Please advice.
Many thanks.
You are comparing apples and oranges. Signal Programming is event driven programming and can be used to influence threads. However the signal programming paradigm can be used in a single threaded application.
To understand signals it is best to start by thinking about a single threaded program. This program is doing whatever it does with its one thread and then a signal is delivered to it. If the program has registered a signal handler (a function to call) for that signal then the normal execution of that program will be put on hold for a little bit while the signal handler function is called (very much like an hardware interrupt interrupts the operating system to run interrupt service routines) and run the function that the program has registered to handle that signal. So with the code:
#include <stdio.h>
#include <signal.h>
#include <unistd.h> // for alarm
volatile int x = 0;
void foo(int sig_num) {
x = sig_num;
}
int main(void) {
unsigned long long count = 0;
signal(SIGALRM, foo);
alarm(1); // This is a posix function and may not be in all hosted
// C implementations.
// SIGALRM will be sent to this process in 1 second.
while (!x) {
printf("not x\n");
count++;
}
printf("x is %i and count = %llu\n", x, count);
}
The program will loop until someone sends it a signal (how this happens may differ by platform). If the signal SIGALARM is sent then foo will set x and the loop will exit. Exactly where in the loop foo is called is not clear. It could happen just between the print and incrementing the count, just after the while conditional is tested, during the print, ... lots of places, really. This is why signals may pose a concurrency or reentrantcy problem -- they can change things without the other code knowing that it happened.
The reason that x was declared as volatile was that without that many compilers might think "hey, no one in main changes x and main doesn't call any other functions, so x never changes" and optimize out the loop test. Specifying volatile tells the C compiler that this variable can be changed by unseen forces (such as signal handlers, other threads, or sometimes even hardware in the case of memory mapped device control registers).
It was pretty easy to make sure that x was looked out for properly between both the signal handler and the main execution code because x is just an integer (load and store for it were probably single instructions assembly each), it was only altered by the one thing (the signal handler, rather than the main code) in this case, and it was only used as a simple boolean value. If x were some other type, such as a string, then since signals can interrupt you at any time, the signal handler may overwrite part of the string while the main code was in the middle of reading the string. This could have results as bad as someone freezing time while you were in the middle of brushing your teeth, replacing your toothbrush with a cobra, and then unfreezing time.
A little bit more on signals -- they are part of the C language, but most of their use is not really covered by C. Many of the Linux, Unix, and POSIX functions that have to do with signals are not part of the C language, but it is difficult to come up with reasonable (and small) examples of signal use that doesn't rely on something not in the C standard, which is why I used the alarm function. The raise function, which is part of C, can be used to send a signal to yourself, but it is more difficult to make examples for.
As scary as signals may seem now, most systems have more functions that make them much more easy to use.
threads, finally
Threads execute concurrently, while signals interrupt. While there are some threading libraries that actually implement threading in such a way that this is not really the case, it is best to think of threads this way. Since computer programs are actually very limited in their ability to see what is going on threads can get in each others' way just like signal handlers can get in the way of the main execution code (probably more often than signal handlers, though).
Imagine that you are about to brush your teeth again, but this time you are def and blind. Now your roommate, who is also def and blind, comes in to fix the sink with some silicone sealer. Just as you reach for the toothpaste he lays down the tube of silicone right on top of the tube of toothpaste and you grab the tube of silicone instead of the toothpaste. Remember, since you are both blind and def (and somehow not bumping into each other) you both assume that no one else is using the sink, so you never realize that you have just put the silicone on your toothbrush, and your roommate doesn't realize that he is trying to fill the cracks between the tile and the back of the sink with toothpaste.
Luckily there are ways that threads can communicate to each other that something is currently in use so other threads should stay away (like locking the door while you brush your teeth).
Thread lives inside a process whereas signals are part of a universe, and signals have permission to communicate with processes or with specific thread inside a process.
This is a pretty basic scenario but I'm not finding too many helpful resources. I have a C++ program running in Linux that does file processing. Reads lines, does various transformations, writes data into a database. There's certain variables (stored in the database) that affect the processing which I'm currently reading at every iteration because I want processing to be as up to date as possible, but a slight lag is OK. But those variables change pretty rarely, and the reads are expensive over time (10 million plus rows a day). I could space out the reads to every n iterations or simply restart the program when a variable changes, but those seem hackish.
What I would like to do instead is have the program trigger a reread of the variables when it receives a SIGHUP. Everything I'm reading about signal handling is talking about the C signal library which I'm not sure how to tie in to my program's classes. The Boost signal libraries seem to be more about inter-object communication rather than handling OS signals.
Can anybody help? It seems like this should be incredibly simple, but I'm pretty rusty with C++.
I would handle it just like you might handle it in C. I think it's perfectly fine to have a stand-alone signal handler function, since you'll just be posting to a semaphore or setting a variable or some such, which another thread or object can inspect to determine if it needs to re-read the settings.
#include <signal.h>
#include <stdio.h>
/* or you might use a semaphore to notify a waiting thread */
static volatile sig_atomic_t sig_caught = 0;
void handle_sighup(int signum)
{
/* in case we registered this handler for multiple signals */
if (signum == SIGHUP) {
sig_caught = 1;
}
}
int main(int argc, char* argv[])
{
/* you may also prefer sigaction() instead of signal() */
signal(SIGHUP, handle_sighup);
while(1) {
if (sig_caught) {
sig_caught = 0;
printf("caught a SIGHUP. I should re-read settings.\n");
}
}
return 0;
}
You can test sending a SIGHUP by using kill -1 `pidof yourapp`.
I'd recommend checking out this link which gives the details on registering a signal.
Unless I'm mistaken, one important thing to remember is that any function inside an object expects a referent parameter, which means non-static member functions can't be signal handlers. I believe you'll need to register it either to a static member function, or some kind of global function. From there, if you have a specific object function you want to take care of your update, you'll need a way to reference that object.
There are several possibilities; it would not necessarily be overkill to implement all of them:
Respond to a specific signal, just like C does. C++ works the same way. See the documentation for signal().
Trigger on the modification timestamp of some file changing, like the database if it is stored in a flat file.
Trigger once per hour, or once per day (whatever makes sense).
You can define a Boost signal corresponding to the OS signal and tie the Boost signal to your slot to invoke the respective handler.