I am looking some info about reentrancy, then I encountered about signal and thread. What is the difference between the two?
Please advice.
Many thanks.
You are comparing apples and oranges. Signal Programming is event driven programming and can be used to influence threads. However the signal programming paradigm can be used in a single threaded application.
To understand signals it is best to start by thinking about a single threaded program. This program is doing whatever it does with its one thread and then a signal is delivered to it. If the program has registered a signal handler (a function to call) for that signal then the normal execution of that program will be put on hold for a little bit while the signal handler function is called (very much like an hardware interrupt interrupts the operating system to run interrupt service routines) and run the function that the program has registered to handle that signal. So with the code:
#include <stdio.h>
#include <signal.h>
#include <unistd.h> // for alarm
volatile int x = 0;
void foo(int sig_num) {
x = sig_num;
}
int main(void) {
unsigned long long count = 0;
signal(SIGALRM, foo);
alarm(1); // This is a posix function and may not be in all hosted
// C implementations.
// SIGALRM will be sent to this process in 1 second.
while (!x) {
printf("not x\n");
count++;
}
printf("x is %i and count = %llu\n", x, count);
}
The program will loop until someone sends it a signal (how this happens may differ by platform). If the signal SIGALARM is sent then foo will set x and the loop will exit. Exactly where in the loop foo is called is not clear. It could happen just between the print and incrementing the count, just after the while conditional is tested, during the print, ... lots of places, really. This is why signals may pose a concurrency or reentrantcy problem -- they can change things without the other code knowing that it happened.
The reason that x was declared as volatile was that without that many compilers might think "hey, no one in main changes x and main doesn't call any other functions, so x never changes" and optimize out the loop test. Specifying volatile tells the C compiler that this variable can be changed by unseen forces (such as signal handlers, other threads, or sometimes even hardware in the case of memory mapped device control registers).
It was pretty easy to make sure that x was looked out for properly between both the signal handler and the main execution code because x is just an integer (load and store for it were probably single instructions assembly each), it was only altered by the one thing (the signal handler, rather than the main code) in this case, and it was only used as a simple boolean value. If x were some other type, such as a string, then since signals can interrupt you at any time, the signal handler may overwrite part of the string while the main code was in the middle of reading the string. This could have results as bad as someone freezing time while you were in the middle of brushing your teeth, replacing your toothbrush with a cobra, and then unfreezing time.
A little bit more on signals -- they are part of the C language, but most of their use is not really covered by C. Many of the Linux, Unix, and POSIX functions that have to do with signals are not part of the C language, but it is difficult to come up with reasonable (and small) examples of signal use that doesn't rely on something not in the C standard, which is why I used the alarm function. The raise function, which is part of C, can be used to send a signal to yourself, but it is more difficult to make examples for.
As scary as signals may seem now, most systems have more functions that make them much more easy to use.
threads, finally
Threads execute concurrently, while signals interrupt. While there are some threading libraries that actually implement threading in such a way that this is not really the case, it is best to think of threads this way. Since computer programs are actually very limited in their ability to see what is going on threads can get in each others' way just like signal handlers can get in the way of the main execution code (probably more often than signal handlers, though).
Imagine that you are about to brush your teeth again, but this time you are def and blind. Now your roommate, who is also def and blind, comes in to fix the sink with some silicone sealer. Just as you reach for the toothpaste he lays down the tube of silicone right on top of the tube of toothpaste and you grab the tube of silicone instead of the toothpaste. Remember, since you are both blind and def (and somehow not bumping into each other) you both assume that no one else is using the sink, so you never realize that you have just put the silicone on your toothbrush, and your roommate doesn't realize that he is trying to fill the cracks between the tile and the back of the sink with toothpaste.
Luckily there are ways that threads can communicate to each other that something is currently in use so other threads should stay away (like locking the door while you brush your teeth).
Thread lives inside a process whereas signals are part of a universe, and signals have permission to communicate with processes or with specific thread inside a process.
Related
Say I have a function whose prototype looks like this, belonging to class container_class:
std::vector<int> container_class::func(int param);
The function may or may not cause an infinite loop on certain inputs; it is impossible to tell which inputs will cause a success and which will cause an infinite loop. The function is in a library of which I do not have the source of and cannot modify (this is a bug and will be fixed in the next release in a few months, but for now I need a way to work around it), so solutions which modify the function or class will not work.
I've tried isolating the function using std::async and std::future, and using a while loop to constantly check the state of the thread:
container_class c();
long start = get_current_time(); //get the current time in ms
auto future = std::async(&container_class::func, &c, 2);
while(future.wait_for(0ms) != std::future_status::ready) {
if(get_current_time() - start > 1000) {
//forcibly terminate future
}
sleep(2);
}
This code has many problems. One is that I can't forcibly terminate the std::future object (and the thread that it represents).
At the far extreme, if I can't find any other solution, I can isolate the function in its own executable, run it, and then check its state and terminate it appropriately. However, I would rather not do this.
How can I accomplish this? Is there a better way than what I'm doing right now?
You are out of luck, sorry.
First off, C++ doesn't even guarantee you there will be a thread for future execution. Although it would be extremely hard (probably impossible) to implement all std::async guarantees in a single thread, there is no direct prohibition of that, and also, there is certainly no guarantee that there will be a thread per async call. Because of that, there is no way to cancel the async execution.
Second, there is no such way even in the lowest level of thread implementation. While pthread_cancel exists, it won't protect you from infinite loops not visiting cancellation points, for example.
You can not arbitrarily kill a thread in Posix, and C++ thread model is based on it. A process really can't be a scheduler of it's own threads, and while sometimes it is a pain, it is what it is.
I'm presently moving back to C++ from Java. There are some areas of C++ where higher performance can be achieved by doing more computation on the stack.And some recursive algorithms operate more efficiently on the stack than on the heap.
Obviously the stack is a resource, and if I am going to use it, I should ensure that I do not consume too much (to the point of crashing my program).
I'm running Xcode, and wrote the following simple program:
#include <csignal>
static bool interrupted = false;
long stack_test(long limit){
if((limit>0)&&(interrupted==false))
return stack_test(limit-1)+1; // program crashes here with EXC_BAD_ACCESS...
else
return 0;
}
void signal_handler(int sig){
interrupted = true;
}
int main(char* args[]){
signal(SIGSEGV,&signal_handler);
stack_test(1000000);
signal(SIGSEGV,SIG_DFL);
}
The documentation states that running on BSD, stack limits can be checked by using getrlimit() and that when the stack limit is being reached, a SIGSEGV event is issued. I tried installing the above event handler for this event, but instead, my program stops at the next iteration with EXT_BAD_ACCESS (code=2, ...).
Am I taking the wrong approach here, or is there a better way?
This has the same problem in Java as it does in c++. You are way over-committing to the stack.
And some recursive algorithms operate more efficiently on the stack than on the heap.
Indeed, and they are commonly of the divide and conquer type.
The usefulness of recursion is to reduce the computation to a more manageable computation with each call. limit - 1 is not such a candidate.
If your question is only about the signal, I unfortunately can't offer you any advice on your system.
Your signal handler can't do much to fix the stack overflow. Setting your interrupted flag doesn't help. When your signal handler returns, the instruction that tried to write to an address beyond the end of the stack resumes and it's still going to attempt to write beyond the end of the stack. Your code won't get back to the part which checks your interrupted flag.
With great care and a lot of architecture-specific code, your signal handler could potentially change the context of the thread which encountered the signal such that, when it resumes, it will be at a different point in the code.
You could also use setjmp() and longjmp() to accomplish this at a coarser granularity.
A different approach would be to set up a thread to use a stack that your code allocated, using pthread_attr_setstackaddr() and pthread_attr_setstacksize() prior to pthread_create(). You would run your code in that secondary thread and not the main one. You could set the last page or two of the stack you allocated to be non-writable using mprotect(). Then, your signal handler could set the interrupted flag and also set those pages to be writable. That should give you enough headroom that the resumed code can execute without re-raising the signal, get far enough to check the flag, and return gracefully. Note that this is a one-time last resort, unless you can find a good point to set those guard pages non-writable again.
I a have third party function which I use in my program. I can't replace it; it's in a dynamic library, so I also can't edit it. The problem is that it sometimes runs for too long.
So, can I do anything to stop this function from running if it runs more than 10 seconds for example? (It's OK to close program in this scenario.)
PS. I have Linux, and this program won't have to be ported anywhere else.
What I want is something like this:
#include <stdio.h>
#include <stdlib.h>
void func1 (void) // I can not change contents of this.
{
int i; // random
while (i % 2 == 0);
}
int main ()
{
setTryTime(10000);
timeTry{
func1();
} catchTime {
puts("function executed too long, aborting..");
}
return 0;
}
Sure. And you'd do it just the way you suggested in your title: "signals".
Specifically, an "alarm" signal:
http://linux.die.net/man/2/alarm
http://beej.us/guide/bgipc/output/html/multipage/signals.html
If you really have to do this, you probably want to spawn a process that does nothing but invoke the function and return its result to the caller. If it runs too long, you can kill that process.
By putting it into its own process, you stand a decent (not great, but decent) chance of cleaning up at least most of what it was doing so when it dies unexpectedly it probably won't make a complete mess of things that will lead to later problem.
The potential problem with forcefully cancelling a running function is that it may "own" resources that it intended to return later. The kind of resources that can be problems include:
heap memory allocations (free store)
shared memory segments
threads
sockets
file handles
locks
Some of these resources are managed on a per-process basis, so letting the function run in a different process (perhaps using fork) makes it easier to kill cleanly. Other resources can outlive a process, and really must be cleaned up explicitly. Depending on your operating system, it's also possible that the function may be part-way through interacting with some hardware driver or device, and killing it unexpectedly may leave that driver or device in a bizarre state such that it won't work until after a restart.
If you happen to know that the function doesn't use any of these kind of resources, then you can kill it confidently. But, it's hard to guarantee that: in a large system with many such decisions - which the compiler can't check - evolution of code in functions like func1() is likely to introduce dependencies on such resources.
If you must do this, I'd suggest running it in a different process or thread, and using kill() for processes, pthread_kill if func1() has some support for terminating when a flag is set asynchronously, or the non-portable pthread_cancel if there's really no other choice.
This is a pretty basic scenario but I'm not finding too many helpful resources. I have a C++ program running in Linux that does file processing. Reads lines, does various transformations, writes data into a database. There's certain variables (stored in the database) that affect the processing which I'm currently reading at every iteration because I want processing to be as up to date as possible, but a slight lag is OK. But those variables change pretty rarely, and the reads are expensive over time (10 million plus rows a day). I could space out the reads to every n iterations or simply restart the program when a variable changes, but those seem hackish.
What I would like to do instead is have the program trigger a reread of the variables when it receives a SIGHUP. Everything I'm reading about signal handling is talking about the C signal library which I'm not sure how to tie in to my program's classes. The Boost signal libraries seem to be more about inter-object communication rather than handling OS signals.
Can anybody help? It seems like this should be incredibly simple, but I'm pretty rusty with C++.
I would handle it just like you might handle it in C. I think it's perfectly fine to have a stand-alone signal handler function, since you'll just be posting to a semaphore or setting a variable or some such, which another thread or object can inspect to determine if it needs to re-read the settings.
#include <signal.h>
#include <stdio.h>
/* or you might use a semaphore to notify a waiting thread */
static volatile sig_atomic_t sig_caught = 0;
void handle_sighup(int signum)
{
/* in case we registered this handler for multiple signals */
if (signum == SIGHUP) {
sig_caught = 1;
}
}
int main(int argc, char* argv[])
{
/* you may also prefer sigaction() instead of signal() */
signal(SIGHUP, handle_sighup);
while(1) {
if (sig_caught) {
sig_caught = 0;
printf("caught a SIGHUP. I should re-read settings.\n");
}
}
return 0;
}
You can test sending a SIGHUP by using kill -1 `pidof yourapp`.
I'd recommend checking out this link which gives the details on registering a signal.
Unless I'm mistaken, one important thing to remember is that any function inside an object expects a referent parameter, which means non-static member functions can't be signal handlers. I believe you'll need to register it either to a static member function, or some kind of global function. From there, if you have a specific object function you want to take care of your update, you'll need a way to reference that object.
There are several possibilities; it would not necessarily be overkill to implement all of them:
Respond to a specific signal, just like C does. C++ works the same way. See the documentation for signal().
Trigger on the modification timestamp of some file changing, like the database if it is stored in a flat file.
Trigger once per hour, or once per day (whatever makes sense).
You can define a Boost signal corresponding to the OS signal and tie the Boost signal to your slot to invoke the respective handler.
Is the following safe?
I am new to threading and I want to delegate a time consuming process to a separate thread in my C++ program.
Using the boost libraries I have written code something like this:
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
Where finished_flag is a boolean member of my class. When the thread is finished it sets the value and the main loop of my program checks for a change in that value.
I assume that this is okay because I only ever start one thread, and that thread is the only thing that changes the value (except for when it is initialised before I start the thread)
So is this okay, or am I missing something, and need to use locks and mutexes, etc
You never mentioned the type of finished_flag...
If it's a straight bool, then it might work, but it's certainly bad practice, for several reasons. First, some compilers will cache the reads of the finished_flag variable, since the compiler doesn't always pick up the fact that it's being written to by another thread. You can get around this by declaring the bool volatile, but that's taking us in the wrong direction. Even if reads and writes are happening as you'd expect, there's nothing to stop the OS scheduler from interleaving the two threads half way through a read / write. That might not be such a problem here where you have one read and one write op in separate threads, but it's a good idea to start as you mean to carry on.
If, on the other hand it's a thread-safe type, like a CEvent in MFC (or equivilent in boost) then you should be fine. This is the best approach: use thread-safe synchronization objects for inter-thread communication, even for simple flags.
Instead of using a member variable to signal that the thread is done, why not use a condition? You are already are using the boost libraries, and condition is part of the thread library.
Check it out. It allows the worker thread to 'signal' that is has finished, and the main thread can check during execution if the condition has been signaled and then do whatever it needs to do with the completed work. There are examples in the link.
As a general case I would neve make the assumption that a resource will only be modified by the thread. You might know what it is for, however someone else might not - causing no ends of grief as the main thread thinks that the work is done and tries to access data that is not correct! It might even delete it while the worker thread is still using it, and causing the app to crash. Using a condition will help this.
Looking at the thread documentation, you could also call thread.timed_join in the main thread. timed_join will wait for a specified amount for the thread to 'join' (join means that the thread has finsihed)
I don't mean to be presumptive, but it seems like the purpose of your finished_flag variable is to pause the main thread (at some point) until the thread thrd has completed.
The easiest way to do this is to use boost::thread::join
// launch the thread...
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
// ... do other things maybe ...
// wait for the thread to complete
thrd.join();
If you really want to get into the details of communication between threads via shared memory, even declaring a variable volatile won't be enough, even if the compiler does use appropriate access semantics to ensure that it won't get a stale version of data after checking the flag. The CPU can issue reads and writes out of order as long (x86 usually doesn't, but PPC definitely does) and there is nothing in C++9x that allows the compiler to generate code to order memory accesses appropriately.
Herb Sutter's Effective Concurrency series has an extremely in depth look at how the C++ world intersects the multicore/multiprocessor world.
Having the thread set a flag (or signal an event) before it exits is a race condition. The thread has not necessarily returned to the OS yet, and may still be executing.
For example, consider a program that loads a dynamic library (pseudocode):
lib = loadLibrary("someLibrary");
fun = getFunction("someFunction");
fun();
unloadLibrary(lib);
And let's suppose that this library uses your thread:
void someFunction() {
volatile bool finished_flag = false;
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
while(!finished_flag) { // ignore the polling loop, it's besides the point
sleep();
}
delete thrd;
}
void myclass::mymethod() {
// do stuff
finished_flag = true;
}
When myclass::mymethod() sets finished_flag to true, myclass::mymethod() hasn't returned yet. At the very least, it still has to execute a "return" instruction of some sort (if not much more: destructors, exception handler management, etc.). If the thread executing myclass::mymethod() gets pre-empted before that point, someFunction() will return to the calling program, and the calling program will unload the library. When the thread executing myclass::mymethod() gets scheduled to run again, the address containing the "return" instruction is no longer valid, and the program crashes.
The solution would be for someFunction() to call thrd->join() before returning. This would ensure that the thread has returned to the OS and is no longer executing.