Changing Thread Task? - c++

I know you cannot kill a boost thread, but can you change it's task?
Currently I have an array of 8 threads. When a button is pressed, these threads are assigned a task. The task which they are assigned to do is completely independent of the main thread and the other threads. None of the the threads have to wait or anything like that, so an interruption point is never reach.
What I need is to is, at anytime, change the task that each thread is doing. Is this possible? I have tried looping through the array of threads and changing what each thread object points to to a new one, but of course that doesn't do anything to the old threads.
I know you can interrupt pThreads, but I cannot find a working link to download the library to check it out.

A thread is not some sort of magical object that can be made to do things. It is a separate path of execution through your code. Your code cannot be made to jump arbitrarily around its codebase unless you specifically program it to do so. And even then, it can only be done within the rules of C++ (ie: calling functions).
You cannot kill a boost::thread because killing a thread would utterly wreck some of the most fundamental assumptions a programmer makes. You now have to take into account the possibility that the next line doesn't execute for reasons that you can neither predict nor prevent.
This isn't like exception handling, where C++ specifically requires destructors to be called, and you have the ability to catch exceptions and do special cleanup. You're talking about executing one piece of code, then suddenly inserting a call to some random function in the middle of already compiled code. That's not going to work.
If you want to be able to change the "task" of a thread, then you need to build that thread with "tasks" in mind. It needs to check every so often that it hasn't been given a new task, and if it has, then it switches to doing that. You will have to define when this switching is done, and what state the world is in when switching happens.

Related

Is there a reason a thread shouldn't access other thread's stack?

I was just playing with Intel Parallel inspector on my project, and it displays a warning:
One or more threads in the application accessed the stack of another
thread. This may indicate one or more bugs in your application.
I do indeed have some objects that are allocated on stack shared between threads. I don't see why this is a problem. Any hints?
It's not wrong, it's just possibly wrong. Tools like Intel Parallel Inspector that provide additional diagnostics for your program must make a tradeoff between false positives and false negatives, in this case, it seems that the developers thought that accessing the stack of another thread was much more likely to be an error (low false positive rate if reported) than not (high false negative rate if not reported).
Valgrind is another example of a tool that can signal errors in code that is correct.
The real question here is, "what is the other thread doing?" If you think, "maybe it will return from that function and the stack frame will be invalid," then you are doing parallel programming wrong. No answer about multithreaded behavior should be qualified with "maybe". You had better make sure that that thread doesn't return, for example, by making it wait on a semaphore or condition variable, or by making it join with the other threads.
Discussion
Pubby: "AFAIK it's hugely inefficient."
The only reason it would be inefficient is because you might have multiple cores modifying the same cache lines, which is the same problem you have with other kinds of shared memory.
Collin: How do you know the stack frame is still good in another thread?
If you use something in multiple threads, you use some kind of synchronization mechanism to ensure that it's not modified in an invalid way. This situation is no different.
H2CO3: Well, is there a reason you should not walk into another person's house?
If we're going to play with analogies, I'd say that the process is the house, and each of the threads are people in the house. If Dave keeps a list of chores in his room, I'll go into his room every time I need to look at the list. If he stops doing that, he'd better tell me or else I'll start writing on random pieces of paper on his desk.
Conclusion
It's a matter of style whether this program behavior is acceptable or not.
Imagine this -- a thread is executing and a method is called which has a local (stack) variable (an object). It adds this object to a work queue, a queue which is processed by a separate thread.
That thread gets to the item added by the first thread and accesses the object, on the stack, of the first thread.
What has the first thread done in the meantime? It may have exited the method and freed up that stack space. That freed space may or may not be re-used. The second thread accessing the stack of the first thread may or may not work correctly, depending on timing and the call graph.
If you know the stack variable will exist while the second thread processes it then it can be safe to do; for example, if Thread 1 queues a stack variable and then blocks until Thread 2 notifies it has finished processing, that is a safe operation.
A warning rather than an error is issued because this may or may not be a legitimate operation, and there's no way for an analyzer to be certain.

Catching Signals c++

I have a boost threadpool which I use to do certain tasks. I also have a Sensor class that has the pure virtual function doWork(int total) = 0;. Whenever it is requested, my main process gets the necessary Sensor pointer and tells the threadpool to run Sensor::doWork(int total).
threadpool->schedule(boost::bind(&Sensor::doWork,this,123456));
I am dynamically loading libraries of type Sensor, thus it is out of my control if someone else has faulty coding which results in SEGFAULTS and such. So is there a way for me to (in my main process) handle any errors thrown by Sensor::doWork(int total), clean up the thread, delete that sensor object and notify the console what and where the error has occurred?
Really the only way to handle a segmentation fault here is to run Sensor::doWork in a completely separate process.
In UNIX, this involves using fork (or some other similar means), running Sensor::doWork in the child process, and then somehow shuttling the results back to the parent process.
I assume similar means are available in Windows.
EDIT: I thought I'd flesh out a bit some of the things you can do.
Solution #1: you can work with processes in the same fashion as you would threads. For example, you could create process pool that sit there in a loop of
Wait for a task to be passed in over a pipe or queue or some similar object
Perform the task
Return the results over a pipe or queue or some similar object
And since you're executing the tasks in the other processes, you're protected against them crashing. The main difficulty with this solution is actually communicating between processes; maybe boost's interprocess library will help with that. I've mainly done this sort of thing in python, which has a standard multiprocessing module that handles this stuff for you.
Solution #2: You could divide your application into "safe" and "risky" portions that run in different processes. The "risky" portion executes the Sensor::doWork methods and anything else you might want to do in that process -- but only work that is acceptable to be spontaneously lost if it crashes. The "safe" portion deals with any precious information that you cannot afford to lose, and monitors the "risky" portion, performing some recovery operations when the child crashes. And, of course, whatever other work you decide you want to do in the safe part.
If you got a SIGSEGV, even if you caught it you have no guarantee about your program state so there's pretty much no way to recover.
If you're working with 3rd party libraries, and they're buggy, and the library maintainer won't fix it (and you don't have the source) then your only recourse is to run the third party library from within a totally separate binary that talks to the main binary by some means. See for example firefox and plugin-container.
You might want to register a function callback to catch SIGSEV. In C this can be done using signal. Be aware, however, there is not much you can do, when the OS sends you a SIGSEV (note that it isn't required to). You don't really know in what state your program is in, I'd guess. If for example the heap got corrupt, new and delete operations may fail, so even a plain simple
std::cout << std::string("hello world") << std::endl;
statement, might not work since memory from the heap needs to be allocated.
Best, Christoph

Boost: what exactly is not threadsafe in Boost.Signals?

I read at multiple places that Boost.Signals is not threadsafe but I haven't found much more details about it. This simple quote doesn't say really that much. Most applications nowadays have threads - even if they try to be single threaded, some of their libraries may use threads (for example libsdl).
I guess the implementation doesn't have problems with other threads not accessing the slot. So it is at least threadsafe in this sense.
But what exactly works and what would not work? Would it work to use it from multiple threads as long as I don't ever access it at the same time? I.e. if I build my own mutexes around the slot?
Or am I forced to use the slot only in that thread where I created it? Or where I used it for the first time?
I don't think it's too clear either, and one of the library reviewers said here:
I also don't liked the fact that only three times the word 'thread' was named.
Boost.signals2 wants to be a 'thread safe signals' library. Therefore some more
details and especially more examples concerning on that area should be given to
the user.
One way of figuring it out is to go to the source and see what they're using _mutex / lock() to protect. Then just imagine what would happen if those calls weren't there. :)
From what I can gather, it's ensuring simple things like "if one thread is doing connects or disconnects, that won't cause a different thread which is iterating through the slots attached to those signals to crash". Kind of like how using a thread-safe version of the C runtime library assures that if two threads make valid calls to printf at the same time then there won't be a crash. (Not to say the output you'll get will make any sense—you're still responsible for the higher order semantics.)
It doesn't seem to be like Qt, in which the thread a certain slot's code gets run on is based on the target slot's "thread affinity" (which means emitting a signal can trigger slots on many different threads to run in parallel.) But I guess not supporting that is why the boost::signal "combiners" can do things like this.
One problem I see is that one thread can connect or disconnect while another thread is signalling.
You can easily wrap your signal and connect calls with mutexes. However, it is non-trivial to wrap the connections. (connect returns connections which you can use to disconnect).

Hibernating/restarting a thread

I'm looking for a way to restart a thread, either from inside that thread's context or from outside the thread, possibly from within another process. (Any of these options will work.) I am aware of the difficulty of hibernating entire processes, and I'm pretty sure that those same difficulties attend to threads. However, I'm asking anyway in the hopes that someone has some insight.
My goal is to pause, save to file, and restart a running thread from its exact context with no modification to that thread's code, or rather, modification in only a small area - i.e., I can't go writing serialization functions throughout the code. The main block of code must be unmodified, and will not have any global/system handles (file handles, sockets, mutexes, etc.) Really down-and-dirty details like CPU registers do not need to be saved; but basically the heap, stack, and program counter should be saved, and anything else required to get the thread running again logically correctly from its save point. The resulting state of the program should be no different, if it was saved or not.
This is for a debugging program for high-reliability software; the goal is to run simulations of the software with various scripts for input, and be able to pause a running simulation and then restart it again later - or get the sim to a branch point, save it, make lots of copies and then run further simulations from the common starting point. This is why the main program cannot be modified.
The main thread language is in C++, and should run on Windows and Linux, however if there is a way to only do this on one system, then that's acceptable too.
Thanks in advance.
I think what you're asking is much more complicated than you think. I am not too familiar with Windows programming but here are some of the difficulties you'll face in Linux.
A saved thread can only be restored from the root process that originally spawned the thread, otherwise the dynamic libraries would be broken. Because of this saving to disk is essentially meaningless. The reason is dynamic libraries are loaded at different address each time they're loaded. The only way around this would be to take complete control of dynamically linking, no small feat. It's possible, but pretty scary.
The suspended thread will have variables in the the heap. You'd need to be able to find all globals 'owned' by the thread. The 'owned' state of any piece of the heap cannot be determined. In the future it may be possible with the C++0x's garbage collection ABI. You can't just assume the whole stack belongs to the thread to be paused. The main thread uses the heap when creating threads. So blowing away the heap when deserializing the paused thread would break the main thread.
You need to address the issues with globals. And not just the globals from created in the threads. Globals (or statics) can and often are created in dynamic libraries.
There are more resources to a program than just memory. You have file handles, network sockets, database connections, etc. A file handle is just a number. serializing its memory is completely meaningless without the context of the process the file was opened in.
All that said. I don't think the core problem is impossible, just that you should consider a different approach.
Anyway to try to implement this the thread to paused needs to be in a known state. I imagine the thread to be stoped would call a library function meant the halt the process so it could be resumed.
I think the linux system call fork is your friend. Fork perfectly duplicates a process. Have the system run to the desired point and fork. One fork wait to fork others. The second fork runs one set of input.
once it completes the first fork can for again. Again the second fork can run another set of input.
continue ad infinitum.
Threads run in the context of a process. So if you want to do anything like persist a thread state to disk, you need to "hibernate" the entire process.
You will need to serialise the entire set of the processes data. And you'll need to store the current thread execution point. I think serialising the process is do-able (check out boost::serialize) but the thread stop point is a lot more difficult. I would put places where it can be stopped through the code, but as you say, you cannot modify the code.
Given that problem, you're looking at virtualising the platform the app is running on, and using its suspend functionality to pause the entire thing. You might find more information about how to do this in the virtualisation vendor's features, eg Xen.
As the whole logical address space of the program is part of the thread's context, you would have to hibernate the whole process.
If you can guarantee that the thread only uses local variables, you could save its stack. It is easy to suspend a thread with pthreads, but I don't see how you could access its stack from outside then.
The way you would have to do this is via VM Snapshots; get a copy of VMWare Workstation, then you can write code to automate starting/stopping/snapshotting the machine at different points. Any other approach is pretty untenable, as while you might be able to freeze and dethaw a process, you can't reconstruct the system state it expects (all the stuff that Caspin mentions like file handles et al.)

c++ implementing cancel across thread pools

I have several thread pools and I want my application to handle a cancel operation.
To do this I implemented a shared operation controller object which I poll at various spots in each thread pool worker function that is called.
Is this a good model, or is there a better way to do it?
I just worry about having all of these operationController.checkState() littered throughout the code.
Yes it's a good approach. Herb Sutter has a nice article comparing it with the alternatives (which are worse).
With any kind of ansynchronous cancellation you're going to have to periodically poll some sort of flag. There's a fundamental issue of having to keep things in a consitant state. If you just kill a thread in the middle of whatever it's doing, bad things will happen sooner or later.
Depending on what you are actually doing, you may be able to just ignore the result of the operation instead of cancelling it. You let the operation continue on, but just don't wait for it to complete and never check the result.
If you actually need to stop the operation, then you're going to have to poll at appropriate points, and do whatever cleanup is necessary.
It's a good way to do it.
Another possible way to do it is, if there's some other subroutine[s] which the threads call regularly anyway, to check within that subroutine and throw an exception (to be caught at the top of the thread), assuming that "cancel" may be considered exceptional and assuming that the code being executed by the thread is exception-safe.
I wouldn't do it that way, checking a shared object.
I most likely will provide each thread object with a way to cancel the execution inside the own thread, be it an event, a threadsafe state variable or whatever.
The problem with the shared operation controller is that, from my point of view, the logic is reversed, Why are you calling it "controller" when it doesn't control anything?
For me, Operation Controller shall recive a cancelation order and then, in turn select the appropiate threads and signal them to stop. That would be a correct "chain of command" if you know what I mean. The way you do it you introduce an unnatural behaivour on the thread wich doesn't "obey" orders to stop, instead if checks each time if his "superior" has "written the order somewere". Somehow it just doesn't feel right.
In addition, what if you just one "some" of the threads to stop in the future? What if you want to include some advanced logic so that threads will only stop given a condition? Then you'll have to rewrite the code in each and every thread to handle that condition.
So I will provide a way, for each thread to be able to handle signals to them, for example by using a Command Pattern with a FIFO structure.
(By the way, I realize they're thread pool workers, not actual Thread Classes but still, I think each worker must be signaled to stop separately, not the other way around).
In similar situations I have used an event, non-auto-reset, all threads can look at that event. Quite similar to polling except that if your threads block at times, they can sleep for the "stop"-event as well. (Easier on Windows.)
/L