Concurrent code without waiting

Concurrent code without waiting - c++

I'm thinking about a certain kind of synchronisation primitive, but I don't know what this kind of synchronisation is called or if something like this would be working.
So there is one variable (boolean) which basically signals if one thread is still working on a block of memory or not. At the beginning the bool is set to false, meaning the worker thread is not working on your block of memory. Now the main thread gives the worker thread a "todo-list", describing how it should be working on that block of memory. After that, it changes the state of the boolean to true, so that the worker thread knows it is now allowed to do its work. The main thread can now continue its own work and checks at certain locations if the worker thread is now done working, e.g. if the boolean has been set to false again. If it is stil true, the main thread just continues its own work and doesn't wait for the worker thread. If the boolean is false, the main thread knows the worker thread is done and starts processing the block of memory.
So the boolean just transfers the ownership over a block of memory between two threads. If one thread currently does not have the ownership of that memory, it just continues with its own work, and checks repeatedly if it now has the ownership again. This way, none of the threads is waiting for one another and can continue its own work.
What is this called and how is such a behavior implemented?
EDIT: Basically it's a mutex. But instead of waiting for the mutex to be unlocked again, it continues/skips the critical code.

EDIT: Basically it's a mutex. But instead of waiting for the mutex to
be unlocked again, it continues/skips the critical code.
It's still a mutex, just with "try" methods.
in standard C++, we're talking about std::mutex::try_lock , which tries to lock the mutex, if it fails it returns false and moves on
class unlocker{
std::mutex& m_Parent;
public :
unlocker(std::mutex& parent) : m_Parent(parent){}
~unlocker() {m_Parent.unlock(); }
};
std::mutex mtx;
if (mtx.try_lock()){
unlocker unlock(mtx); // no, you can't use std::lock_guard/unique_lock here
//success, mtx is free
} else{
// do something else
}
on Native OS's code you have similar functions depending on the operating system you are on, like pthread_mutex_trylock on Unix and TryEnterCriticalSection on Windows. needless to say that standard mutex probably does use these functions behind the scenes

What will you do if the main thread runs out of work?
Suppose you keep checking and you keep reading true. Eventually you reach a point where the main thread cannot continue without the result from the worker thread. Since you have no more work to do, the only thing left is now keep checking the value of the flag over and over, wasting CPU resources that other threads could use to do useful work.
In general, this is not what you want. Instead, you would like the operating system to put your main thread to sleep and only wake it up once the worker thread has finished processing. All kinds of locks and semaphores that ship with modern operating systems work this way. Underneath there is some flag in memory that indicates who owns the lock, but there is also a bunch of logic around it that ensure the operating system won't schedule threads that have nothing to do but wait for a lock to become ready.
That being said, there are some situations where this is not what you want. If you are sufficiently sure that you won't run into the situation where one thread just spins on a lock, and you want to save the overhead that comes with the OS locks, just checking a flag like you described might be a viable option.
Note though that low-level stuff like this should be reserved for special circumstances, not be the first tool in your toolbox. It's just too easy to end up with an algorithm that is incorrect or an implementation that is not as efficient as you thought. If you decide to go down this road, be prepared to do some serious work to get it working as expected.

Related

How to cleanly exit a threaded C++ program?

I am creating multiple threads in my program. On pressing Ctrl-C, a signal handler is called. Inside a signal handler, I have put exit(0) at last. The thing is that sometimes the program terminates safely but the other times, I get runtime error stating
abort() has been called
So what would be the possible solution to avoid the error?

The usual way is to set an atomic flag (like std::atomic<bool>) which is checked by all threads (including the main thread). If set, then the sub-threads exit, and the main thread starts to join the sub-threads. Then you can exit cleanly.
If you use std::thread for the threads, that's a possible reason for the crashes you have. You must join the thread before the std::thread object is destructed.

Others have mentioned having the signal-handler set a std::atomic<bool> and having all the other threads periodically check that value to know when to exit.
That approach works well as long as all of your other threads are periodically waking up anyway, at a reasonable frequency.
It's not entirely satisfactory if one or more of your threads is purely event-driven, however -- in an event-driven program, threads are only supposed to wake up when there is some work for them to do, which means that they might well be asleep for days or weeks at a time. If they are forced to wake up every (so many) milliseconds simply to poll an atomic-boolean-flag, that makes an otherwise extremely CPU-efficient program much less CPU-efficient, since now every thread is waking up at short regular intervals, 24/7/365. This can be particularly problematic if you are trying to conserve battery life, as it can prevent the CPU from going into power-saving mode.
An alternative approach that avoids polling would be this one:
On startup, have your main thread create an fd-pipe or socket-pair (by calling pipe() or socketpair())
Have your main thread (or possibly some other responsible thread) include the receiving-socket in its read-ready select() fd_set (or take a similar action for poll() or whatever wait-for-IO function that thread blocks in)
When the signal-handler is executed, have it write a byte (any byte, doesn't matter what) into the sending-socket.
That will cause the main thread's select() call to immediately return, with FD_ISSET(receivingSocket) indicating true because of the received byte
At that point, your main thread knows it is time for the process to exit, so it can start directing all of its child threads to start shutting down (via whatever mechanism is convenient; atomic booleans or pipes or something else)
After telling all the child threads to start shutting down, the main thread should then call join() on each child thread, so that it can be guaranteed that all of the child threads are actually gone before main() returns. (This is necessary because otherwise there is a risk of a race condition -- e.g. the post-main() cleanup code might occasionally free a resource while a still-executing child thread was still using it, leading to a crash)

The first thing you must accept is that threading is hard.
A "program using threading" is about as generic as a "program using memory", and your question is similar to "how do I not corrupt memory in a program using memory?"
The way you handle threading problem is to restrict how you use threads and the behavior of the threads.
If your threading system is a bunch of small operations composed into a data flow network, with an implicit guarantee that if an operation is too big it is broken down into smaller operations and/or does checkpoints with the system, then shutting down looks very different than if you have a thread that loads an external DLL that then runs it for somewhere from 1 second to 10 hours to infinite length.
Like most things in C++, solving your problem is going to be about ownership, control and (at a last resort) hacks.
Like data in C++, every thread should be owned. The owner of a thread should have significant control over that thread, and be able to tell it that the application is shutting down. The shut down mechanism should be robust and tested, and ideally connected to other mechanisms (like early-abort of speculative tasks).
The fact you are calling exit(0) is a bad sign. It implies your main thread of execution doesn't have a clean shutdown path. Start there; the interrupt handler should signal the main thread that shutdown should begin, and then your main thread should shut down gracefully. All stack frames should unwind, data should be cleaned up, etc.
Then the same kind of logic that permits that clean and fast shutdown should also be applied to your threaded off code.
Anyone telling you it is as simple as a condition variable/atomic boolean and polling is selling you a bill of goods. That will only work in simple cases if you are lucky, and determining if it works reliably is going to be quite hard.

Additional to Some programmer dude answer and related to discussion in the comment section, you need to make the flag that controls termination of your threads as atomic type.
Consider following case :
bool done = false;
void pending_thread()
{
while(!done)
{
std::this_thread::sleep(std::milliseconds(1));
}
// do something that depends on working thread results
}
void worker_thread()
{
//do something for pending thread
done = true;
}
Here worker thread can be your main thread also and done is terminating flag of your thread, but pending thread need to do something with given data by working thread, before exiting.
this example has race condition and undefined behaviour along with it, and it's really hard to find what is the actual problem int the real world.
Now the corrected version using std::automic :
std::atomic<bool> done(false);
void pending_thread()
{
while(!done.load())
{
std::this_thread::sleep(std::milliseconds(1));
}
// do something that depends on working thread results
}
void worker_thread()
{
//do something for pending thread
done = true;
}
You can exit thread without being concern of race condition or UB.

C++/BOOST: Does condition_variable::wait( ) / notify( ) ensure ordering of threads waiting?

I need to know if there is a way to "queue up" threads that wait on a condition variable so that they are awoken in the correct order...without writing a bunch of queueing code, that is.
In most systems, the following reversal of the producer/consumer model (with blocking on full mailbox) may not ensure ordering:
unique_lock lock1(mutex), lock2(mutex)
ConditionVariable cv
Code Block A: (called by multiple threads)
lock(lock1)
timestampOnEntry = now()
cv.wait(lock1) // Don't worry about spurious notifies, out of scope.
somethingRequiringMonotonicOrderOfTimestamps(timestampOnEntry)
unlock(lock1)
Code Block B: (called by a single thread, typically within a loop)
lock(lock2)
somethingVeryVerySlow()
(1) unlock(lock2) // the ordering here is not a mistake
(2) cv.notify_one(lock2) // prevents needless reblocking in code block A
Note that lines (1) and (2) in the given order. This prevents an unnecessary second block on guard in code block A should the notified thread wake up before guard is unlocked by the thread in code block B.
The question is that if multiple threads are "blocked" on wait, I need to know if
*notify_one* will wake them up in the order in which blocked. Probably not (as in Java). If not by default, if there is a way to specify that.
This could of course be done with a bunch of queuing code, but I'd prefer to use a pre-canned BOOST methodology, regardless of how complicated the contents of the can are. Of course, should I convert *cv.notify_one(guard)* into *cv.notify_all(guard)*, I would be required to do the queueing code, regardless.

No such guarantess are given by the standard, notify_one may wake any thread that is currently waiting (§30.5.1):
void notify_one() noexcept;
Effects: If any threads are blocked waiting for *this, unblocks one of those theads.
The only way to ensure that a specific thread reacts to the event is to wake all threads and then have some additional synchronization mechanism that sends all but the correct thread back to sleep.
This is a fundamental limitation due to the requirements that the platform has to fulfill: Usually condition variables are implemented in a way that the waiting threads are put into a suspended state and will not get scheduled by the system again until a notify occurs. A scheduler implementation is not required to provide the functionality for selecting a specific thread for waking up (and many actually don't).
So this part of the logic inevitably has to be handled by user code, which in turn means you have to wake up all threads to make it work, because this is the only way to ensure that the correct thread will get woken at all.

The short answer, as you seem to have suspected, is no. Which thread (or threads) notify_one is going to rouse is not necessarily guaranteed.
That said, I'm not sure what to make of your example code. Specifically, passing a mutex to notify_one doesn't make sense to me (I am unaware of any condition variable implementations on any platform that signal/broadcast that way). I don't know your use case--perhaps you must have a lot of thread local data that prevents arranging your application state in such a way that any thread can pick up the necessary data to do the next task? My first reaction to that would be to refactor the code to care less about which particular OS thread does which work and focus more on the ordering of the work itself.

Cheapest way to wake up multiple waiting threads without blocking

I use boost::thread to manage threads. In my program i have pool of threads (workers) that are activated sometimes to do some job simultaneously.
Now i use boost::condition_variable: and all threads are waiting inside boost::condition_variable::wait() call on their own conditional_variableS objects.
Can i AVOID using mutexes in classic scheme, when i work with conditional_variables? I want to wake up threads, but don't need to pass some data to them, so don't need a mutex to be locked/unlocked during awakening process, why should i spend CPU on this (but yes, i should remember about spurious wakeups)?
The boost::condition_variable::wait() call trying to REACQUIRE the locking object when CV received the notification. But i don't need this exact facility.
What is cheapest way to awake several threads from another thread?

If you don't reacquire the locking object, how can the threads know that they are done waiting? What will tell them that? Returning from the block tells them nothing because the blocking object is stateless. It doesn't have an "unlocked" or "not blocking" state for it to return in.
You have to pass some data to them, otherwise how will they know that before they had to wait and now they don't? A condition variable is completely stateless, so any state that you need must be maintained and passed by you.
One common pattern is to use a mutex, condition variable, and a state integer. To block, do this:
Acquire the mutex.
Copy the value of the state integer.
Block on the condition variable, releasing the mutex.
If the state integer is the same as it was when you coped it, go to step 3.
Release the mutex.
To unblock all threads, do this:
Acquire the mutex.
Increment the state integer.
Broadcast the condition variable.
Release the mutex.
Notice how step 4 of the locking algorithm tests whether the thread is done waiting? Notice how this code tracks whether or not there has been an unblock since the thread decided to block? You have to do that because condition variables don't do it themselves. (And that's why you need to reacquire the locking object.)
If you try to remove the state integer, your code will behave unpredictably. Sometimes you will block too long due to missed wakeups and sometimes you won't block long enough due to spurious wakeups. Only a state integer (or similar predicate) protected by the mutex tells the threads when to wait and when to stop waiting.
Also, I haven't seen how your code uses this, but it almost always folds into logic you're already using. Why did the threads block anyway? Is it because there's no work for them to do? And when they wakeup, are they going to figure out what to do? Well, finding out that there's no work for them to do and finding out what work they do need to do will require some lock since it's shared state, right? So there almost always is already a lock you're holding when you decide to block and need to reacquire when you're done waiting.

For controlling threads doing parallel jobs, there is a nice primitive called a barrier.
A barrier is initialized with some positive integer value N representing how many threads it holds. A barrier has only a single operation: wait. When N threads call wait, the barrier releases all of them. Additionally, one of the threads is given a special return value indicating that it is the "serial thread"; that thread will be the one to do some special job, like integrating the results of the computation from the other threads.
The limitation is that a given barrier has to know the exact number of threads. It's really suitable for parallel processing type situations.
POSIX added barriers in 2003. A web search indicates that Boost has them, too.
http://www.boost.org/doc/libs/1_33_1/doc/html/barrier.html

Generally speaking, you can't.
Assuming the algorithm looks something like this:
ConditionVariable cv;
void WorkerThread()
{
for (;;)
{
cv.wait();
DoWork();
}
}
void MainThread()
{
for (;;)
{
ScheduleWork();
cv.notify_all();
}
}
NOTE: I intentionally omitted any reference to mutexes in this pseudo-code. For the purposes of this example, we'll suppose ConditionVariable does not require a mutex.
The first time through MainTnread(), work is queued and then it notifies WorkerThread() that it should execute its work. At this point two things can happen:
WorkerThread() completes DoWork() before MainThread() can complete ScheduleWork().
MainThread() completes ScheduleWork() before WorkerThread() can complete DoWork().
In case #1, WorkerThread() comes back around to sleep on the CV, and is awoken by the next cv.notify() and all is well.
In case #2, MainThread() comes back around and notifies... nobody and continues on. Meanwhile WorkerThread() eventually comes back around in its loop and waits on the CV but it is now one or more iterations behind MainThread().
This is known as a "lost wakeup". It is similar to the notorious "spurious wakeup" in that the two threads now have different ideas about how many notify()s have taken place. If you are expecting the two threads to maintain synchrony (and usually you are), you need some sort of shared synchronization primitive to control it. This is where the mutex comes in. It helps avoid lost wakeups which, arguably, are a more serious problem than the spurious variety. Either way, the effects can be serious.
UPDATE: For further rationale behind this design, see this comment by one of the original POSIX authors: https://groups.google.com/d/msg/comp.programming.threads/cpJxTPu3acc/Hw3sbptsY4sJ
Spurious wakeups are two things:
Write your program carefully, and make sure it works even if you
missed something.
Support efficient SMP implementations
There may be rare cases where an "absolutely, paranoiacally correct"
implementation of condition wakeup, given simultaneous wait and
signal/broadcast on different processors, would require additional
synchronization that would slow down ALL condition variable operations
while providing no benefit in 99.99999% of all calls. Is it worth the
overhead? No way!
But, really, that's an excuse because we wanted to force people to
write safe code. (Yes, that's the truth.)

boost::condition_variable::notify_*(lock) does NOT require that the caller hold the lock on the mutex. THis is a nice improvement over the Java model in that it decouples the notification of threads with the holding of the lock.
Strictly speaking, this means the following pointless code SHOULD DO what you are asking:
lock_guard lock(mutex);
// Do something
cv.wait(lock);
// Do something else
unique_lock otherLock(mutex);
//do something
otherLock.unlock();
cv.notify_one();
I do not believe you need to call otherLock.lock() first.

Wait for a detached thread to finish in C++

How can I wait for a detached thread to finish in C++?
I don't care about an exit status, I just want to know whether or not the thread has finished.
I'm trying to provide a synchronous wrapper around an asynchronous thirdarty tool. The problem is a weird race condition crash involving a callback. The progression is:
I call the thirdparty, and register a callback
when the thirdparty finishes, it notifies me using the callback -- in a detached thread I have no real control over.
I want the thread from (1) to wait until (2) is called.
I want to wrap this in a mechanism that provides a blocking call. So far, I have:
class Wait {
public:
void callback() {
pthread_mutex_lock(&m_mutex);
m_done = true;
pthread_cond_broadcast(&m_cond);
pthread_mutex_unlock(&m_mutex);
}
void wait() {
pthread_mutex_lock(&m_mutex);
while (!m_done) {
pthread_cond_wait(&m_cond, &m_mutex);
}
pthread_mutex_unlock(&m_mutex);
}
private:
pthread_mutex_t m_mutex;
pthread_cond_t m_cond;
bool m_done;
};
// elsewhere...
Wait waiter;
thirdparty_utility(&waiter);
waiter.wait();
As far as I can tell, this should work, and it usually does, but sometimes it crashes. As far as I can determine from the corefile, my guess as to the problem is this:
When the callback broadcasts the end of m_done, the wait thread wakes up
The wait thread is now done here, and Wait is destroyed. All of Wait's members are destroyed, including the mutex and cond.
The callback thread tries to continue from the broadcast point, but is now using memory that's been released, which results in memory corruption.
When the callback thread tries to return (above the level of my poor callback method), the program crashes (usually with a SIGSEGV, but I've seen SIGILL a couple of times).
I've tried a lot of different mechanisms to try to fix this, but none of them solve the problem. I still see occasional crashes.
EDIT: More details:
This is part of a massively multithreaded application, so creating a static Wait isn't practical.
I ran a test, creating Wait on the heap, and deliberately leaking the memory (i.e. the Wait objects are never deallocated), and that resulted in no crashes. So I'm sure it's a problem of Wait being deallocated too soon.
I've also tried a test with a sleep(5) after the unlock in wait, and that also produced no crashes. I hate to rely on a kludge like that though.
EDIT: ThirdParty details:
I didn't think this was relevant at first, but the more I think about it, the more I think it's the real problem:
The thirdparty stuff I mentioned, and why I have no control over the thread: this is using CORBA.
So, it's possible that CORBA is holding onto a reference to my object longer than intended.

Yes, I believe that what you're describing is happening (race condition on deallocate). One quick way to fix this is to create a static instance of Wait, one that won't get destroyed. This will work as long as you don't need to have more than one waiter at the same time.
You will also permanently use that memory, it will not deallocate. But it doesn't look like that's too bad.
The main issue is that it's hard to coordinate lifetimes of your thread communication constructs between threads: you will always need at least one leftover communication construct to communicate when it is safe to destroy (at least in languages without garbage collection, like C++).
EDIT:
See comments for some ideas about refcounting with a global mutex.

To the best of my knowledge there's no portable way to directly ask a thread if its done running (i.e. no pthread_ function). What you are doing is the right way to do it, at least as far as having a condition that you signal. If you are seeing crashes that you are sure are due to the Wait object is being deallocated when the thread that creates it quits (and not some other subtle locking issue -- all too common), the issue is that you need to make sure the Wait isn't being deallocated, by managing from a thread other than the one that does the notification. Put it in global memory or dynamically allocate it and share it with that thread. Most simply don't have the thread being waited on own the memory for the Wait, have the thread doing the waiting own it.

Are you initializing and destroying the mutex and condition var properly?
Wait::Wait()
{
pthread_mutex_init(&m_mutex, NULL);
pthread_cond_init(&m_cond, NULL);
m_done = false;
}
Wait::~Wait()
{
assert(m_done);
pthread_mutex_destroy(&m_mutex);
pthread_cond_destroy(&m_cond);
}
Make sure that you aren't prematurely destroying the Wait object -- if it gets destroyed in one thread while the other thread still needs it, you'll get a race condition that will likely result in a segfault. I'd recommend making it a global static variable that gets constructed on program initialization (before main()) and gets destroyed on program exit.

If your assumption is correct then third party module appears to be buggy and you need to come up with some kind of hack to make your application work.
Static Wait is not feasible. How about Wait pool (it even may grow on demand)? Is you application using thread pool to run?
Although there will still be a chance that same Wait will be reused while third party module is still using it. But you can minimize such chance by properly queing vacant Waits in your pool.
Disclaimer: I am in no way an expert in thread safety, so consider this post as a suggestion from a layman.

C++ Thread question - setting a value to indicate the thread has finished

Is the following safe?
I am new to threading and I want to delegate a time consuming process to a separate thread in my C++ program.
Using the boost libraries I have written code something like this:
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
Where finished_flag is a boolean member of my class. When the thread is finished it sets the value and the main loop of my program checks for a change in that value.
I assume that this is okay because I only ever start one thread, and that thread is the only thing that changes the value (except for when it is initialised before I start the thread)
So is this okay, or am I missing something, and need to use locks and mutexes, etc

You never mentioned the type of finished_flag...
If it's a straight bool, then it might work, but it's certainly bad practice, for several reasons. First, some compilers will cache the reads of the finished_flag variable, since the compiler doesn't always pick up the fact that it's being written to by another thread. You can get around this by declaring the bool volatile, but that's taking us in the wrong direction. Even if reads and writes are happening as you'd expect, there's nothing to stop the OS scheduler from interleaving the two threads half way through a read / write. That might not be such a problem here where you have one read and one write op in separate threads, but it's a good idea to start as you mean to carry on.
If, on the other hand it's a thread-safe type, like a CEvent in MFC (or equivilent in boost) then you should be fine. This is the best approach: use thread-safe synchronization objects for inter-thread communication, even for simple flags.

Instead of using a member variable to signal that the thread is done, why not use a condition? You are already are using the boost libraries, and condition is part of the thread library.
Check it out. It allows the worker thread to 'signal' that is has finished, and the main thread can check during execution if the condition has been signaled and then do whatever it needs to do with the completed work. There are examples in the link.
As a general case I would neve make the assumption that a resource will only be modified by the thread. You might know what it is for, however someone else might not - causing no ends of grief as the main thread thinks that the work is done and tries to access data that is not correct! It might even delete it while the worker thread is still using it, and causing the app to crash. Using a condition will help this.
Looking at the thread documentation, you could also call thread.timed_join in the main thread. timed_join will wait for a specified amount for the thread to 'join' (join means that the thread has finsihed)

I don't mean to be presumptive, but it seems like the purpose of your finished_flag variable is to pause the main thread (at some point) until the thread thrd has completed.
The easiest way to do this is to use boost::thread::join
// launch the thread...
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
// ... do other things maybe ...
// wait for the thread to complete
thrd.join();

If you really want to get into the details of communication between threads via shared memory, even declaring a variable volatile won't be enough, even if the compiler does use appropriate access semantics to ensure that it won't get a stale version of data after checking the flag. The CPU can issue reads and writes out of order as long (x86 usually doesn't, but PPC definitely does) and there is nothing in C++9x that allows the compiler to generate code to order memory accesses appropriately.
Herb Sutter's Effective Concurrency series has an extremely in depth look at how the C++ world intersects the multicore/multiprocessor world.

Having the thread set a flag (or signal an event) before it exits is a race condition. The thread has not necessarily returned to the OS yet, and may still be executing.
For example, consider a program that loads a dynamic library (pseudocode):
lib = loadLibrary("someLibrary");
fun = getFunction("someFunction");
fun();
unloadLibrary(lib);
And let's suppose that this library uses your thread:
void someFunction() {
volatile bool finished_flag = false;
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
while(!finished_flag) { // ignore the polling loop, it's besides the point
sleep();
}
delete thrd;
}
void myclass::mymethod() {
// do stuff
finished_flag = true;
}
When myclass::mymethod() sets finished_flag to true, myclass::mymethod() hasn't returned yet. At the very least, it still has to execute a "return" instruction of some sort (if not much more: destructors, exception handler management, etc.). If the thread executing myclass::mymethod() gets pre-empted before that point, someFunction() will return to the calling program, and the calling program will unload the library. When the thread executing myclass::mymethod() gets scheduled to run again, the address containing the "return" instruction is no longer valid, and the program crashes.
The solution would be for someFunction() to call thrd->join() before returning. This would ensure that the thread has returned to the OS and is no longer executing.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js