Boost thread seems to block when going out of scope - c++

I have a strange problem with Boost 1.54 threads which seem to block when the thread object goes out of scope.
Background: I'm working on a real-time application that uses external hardware through API calls. Some of these API calls block until execution. That's why I want to call them in separate threads in order to avoid blocking my main thread. The simplified structure looks as follows:
void some_func(){
//t2
boost::thread t(&blocking_call);
//t3
}
int main(){
//t1
some_func();
//t4
return 0;
}
Luckily, the external hardware has an onboard clock so that I was able to time the execution of my program precisely.
What I observed: t1, t2 and t3 are - as expected - incrementing only a tiny little bit, but t4 is always shortly after the execution time of the API call which is a lot later (and unfortunately even too late for me). It seems as if the thread object was calling join() when it is going out of scope, although I thought it should just get detached and finish its work.
Any hints what might be the issue?

Related

Is there a race condition in the `latch` sample in N3600?

Proposed for inclusion in C++14 (aka C++1y) are some new thread synchronization primitives: latches and barriers. The proposal is
N3600: C++ Latches and Barriers
N3666: C++ Latches and Barriers, revised
It sounds like a good idea and the samples make it look very programmer-friendly. Unfortunately, I think the sample code invokes undefined behavior. The proposal says of latch::~latch():
Destroys the latch. If the latch is destroyed while other threads are in wait(), or are invoking count_down(), the behaviour is undefined.
Note that it says "in wait()" and not "blocked in wait()", as the description of count_down() uses.
Then the following sample is provided:
An example of the second use case is shown below. We need to load data and then process it using a number of threads. Loading the data is I/O bound, whereas starting threads and creating data structures is CPU bound. By running these in parallel, throughput can be increased.
void DoWork()
{
latch start_latch(1);
vector<thread*> workers;
for (int i = 0; i < NTHREADS; ++i) {
workers.push_back(new thread([&] {
// Initialize data structures. This is CPU bound.
...
start_latch.wait();
// perform work
...
}));
}
// Load input data. This is I/O bound.
...
// Threads can now start processing
start_latch.count_down();
}
Isn't there a race condition between the threads waking and returning from wait(), and destruction of the latch when it leaves scope? Besides that, all the thread objects are leaked. If the scheduler doesn't run all worker threads before count_down returns and the start_latch object leaves scope, then I think undefined behavior will result. Presumably the fix is to iterate the vector and join() and delete all the worker threads after count_down but before returning.
Is there a problem with the sample code?
Do you agree that a proposal should show a complete correct example, even if the task is extremely simple, in order for reviewers to see what the use experience will be like?
Note: It appears possible that one or more of the worker threads haven't yet begun to wait, and will therefore call wait() on a destroyed latch.
Update: There's now a new version of the proposal, but the representative example is unchanged.
Thanks for pointing this out. Yes, I think that the sample code (which, in its defense, was intended to be concise) is broken. It should probably wait for the threads to finish.
Any implementation that allows threads to be blocked in wait() is almost certainly going to involves some kind of condition variable, and destroying the latch while a thread has not yet exited wait() is potentially undefined.
I don't know if there's time to update the paper, but I can make sure that the next version is fixed.
Alasdair

Does async(launch::async) in C++11 make thread pools obsolete for avoiding expensive thread creation?

It is loosely related to this question: Are std::thread pooled in C++11?. Though the question differs, the intention is the same:
Question 1: Does it still make sense to use your own (or 3rd-party library) thread pools to avoid expensive thread creation?
The conclusion in the other question was that you cannot rely on std::thread to be pooled (it might or it might be not). However, std::async(launch::async) seems to have a much higher chance to be pooled.
It don't think that it is forced by the standard, but IMHO I would expect that all good C++11 implementations would use thread pooling if thread creation is slow. Only on platforms where it is inexpensive to create a new thread, I would expect that they always spawn a new thread.
Question 2: This is just what I think, but I have no facts to prove it. I may very well be mistaken. Is it an educated guess?
Finally, here I have provided some sample code that first shows how I think thread creation can be expressed by async(launch::async):
Example 1:
thread t([]{ f(); });
// ...
t.join();
becomes
auto future = async(launch::async, []{ f(); });
// ...
future.wait();
Example 2: Fire and forget thread
thread([]{ f(); }).detach();
becomes
// a bit clumsy...
auto dummy = async(launch::async, []{ f(); });
// ... but I hope soon it can be simplified to
async(launch::async, []{ f(); });
Question 3: Would you prefer the async versions to the thread versions?
The rest is no longer part of the question, but only for clarification:
Why must the return value be assigned to a dummy variable?
Unfortunately, the current C++11 standard forces that you capture the return value of std::async, as otherwise the destructor is executed, which blocks until the action terminates. It is by some considered an error in the standard (e.g., by Herb Sutter).
This example from cppreference.com illustrates it nicely:
{
std::async(std::launch::async, []{ f(); });
std::async(std::launch::async, []{ g(); }); // does not run until f() completes
}
Another clarification:
I know that thread pools may have other legitimate uses but in this question I am only interested in the aspect of avoiding expensive thread creation costs.
I think there are still situations where thread pools are very useful, especially if you need more control over resources.
For example, a server might decide to handle only a fixed number of requests simultaneously to guarantee fast response times and to increase the predictability of memory usage. Thread pools should be fine, here.
Thread-local variables may also be an argument for your own thread pools, but I'm not sure whether it is relevant in practice:
Creating a new thread with std::thread starts without initialized thread-local variables. Maybe this is not what you want.
In threads spawned by async, it is somewhat unclear for me because the thread could have been reused. From my understanding, thread-local variables are not guaranteed to be resetted, but I may be mistaken.
Using your own (fixed-size) thread pools, on the other hand, gives you full control if you really need it.
Question 1:
I changed this from the original because the original was wrong. I was under the impression that Linux thread creation was very cheap and after testing I determined that the overhead of function call in a new thread vs. a normal one is enormous. The overhead for creating a thread to handle a function call is something like 10000 or more times slower than a plain function call. So, if you're issuing a lot of small function calls, a thread pool might be a good idea.
It's quite apparent that the standard C++ library that ships with g++ doesn't have thread pools. But I can definitely see a case for them. Even with the overhead of having to shove the call through some kind of inter-thread queue, it would likely be cheaper than starting up a new thread. And the standard allows this.
IMHO, the Linux kernel people should work on making thread creation cheaper than it currently is. But, the standard C++ library should also consider using pool to implement launch::async | launch::deferred.
And the OP is correct, using ::std::thread to launch a thread of course forces the creation of a new thread instead of using one from a pool. So ::std::async(::std::launch::async, ...) is preferred.
Question 2:
Yes, basically this 'implicitly' launches a thread. But really, it's still quite obvious what's happening. So I don't really think the word implicitly is a particularly good word.
I'm also not convinced that forcing you to wait for a return before destruction is necessarily an error. I don't know that you should be using the async call to create 'daemon' threads that aren't expected to return. And if they are expected to return, it's not OK to be ignoring exceptions.
Question 3:
Personally, I like thread launches to be explicit. I place a lot of value on islands where you can guarantee serial access. Otherwise you end up with mutable state that you always have to be wrapping a mutex around somewhere and remembering to use it.
I liked the work queue model a whole lot better than the 'future' model because there are 'islands of serial' lying around so you can more effectively handle mutable state.
But really, it depends on exactly what you're doing.
Performance Test
So, I tested the performance of various methods of calling things and came up with these numbers on an 8 core (AMD Ryzen 7 2700X) system running Fedora 29 compiled with clang version 7.0.1 and libc++ (not libstdc++):
Do nothing calls per second: 35365257
Empty calls per second: 35210682
New thread calls per second: 62356
Async launch calls per second: 68869
Worker thread calls per second: 970415
And native, on my MacBook Pro 15" (Intel(R) Core(TM) i7-7820HQ CPU # 2.90GHz) with Apple LLVM version 10.0.0 (clang-1000.10.44.4) under OSX 10.13.6, I get this:
Do nothing calls per second: 22078079
Empty calls per second: 21847547
New thread calls per second: 43326
Async launch calls per second: 58684
Worker thread calls per second: 2053775
For the worker thread, I started up a thread, then used a lockless queue to send requests to another thread and then wait for a "It's done" reply to be sent back.
The "Do nothing" is just to test the overhead of the test harness.
It's clear that the overhead of launching a thread is enormous. And even the worker thread with the inter-thread queue slows things down by a factor of 20 or so on Fedora 25 in a VM, and by about 8 on native OS X.
I created an OSDN chamber holding the code I used for the performance test. It can be found here: https://osdn.net/users/omnifarious/pf/launch_thread_performance/

kill boost thread after n seconds

I am looking for the best way to solve the following (c++) problem. I have a function given by some framework, which returns an object. Sometimes it takes just miliseconds, but on some occasions it takes minutes. So i want to stop the execution if it takes longer than let's say 2 seconds.
I was thinking about doing it with boost threads. Important sidenote, if the function returns faster than the 2 seconds the program should not wait.
So i was thinking about 2 threads:
1.thread: execute function a
2.thread: run timer
if(thread 2 exited bevore thread 1) kill thread 1
else do nothing
I am struggeling a bit the practical implementation. Especially,
how do i return an object from a child boost thread to the main thread?
how do i kill a thread in boost?
is my idea even a good one, is there a better way to solve the problem in c++ (with or without boost)?
As for waiting, just use thread::timed_join() inside your main thread, this will return false, if the thread didn't complete within the given time.
Killing the thread is not feasible if your third-party library is not aware of boost:threads. Also, you almost certainly don't want to 'kill' the thread without giving the function the possibility to clean up.
I'd suggest that you wait for, say, 2 seconds and then continue with some kind of error message, letting the framework function finish its work and just ignoring the result if it came too late.
As for returning a value, I'd suggest something like
struct myfunction {
MyObj returnValue;
void operator() () {
// ...
returnValue = theComputedReturnValue;
}
};
// ...
myfunction f;
boost::thread t = boost::thread(boost::ref(f));
t.join(); // or t.timed_join()...
use(f.returnValue);
// ...
I have done something similar by the past and that works (even though not ideal).
To get the return value just "share" a variable (that could be just a pointer (initially nil) to the returned value, or a full object with a state etc ...) and make your thread read/udate it. Don't forget to mutex it needed. That should be quite straight forward.
Expanding what James has said above, "kill a thread" is such a harsh term! :) But interruption is not so easy either, typically with boost threads, there needs to be an interruption point, where the running thread can be interrupted. There is a set of these interruptible functions (unfortunately they are boost specific), such as wait/sleep etc. One option you have is in the first thread, liberally scatter interruption_points(). Such that when you call interrupt() once thread 2 dies, at the next interruption_point() thread 1 will throw an exception.
Threads are in the same process space, thus you can have shared state between multiple threads as long as there is synchronized access to that shared state.
EDIT: just noticed that the OP has already looked into this... will leave the answer up anyway I guess...

Wait for a detached thread to finish in C++

How can I wait for a detached thread to finish in C++?
I don't care about an exit status, I just want to know whether or not the thread has finished.
I'm trying to provide a synchronous wrapper around an asynchronous thirdarty tool. The problem is a weird race condition crash involving a callback. The progression is:
I call the thirdparty, and register a callback
when the thirdparty finishes, it notifies me using the callback -- in a detached thread I have no real control over.
I want the thread from (1) to wait until (2) is called.
I want to wrap this in a mechanism that provides a blocking call. So far, I have:
class Wait {
public:
void callback() {
pthread_mutex_lock(&m_mutex);
m_done = true;
pthread_cond_broadcast(&m_cond);
pthread_mutex_unlock(&m_mutex);
}
void wait() {
pthread_mutex_lock(&m_mutex);
while (!m_done) {
pthread_cond_wait(&m_cond, &m_mutex);
}
pthread_mutex_unlock(&m_mutex);
}
private:
pthread_mutex_t m_mutex;
pthread_cond_t m_cond;
bool m_done;
};
// elsewhere...
Wait waiter;
thirdparty_utility(&waiter);
waiter.wait();
As far as I can tell, this should work, and it usually does, but sometimes it crashes. As far as I can determine from the corefile, my guess as to the problem is this:
When the callback broadcasts the end of m_done, the wait thread wakes up
The wait thread is now done here, and Wait is destroyed. All of Wait's members are destroyed, including the mutex and cond.
The callback thread tries to continue from the broadcast point, but is now using memory that's been released, which results in memory corruption.
When the callback thread tries to return (above the level of my poor callback method), the program crashes (usually with a SIGSEGV, but I've seen SIGILL a couple of times).
I've tried a lot of different mechanisms to try to fix this, but none of them solve the problem. I still see occasional crashes.
EDIT: More details:
This is part of a massively multithreaded application, so creating a static Wait isn't practical.
I ran a test, creating Wait on the heap, and deliberately leaking the memory (i.e. the Wait objects are never deallocated), and that resulted in no crashes. So I'm sure it's a problem of Wait being deallocated too soon.
I've also tried a test with a sleep(5) after the unlock in wait, and that also produced no crashes. I hate to rely on a kludge like that though.
EDIT: ThirdParty details:
I didn't think this was relevant at first, but the more I think about it, the more I think it's the real problem:
The thirdparty stuff I mentioned, and why I have no control over the thread: this is using CORBA.
So, it's possible that CORBA is holding onto a reference to my object longer than intended.
Yes, I believe that what you're describing is happening (race condition on deallocate). One quick way to fix this is to create a static instance of Wait, one that won't get destroyed. This will work as long as you don't need to have more than one waiter at the same time.
You will also permanently use that memory, it will not deallocate. But it doesn't look like that's too bad.
The main issue is that it's hard to coordinate lifetimes of your thread communication constructs between threads: you will always need at least one leftover communication construct to communicate when it is safe to destroy (at least in languages without garbage collection, like C++).
EDIT:
See comments for some ideas about refcounting with a global mutex.
To the best of my knowledge there's no portable way to directly ask a thread if its done running (i.e. no pthread_ function). What you are doing is the right way to do it, at least as far as having a condition that you signal. If you are seeing crashes that you are sure are due to the Wait object is being deallocated when the thread that creates it quits (and not some other subtle locking issue -- all too common), the issue is that you need to make sure the Wait isn't being deallocated, by managing from a thread other than the one that does the notification. Put it in global memory or dynamically allocate it and share it with that thread. Most simply don't have the thread being waited on own the memory for the Wait, have the thread doing the waiting own it.
Are you initializing and destroying the mutex and condition var properly?
Wait::Wait()
{
pthread_mutex_init(&m_mutex, NULL);
pthread_cond_init(&m_cond, NULL);
m_done = false;
}
Wait::~Wait()
{
assert(m_done);
pthread_mutex_destroy(&m_mutex);
pthread_cond_destroy(&m_cond);
}
Make sure that you aren't prematurely destroying the Wait object -- if it gets destroyed in one thread while the other thread still needs it, you'll get a race condition that will likely result in a segfault. I'd recommend making it a global static variable that gets constructed on program initialization (before main()) and gets destroyed on program exit.
If your assumption is correct then third party module appears to be buggy and you need to come up with some kind of hack to make your application work.
Static Wait is not feasible. How about Wait pool (it even may grow on demand)? Is you application using thread pool to run?
Although there will still be a chance that same Wait will be reused while third party module is still using it. But you can minimize such chance by properly queing vacant Waits in your pool.
Disclaimer: I am in no way an expert in thread safety, so consider this post as a suggestion from a layman.

C++ Thread question - setting a value to indicate the thread has finished

Is the following safe?
I am new to threading and I want to delegate a time consuming process to a separate thread in my C++ program.
Using the boost libraries I have written code something like this:
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
Where finished_flag is a boolean member of my class. When the thread is finished it sets the value and the main loop of my program checks for a change in that value.
I assume that this is okay because I only ever start one thread, and that thread is the only thing that changes the value (except for when it is initialised before I start the thread)
So is this okay, or am I missing something, and need to use locks and mutexes, etc
You never mentioned the type of finished_flag...
If it's a straight bool, then it might work, but it's certainly bad practice, for several reasons. First, some compilers will cache the reads of the finished_flag variable, since the compiler doesn't always pick up the fact that it's being written to by another thread. You can get around this by declaring the bool volatile, but that's taking us in the wrong direction. Even if reads and writes are happening as you'd expect, there's nothing to stop the OS scheduler from interleaving the two threads half way through a read / write. That might not be such a problem here where you have one read and one write op in separate threads, but it's a good idea to start as you mean to carry on.
If, on the other hand it's a thread-safe type, like a CEvent in MFC (or equivilent in boost) then you should be fine. This is the best approach: use thread-safe synchronization objects for inter-thread communication, even for simple flags.
Instead of using a member variable to signal that the thread is done, why not use a condition? You are already are using the boost libraries, and condition is part of the thread library.
Check it out. It allows the worker thread to 'signal' that is has finished, and the main thread can check during execution if the condition has been signaled and then do whatever it needs to do with the completed work. There are examples in the link.
As a general case I would neve make the assumption that a resource will only be modified by the thread. You might know what it is for, however someone else might not - causing no ends of grief as the main thread thinks that the work is done and tries to access data that is not correct! It might even delete it while the worker thread is still using it, and causing the app to crash. Using a condition will help this.
Looking at the thread documentation, you could also call thread.timed_join in the main thread. timed_join will wait for a specified amount for the thread to 'join' (join means that the thread has finsihed)
I don't mean to be presumptive, but it seems like the purpose of your finished_flag variable is to pause the main thread (at some point) until the thread thrd has completed.
The easiest way to do this is to use boost::thread::join
// launch the thread...
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
// ... do other things maybe ...
// wait for the thread to complete
thrd.join();
If you really want to get into the details of communication between threads via shared memory, even declaring a variable volatile won't be enough, even if the compiler does use appropriate access semantics to ensure that it won't get a stale version of data after checking the flag. The CPU can issue reads and writes out of order as long (x86 usually doesn't, but PPC definitely does) and there is nothing in C++9x that allows the compiler to generate code to order memory accesses appropriately.
Herb Sutter's Effective Concurrency series has an extremely in depth look at how the C++ world intersects the multicore/multiprocessor world.
Having the thread set a flag (or signal an event) before it exits is a race condition. The thread has not necessarily returned to the OS yet, and may still be executing.
For example, consider a program that loads a dynamic library (pseudocode):
lib = loadLibrary("someLibrary");
fun = getFunction("someFunction");
fun();
unloadLibrary(lib);
And let's suppose that this library uses your thread:
void someFunction() {
volatile bool finished_flag = false;
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
while(!finished_flag) { // ignore the polling loop, it's besides the point
sleep();
}
delete thrd;
}
void myclass::mymethod() {
// do stuff
finished_flag = true;
}
When myclass::mymethod() sets finished_flag to true, myclass::mymethod() hasn't returned yet. At the very least, it still has to execute a "return" instruction of some sort (if not much more: destructors, exception handler management, etc.). If the thread executing myclass::mymethod() gets pre-empted before that point, someFunction() will return to the calling program, and the calling program will unload the library. When the thread executing myclass::mymethod() gets scheduled to run again, the address containing the "return" instruction is no longer valid, and the program crashes.
The solution would be for someFunction() to call thrd->join() before returning. This would ensure that the thread has returned to the OS and is no longer executing.