Does tbb::parallel_for always utilize the calling thread - c++

I have a piece of code where I am using tbb::parallel_for to multithread a loop, which is called by main thread. In that loop I need main thread to update the UI to reflect the progress. From what I have observed, tbb::parallel_for always uses the caller thread + N worker threads. However, I wonder, whether the usage of the calling threads is guaranteed or rather just happens to be the case?
Here is the sample code:
static thread_local bool _mainThread = false; // false in all threads
_mainThread = true; // now true in main thread, but false in others
tbb::parallel_for(start, end, *this);
void Bender::processor::operator()(size_t i) const
{
...
if(_mainThread) // only main thread will issue events
ProgressUpdatedEvent(progress);
}
Thanks!

Strictly speaking, I don't think there is any guarantee in TBB about what any given thread is supposed to run (basic principles of TBB are optional parallelism and random work-stealing). Even task affinity in TBB is "soft" since it is not guaranteed that a specific worker can take affinitized task.
Practically speaking, the way how parallel_for is implemented implies that it will run at least one task before switching to something else and exiting parallel_for. Thus, for at least simple case, it is expected to work well enough.

Related

Trying to minimize checks of atomics on every iteration

From a multithreading perspective, is the following correct or incorrect?
I have an app which has 2 threads: the main thread, and a worker thread.
The main thread has a MainUpdate() function that gets called in a continuous loop. As part of its job, that MainUpdate() function might call a ToggleActive() method on the worker objects running on the worker thread. That ToggleActive() method is used to turn the worker objects on/off.
The flow is something like this.
// MainThread
while(true) {
MainUpdate(...);
}
void MainUpdate(...) {
for(auto& obj: objectsInWorkerThread) {
if (foo())
obj.ToggleActive(getBool());
}
}
// Worker thread example worker ------------------------------
struct SomeWorkerObject {
void Execute(...) {
if(mIsActive == false) // %%%%%%% THIS!
return;
Update(...);
}
void ToggleActive(bool active) {
mIsActiveAtom = active; // %%%%%%% THIS!
mIsActive = mIsActiveAtom; // %%%%%%% THIS!
}
private:
void Update(...) {...}
std::atomic_bool mIsActiveAtom = true;
volatile bool mIsActive = true;
};
I'm trying to avoid checking the atomic field on every invocation of Execute(), which gets called on every iteration of the worker thread. There are many worker objects running at any one time, and thus there would be many atomic fields checks.
As you can see, I'm using the non-atomic field to check for activeness. The value of the non-atomic field gets its value from the atomic field in ToggleActive().
From my tests, this seems to be working, but I have a feeling that it is incorrect.
volatile variable only guarantees that it is not optimized out and reorder by compiler and has nothing to do with multi-thread execution. Therefore, your program does have race condition since ToggleActive and Execute can modify/read mIsActive at the same time.
About performance, you can check if your platform support for lock-free atomic bool. If that is the case, checking atomic value can be very fast. I remember seeing a benchmark somewhere that show std::atomic<bool> has the same speed as volatile bool.
#hgminh is right, your code is not safe.
Synchronization is two way road — if you have a thread perform thread-safe write, another thread must perform thread-safe read. If you have a thread use a lock, another thread must use the same lock.
Think about inter-thread communication as message passing (incidentally, it works exactly that way in modern CPUs). If both sides don't share a messaging channel (mIsActiveAtom), the message might not be delivered properly.

Is there a race condition in the `latch` sample in N3600?

Proposed for inclusion in C++14 (aka C++1y) are some new thread synchronization primitives: latches and barriers. The proposal is
N3600: C++ Latches and Barriers
N3666: C++ Latches and Barriers, revised
It sounds like a good idea and the samples make it look very programmer-friendly. Unfortunately, I think the sample code invokes undefined behavior. The proposal says of latch::~latch():
Destroys the latch. If the latch is destroyed while other threads are in wait(), or are invoking count_down(), the behaviour is undefined.
Note that it says "in wait()" and not "blocked in wait()", as the description of count_down() uses.
Then the following sample is provided:
An example of the second use case is shown below. We need to load data and then process it using a number of threads. Loading the data is I/O bound, whereas starting threads and creating data structures is CPU bound. By running these in parallel, throughput can be increased.
void DoWork()
{
latch start_latch(1);
vector<thread*> workers;
for (int i = 0; i < NTHREADS; ++i) {
workers.push_back(new thread([&] {
// Initialize data structures. This is CPU bound.
...
start_latch.wait();
// perform work
...
}));
}
// Load input data. This is I/O bound.
...
// Threads can now start processing
start_latch.count_down();
}
Isn't there a race condition between the threads waking and returning from wait(), and destruction of the latch when it leaves scope? Besides that, all the thread objects are leaked. If the scheduler doesn't run all worker threads before count_down returns and the start_latch object leaves scope, then I think undefined behavior will result. Presumably the fix is to iterate the vector and join() and delete all the worker threads after count_down but before returning.
Is there a problem with the sample code?
Do you agree that a proposal should show a complete correct example, even if the task is extremely simple, in order for reviewers to see what the use experience will be like?
Note: It appears possible that one or more of the worker threads haven't yet begun to wait, and will therefore call wait() on a destroyed latch.
Update: There's now a new version of the proposal, but the representative example is unchanged.
Thanks for pointing this out. Yes, I think that the sample code (which, in its defense, was intended to be concise) is broken. It should probably wait for the threads to finish.
Any implementation that allows threads to be blocked in wait() is almost certainly going to involves some kind of condition variable, and destroying the latch while a thread has not yet exited wait() is potentially undefined.
I don't know if there's time to update the paper, but I can make sure that the next version is fixed.
Alasdair

How to control thread lifetime using C++11 atomics

Following on from this question, I'd like to know what's the recommended approach we should take to replace the very common pattern we have in legacy code.
We have plenty of places where a primary thread is spawing one or more background worker threads and periodically pumping out some work for them to do, using a suitably synchronized queue. So the general pattern for a worker thread will look like this:
There will be an event HANDLE and a bool defined somewhere (usually as member variables) -
HANDLE hDoSomething = CreateEvent(NULL, FALSE, FALSE, NULL);
volatile bool bEndThread = false;
Then the worker thread function waits for the event to be signalled before doing work, but checks for a termination request inside the main loop -
unsigned int ThreadFunc(void *pParam)
{
// typical legacy implementation of a worker thread
while (true)
{
// wait for event
WaitForSingleObject(hDoSomething, INFINITE);
// check for termination request
if (bEndThread) break;
// ... do background work ...
}
// normal termination
return 0;
}
The primary thread can then give some work to the background thread like this -
// ... put some work on a synchronized queue ...
// pulse worker thread to do the work
SetEvent(hDoSomething);
And it can finally terminate the worker thread like so -
// to terminate the worker thread
bEndThread = true;
SetEvent(hDoSomething);
// wait for worker thread to die
WaitForSingleObject(hWorkerThreadHandle, dwSomeSuitableTimeOut);
In some cases, we've used two events (one for work, one for termination) and WaitForMultipleObjects instead, but the general pattern is the same.
So, looking at replacing the volatile bool with a C++11 standard equivalent, is it as simple as replacing this
volatile bool bEndThread = false;
with this?
std::atomic<bool> bEndThread = false;
I'm sure it will work, but it doesn't seem enough. Also, it doesn't affect the case where we use two events and no bool.
Note, I'm not intending to replace all this legacy stuff with the PPL and/or Concurrency Runtime equivalents because although we use these for new development, the legacy codebase is end-of-life and just needs to be compatible with the latest development tools (the original question I linked above shows where my concern arose).
Can someone give me a rough example of C++11 standard code we could use for this simple thread management pattern to rewrite our legacy code without too much refactoring?
If it ain't broken don't fix it (especially if this is a legacy code base)
VS style volatile will be around for a few more years. Given that
MFC isn't dead this won't be dead any time soon. A cursory Google
search says you can control it with /volatile:ms.
Atomics might do the job of volatile, especially if this is a counter
there might be little performance overhead.
Many Windows native functions have different performance characteristics when compared to their C++11 implementation. For example, Windows TimerQueues and Multimedia have precision that is not possible to achieve with C++11.
For example ::sleep_for(5)
will sleep for 15 (and not 5 or 6). This can be solved with a mysterious
call to timeSetPeriod. Another example is that unlocking on a condition variable can be slow to respond. Interfaces to fix these aren't exposed to C++11 on Windows.

How can I avoid threading + optimizer == infinite loop? [duplicate]

This question already has answers here:
Multithreading program stuck in optimized mode but runs normally in -O0
(3 answers)
Closed 1 year ago.
In a code review today, I stumbled across the following bit of code (slightly modified for posting):
while (!initialized)
{
// The thread can start before the constructor has finished initializing the object.
// Can lead to strange behavior.
continue;
}
This is the first few lines of code that runs in a new thread. In another thread, once initialization is complete, it sets initialized to true.
I know that the optimizer could turn this into an infinite loop, but what's the best way to avoid that?
volatile - considered harmful
calling an isInitialized() function instead of using the variable directly - would this guarantee a memory barrier? What if the function was declared inline?
Are there other options?
Edit:
Should have mentioned this sooner, but this is portable code that needs to run on Windows, Linux, Solaris, etc. We use mostly use Boost.Thread for our portable threading library.
Calling a function won't help at all; even if a function is not declared inline, its body can still be inlined (barring something extreme, like putting your isInitialized() function in another library and dynamically linking against it).
Two options that come to mind:
Declare initialized as an atomic flag (in C++0x, you can use std::atomic_flag; otherwise, you'll want to consult the documentation for your threading library for how to do this)
Use a semaphore; acquire it in the other thread and wait for it in this thread.
#Karl's comment is the answer. Don't start processing in thread A until thread B has finished initialization. They key to doing this is sending a signal from thread B to thread A that it is up & running.
You mentioned no OS, so I will give you some Windows-ish psudocode. Transcode to the OS/library of your choice.
First create a Windows Event object. This will be used as the signal:
Thread A:
HANDLE running = CreateEvent(0, TRUE, FALSE, 0);
Then have Thread A start Thread B, passing the event along to it:
Thread A:
DWORD thread_b_id = 0;
HANDLE thread_b = CreateThread(0, 0, ThreadBMain, (void*)handle, 0, &thread_b_id);
Now in Thread A, wait until the event is signaled:
Thread A:
DWORD rc = WaitForSingleObject(running, INFINITE);
if( rc == WAIT_OBJECT_0 )
{
// thread B is up & running now...
// MAGIC HAPPENS
}
Thread B's startup routine does its initialization, and then signals the event:
Thread B:
DWORD WINAPI ThreadBMain(void* param)
{
HANDLE running = (HANDLE)param;
do_expensive_initialization();
SetEvent(running); // this will tell Thread A that we're good to go
}
Synchronization primitives are the solution to this problem, not spinning in a loop... But if you must spin in a loop and can't use a semaphore, event, etc, you can safely use volatile. It's considered harmful because it hurts the optimizer. In this case that's exactly what you want to do, no?
There is a boost equivalent of atomic_flag which is called once_flag in boost::once. It may well be what you want here.
Effectively if you want something to be constructed the first time it is called, eg lazy loading, and happens in multiple threads, you get boost::once to call your function the first time it is reached. The post-condition is that it has been initialized so there is no need for any kind of looping or locking.
What you do need to ensure is that your initialization logic does not throw exceptions.
This is a well known problem when working with threads. Creation/Initialization of objects takes relatively little time. When the thread actually starts running though... That can take quite a long time in terms of executed code.
Everyone keeps mentioning semaphores...
You may want to look at POSIX 1003.1b semaphores. Under Linux, try man sem_init. E.g.:
http://manpages.ubuntu.com/manpages/dapper/man3/sem_init.3.html
http://www.skrenta.com/rt/man/sem_init.3.html
http://docs.oracle.com/cd/E23824_01/html/821-1465/sem-init-3c.html
These semaphores have the advantage that, once Created/Initialized, one thread can block indefinitely until signaled by another thread. More critically, that signal can occur BEFORE the waiting thread starts waiting. (A significant difference between Semaphores and Condition Variables.) Also, they can handle the situation where you receive multiple signals before waking up.

C++ Thread question - setting a value to indicate the thread has finished

Is the following safe?
I am new to threading and I want to delegate a time consuming process to a separate thread in my C++ program.
Using the boost libraries I have written code something like this:
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
Where finished_flag is a boolean member of my class. When the thread is finished it sets the value and the main loop of my program checks for a change in that value.
I assume that this is okay because I only ever start one thread, and that thread is the only thing that changes the value (except for when it is initialised before I start the thread)
So is this okay, or am I missing something, and need to use locks and mutexes, etc
You never mentioned the type of finished_flag...
If it's a straight bool, then it might work, but it's certainly bad practice, for several reasons. First, some compilers will cache the reads of the finished_flag variable, since the compiler doesn't always pick up the fact that it's being written to by another thread. You can get around this by declaring the bool volatile, but that's taking us in the wrong direction. Even if reads and writes are happening as you'd expect, there's nothing to stop the OS scheduler from interleaving the two threads half way through a read / write. That might not be such a problem here where you have one read and one write op in separate threads, but it's a good idea to start as you mean to carry on.
If, on the other hand it's a thread-safe type, like a CEvent in MFC (or equivilent in boost) then you should be fine. This is the best approach: use thread-safe synchronization objects for inter-thread communication, even for simple flags.
Instead of using a member variable to signal that the thread is done, why not use a condition? You are already are using the boost libraries, and condition is part of the thread library.
Check it out. It allows the worker thread to 'signal' that is has finished, and the main thread can check during execution if the condition has been signaled and then do whatever it needs to do with the completed work. There are examples in the link.
As a general case I would neve make the assumption that a resource will only be modified by the thread. You might know what it is for, however someone else might not - causing no ends of grief as the main thread thinks that the work is done and tries to access data that is not correct! It might even delete it while the worker thread is still using it, and causing the app to crash. Using a condition will help this.
Looking at the thread documentation, you could also call thread.timed_join in the main thread. timed_join will wait for a specified amount for the thread to 'join' (join means that the thread has finsihed)
I don't mean to be presumptive, but it seems like the purpose of your finished_flag variable is to pause the main thread (at some point) until the thread thrd has completed.
The easiest way to do this is to use boost::thread::join
// launch the thread...
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
// ... do other things maybe ...
// wait for the thread to complete
thrd.join();
If you really want to get into the details of communication between threads via shared memory, even declaring a variable volatile won't be enough, even if the compiler does use appropriate access semantics to ensure that it won't get a stale version of data after checking the flag. The CPU can issue reads and writes out of order as long (x86 usually doesn't, but PPC definitely does) and there is nothing in C++9x that allows the compiler to generate code to order memory accesses appropriately.
Herb Sutter's Effective Concurrency series has an extremely in depth look at how the C++ world intersects the multicore/multiprocessor world.
Having the thread set a flag (or signal an event) before it exits is a race condition. The thread has not necessarily returned to the OS yet, and may still be executing.
For example, consider a program that loads a dynamic library (pseudocode):
lib = loadLibrary("someLibrary");
fun = getFunction("someFunction");
fun();
unloadLibrary(lib);
And let's suppose that this library uses your thread:
void someFunction() {
volatile bool finished_flag = false;
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
while(!finished_flag) { // ignore the polling loop, it's besides the point
sleep();
}
delete thrd;
}
void myclass::mymethod() {
// do stuff
finished_flag = true;
}
When myclass::mymethod() sets finished_flag to true, myclass::mymethod() hasn't returned yet. At the very least, it still has to execute a "return" instruction of some sort (if not much more: destructors, exception handler management, etc.). If the thread executing myclass::mymethod() gets pre-empted before that point, someFunction() will return to the calling program, and the calling program will unload the library. When the thread executing myclass::mymethod() gets scheduled to run again, the address containing the "return" instruction is no longer valid, and the program crashes.
The solution would be for someFunction() to call thrd->join() before returning. This would ensure that the thread has returned to the OS and is no longer executing.