Can objects in an OpenMP thread be modified from outside? - c++

I use OpenMP to parallelize calls like so:
#pragma omp parallel for
for(std::size_t iter = 0; iter < visitors.size(); ++iter)
{
VisitorSPtr visitor_sp = visitors.at(iter);
dataSetPtr->accept(*(visitor_sp.get());
}
// End of
// #pragma omp parallel for
Each visitor is used in a different thread, thanks to the #pragma omp parallel for directive. Fine.
The dataSetPtr->accept() function that is called within the loop checks if the visitor has been cancelled by the user like this:
if(visitor.shouldStop())
break;
If that call returns true, the visit is not performed. That cancellation is trapped when a user clicks a button and a signal is emitted that is relayed to the visitor which sets a member boolean variable to state that cancellation has been requested. But the signal does not get to the visitor and the if(visitor.shouldStop()) is of no use, that is, never evaluates to true even if the cancellation signal was properly emitted.
The connection is performed like this (this is the MassDataIntegrator object instance from which the connection is made, that receives the cancelling signal and that should relay it to the Visitor instance):
connect(this,
&MassDataIntegrator::cancelOperationSignal,
visitor_sp.get(),
&Visitor::cancelOperation,
Qt::QueuedConnection);
My question: how can I modify objects that are in a #pragma omp parallel for loop from code that runs in another thread? I thought that would be trivial by using pointers. Evidently, I am missing some concept here. Could anybody help me sort this mis-understanding ? Thank you for your attention.
SOLVED
The connect call above did not work for some reason (that I will investigate). So I tried using a lambda which, on the face, accesses directly the Visitor instance like this (I commented out the replaced code to show the difference):
connect(this,
&MassDataIntegrator::cancelOperationSignal,
[visitor_sp](){visitor_sp->cancelOperation();});
//visitor_sp.get(),
//&TicChromTreeNodeCombinerVisitor::cancelOperation,
//Qt::QueuedConnection);
We can consider this issue solved. How do I do that ?

If you access a data location from multiple threads in OpenMP, and at least one of the accesses is a write access, you must protect all read and write accesses to this location with atomic directives (or other means to avoid race-conditions and ensure memory consistency).
Simply speaking, shouldStop should be implemented along the lines of:
bool r;
#pragma omp atomic read
r = this->cancelFlag_;
return r;
and cancelOperation like:
#pragma omp atomic write
this->cancelFlag_ = true;
This both ensures that there is no race condition in the unlikely case that a writing a bool needs two operations, and implies appropriate memory flushes to ensure that the result of the write is visible in other threads.

Related

When will omp thread pool get destructed

I use openomp in my service for parallelizing my loop. But every time when a request came in, my service will create a brand new thread for it, and this thread will use omp to create a thread pool. Can I ask when this thread pool will be detructed?
void foo() {
#pragma omp parallel for schedule(dynamic, 1)
// Do something
}
int main() {
std::vector<std::thread> threads;
for (int i = 0; i < x; i++) {
threads.push_back(
std::thread(foo);
);
}
for (auto& thread : threads) {
thread.join();
}
}
In this pseudo code, I noticed that:
In the for loop, the thread num is 8 * x + 1(8 cores host, some 8 omp threads for each std::thread, and 1 main thread).
After the for loop, the thread num get back to 1, which means all omp thread pools get destructed.
This can be reproduced in this simple code, but for some more complex situation but similar use cases, I noticed the thread pools didn't get destructed after their parent thread finished. So it is hard for me to understand why.
So can I ask when the thread pool of omp will get destructed?
The creation and deletion of the native threads of an OpenMP parallel region is left to the OpenMP implementation (eg. IOMP for ICC/Clang, GOMP for GCC) and is not defined by the OpenMP specification. The specification do not restrict implementations to create native threads at the beginning of a parallel region nor to delete them at the end. In fact, most implementation keep the threads alive as long as possible because creating threads is slow (especially on many-core architectures). The specification explicitly mention the difference between native threads and basic OpenMP threads/tasks (everything is a task in OpenMP 5). Note that OMPT can be used to track when native threads are created and deleted. I expect mainstream implementation to create threads during the runtime initialization (typically when the first parallel section is encountered) and to delete threads when the program ends.
The specification states:
[A native thread is] a thread defined by an underlying thread implementation
If the parallel region creates a native thread, a native-thread-begin event occurs as the first event in the context of the new thread prior to the implicit-task-begin event.
If a native thread is destroyed at the end of a parallel region, a native-thread-end event occurs in the thread as the last event prior to destruction of the thread.
Note that implementations typically destroy and recreate new threads when the number of threads of a parallel region is different from the previous one. This also happens in pathological cases like nesting.
The documentation of GOMP is available here but it is not very detailed. The one of IOMP is available here and is not much better... You can find interesting information directly in the code of the runtimes. For example, in the GOMP code. Note that there are useful comments like:
We only allow the reuse of idle threads for non-nested PARALLEL regions

Join threads created recursively

I have a function that basically fetches a data from a database, then parse this data and fetches others data to which it is dependant, and so on...
The function is thus recursive, and I want to use multithreading to do so.
To simplify the problem, I just writed a dummy program, just for expressing the "spirit" of the function:
void DummyFunction(std::vector<std::thread>& threads, int& i)
{
++i;
if (i < 10)
threads.push_back(std::thread([&]() { DummyFunction(threads, i); }));
}
int main()
{
std::vector<std::thread> threads;
int i = 0;
DummyFunction(threads, i);
// Coming here, "DummyFunction" is still running and potentially creating new threads
// Issue is thus we may enter the for loop when we still don't have the actual number of threads created
for (std::thread& thread : threads)
{
thread.join();
}
}
The issue comes from the need to wait for all the threads to finish running before going any further (hence the for loop to join the threads). But of course, since the "DummyFunction" is still running, new threads can be created and so this way it can't work...
Question is, how can I design such thing properly (if there is a way...)? Can we actually use multi threading recursively?
If you have C++20 available consider using the new thread that automatically joins on destruction. It goes by the name jthread and will save you all the trouble from having to manually join threads.
Try a thought experiment: add an else clause to your if statement:
if (i < 10)
{
threads.push_back(std::thread([&]() { DummyFunction(threads, i); }));
}
else
{
// do something here
}
Once you make that change, a few minutes' worth of thinking will reach the following conclusion: the "do something here" part gets executed exactly once, in one of the execution threads, after all of the execution threads get created.
Now, the solution should be very obvious:
Add a mutex, a condition variable, and a boolean flag. You can either make them global; pass them as additional parameters into DummyFunction, or, better yet: turn your threads vector into its own class containing the vector, the mutex, the condition variable, and the boolean flag, and pass that in recursively instead of just the vector.
main() locks the mutex, clears the condition variable, and after DummyFunction() returns it waits on the condition variable until the boolean flag is set.
The "do something here" part locks the same mutex, sets the boolean flag, signals the condition variable, and unlocks the mutex.
Once you reach this point, you will also suddenly realize one more thing: as is, you have different execution threads all attempting to push_back something into the same vector. Vectors are not thread-safe, so this is undefined behavior. Therefore, you will also need to implement a separate mutex (or reuse the existing one, this looks eminently possible to me) to also lock the access to the vector.

How do I make a thread wait without polling?

I have question about multi threading in c++. I have a scenario as follows
void ThreadedRead(int32_t thread_num, BinReader reader) {
while (!reader.endOfData) {
thread_buckets[thread_num].clear();
thread_buckets[thread_num] = reader.readnextbatch()
thread_flags[thread_num] = THREAD_WAITING;
while (thread_flags[thread_num] != THREAD_RUNNING) {
// wait until awakened
if (thread_flags[thread_num] != THREAD_RUNNING) {
//go back to sleep
}
}
}
thread_flags[thread_num] = THREAD_FINISHED;
}
No section of the above code writes or access memory shared between threads. Each thread is assigned a thread_num and a unique reader object that it may use to read data.
I want the main thread to be able to notify a thread that is in the THREAD_WAITING state that his state has been changed back to THREAD_RUNNING and he needs to do some work. I don't want to him to keep polling his state.
I understand conditional vars and mutexes can help me. But I'm not sure how to use them because I don't want to acquire or need a lock. How can the mainthread blanket notify all waiting threads that they are now free to read more data?
EDIT:
Just in case anyone needs more details
1) reader reads some files
2) thread_buckets is a vector of vectors of uint16
3) threadflags is a int vector
they have all been resized appropriately
I realize that you wrote that you wanted to avoid condition variables and locks. On the other hand you mentioned that this was because you were not sure about how to use them. Please consider the following example to get the job done without polling:
The trick with the condition variables is that a single condition_variable object together with a single mutex object will do the management for you including the handling of the unique_lock objects in the worker threads. Since you tagged your question as C++ I assume you are talking about C++11 (or higher) multithreading (I guess that C-pthreads may work similarly). Your code could be as follows:
// compile for C++11 or higher
#include <thread>
#include <condition_variable>
#include <mutex>
// objects visible to both master and workers:
std::condition_variable cvr;
std::mutex mtx;
void ThreadedRead(int32_t thread_num, BinReader reader) {
while (!reader.endOfData) {
thread_buckets[thread_num].clear();
thread_buckets[thread_num] = reader.readnextbatch()
std::unique_lock<std::mutex> myLock(mtx);
// This lock will be managed by the condition variable!
thread_flags[thread_num] = THREAD_WAITING;
while (thread_flags[thread_num] == THREAD_WAITING) {
cvr.wait(myLock);
// ...must be in a loop as shown because of potential spurious wake-ups
}
}
thread_flags[thread_num] = THREAD_FINISHED;
}
To (re-)activate the workers from a master thread:
{ // block...
// step 1: usually make sure that there is no worker still preparing itself at the moment
std::unique_lock<std::mutex> someLock(mtx);
// (in your case this would not cover workers currently busy with reader.readnextbatch(),
// these would be not re-started this time...)
// step 2: set all worker threads that should work now to THREAD_RUNNING
for (...looping over the worker's flags...) {
if (...corresponding worker should run now...) {
flag = THREAD_RUNNING;
}
}
// step 3: signalize the workers to run now
cvr.notify_all();
} // ...block, releasing someLock
Notice:
If you just want to trigger all sleeping workers you should control them with a single flag instead of a container of flags.
If you want to trigger single sleeping workers but it doesn't matter which one consider the .notify_one() member function instead of .notify_all(). Note as well that also in this case a single mutex/condition_variable pair is sufficient.
The flags should better be placed in an atomic object such as a global std::atomic<int> or maybe for finer control in a std::vector<std::atomic<int>>.
A good introduction to std::condition_variable which also inspired the suggested solution is given in: cplusplus website
It looks like there are a few issues. For one thing, you do not need the conditional inside of your loop:
while (thread_flags[thread_num] != THREAD_RUNNING);
will work by itself. As soon as that condition is false, the loop will exit.
If all you want to do is avoid checking thread_flags as quickly as possible, just put a yield in the loop:
while (thread_flags[thread_num] != THREAD_RUNNING) yield(100);
This will cause the thread to yield the CPU so that it can do other things while the thread waits for its state to change. This will make make the overhead for polling close to negligible. You can experiment with the sleep duration to find a good value. 100ms is probably on the long side.
Depending on what causes the thread state to change, you could have the thread poll that condition/value directly (with a sleep in still) and not bother with states at all.
There are a lot of options here. If you look up reader threads you can probably find just what you want; having a separate reader thread is very common.

OpenMP: how to explicitly divide code into different threads

Let's say I have a Writer class that generates some data, and a Reader class that consumes it. I want them to run all the time under different threads. How can I do that with OpenMP?
This is want I would like to have:
class Reader
{
public:
void run();
};
class Writer
{
public:
void run();
};
int main()
{
Reader reader;
Writer writer;
reader.run(); // starts asynchronously
writer.run(); // starts asynchronously
wait_until_finished();
}
I guess the first answers will point to separate each operation into a section, but sections does not guarantee that code blocks will be given to different threads.
Can tasks do it? As far as I understood after reading about task is that each code block is executed just once, but the assigned thread can change.
Any other solution?
I would like to know this to know if a code I have inherited that uses pthreads, which explicitly creates several threads, could be written with OpenMP. The issue is that some threads were not smartly written and contain active waiting loops. In that situation, if two objects with active waiting are assigned to the same OpenMP thread (and hence are executed sequentially), they can reach a deadlock. At least, I think that could happen with sections, but I am not sure about tasks.
Serialisation could also happen with tasks. One horrible solution would be to reimplement sections on your own with guarantee that each section would run in a separate thread:
#pragma omp parallel num_threads(3)
{
switch (omp_get_thread_num())
{
case 0: wait_until_finished(); break;
case 1: reader.run(); break;
case 2: writer.run(); break;
}
}
This code assumes that you would like wait_until_finished() to execute in parallel with reader.run() and writer.run(). This is necessary since in OpenMP only the scope of the parallel construct is where the program executes in parallel and there is no way to put things in the background, so to say.
If you're rewriting the code anyway, you might be better moving to Threading Building Blocks (TBB; http://www.threadingbuildingblocks.org).
TBB has explicit support for pipeline style operation (or more complicated task graphs), while maintaing cache-locality and independence of the underlying number of threads.

Windows Threads: when should you use InterlockedExchangeAdd()?

The naming of this function seems like this is some complicated stuff going on. When exactly does one know that this is the way to go instead of doing something like this:
Preparation
CRITICAL_SECTION cs;
int *p = malloc(sizeof(int)); // Allocation Site
InitializeCriticalSection(&cs); // HINT for first Write
Thread #1
{
*p = 1; // First Write
}
Thread #2
{
EnterCriticalSection(&cs);
*p = 2; // Second Write
LeaveCriticalSection(&cs);
}
I have a write that gets done in one thread:
Run()
{
// some code
m_bIsTerminated = TRUE;
// some more code
}
Then, I have a read that gets done in another thread (potentially at the same time):
Terminate()
{
// some code
if( m_bIsTerminated )
{
m_dwThreadId = 0;
m_hThread = NULL;
m_evExit.SetEvent();
return;
}
// even more code
}
What's the best solution to solve this race condition? Are critical sections the way to go or is the use of InterlockedExchangeAdd() more useful?
In your case, there's no race condition. The variable is never reset back to FALSE, is it? It's just a "please die" switch for the thread, right? Then no need for synchronization of any kind.
The InterlockedXXX family of functions makes use of Intel CPU's atomic 3-operand commands (XADD and CMPXCNG). So they're much cheaper than a critical section. And the one you want for thread-safe assignment is InterlockedCompareExchange().
UPD: and the mark the variable as volatile.
InterlockedExchangeAdd is used to add a value to an integer as an atomic operation, meaning that you won't have to use a critical section. This also removes the risk of a deadlock if one of your threads throws an exception - you need to make sure that you don't keep any lock of any kind as that would prevent other threads from acquiring that lock.
For your scenario you can definitely use an Interlocked...- function, but I would use an event (CreateEvent, SetEvent, WaitForSingleObject), probably because I often find myself needing to wait for more than one object (you can wait for zero seconds in your scenario).
Upd: Using volatile for the variable may work, however it isn't recommended, see: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html and http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/tags/c%2B%2B0x for instance.
If you want to be portable, take a look at boost::thread.
Make sure m_bIsTerminated is marked as volatile, and you should be ok. Although it seems pretty weird to me that you'd // some more code after setting "is terminated" to true. What exactly does that variable indicate?
Your "race condition" is that your various elements of // more code can execute in different orders. Your variable doesn't help that. Is your goal to get them to execute in a deterministic order? If yes, you'd need a condition variable to wait on one thread and set in another. If you just don't want them executing concurrently, a critical section would be fine.