Join threads created recursively

Join threads created recursively - c++

I have a function that basically fetches a data from a database, then parse this data and fetches others data to which it is dependant, and so on...
The function is thus recursive, and I want to use multithreading to do so.
To simplify the problem, I just writed a dummy program, just for expressing the "spirit" of the function:
void DummyFunction(std::vector<std::thread>& threads, int& i)
{
++i;
if (i < 10)
threads.push_back(std::thread([&]() { DummyFunction(threads, i); }));
}
int main()
{
std::vector<std::thread> threads;
int i = 0;
DummyFunction(threads, i);
// Coming here, "DummyFunction" is still running and potentially creating new threads
// Issue is thus we may enter the for loop when we still don't have the actual number of threads created
for (std::thread& thread : threads)
{
thread.join();
}
}
The issue comes from the need to wait for all the threads to finish running before going any further (hence the for loop to join the threads). But of course, since the "DummyFunction" is still running, new threads can be created and so this way it can't work...
Question is, how can I design such thing properly (if there is a way...)? Can we actually use multi threading recursively?

If you have C++20 available consider using the new thread that automatically joins on destruction. It goes by the name jthread and will save you all the trouble from having to manually join threads.

Try a thought experiment: add an else clause to your if statement:
if (i < 10)
{
threads.push_back(std::thread([&]() { DummyFunction(threads, i); }));
}
else
{
// do something here
}
Once you make that change, a few minutes' worth of thinking will reach the following conclusion: the "do something here" part gets executed exactly once, in one of the execution threads, after all of the execution threads get created.
Now, the solution should be very obvious:
Add a mutex, a condition variable, and a boolean flag. You can either make them global; pass them as additional parameters into DummyFunction, or, better yet: turn your threads vector into its own class containing the vector, the mutex, the condition variable, and the boolean flag, and pass that in recursively instead of just the vector.
main() locks the mutex, clears the condition variable, and after DummyFunction() returns it waits on the condition variable until the boolean flag is set.
The "do something here" part locks the same mutex, sets the boolean flag, signals the condition variable, and unlocks the mutex.
Once you reach this point, you will also suddenly realize one more thing: as is, you have different execution threads all attempting to push_back something into the same vector. Vectors are not thread-safe, so this is undefined behavior. Therefore, you will also need to implement a separate mutex (or reuse the existing one, this looks eminently possible to me) to also lock the access to the vector.

Related

c++ block thread until condition on a thread-safe object is met

I apologise in advance if my question is a duplicate, but I was not able to find a satisfying answer to my question.
I am dealing with the following (maybe silly) issue: I am trying to synchronise two threads (A and B), and I want to block the thread A until a condition is set to true by the thread B.
The "special" thing is that the condition is checked on a thread-safe object (for instance, let's consider it to be a std::atomic_bool).
My naive approach was the following:
// Shared atomic object
std::atomic_bool condition{false};
// Thread A
// ... does something
while(!condition.load()) ; // Do nothing
// Condition is met, proceed with the job
// Thread B
// ... does something
condition.store(true); // Unlock Thread A
but, as far as I have understood, the while implies an active wait which is undesirable.
So, I thought about having a small sleep_for as the body of the while to reduce the frequency of the active wait, but then the issue becomes finding the right waiting time that does not cause waste of time in case the condition unlocks while thread A is sleeping and, at the same time, does not make the loop to execute too often.
My feeling is that this is very much dependant on the time that thread B spends before setting the condition to true, which may be not predictable.
Another solution I have found looking on other SO topics is to use a condition variable, but that would require the introduction of a mutex that is not really needed.
I am perhaps overthinking the problem, but I'd like to know if there are alternative "standard" solutions to follow (bearing in mind that I am limited to C++11), and what would be the best approach in general.
Many thanks in advance for the help.

Your use case is simple and there are many ways to implement that.
The first recommendation would be to make use of condition variable. But it
seems from your question that you would like to avoid that because of mutex.
I don't have any profiling data for your use case, but mutex isn't costly for your use case.
In a multi-threaded environment, at some point of time, you would need some techniques to protect shared access and modification of data. You would probably need mutexes for that.
You could go for condition variable approach.
It is by the standard, and it also provides function to notify all the threads as well, if your use case scales in future.
Also, as you mentioned about "time", condition_variable also comes with variations of wait* functions where the condition could be in terms of "time". It can wait_for or wait_until a certain time as well.
About the while loop and a sleep_for approach, blocking a thread from execution and then rescheduling it again isn't that cheap if we are counting in terms of milliseconds. The condition variable approach would be better suited in this case, rather than having the while loop and an explicit call to sleep_for.

Sorry, condition variables are the way to go here.
The mutex is being used as a part of the condition variable, not as a traditional mutex. And barring some strange priority inversion situation, it shouldn't have much cost.
Here is a simple "farm gate". It starts shut, and can be opened. Once opened, it can never be shut again.
struct gate {
void open_gate() {
auto l = lock();
gate_is_open = true;
cv.notify_all();
}
void wait_on_gate() const {
auto l = lock();
cv.wait(l, [&]{ return gate_is_open; });
}
private:
auto lock() const { return std::unique_lock{m}; }
mutable std::mutex m;
bool gate_is_open = false;
std::condition_variable cv;
};
which you'd use like this:
// Shared gate
gate condition;
// Thread A
// ... does something
condition.wait_on_gate(); // Do nothing
// Condition is met, proceed with the job
// Thread B
// ... does something
condition.open_gate(); // Unlock Thread A
and there we have it.
In c++20 there is std::latch. Start the counter at 1, decrement it when the gate opens, and the other thread waits on the latch.

How about using some sort of a sentinel value to check if the conditions of thread B are true to unlock thread A and synchronize both of them once the condition is met.

How to find joinable threads and remove them from vector?

I'm keeping track of some threads in C++ by adding them to a vector.
I want to periodically loop through the vector, check to see if they've exited (they've become joinable), and if so, remove them from the vector.
My code looks like:
void do_stuff(){
for(int i=0; i<30; i++){
cout << "Doing Stuff.\n";
sleep(10);
}
}
void main_loop(){
std::vector<std::thread> client_threads;
while(1){
if(stuff_needs_to_be_done()){
client_threads.push_back(std::thread(&do_stuff));
}
// Cleanup threads.
auto itr = std::begin(client_threads);
while (itr != std::end(client_threads)) {
if ((*itr).joinable()){
itr = client_threads.erase(itr);
}else{
++itr;
}
}
}
}
Upon stepping through the code, I find when it gets to my thread cleanup section, my process exits with:
terminate called without an active exception
Aborted (core dumped)
I'm not sure what this means exactly, other than I'm probably not cleaning up my threads correctly. What am I doing wrong?

Using std::async is the easiest way to implement the idea with the vector where you dynamically remove finished threads in the loop. This method is almost the same as #4xy advised, but a little easier.
Anyway, let's avoid the problem XY. Why do you need to remove the elements from the vector? If the only reason is to wait for the last thread to finish, you don't need to remove the elements dynamically. You may query the shared counter (implemented as std::atomic<int>), and share this variable between threads (or async procedures).
Busy waiting is another problem: why not to stop on a blocking primitive? Logically that is an inverse semaphore, where it is in signaled state whenever the counter is equal to zero. There is no such primitive in current C++ standard, but you can implement it. Recently I've done that using a std::shared_ptr: I'm creating a single shared pointer object and copying it into each of the threads. Each copy is being destroyed whenever the thread finishes. The last one destroys the counter and calls the deleter. The deleter is a custom procedure you may provide. For example it can release a mutex that can be used to block your main thread.

Use std::promise and std::future::wait_for with zero timeout to indicate the thread exit and wait for non blocking completion (polling) of threads.
The links provide good examples that can be easily adapted for your case.

Proper way to use predicate in conditional variable

I want to know if I need to reset the predicate boolean variable inside the scope of the locked mutex. Right now, I have a std::unique_lock with a lambda function as the predicate parameter - The lambda function returns a boolean flag. Does this mean I need to set the boolean flag back to false inside the scope of the lock guard?
It seems to work a lot faster if I don't reset the boolean flag, but I'm not sure if this is the safe way to approach this.
#include <thread>
#include <condition_variable>
#include <mutex>
std::condition_variable _conditional;
bool processed = false;
std::mutex _mutex;
void Worker() {
while (true) {
//Doing other things ..
//
{
//mutating shared data
std::unique_lock<std::mutex> _lock(_mutex);
_conditional.wait(_lock, [] { return processed; });
processed = false? //do I set this boolean back to false or just leave as is?
data++;
}
}
}
void Reader() {
//just prints out the changed data
while (true) {
std::cout << data << std::endl;
processed = true;
_conditional.notify_one();
}
}
int main(){
std::thread t1(Worked);
std::thread t2(Reader);
t1.join();
t2.join();
}

Firstly, the Reader never acquires a lock to synchronize its access to the shared data (in this case the processed boolean and the data variable, whatever it is) with the Worker. As such, the Reader can modify processed while the Worker is reading from it, and the Reader can read from data while the Worker is writing to it; these are both race conditions. They can be fixed by having the Reader also lock the mutex before modifying processed or reading from data. The rest of this answer assumes that this correction is made.
Secondly, whether or not processed should be reset back to false is dependent on what you want the application to do, so it's necessary to understand the consequences.
If it is never reset to false, then the Worker will never again wait on the condition variable (though it will continuously reacquire the mutex and check the value of processed, despite the fact that it is guaranteed to be true after the first wait terminates), and it will simply keep incrementing data. Even if you correctly synchronize access to the shared data like I mentioned, this still might not do what you want it to do. It's very possible that the Worker could acquire the mutex several times in a row before the Reader, and thus data can be incremented multiple times in-between prints, and data can be printed multiple times in-between increments (there is no order guarantee for printing and incrementing, in such a case)
If you reset processed back to false after each wait within the Worker, then you can guarantee that data will be printed at least once in-between each increment, since it would be unable to increment data until the Reader has notified it (which requires at least one print first). However, it may still be printed multiple times in-between each increment, because there is still no mechanism forcing the Reader to wait on the Worker.
If you provide another mechanism allowing the Reader to also wait on the Worker, then you can theoretically guarantee that each print happens exactly once in-between each increment (alternating). But once you've gone this far, the entire process is run serially, so there's really no point in using multiple threads anymore.
Notice that each of these approaches all have entirely different semantics. The one you should choose depends on what you want your application to do.

How do I make a thread wait without polling?

I have question about multi threading in c++. I have a scenario as follows
void ThreadedRead(int32_t thread_num, BinReader reader) {
while (!reader.endOfData) {
thread_buckets[thread_num].clear();
thread_buckets[thread_num] = reader.readnextbatch()
thread_flags[thread_num] = THREAD_WAITING;
while (thread_flags[thread_num] != THREAD_RUNNING) {
// wait until awakened
if (thread_flags[thread_num] != THREAD_RUNNING) {
//go back to sleep
}
}
}
thread_flags[thread_num] = THREAD_FINISHED;
}
No section of the above code writes or access memory shared between threads. Each thread is assigned a thread_num and a unique reader object that it may use to read data.
I want the main thread to be able to notify a thread that is in the THREAD_WAITING state that his state has been changed back to THREAD_RUNNING and he needs to do some work. I don't want to him to keep polling his state.
I understand conditional vars and mutexes can help me. But I'm not sure how to use them because I don't want to acquire or need a lock. How can the mainthread blanket notify all waiting threads that they are now free to read more data?
EDIT:
Just in case anyone needs more details
1) reader reads some files
2) thread_buckets is a vector of vectors of uint16
3) threadflags is a int vector
they have all been resized appropriately

I realize that you wrote that you wanted to avoid condition variables and locks. On the other hand you mentioned that this was because you were not sure about how to use them. Please consider the following example to get the job done without polling:
The trick with the condition variables is that a single condition_variable object together with a single mutex object will do the management for you including the handling of the unique_lock objects in the worker threads. Since you tagged your question as C++ I assume you are talking about C++11 (or higher) multithreading (I guess that C-pthreads may work similarly). Your code could be as follows:
// compile for C++11 or higher
#include <thread>
#include <condition_variable>
#include <mutex>
// objects visible to both master and workers:
std::condition_variable cvr;
std::mutex mtx;
void ThreadedRead(int32_t thread_num, BinReader reader) {
while (!reader.endOfData) {
thread_buckets[thread_num].clear();
thread_buckets[thread_num] = reader.readnextbatch()
std::unique_lock<std::mutex> myLock(mtx);
// This lock will be managed by the condition variable!
thread_flags[thread_num] = THREAD_WAITING;
while (thread_flags[thread_num] == THREAD_WAITING) {
cvr.wait(myLock);
// ...must be in a loop as shown because of potential spurious wake-ups
}
}
thread_flags[thread_num] = THREAD_FINISHED;
}
To (re-)activate the workers from a master thread:
{ // block...
// step 1: usually make sure that there is no worker still preparing itself at the moment
std::unique_lock<std::mutex> someLock(mtx);
// (in your case this would not cover workers currently busy with reader.readnextbatch(),
// these would be not re-started this time...)
// step 2: set all worker threads that should work now to THREAD_RUNNING
for (...looping over the worker's flags...) {
if (...corresponding worker should run now...) {
flag = THREAD_RUNNING;
}
}
// step 3: signalize the workers to run now
cvr.notify_all();
} // ...block, releasing someLock
Notice:
If you just want to trigger all sleeping workers you should control them with a single flag instead of a container of flags.
If you want to trigger single sleeping workers but it doesn't matter which one consider the .notify_one() member function instead of .notify_all(). Note as well that also in this case a single mutex/condition_variable pair is sufficient.
The flags should better be placed in an atomic object such as a global std::atomic<int> or maybe for finer control in a std::vector<std::atomic<int>>.
A good introduction to std::condition_variable which also inspired the suggested solution is given in: cplusplus website

It looks like there are a few issues. For one thing, you do not need the conditional inside of your loop:
while (thread_flags[thread_num] != THREAD_RUNNING);
will work by itself. As soon as that condition is false, the loop will exit.
If all you want to do is avoid checking thread_flags as quickly as possible, just put a yield in the loop:
while (thread_flags[thread_num] != THREAD_RUNNING) yield(100);
This will cause the thread to yield the CPU so that it can do other things while the thread waits for its state to change. This will make make the overhead for polling close to negligible. You can experiment with the sleep duration to find a good value. 100ms is probably on the long side.
Depending on what causes the thread state to change, you could have the thread poll that condition/value directly (with a sleep in still) and not bother with states at all.
There are a lot of options here. If you look up reader threads you can probably find just what you want; having a separate reader thread is very common.

Windows Threads: when should you use InterlockedExchangeAdd()?

The naming of this function seems like this is some complicated stuff going on. When exactly does one know that this is the way to go instead of doing something like this:
Preparation
CRITICAL_SECTION cs;
int *p = malloc(sizeof(int)); // Allocation Site
InitializeCriticalSection(&cs); // HINT for first Write
Thread #1
{
*p = 1; // First Write
}
Thread #2
{
EnterCriticalSection(&cs);
*p = 2; // Second Write
LeaveCriticalSection(&cs);
}
I have a write that gets done in one thread:
Run()
{
// some code
m_bIsTerminated = TRUE;
// some more code
}
Then, I have a read that gets done in another thread (potentially at the same time):
Terminate()
{
// some code
if( m_bIsTerminated )
{
m_dwThreadId = 0;
m_hThread = NULL;
m_evExit.SetEvent();
return;
}
// even more code
}
What's the best solution to solve this race condition? Are critical sections the way to go or is the use of InterlockedExchangeAdd() more useful?

In your case, there's no race condition. The variable is never reset back to FALSE, is it? It's just a "please die" switch for the thread, right? Then no need for synchronization of any kind.
The InterlockedXXX family of functions makes use of Intel CPU's atomic 3-operand commands (XADD and CMPXCNG). So they're much cheaper than a critical section. And the one you want for thread-safe assignment is InterlockedCompareExchange().
UPD: and the mark the variable as volatile.

InterlockedExchangeAdd is used to add a value to an integer as an atomic operation, meaning that you won't have to use a critical section. This also removes the risk of a deadlock if one of your threads throws an exception - you need to make sure that you don't keep any lock of any kind as that would prevent other threads from acquiring that lock.
For your scenario you can definitely use an Interlocked...- function, but I would use an event (CreateEvent, SetEvent, WaitForSingleObject), probably because I often find myself needing to wait for more than one object (you can wait for zero seconds in your scenario).
Upd: Using volatile for the variable may work, however it isn't recommended, see: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html and http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/tags/c%2B%2B0x for instance.
If you want to be portable, take a look at boost::thread.

Make sure m_bIsTerminated is marked as volatile, and you should be ok. Although it seems pretty weird to me that you'd // some more code after setting "is terminated" to true. What exactly does that variable indicate?
Your "race condition" is that your various elements of // more code can execute in different orders. Your variable doesn't help that. Is your goal to get them to execute in a deterministic order? If yes, you'd need a condition variable to wait on one thread and set in another. If you just don't want them executing concurrently, a critical section would be fine.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js