multi-thread program initialization using call_once vs atomic_flag

multi-thread program initialization using call_once vs atomic_flag - c++

In book C++ Concurrency in Action 2nd, 3.3.1, the author introduced a way using call_once function to avoid double-checked locking pattern when doing initialization in multi-thread program,
std::shared_ptr<some_resource> resource_ptr;
std::once_flag resource_flag;
void init_resource()
{
resource_ptr.reset(new some_resource);
}
void foo()
{
std::call_once(resource_flag,init_resource); #1
resource_ptr->do_something();
}
the reason is explained in this [answer][1]. I used to use atomic_flag to do initialization in multi-thread program, something like this:
td::atomic_flag init = ATOMIC_FLAG_INIT;
std::atomic<bool> initialized = false;
void Init()
{
if (init.test_and_set()) return;
DoInit();
initialized = true;
}
void Foo(){
if(!initialized) return;
DoSomething(); // use some variable intialized in DoInit()
}
every threads will call Init() before call Foo().
After read the book, I wonder will the above pattern cause race condition, therefore not safe to use? Is it possible that the compiler reorder the instructions and initialized become true before DoInit() finish?
[1]: Explain race condition in double checked locking

The race condition in your code happens when thread 1 enters DoInit and thread 2 skips it and proceeds to Foo.
You handle it with if(!initialized) return in Foo but this is not always possible: you should always expect a method to accidently do nothing and you can forget to add such checks to other methods.
With std::call_once with concurrent invocation the execution will only continue after the action is executed.
Regarding reordering, atomics operations use memory_order_seq_cst by default that does not allow any reordering.

Related

Does std::thread constructor completion actually synchronize with the beginning of the thread of execution?

The C++11 standard (N337, 30.3.1.2) states about the synchronization of the std::thread constructor:
Synchronization: The completion of the invocation of the constructor synchronizes with the beginning of the invocation of the copy of f.
Reading it, I thought the constructor completes before the start of the new thread. But according to the question (std::thread construction and execution) and the current implementation in libc++/libstdc++, there seems no synchronization mechanism and the new thread of execution can possibly begin before the end of the std::thread constructor.
If that is correct, what does the standard try to say? Is this the gap between standard and implementation? Or do I understand the term "synchronize with" incorrectly? Even if constructor and new thread are running simultaneously, can the constructor completion be considered synchronizing with the beginning of new thread?

Reading it, I thought the constructor completes before the start of the new thread
"synchronizes with" is a term of art. When the standard mandates that two operations synchronize with each other, that carries with it certain requirements for evaluations before and after the two operations. For example, accessing a variable in the original thread before the std::thread constructor, and accessing it in the new thread do not cause a data race.
Intuitively, you can think of "synchronizes with" as meaning that the new thread can see all prior evaluations, modifications, and side effects from the initial thread.
There is no need to make sure the thread begins by the end of the constructor. That is not what this says.
The way standard libraries enforce this requirement is by relying on underlying libraries like pthreads that essentially also enforce this requirement.

Once the constructor completes there is nothing you can do to prevent the thread from executing, if it already starts or not actual execution immediately is irrelevant and depends on scheduler/cpu/load.
You are guaranteed however that all the code in the constructor has been executed when the code of the thread starts executing.
In other words it may be that the thread executes a million instructions before the next instruction after the constructor is executed (e.g. the main thread is suspended right after creating the new thread) or it may be a million instructions after the constructor are executed before the first instruction in the thread is executed (i.e. the new thread is immediately suspended).
May be the hardware is single core and indeed there is simply no way two instructions are executed "at the same time" and all operations on native types are atomic (this was the main issue when programs started being executed on first real parallel hardware... a lot of old multithread code was working without explicit synchronization because the hardware was always implicitly synchronizing but started failing in random ways when true parallelism arrived).
Anyway I wouldn't try to read too much into the formal specification: from a purely formal point of view, I think it would be hard to say that an implementation in which a thread is never actually started is not conforming (proving black-box it's not conforming would require waiting till the end of time). The same could be said of an implementation in which a+b takes 10 billion years...

No it does not synchronize (scheduling etc...)
I usually use a condition variable construct to be really sure the asynchronous thread has really started. Like this :
#include <condition_variable>
#include <mutex>
#include <thread>
/// <summary>
/// wrapper for a condition variable, takes into account
/// https://www.modernescpp.com/index.php/c-core-guidelines-be-aware-of-the-traps-of-condition-variables
/// </summary>
class sync_signal final
{
public:
sync_signal() = delete;
~sync_signal() = default;
sync_signal(const sync_signal&) = delete;
sync_signal& operator=(const sync_signal&) = delete;
sync_signal(sync_signal&&) = delete;
explicit sync_signal(bool value) :
m_value{ value }
{
}
void set() noexcept
{
{
std::unique_lock<std::mutex> lock(m_value_mutex);
m_value = true;
}
m_value_changed.notify_all();
}
void wait()
{
std::unique_lock<std::mutex> lock(m_value_mutex);
auto pred = [this] { return m_value; };
// check pred first we might have missed a notify
if (pred())
{
return;
}
m_value_changed.wait(lock, pred);
}
std::mutex m_value_mutex;
std::condition_variable m_value_changed;
bool m_value;
};
int main()
{
sync_signal signal{ false };
std::thread t([&signal]()
{
signal.set(); // signal that thread has really scheduled/started
// do your async stuff here
});
// wait on main thread for async thread to have really started
signal.wait();
// wait for thread to finish to avoid race conditions at shut-down
t.join();
}
For short duration functions I prefer using std::async though.
Instead of creating a whole new thread (and taking up resources) it will get a thread from a threadpool
#include <future>
auto ft = std::async(std::launch::async, [&signal]()
{
signal.set(); // signal that thread has really started
// do your async stuff here
});
ft.get();

How to end a thread properly?

My main program creates a thread. This thread initializes some data then enters a 'while' loop and runs until the main program sets the control variable to 'false'. Then it calls join() witch blocks the whole code endlessly.
bool m_ThreadMayRun;
void main(){
thread mythread = thread(&ThreadFunction);
//do stuff
m_ThreadMayRun = false;
mythread.join(); // this blocks endlessly even when I ask 'joinable' before
}
void ThreadFunction{
initdata();
m_ThreadMayRun=true;
while(m_ThreadMayRun){
//do stuff that can be / has to be done for ever
}
deinitdata();
}
Am I missing something here?
What would be a proper solution to make the loop leave from the main thread?
Is it at all necessary to call join?
Thanks for help

You have a race condition for two threads writing to m_ThreadMayRun. Consider what happens if first the main thread executes m_ThreadMayRun = false; and then the thread you spwaned executes m_ThreadMayRun = true;, then you have an infinite loop. However, strictly speaking that line of reasoning is irrelevant, because when you have a race condition your code has undefined behavior.
Am I missing something here?
You need to synchronize access to m_ThreadMayRun by making it either an std::atomic<bool> or using a std::mutex and make sure that m_ThreadMayRun = false is executed after m_ThreadMayRun = true;.
PS For this situation it is better to use a std::condition_variable.

The issue is that access to bool m_ThreadMayRun; is not synchronized, and according to C++ rules, each thread may assume it does not change between threads. So you end up with a race (a form of undefined behavior).
To make the intention clear, make it atomic.
std::atomic<bool> m_ThreadMayRun;
With this every load/store of m_ThreadMayRun becomes a memory fence, which not only synchronizes its own value, but also makes other work done by the thread visible, due to the acquire/release semantics of an atomic load/store.
Though there is still a small race possible between m_ThreadMayRun = true in the thread and setting m_ThreadMayRun = false. Either one can execute first, sometimes leading to undesired results. To avoid this, initialize it to true before starting the thread.
std::atomic<bool> m_ThreadMayRun;
void main(){
m_ThreadMayRun = true;
thread mythread(&ThreadFunction);
//do stuff
m_ThreadMayRun = false;
mythread.join(); // this blocks endlessly even when I ask 'joinable' before
}
void ThreadFunction{
initdata();
while(m_ThreadMayRun){
//do stuff that can be / has to be done for ever
}
deinitdata();
}
For more details about memory fences and acquire/release semantics, refer to the following excellent resources: the book "C++ Concurrency in Action" and Herb Sutter's atomic<> weapons talk.

Using std::condition_variable with atomic<bool>

There are several questions on SO dealing with atomic, and other that deal with std::condition_variable. But my question if my use below is correct?
Three threads, one ctrl thread that does preparation work before unpausing the two other threads. The ctrl thread also is able to pause the worker threads (sender/receiver) while they are in their tight send/receive loops.
The idea with using the atomic is to make the tight loops faster in case the boolean for pausing is not set.
class SomeClass
{
public:
//...
// Disregard that data is public...
std::condition_variable cv; // UDP threads will wait on this cv until allowed
// to run by ctrl thread.
std::mutex cv_m;
std::atomic<bool> pause_test_threads;
};
void do_pause_test_threads(SomeClass *someclass)
{
if (!someclass->pause_test_threads)
{
// Even though we use an atomic, mutex must be held during
// modification. See documentation of condition variable
// notify_all/wait. Mutex does not need to be held for the actual
// notify call.
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = true;
}
}
void unpause_test_threads(SomeClass *someclass)
{
if (someclass->pause_test_threads)
{
{
// Even though we use an atomic, mutex must be held during
// modification. See documentation of condition variable
// notify_all/wait. Mutex does not need to be held for the actual
// notify call.
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = false;
}
someclass->cv.notify_all(); // Allow send/receive threads to run.
}
}
void wait_to_start(SomeClass *someclass)
{
std::unique_lock<std::mutex> lk(someclass->cv_m); // RAII, no need for unlock.
auto not_paused = [someclass](){return someclass->pause_test_threads == false;};
someclass->cv.wait(lk, not_paused);
}
void ctrl_thread(SomeClass *someclass)
{
// Do startup work
// ...
unpause_test_threads(someclass);
for (;;)
{
// ... check for end-program etc, if so, break;
if (lost ctrl connection to other endpoint)
{
pause_test_threads();
}
else
{
unpause_test_threads();
}
sleep(SLEEP_INTERVAL);
}
unpause_test_threads(someclass);
}
void sender_thread(SomeClass *someclass)
{
wait_to_start(someclass);
...
for (;;)
{
// ... check for end-program etc, if so, break;
if (someclass->pause_test_threads) wait_to_start(someclass);
...
}
}
void receiver_thread(SomeClass *someclass)
{
wait_to_start(someclass);
...
for (;;)
{
// ... check for end-program etc, if so, break;
if (someclass->pause_test_threads) wait_to_start(someclass);
...
}

I looked through your code manipulating conditional variable and atomic, and it seems that it is correct and won't cause problems.
Why you should protect writes to shared variable even if it is atomic:
There could be problems if write to shared variable happens between checking it in predicate and waiting on condition. Consider following:
Waiting thread wakes spuriously, aquires mutex, checks predicate and evaluates it to false, so it must wait on cv again.
Controlling thread sets shared variable to true.
Controlling thread sends notification, which is not received by anybody, because there is no threads waiting on conditional variable.
Waiting thread waits on conditional variable. Since notification was already sent, it would wait until next spurious wakeup, or next time when controlling thread sends notification. Potentially waiting indefinetly.
Reads from shared atomic variables without locking is generally safe, unless it introduces TOCTOU problems.
In your case you are reading shared variable to avoid unnecessary locking and then checking it again after lock (in conditional wait call). It is a valid optimisation, called double-checked locking and I do not see any potential problems here.
You might want to check if atomic<bool> is lock-free. Otherwise you will have even more locks you would have without it.

In general, you want to treat the fact that variable is atomic independently of how it works with a condition variable.
If all code that interacts with the condition variable follows the usual pattern of locking the mutex before query/modification, and the code interacting with the condition variable does not rely on code that does not interact with the condition variable, it will continue to be correct even if it wraps an atomic mutex.
From a quick read of your pseudo-code, this appears to be correct. However, pseudo-code is often a poor substitute for real code for multi-threaded code.
The "optimization" of only waiting on the condition variable (and locking the mutex) when an atomic read says you might want to may or may not be an optimization. You need to profile throughput.

atomic data doesn't need another synchronization, it's basis of lock-free algorithms and data structures.
void do_pause_test_threads(SomeClass *someclass)
{
if (!someclass->pause_test_threads)
{
/// your pause_test_threads might be changed here by other thread
/// so you have to acquire mutex before checking and changing
/// or use atomic methods - compare_exchange_weak/strong,
/// but not all together
std::lock_guard<std::mutex> lk(someclass->cv_m);
someclass->pause_test_threads = true;
}
}

C++: Thread Safety in a Signal/Slot Library

I'm implementing a Signal/Slot framework, and got to the point that I want it to be thread-safe. I already had a lot of support from the Boost mailing-list, but since this is not really boost-related, I'll ask my pending question here.
When is a signal/slot implementation (or any framework that calls functions outside itself, specified in some way by the user) considered thread-safe? Should it be safe w.r.t. its own data, i.e. the data associated to its implementation details? Or should it also take into account the user's data, which might or might not be modified whatever functions are passed to the framework?
This is an example given on the mailing-list (Edit: this is an example use-case --i.e. user code--. My code is behind the calls to the Emitter object):
int * somePtr = nullptr;
Emitter<Event> em; // just an object that can emit the 'Event' signal
void mainThread()
{
em.connect<Event>(someFunction);
// now, somehow, 2 threads are created which, at some point
// execute the thread1() and thread2() functions below
}
void someFunction()
{
// can somePtr change after the check but before the set?
if (somePtr)
*somePtr = 17;
}
void cleanupPtr()
{
// this looks safe, but compilers and CPUs can reorder this code:
int *tmp = somePtr;
somePtr = null;
delete tmp;
}
void thread1()
{
em.emit<Event>();
}
void thread2()
{
em.disconnect<Event>(someFunction);
// now safe to cleanup (?)
cleanupPtr();
}
In the above code, it might happen that Event is emitted, causing someFunction to be executed. If somePtr is non-null, but becomes null just after the if, but before the assignment, we're in trouble. From the point of view of thread2, this is not obvious because it is disconnecting someFunction before calling cleanupPtr.
I can see why this could potentially lead to trouble, but who's responsibility is this? Should my library protect the user from using it in every irresponsible but imaginable way?

I suspect there is no clearly good answer, but clarity will come from documenting the guarantees you wish to make about concurrent access to an Emitter object.
One level of guarantee, which to me is what is implied by a promise of thread safety, is that:
Concurrent operations on the object are guaranteed to leave the object in a consistent state (at least, from the point of view of the accessing threads.)
Non-commutative operations will be performed as if they were scheduled serially in some (unknown) order.
Then the question is, what does the emit method promise semantically: passing control to the connected routine, or evaluation of the function? If the former, then your work sounds like it is already done; if the latter, then the 'as-if ordered' requirement would mean that you need to enforce some level of synchronisation.
Users of the library can work with either, provided it is clear what is being promised.

Firstly the simplest possibility: If you don't claim your library to be thread-safe, you don't have to bother about this.
(But even) if you do:
In your example the user would have to take care about thread-safety, since both functions could be dangerous, even without using your event-system (IMHO, this is a pretty good way to determine who should take care about those kind of problems). A possible way for him to do this in C++11 could be:
#include <mutex>
// A mutex is used to control thread-acess to a shared resource
std::mutex _somePtr_mutex;
int* somePtr = nullptr;
void someFunction()
{
/*
Create a 'lock_guard' to manage your mutex.
Is the mutex '_somePtr_mutex' already locked?
Yes: Wait until it's unlocked.
No: Lock it and continue execution.
*/
std::lock_guard<std::mutex> lock(_somePtr_mutex);
if(somePtr)
*somePtr = 17;
// End of scope: 'lock' gets destroyed and hence unlocks '_somePtr_mutex'
}
void cleanupPtr()
{
/*
Create a 'lock_guard' to manage your mutex.
Is the mutex '_somePtr_mutex' already locked?
Yes: Wait until it's unlocked.
No: Lock it and continue execution.
*/
std::lock_guard<std::mutex> lock(_somePtr_mutex);
int *tmp = somePtr;
somePtr = null;
delete tmp;
// End of scope: 'lock' gets destroyed and hence unlocks '_somePtr_mutex'
}

The last question is easy. If you say your library is threadsafe, it should threadsafe. It makes no sense to say it is partly threadsafe or, it is only threadsafe if you do not abuse it. In that case you have to explain what exactly is not threadsafe.
Now to your first question regarded someFunction:
The operation is non atomic. Which means the CPU can interrupt between the if and the assigment. And that will happen, I know that :-) The other thread can erase the pointer anytime. Even between two short and fast looking statements.
Now to cleanupPtr:
I am not a compiler expert, but if you want to be shure that your assigment take place in the same moment you wrote it in code you should write the keyword volatile in front of the declaration of somePtr. The compiler will now know that you use that attribute in a multithreaded situation and will not buffer the value in a register of the CPU.
If you have a thread situation with a reader thread and a writer thread, the keyword volatile can (IMHO) be enough to sync them. As long as the attributes you use to exchange information between threads are generic.
For other situations you can use mutex or atomics. I will give you an example for mutex. I use C++11 for that, but it works similar with previous versions of C++ using boost.
Using mutex:
int * somePtr = nullptr;
Emitter<Event> em; // just an object that can emit the 'Event' signal
std::recursive_mutex g_mutex;
void mainThread()
{
em.connect<Event>(someFunction);
// now, somehow, 2 threads are created which, at some point
// execute the thread1() and thread2() functions below
}
void someFunction()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
// can somePtr change after the check but before the set?
if (somePtr)
*somePtr = 17;
}
void cleanupPtr()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
// this looks safe, but compilers and CPUs can reorder this code:
int *tmp = somePtr;
somePtr = null;
delete tmp;
}
void thread1()
{
em.emit<Event>();
}
void thread2()
{
em.disconnect<Event>(someFunction);
// now safe to cleanup (?)
cleanupPtr();
}
I only added a recursive mutex here without changing any other code of the sample, even if it's now cargo code.
There are two kinds of mutex in the std. A utterly useless std::mutex and the std::recursive_mutex which work like you expect a mutex should work. The std::mutex exclude the access of any further call even from the same thread. Which can happen if a method which needs mutex protection calls a public method which use the same mutex. std::recursive_mutex is reentrant for the same thread.
Atomics (or interlocks in win32) are another way, but only to exchange values between threads or access them concurrently. Your example is missing such values, but in your case, I would look a little deeper in them (std::atomic).
UPDATE
If your are the user of a library which is not explicit declared as threadsafe by the developer, take it as non threadsafe and shield every call to it with a mutex lock.
To stick with the example. If you cannot change someFunction the you have to wrap the function like:
void threadsafeSomeFunction()
{
std::lock_guard<std::recursive_mutex> lock(g_mutex);
someFunction();
}

Simple Thread Synchronization

I need a simple "one at a time" lock on a section of code. Consider the function func which can be run from multiple threads:
void func()
{
// locking/mutex statement goes here
operation1();
operation2();
// corresponding unlock goes here
operation3();
}
I need to make sure that operation1 and operation2 always run "together". With C# I would use a simple lock block around these two calls. What is the C++/Win32/MFC equivalent?
Presumably some sort of Mutex?

Improving Michael solution above for C++.
Michael solution is perfect for C applications. But when used in C++ this style is discouraged because of the possibility of exceptions. If an exception happens in operation1 or operation2 then the critical section will not be correctly left and all other threads will block waiting.
// Perfect solution for C applications
void func()
{
// cs previously initialized via InitializeCriticalSection
EnterCriticalSection(&cs);
operation1();
operation2();
LeaveCriticalSection(&cs);
operation3();}
}
// A better solution for C++
class Locker
{
public:
Locker(CSType& cs): m_cs(cs)
{
EnterCriticalSection(&m_cs);
}
~Locker()
{
LeaveCriticalSection(&m_cs);
}
private:
CSType& m_cs;
}
void func()
{
// cs previously initialized via InitializeCriticalSection
{
Locker lock(cs);
operation1();
operation2();
}
operation3();
}

Critical sections will work (they're lighter-weight that mutexes.) InitializeCriticalSection, EnterCriticalSection, LeaveCriticalSection, and DeleteCriticalSection are the functions to look for on MSDN.
void func()
{
// cs previously initialized via InitializeCriticalSection
EnterCriticalSection(&cs);
operation1();
operation2();
LeaveCriticalSection(&cs);
operation3();}
}
EDIT:
Critical sections are faster than mutexes since critical sections are primarily user mode primitives - in the case of an uncontended acquire (usually the common case) there is no system call into the kernel, and acquiring takes on the order of dozens of cycles. A kernel switch is more more expensive (on the order of hundreds of cycles). The only time critical sections call into the kernel is in order to block, which involves waiting on a kernel primitive, (either mutex or event). Acquiring a mutex always involves a call into the kernel, and is thus orders of magnitude slower.
However, critical sections can only be used to synchronize resources in one process. In order to synchronize across multiple processes, a mutex is needed.

The best method would be to use a critical section, use EnterCriticalSection and LeaveCriticalSection. The only ticky part is that you need to initialize a critical section first with InitializeCriticalSection. If this code is within a class, put the initialization in the constructor and the CRITICAL_SECTION data structure as a member of the class. If the code is not part of a class, you need to likely use a global or something similiar to ensure it is initialized once.

using MFC:
Define a synchronization object. ( Mutext or Critical section)
1.1 If multiple threads belonging to different process enters the
func() then use CMutex.
1.2. If multiple threads of same process enters the func() then use
CCriticalSection.
CSingleLock can be used to ease the usage of synchronization objects.
Lets say we have defined critical section
CCriticalSection m_CriticalSection;
void func()
{
// locking/mutex statement goes here
CSingleLock aLock(&m_CriticalSection, **TRUE**);
// TRUE indicates that Lock aquired during aLock creation.
// if FALSE used then use aLock.Lock() for locking.
operation1();
operation2();
// corresponding unlock goes here
aLock.Unlock();
operation3();
}
EDIT: Refer VC++ article from MSDN: Multithreading with C++ and MFC Classes and
Multithreading: How to Use the Synchronization Classes

You can try this:
void func()
{
// See answer by Sasha on how to create the mutex
WaitForSingleObject (mutex, INFINITE);
operation1();
operation2();
ReleaseMutex(mutex);
operation3();
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js