Is there any potential problem with double-check lock for C++?

Is there any potential problem with double-check lock for C++? - c++

Here is a simple code snippet for demonstration.
Somebody told me that the double-check lock is incorrect. Since the variable is non-volatile, the compiler is free to reorder the calls or optimize them away(For details, see codereview.stackexchange.com/a/266302/226000).
But I really saw such a code snippet is used in many projects indeed. Could somebody shed some light on this matter? I googled and talked about it with my friends, but I still can't find out the answer.
#include <iostream>
#include <mutex>
#include <fstream>
namespace DemoLogger
{
void InitFd()
{
if (!is_log_file_ready)
{
std::lock_guard<std::mutex> guard(log_mutex);
if (!is_log_file_ready)
{
log_stream.open("sdk.log", std::ofstream::out | std::ofstream::trunc);
is_log_file_ready = true;
}
}
}
extern static bool is_log_file_ready;
extern static std::mutex log_mutex;
extern static std::ofstream log_stream;
}
//cpp
namespace DemoLogger
{
bool is_log_file_ready{false};
std::mutex log_mutex;
std::ofstream log_stream;
}
UPDATE:
Thanks to all of you. There is better implementation for InitFd() indeed, but it's only a simple demo indeed, what I really want to know is that whether there is any potential problem with double-check lock or not.
For the complete code snippet, see https://codereview.stackexchange.com/questions/266282/c-logger-by-template.

The double-checked lock is incorrect because is_log_file_ready is a plain bool, and this flag can be accessed by multiple threads one of which is a writer - that is a race
The simple fix is to change the declaration:
std::atomic<bool> is_log_file_ready{false};
You can then further relax operations on is_log_file_ready:
void InitFd()
{
if (!is_log_file_ready.load(std::memory_order_acquire))
{
std::lock_guard<std::mutex> guard(log_mutex);
if (!is_log_file_ready.load(std::memory_order_relaxed))
{
log_stream.open("sdk.log", std::ofstream::out | std::ofstream::trunc);
is_log_file_ready.store(true, std::memory_order_release);
}
}
}
But in general, double-checked locking should be avoided except in low-level implementations.
As suggested by Arthur P. Golubev, C++ offers primitives to do this, such as std::call_once
Update:
Here's an example that shows one of the problems a race can cause.
#include <thread>
#include <atomic>
using namespace std::literals::chrono_literals;
int main()
{
int flag {0}; // wrong !
std::thread t{[&] { while (!flag); }};
std::this_thread::sleep_for(20ms);
flag = 1;
t.join();
}
The sleep is there to give the thread some time to initialize.
This program should return immediately, but compiled with full optimization -O3, it probably doesn't. This is caused by a valid compiler transformation, that changes the while-loop into something like this:
if (flag) return; while(1);
And if flag is (still) zero, this will run forever (changing the flag type to std::atomic<int> will solve this).
This is only one of the effects of undefined behavior, the compiler does not even have to commit the change to flag to memory.
With a race, or incorrectly set (or missing) barriers, operations can also be re-ordered causing unwanted effects, but these are less likely to occur on X86 since it is a generally more forgiving platform than weaker architectures (although re-ordering effects do exist on X86)

Somebody told me that the double-check lock is incorrect
It usually is.
IIRC double-checked locking originated in Java (whose more strongly-specified memory model made it viable).
From there it spread a plague of ill-informed and incorrect C++ code, presumably because it looks enough like Java to be vaguely plausible.
Since the variable is non-volatile
Double-checked locking cannot be made correct by using volatile for synchronization, because that's not what volatile is for.
Java is perhaps also the source of this misuse of volatile, since it means something entirely different there.
Thanks for linking to the review that suggested this, I'll go and downvote it.
But I really saw such a code snippet is used in many projects indeed. Could somebody shed some light on this matter?
As I say, it's a plague, or really I suppose a harmful meme in the original sense.
I googled and talked about it with my friends, but I still can't find out the answer.
... Is there any potential problem with double-check lock for C++?
There are nothing but problems with double-checked locking for C++. Almost nobody should ever use it. You should probably never copy code from anyone who does use it.
In preference order:
Just use a static local, which is even less effort and still guaranteed to be correct - in fact:
If multiple threads attempt to initialize the same static local variable concurrently, the initialization occurs exactly once (similar behavior can be obtained for arbitrary functions with std::call_once).
Note: usual implementations of this feature use variants of the double-checked locking pattern, which reduces runtime overhead for already-initialized local statics to a single non-atomic boolean comparison.
so you can get correct double-checked locking for free.
Use std::call_once if you need more elaborate initialization and don't want to package it into a class
Use (if you must) double-checked locking with a std::atomic_flag or std::atomic_bool flag and never volatile.

There is nothing to optimize away here (no commands to be excluded, see the details below), but there are the following:
It is possible that is_log_file is set to true before log_stream opens the file; and then another thread is possible to bypass the outer if block code and start using the stream before the std::ofstream::open has completed.
It could be solved by using std::atomic_thread_fence(std::memory_order_release); memory barrier before setting the flag to true.
Also, a compiler is forbidden to reorder accesses to volatile objects on the same thread (https://en.cppreference.com/w/cpp/language/as_if), but, as for the code specifically, the available set of operator << functions and write function of std::ofstream just is not for volatile objects - it would not be possible to write in the stream if make it volatile (and making volatile only the flag would not permit the reordering).
Note, a protection for is_log_file flag from a data race with C++ standard library means releasing std::memory_order_release or stronger memory order - the most reasonable would be std::atomic/std::atomic_bool (see LWimsey's answer for the sample of the code) - would make reordering impossible because the memory order
Formally, an execution with a data race is considered to be causing undefined behaviour - which in the double-checked lock is actual for is_log_file flag. In a conforming to the standard of the language code, the flag must be protected from a data race (the most reasonable way to do it would be using std::atomic/std::atomic_bool).
Though, in practice, if the compiler is not insane so that intentionally spoils your code (some people wrongly consider undefined behaviour to be what occurs in run-time and does not relate to compilation, but standard operates undefined behaviour to regulate compilation) under the pretext it is allowed everything if undefined behavior is caused (by the way, must be documented; see details of compiling C++ code with a data race in: https://stackoverflow.com/a/69062080/1790694
) and at the same time if it implements bool reasonably, so that consider any non-zero physical value as true (it would be reasonable since it must convert arithmetics, pointers and some others to bool so), there will never be a problem with partial setting the flag to true (it would not cause a problem when reading); so the only memory barrier std::atomic_thread_fence(std::memory_order_release); before setting the flag to true, so that reordering is prevented, would make your code work without problems.
At https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables you can read that implementations of initialization of static local variables since C++11 (which you also should consider to use for one-time actions in general, see the note about what to consider for one-time actions in general below) usually use variants of the double-checked locking pattern, which reduces runtime overhead for already-initialized local statics to a single non-atomic boolean comparison.
This is an examples of exactly that environment-dependent safety of a non-atomic flag which I stated above. But it should be understood that these solutions are environment-dependent, and, since they are parts of implementations of the compilers themselves, but not a program using the compilers, there is no concern of conforming to the standard there.
To make your program corresponding to the standard of the language and be protected (as far as the standard is implemented) against a compiler implementation details liberty, you must protect the flag from data races, and the most reasonable then, would be using std::atomic or std::atomic_bool.
Note, even without protection of the flag from data races:
because of the mutex, it is not possible that any thread would not get updates after changing values (both the bool flag and the std::ofstream object) by some thread.
The mutex implements the memory barrier, and if we don’t have the update when checking the flag in the first condition clause, we then get it then come to the mutex, and so guaranteedly have the updated value when checking the flag in the second condition clause.
because the flag can unobservably be potentially accessed in unpredictable ways from other translation units, the compiler would not be able to avoid writes and reads to the flag under the as-if rule even if the other code of translation unit would be so senseless (such as setting the flag to true and then starting the threads so that no resets to false accessible) that it would be permitted in case the flag is not accessible from other translation units.
For one-time actions in general besides raw protection with flags and mutexes consider using:
std::call_once (https://en.cppreference.com/w/cpp/thread/call_once);
calling a function for initializing a static local variable (https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables) if its lifetime suits since its initialization is data race safety (be careful in regards to the fact that data race safety of initialization of static local variables is present only since C++11).
All the mentioned multi-threading functionality is available since C++11 (but, since you are already using std::mutex which is available starting since it too, this is the case).
Also, you should correctly handle the cases of opening the file failure.
Also, everyone must protect your std::ofstream object from concurrent operations of writing to the stream.
Answering the additional question from the update of the question, there are no problems with properly implemented double-check lock, and the proper implementation is possible in C++.

Related

c++ double check if statement optimization with a lock

My question is not about double check locking but similar.
In the following code, is it possible for the compiler to optimize the inner if statement out?
void MyClass:foo()
{
if(x) //member data of MyClass defined as bool
{
loc.lock(); //mutex. Unlocks when out of scope
if(x) //can the compiler optimize this out?
{
fire1();//doSomthingNotAffectingX
}
}
}
X gets set via another thread in the same translation unit.
void MyClass::unset()
{
loc.lock();
x=false;
fire2();
}
The idea is to guarantee that if fire2 was called, fire1 can't be called

In my answer, I assume you are using std::mutex and not your own custom class!
Essentially, the compiler cannot optimize away the second if.
Well, if the compiler could somehow determine that x cannot be changed (legally), it might remove the check but clearly, this is not the case here. A compiler can only do optimization if the resulting program works AS-IF.
In C++ 11 and later, a mutex is a barrier. So if a variable is properly protected by a mutex, you would read the expected value. However, it you forgot to put some mutex at appropriate location, then the behavior will be undefined.
Here, as your first check is a read, you might have a problem, if x can again become true one after being set once to false.
Memory model
C++11 introduced a standardized memory model. What does it mean? And how is it going to affect C++ programming?
I highly recommend reading the book C++ Concurrency in Action, Practical Multithreading by Anthony Williams. It would help a lot understanding modern C++ multithreading.
In your code, do you need to protect fire1 and fire2 by the mutex? Calling a function can be dangerous if that function also wait for a mutex as it can cause deadlocks. Also, the lock might be effective longer than required. If you only need to ensure that fire1is not called if fire2 is called, then a std::atomic<bool> would be enough…
-- EDIT --
Finally, there is a good example in the documentation: std::mutex
-- EDIT #2 --
As pointed out in a comment, my answer is not fully valid for general cases. I do think that the above code would works correctly with a bool provide that it can only change from true to false.
So I have done a search on the web and found those articles:
Double-Checked Locking is Fixed In C++11
Double-checked locking: Why doesn’t it work and how to fix it

If the compiler doesn't know someone other than loc.lock() might change x, and it knows loc.lock() won't change x for sure, then it can assume x won't change between the ifs. In this case, it is likely to store x in a register, or omit the if altogether.
If you want to avoid such unsafe optimizations, you have to let the compiler know. On C++11, use std::atomic<bool>. On earlier versions, make it volatile bool. Do neither, and the compiler might break your code.

Volatile vs. memory fences

The code below is used to assign work to multiple threads, wake them up, and wait until they are done. The "work" in this case consists of "cleaning a volume". What exactly this operation does is irrelevant for this question -- it just helps with the context. The code is part of a huge transaction processing system.
void bf_tree_cleaner::force_all()
{
for (int i = 0; i < vol_m::MAX_VOLS; i++) {
_requested_volumes[i] = true;
}
// fence here (seq_cst)
wakeup_cleaners();
while (true) {
usleep(10000); // 10 ms
bool remains = false;
for (int vol = 0; vol < vol_m::MAX_VOLS; ++vol) {
// fence here (seq_cst)
if (_requested_volumes[vol]) {
remains = true;
break;
}
}
if (!remains) {
break;
}
}
}
A value in a boolean array _requested_volumes[i] tells whether thread i has work to do. When it is done, the worker thread sets it to false and goes back to sleep.
The problem I am having is that the compiler generates an infinite loop, where the variable remains is always true, even though all values in the array have been set to false. This only happens with -O3.
I have tried two solutions to fix that:
Declare _requested_volumes volatile
(EDIT: this solution does work actually. See edit below)
Many experts say that volatile has nothing to do with thread synchronization, and it should only be used in low-level hardware accesses. But there's a lot of dispute over this on the Internet. The way I understand it, volatile is the only way to refrain the compiler from optimizing away accesses to memory which is changed outside of the current scope, regardless of concurrent access. In that sense, volatile should do the trick, even if we disagree on best practices for concurrent programming.
Introduce memory fences
The method wakeup_cleaners() acquires a pthread_mutex_t internally in order to set a wake-up flag in the worker threads, so it should implicitly produce proper memory fences. But I'm not sure if those fences affect memory accesses in the caller method (force_all()). Therefore, I manually introduced fences in the locations specified by the comments above. This should make sure that writes performed by the worker thread in _requested_volumes are visible in the main thread.
What puzzles me is that none of these solutions works, and I have absolutely no idea why. The semantics and proper use of memory fences and volatile is confusing me right now. The problem is that the compiler is applying an undesired optimization -- hence the volatile attempt. But it could also be a problem of thread synchronization -- hence the memory fence attempt.
I could try a third solution in which a mutex protects every access to _requested_volumes, but even if that works, I would like to understand why, because as far as I understand, it's all about memory fences. Thus, it should make no difference whether it's done explicitly or implicitly via a mutex.
EDIT: My assumptions were wrong and Solution 1 actually does work. However, my question remains in order to clarify the use of volatile vs. memory fences. If volatile is such a bad thing, that should never be used in multithreaded programming, what else should I use here? Do memory fences also affect compiler optimizations? Because I see these as two orthogonal issues, and therefore orthogonal solutions: fences for visibility in multiple threads and volatile for preventing optimizations.

Many experts say that volatile has nothing to do with thread synchronization, and it should only be used in low-level hardware accesses.
Yes.
But there's a lot of dispute over this on the Internet.
Not, generally, between "the experts".
The way I understand it, volatile is the only way to refrain the compiler from optimizing away accesses to memory which is changed outside of the current scope, regardless of concurrent access.
Nope.
Non-pure, non-constexpr non-inlined function calls (getters/accessors) also necessarily have this effect. Admittedly link-time optimization confuses the issue of which functions may really get inlined.
In C, and by extension C++, volatile affects memory access optimization. Java took this keyword, and since it can't (or couldn't) do the tasks C uses volatile for in the first place, altered it to provide a memory fence.
The correct way to get the same effect in C++ is using std::atomic.
In that sense, volatile should do the trick, even if we disagree on best practices for concurrent programming.
No, it may have the desired effect, depending on how it interacts with your platform's cache hardware. This is brittle - it could change any time you upgrade a CPU, or add another one, or change your scheduler behaviour - and it certainly isn't portable.
If you're really just tracking how many workers are still working, sane methods might be a semaphore (synchronized counter), or mutex+condvar+integer count. Either are likely more efficient than busy-looping with a sleep.
If you're wedded to the busy loop, you could still reasonably have a single counter, such as std::atomic<size_t>, which is set by wakeup_cleaners and decremented as each cleaner completes. Then you can just wait for it to reach zero.
If you really want a busy loop and really prefer to scan the array each time, it should be an array of std::atomic<bool>. That way you can decide what consistency you need from each load, and it will control both the compiler optimizations and the memory hardware appropriately.

Apparently, volatile does the necessary for your example. The topic of volatile qualifier itself is too broad: you can start by searching "C++ volatile vs atomic" etc. There are a lot of articles and questions&answers on the internet, e.g. Concurrency: Atomic and volatile in C++11 memory model .
Briefly, volatile tells the compiler to disable some aggressive optimizations, particularly, to read the variable each time it is accessed (rather than storing it in a register or cache). There are compilers which do more so making volatile to act more like std::atomic: see Microsoft Specific section here. In your case disablement of an aggressive optimization is exactly what was necessary.
However, volatile doesn't define the order for the execution of the statements around it. That is why you need memory order in case you need to do something else with the data after the flags you check have been set.
For inter-thread communication it is appropriate to use std::atomic, particularly, you need to refactor _requested_volumes[vol] to be of type std::atomic<bool> or even std::atomic_flag: http://en.cppreference.com/w/cpp/atomic/atomic .
An article that discourages usage of volatile and explains that volatile can be used only in rare special cases (connected with hardware I/O): https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt

Is it ok to read a shared boolean flag without locking it when another thread may set it (at most once)?

I would like my thread to shut down more gracefully so I am trying to implement a simple signalling mechanism. I don't think I want a fully event-driven thread so I have a worker with a method to graceully stop it using a critical section Monitor (equivalent to a C# lock I believe):
DrawingThread.h
class DrawingThread {
bool stopRequested;
Runtime::Monitor CSMonitor;
CPInfo *pPInfo;
//More..
}
DrawingThread.cpp
void DrawingThread::Run() {
if (!stopRequested)
//Time consuming call#1
if (!stopRequested) {
CSMonitor.Enter();
pPInfo = new CPInfo(/**/);
//Not time consuming but pPInfo must either be null or constructed.
CSMonitor.Exit();
}
if (!stopRequested) {
pPInfo->foobar(/**/);//Time consuming and can be signalled
}
if (!stopRequested) {
//One more optional but time consuming call.
}
}
void DrawingThread::RequestStop() {
CSMonitor.Enter();
stopRequested = true;
if (pPInfo) pPInfo->RequestStop();
CSMonitor.Exit();
}
I understand (at least in Windows) Monitor/locks are the least expensive thread synchronization primitive but I am keen to avoid overuse. Should I be wrapping each read of this boolean flag? It is initialized to false and only set once to true when stop is requested (if it is requested before the task completes).
My tutors advised to protect even bool's because read/writing may not be atomic. I think this one shot flag is the exception that proves the rule?

It is never OK to read something possibly modified in a different thread without synchronization. What level of synchronization is needed depends on what you are actually reading. For primitive types, you should have a look at atomic reads, e.g. in the form of std::atomic<bool>.
The reason synchronization is always needed is that the processors will have the data possibly shared in a cache line. It has no reason to update this value to a value possibly changed in a different thread if there is no synchronization. Worse, yet, if there is no synchronization it may write the wrong value if something stored close to the value is changed and synchronized.

Boolean assignment is atomic. That's not the problem.
The problem is that a thread may not not see changes to a variable done by a different thread due to either compiler or CPU instruction reordering or data caching (i.e. the thread that reads the boolean flag may read a cached value, instead of the actual updated value).
The solution is a memory fence, which indeed is implicitly added by lock statements, but for a single variable it's overkill. Just declare it as std::atomic<bool>.

The answer, I believe, is "it depends." If you're using C++03, threading isn't defined in the Standard, and you'll have to read what your compiler and your thread library say, although this kind of thing is usually called a "benign race" and is usually OK.
If you're using C++11, benign races are undefined behavior. Even when undefined behavior doesn't make sense for the underlying data type. The problem is that compilers can assume that programs have no undefined behavior, and make optimizations based on that (see also the Part 1 and Part 2 linked from there). For instance, your compiler could decide to read the flag once and cache the value because it's undefined behavior to write to the variable in another thread without some kind of mutex or memory barrier.
Of course, it may well be that your compiler promises to not make that optimization. You'll need to look.
The easiest solution is to use std::atomic<bool> in C++11, or something like Hans Boehm's atomic_ops elsewhere.

No, you have to protect every access, since modern compilers and cpus reorder the code without your multithreading tasks in mind. The read access from different threads might work, but don't have to work.

Are mutex lock functions sufficient without volatile?

A coworker and I write software for a variety of platforms running on x86, x64, Itanium, PowerPC, and other 10 year old server CPUs.
We just had a discussion about whether mutex functions such as pthread_mutex_lock() ... pthread_mutex_unlock() are sufficient by themselves, or whether the protected variable needs to be volatile.
int foo::bar()
{
//...
//code which may or may not access _protected.
pthread_mutex_lock(m);
int ret = _protected;
pthread_mutex_unlock(m);
return ret;
}
My concern is caching. Could the compiler place a copy of _protected on the stack or in a register, and use that stale value in the assignment? If not, what prevents that from happening? Are variations of this pattern vulnerable?
I presume that the compiler doesn't actually understand that pthread_mutex_lock() is a special function, so are we just protected by sequence points?
Thanks greatly.
Update: Alright, I can see a trend with answers explaining why volatile is bad. I respect those answers, but articles on that subject are easy to find online. What I can't find online, and the reason I'm asking this question, is how I'm protected without volatile. If the above code is correct, how is it invulnerable to caching issues?

Simplest answer is volatile is not needed for multi-threading at all.
The long answer is that sequence points like critical sections are platform dependent as is whatever threading solution you're using so most of your thread safety is also platform dependent.
C++0x has a concept of threads and thread safety but the current standard does not and therefore volatile is sometimes misidentified as something to prevent reordering of operations and memory access for multi-threading programming when it was never intended and can't be reliably used that way.
The only thing volatile should be used for in C++ is to allow access to memory mapped devices, allow uses of variables between setjmp and longjmp, and to allow uses of sig_atomic_t variables in signal handlers. The keyword itself does not make a variable atomic.
Good news in C++0x we will have the STL construct std::atomic which can be used to guarantee atomic operations and thread safe constructs for variables. Until your compiler of choice supports it you may need to turn to the boost library or bust out some assembly code to create your own objects to provide atomic variables.
P.S. A lot of the confusion is caused by Java and .NET actually enforcing multi-threaded semantics with the keyword volatile C++ however follows suit with C where this is not the case.

Your threading library should include the apropriate CPU and compiler barriers on mutex lock and unlock. For GCC, a memory clobber on an asm statement acts as a compiler barrier.
Actually, there are two things that protect your code from (compiler) caching:
You are calling a non-pure external function (pthread_mutex_*()), which means that the compiler doesn't know that that function doesn't modify your global variables, so it has to reload them.
As I said, pthread_mutex_*() includes a compiler barrier, e.g: on glibc/x86 pthread_mutex_lock() ends up calling the macro lll_lock(), which has a memory clobber, forcing the compiler to reload variables.

If the above code is correct, how is it invulnerable to caching
issues?
Until C++0x, it is not. And it is not specified in C. So, it really depends on the compiler. In general, if the compiler does not guarantee that it will respect ordering constraints on memory accesses for functions or operations that involve multiple threads, you will not be able to write multithreaded safe code with that compiler. See Hans J Boehm's Threads Cannot be Implemented as a Library.
As for what abstractions your compiler should support for thread safe code, the wikipedia entry on Memory Barriers is a pretty good starting point.
(As for why people suggested volatile, some compilers treat volatile as a memory barrier for the compiler. It's definitely not standard.)

The volatile keyword is a hint to the compiler that the variable might change outside of program logic, such as a memory-mapped hardware register that could change as part of an interrupt service routine. This prevents the compiler from assuming a cached value is always correct and would normally force a memory read to retrieve the value. This usage pre-dates threading by a couple decades or so. I've seen it used with variables manipulated by signals as well, but I'm not sure that usage was correct.
Variables guarded by mutexes are guaranteed to be correct when read or written by different threads. The threading API is required to ensure that such views of variables are consistent. This access is all part of your program logic and the volatile keyword is irrelevant here.

With the exception of the simplest spin lock algorithm, mutex code is quite involved: a good optimized mutex lock/unlock code contains the kind of code even excellent programmer struggle to understand. It uses special compare and set instructions, manages not only the unlocked/locked state but also the wait queue, optionally uses system calls to go into a wait state (for lock) or wake up other threads (for unlock).
There is no way the average compiler can decode and "understand" all that complex code (again, with the exception of the simple spin lock) no matter way, so even for a compiler not aware of what a mutex is, and how it relates to synchronization, there is no way in practice a compiler could optimize anything around such code.
That's if the code was "inline", or available for analyse for the purpose of cross module optimization, or if global optimization is available.
I presume that the compiler doesn't actually understand that
pthread_mutex_lock() is a special function, so are we just protected
by sequence points?
The compiler does not know what it does, so does not try to optimize around it.
How is it "special"? It's opaque and treated as such. It is not special among opaque functions.
There is no semantic difference with an arbitrary opaque function that can access any other object.
My concern is caching. Could the compiler place a copy of _protected
on the stack or in a register, and use that stale value in the
assignment?
Yes, in code that act on objects transparently and directly, by using the variable name or pointers in a way that the compiler can follow. Not in code that might use arbitrary pointers to indirectly use variables.
So yes between calls to opaque functions. Not across.
And also for variables which can only be used in the function, by name: for local variables that don't have either their address taken or a reference bound to them (such that the compiler cannot follow all further uses). These can indeed be "cached" across arbitrary calls include lock/unlock.
If not, what prevents that from happening? Are variations of this
pattern vulnerable?
Opacity of the functions. Non inlining. Assembly code. System calls. Code complexity. Everything that make compilers bail out and think "that's complicated stuff just make calls to it".
The default position of a compiler is always the "let's execute stupidly I don't understand what is being done anyway" not "I will optimize that/let's rewrite the algorithm I know better". Most code is not optimized in complex non local way.
Now let's assume the absolute worse (from out point of view which is that the compiler should give up, that is the absolute best from the point of view of an optimizing algorithm):
the function is "inline" (= available for inlining) (or global optimization kicks in, or all functions are morally "inline");
no memory barrier is needed (as in a mono-processor time sharing system, and in a multi-processor strongly ordered system) in that synchronization primitive (lock or unlock) so it contains no such thing;
there is no special instruction (like compare and set) used (for example for a spin lock, the unlock operation is a simple write);
there is no system call to pause or wake threads (not needed in a spin lock);
then we might have a problem as the compiler could optimize around the function call. This is fixed trivially by inserting a compiler barrier such as an empty asm statement with a "clobber" for other accessible variables. That means that compiler just assumes that anything that might be accessible to a called function is "clobbered".
or whether the protected variable needs to be volatile.
You can make it volatile for the usual reason you make things volatile: to be certain to be able to access the variable in the debugger, to prevent a floating point variable from having the wrong datatype at runtime, etc.
Making it volatile would actually not even fix the issue described above as volatile is essentially a memory operation in the abstract machine that has the semantics of an I/O operation and as such is only ordered with respect to
real I/O like iostream
system calls
other volatile operations
asm memory clobbers (but then no memory side effect is reordered around those)
calls to external functions (as they might do one the above)
Volatile is not ordered with respect to non volatile memory side effects. That makes volatile practically useless (useless for practical uses) for writing thread safe code in even the most specific case where volatile would a priori help, the case where no memory fence is ever needed: when programming threading primitives on a time sharing system on a single CPU. (That may be one of the least understood aspects of either C or C++.)
So while volatile does prevent "caching", volatile doesn't even prevent compiler reordering of lock/unlock operation unless all shared variables are volatile.

Locks/synchronisation primitives make sure the data is not cached in registers/cpu cache, that means data propagates to memory. If two threads are accessing/ modifying data with in locks, it is guaranteed that data is read from memory and written to memory. We don't need volatile in this use case.
But the case where you have code with double checks, compiler can optimise the code and remove redundant code, to prevent that we need volatile.
Example: see singleton pattern example
https://en.m.wikipedia.org/wiki/Singleton_pattern#Lazy_initialization
Why do some one write this kind of code?
Ans: There is a performance benefit of not accuiring lock.
PS: This is my first post on stack overflow.

Not if the object you're locking is volatile, eg: if the value it represents depends on something foreign to the program (hardware state).
volatile should NOT be used to denote any kind of behavior that is the result of executing the program.
If it's actually volatile what I personally would do is locking the value of the pointer/address, instead of the underlying object.
eg:
volatile int i = 0;
// ... Later in a thread
// ... Code that may not access anything without a lock
std::uintptr_t ptr_to_lock = &i;
some_lock(ptr_to_lock);
// use i
release_some_lock(ptr_to_lock);
Please note that it only works if ALL the code ever using the object in a thread locks the same address. So be mindful of that when using threads with some variable that is part of an API.

Can a C/C++ compiler legally cache a variable in a register across a pthread library call?

Suppose that we have the following bit of code:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
void guarantee(bool cond, const char *msg) {
if (!cond) {
fprintf(stderr, "%s", msg);
exit(1);
}
}
bool do_shutdown = false; // Not volatile!
pthread_cond_t shutdown_cond = PTHREAD_COND_INITIALIZER;
pthread_mutex_t shutdown_cond_mutex = PTHREAD_MUTEX_INITIALIZER;
/* Called in Thread 1. Intended behavior is to block until
trigger_shutdown() is called. */
void wait_for_shutdown_signal() {
int res;
res = pthread_mutex_lock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not lock shutdown cond mutex");
while (!do_shutdown) { // while loop guards against spurious wakeups
res = pthread_cond_wait(&shutdown_cond, &shutdown_cond_mutex);
guarantee(res == 0, "Could not wait for shutdown cond");
}
res = pthread_mutex_unlock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not unlock shutdown cond mutex");
}
/* Called in Thread 2. */
void trigger_shutdown() {
int res;
res = pthread_mutex_lock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not lock shutdown cond mutex");
do_shutdown = true;
res = pthread_cond_signal(&shutdown_cond);
guarantee(res == 0, "Could not signal shutdown cond");
res = pthread_mutex_unlock(&shutdown_cond_mutex);
guarantee(res == 0, "Could not unlock shutdown cond mutex");
}
Can a standards-compliant C/C++ compiler ever cache the value of do_shutdown in a register across the call to pthread_cond_wait()? If not, which standards/clauses guarantee this?
The compiler could hypothetically know that pthread_cond_wait() does not modify do_shutdown. This seems rather improbable, but I know of no standard that prevents it.
In practice, do any C/C++ compilers cache the value of do_shutdown in a register across the call to pthread_cond_wait()?
Which function calls is the compiler guaranteed not to cache the value of do_shutdown across? It's clear that if the function is declared externally and the compiler cannot access its definition, it must make no assumptions about its behavior so it cannot prove that it does not access do_shutdown. If the compiler can inline the function and prove it does not access do_shutdown, then can it cache do_shutdown even in a multithreaded setting? What about a non-inlined function in the same compilation unit?

Of course the current C and C++ standards say nothing on the subject.
As far as I know, Posix still avoids formally defining a concurrency model (I may be out of date, though, in which case apply my answer only to earlier Posix versions). Therefore what it does say has to be read with a little sympathy - it does not precisely lay out the requirements in this area, but implementers are expected to "know what it means" and do something that makes threads usable.
When the standard says that mutexes "synchronize memory access", implementations must assume that this means changes made under the lock in one thread will be visible under the lock in other threads. In other words, it's necessary (although not sufficient) that synchronization operations include memory barriers of one kind or another, and necessary behaviour of a memory barrier is that it must assume globals can change.
Threads Cannot be Implemented as a Library covers some specific issues that are required for a pthreads to actually be usable, but are not explicitly stated in the Posix standard at the time of writing (2004). It becomes quite important whether your compiler-writer, or whoever defined the memory model for your implementation, agrees with Boehm what "usable" means, in terms of allowing the programmer to "reason convincingly about program correctness".
Note that Posix doesn't guarantee a coherent memory cache, so if your implementation perversely wants to cache do_something in a register in your code, then even if you marked it volatile, it might perversely choose not to dirty your CPU's local cache between the synchronizing operation and reading do_something. So if the writer thread is running on a different CPU with its own cache, you might not see the change even then.
That's (one reason) why threads cannot be implemented merely as a library. This optimization of fetching a volatile global only from local CPU cache would be valid in a single-threaded C implementation[*], but breaks multi-threaded code. Hence, the compiler needs to "know about" threads, and how they affect other language features (for an example outside pthreads: on Windows, where cache is always coherent, Microsoft spells out the additional semantics that it grants volatile in multi-threaded code). Basically, you have to assume that if your implementation has gone to the trouble of providing the pthreads functions, then it will go to the trouble of defining a workable memory model in which locks actually synchronize memory access.
If the compiler can inline the
function and prove it does not access
do_shutdown, then can it cache
do_shutdown even in a multithreaded
setting? What about a non-inlined
function in the same compilation unit?
Yes to all of this - if the object is non-volatile, and the compiler can prove that this thread doesn't modify it (either through its name or through an aliased pointer), and if no memory barriers occur, then it can reuse previous values. There can and will be other implementation-specific conditions that sometimes stop it, of course.
[*] provided that the implementation knows the global is not located at some "special" hardware address which requires that reads always go through cache to main memory in order to see the results of whatever hardware op affects that address. But to put a global at any such location, or to make its location special with DMA or whatever, requires implementation-specific magic. Absent any such magic the implementation in principle can sometimes know this.

Since do_shutdown has external linkage there's no way the compiler could know what happens to it across the call (unless it had full visibility to the functions being called). So it would have to reload the value (volatile or not - threading has no bearing on this) after the call.
As far as I know there's nothing directly said about this in the standard, except that the (single-threaded) abstract machine the standard uses to define the behavior of expressions indicates that the variable needs to be read when it's accessed in an expression. The standard permits that reading of the variable to be optimized away only if the behavior can be proven to be "as if" it were reloaded. And that can happen only if the compiler can know that the value was not modified by the function call.
Also not that the pthread library does make certain guarantees about memory barriers for various functions, including pthread_cond_wait(): Does guarding a variable with a pthread mutex guarantee it's also not cached?
Now, if do_shutdown were static (no external linkage) and you have several threads that used that static variable defined in the same module (ie., the address of the static variable was never taken to be passed to another module), That might be a different story. for example, say that you have a single function that used such a variable, and started several thread instances running for that function. In that case, a standards conforming compiler implementation might cache the value across function calls since it could assume that nothing else could modify the value (the standard's abstract machine model doesn't include threading).
So in that case, you would have to use mechanisms to ensure that the value was reloaded across the call. Note that because of hardware intricacies, the volatile keyword might not be adequate to ensure correct memory access ordering - you should rely on APIs provided by pthreads or the OS to ensure that. (as a side-note, recent versions of Microsoft's compilers do document that volatile enforce full memory barriers, but I've read opinions that indicate this isn't required by the standard).

The hand-waving answers are all wrong. Sorry to be harsh.
There is no way
The compiler could hypothetically know that pthread_cond_wait() does not modify do_shutdown.
If you believe differently, please show proof: a complete C++ program such that a compiler not designed for MT could deduce that pthread_cond_wait does not modify do_shutdown.
It's absurd, a compiler cannot possibly understand what pthread_ functions do, unless it has built-in knowledge of POSIX threads.

From my own work, I can say that yes, the compiler can cache values across pthread_mutex_lock/pthread_mutex_unlock. I spent most of a weekend tracing down a bug in a bit of code that was caused by a set of pointers assignments being cached and unavailable to the threads that needed them. As a quick test, I wrapped the assignments in a mutex lock/unlock, and the threads still did not have access to the proper pointer values. Moving the pointer assignments & associated mutex locking to a separate function did fix the problem.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js