Why does libc++ allow recursive locking of std::mutex? - c++

std::mutex is nonrecursive, and violation of that is UB. So anything is possible in theory(including works as std::recursive_mutex)), but libc++ seems to work fine , this program outputs
bye
#include <iostream>
#include <mutex>
std::mutex m;
int main() {
std::scoped_lock l1(m);
std::scoped_lock l2(m);
std::cout << "bye" << std::endl;
}
Is this intentional design decision in libc++ or just some accident(for example they could use same logic for mutex and recursive_mutex)?
libstdc++ hangs.
note: I am aware of that people should not rely on UB, so this is not about best practices, I am just curious about obscure implementation details.

It does not seem to be an intentional design decision. The libc++ implementation of std::mutex is just a wrapper around the platform's POSIX default mutex. Since that also is defined to have UB if locked recursively, they just inherited the fact that the platform's default POSIX mutex also happens to allow recursive locking.

I get the opposite results: libc++ hangs and libstdc++ doesn't
The reason is that if the file is not compiled with -pthread, threading support is disabled and std::mutex::lock/unlock become noops. Adding -pthread makes both of them deadlock as expected.
libc++ is built with threading support by default and doesn't require the -pthread flag, so it std::mutex::lock does actually acquire a lock, creating the deadlock.

Related

Is there any potential problem with double-check lock for C++?

Here is a simple code snippet for demonstration.
Somebody told me that the double-check lock is incorrect. Since the variable is non-volatile, the compiler is free to reorder the calls or optimize them away(For details, see codereview.stackexchange.com/a/266302/226000).
But I really saw such a code snippet is used in many projects indeed. Could somebody shed some light on this matter? I googled and talked about it with my friends, but I still can't find out the answer.
#include <iostream>
#include <mutex>
#include <fstream>
namespace DemoLogger
{
void InitFd()
{
if (!is_log_file_ready)
{
std::lock_guard<std::mutex> guard(log_mutex);
if (!is_log_file_ready)
{
log_stream.open("sdk.log", std::ofstream::out | std::ofstream::trunc);
is_log_file_ready = true;
}
}
}
extern static bool is_log_file_ready;
extern static std::mutex log_mutex;
extern static std::ofstream log_stream;
}
//cpp
namespace DemoLogger
{
bool is_log_file_ready{false};
std::mutex log_mutex;
std::ofstream log_stream;
}
UPDATE:
Thanks to all of you. There is better implementation for InitFd() indeed, but it's only a simple demo indeed, what I really want to know is that whether there is any potential problem with double-check lock or not.
For the complete code snippet, see https://codereview.stackexchange.com/questions/266282/c-logger-by-template.
The double-checked lock is incorrect because is_log_file_ready is a plain bool, and this flag can be accessed by multiple threads one of which is a writer - that is a race
The simple fix is to change the declaration:
std::atomic<bool> is_log_file_ready{false};
You can then further relax operations on is_log_file_ready:
void InitFd()
{
if (!is_log_file_ready.load(std::memory_order_acquire))
{
std::lock_guard<std::mutex> guard(log_mutex);
if (!is_log_file_ready.load(std::memory_order_relaxed))
{
log_stream.open("sdk.log", std::ofstream::out | std::ofstream::trunc);
is_log_file_ready.store(true, std::memory_order_release);
}
}
}
But in general, double-checked locking should be avoided except in low-level implementations.
As suggested by Arthur P. Golubev, C++ offers primitives to do this, such as std::call_once
Update:
Here's an example that shows one of the problems a race can cause.
#include <thread>
#include <atomic>
using namespace std::literals::chrono_literals;
int main()
{
int flag {0}; // wrong !
std::thread t{[&] { while (!flag); }};
std::this_thread::sleep_for(20ms);
flag = 1;
t.join();
}
The sleep is there to give the thread some time to initialize.
This program should return immediately, but compiled with full optimization -O3, it probably doesn't. This is caused by a valid compiler transformation, that changes the while-loop into something like this:
if (flag) return; while(1);
And if flag is (still) zero, this will run forever (changing the flag type to std::atomic<int> will solve this).
This is only one of the effects of undefined behavior, the compiler does not even have to commit the change to flag to memory.
With a race, or incorrectly set (or missing) barriers, operations can also be re-ordered causing unwanted effects, but these are less likely to occur on X86 since it is a generally more forgiving platform than weaker architectures (although re-ordering effects do exist on X86)
Somebody told me that the double-check lock is incorrect
It usually is.
IIRC double-checked locking originated in Java (whose more strongly-specified memory model made it viable).
From there it spread a plague of ill-informed and incorrect C++ code, presumably because it looks enough like Java to be vaguely plausible.
Since the variable is non-volatile
Double-checked locking cannot be made correct by using volatile for synchronization, because that's not what volatile is for.
Java is perhaps also the source of this misuse of volatile, since it means something entirely different there.
Thanks for linking to the review that suggested this, I'll go and downvote it.
But I really saw such a code snippet is used in many projects indeed. Could somebody shed some light on this matter?
As I say, it's a plague, or really I suppose a harmful meme in the original sense.
I googled and talked about it with my friends, but I still can't find out the answer.
... Is there any potential problem with double-check lock for C++?
There are nothing but problems with double-checked locking for C++. Almost nobody should ever use it. You should probably never copy code from anyone who does use it.
In preference order:
Just use a static local, which is even less effort and still guaranteed to be correct - in fact:
If multiple threads attempt to initialize the same static local variable concurrently, the initialization occurs exactly once (similar behavior can be obtained for arbitrary functions with std::call_once).
Note: usual implementations of this feature use variants of the double-checked locking pattern, which reduces runtime overhead for already-initialized local statics to a single non-atomic boolean comparison.
so you can get correct double-checked locking for free.
Use std::call_once if you need more elaborate initialization and don't want to package it into a class
Use (if you must) double-checked locking with a std::atomic_flag or std::atomic_bool flag and never volatile.
There is nothing to optimize away here (no commands to be excluded, see the details below), but there are the following:
It is possible that is_log_file is set to true before log_stream opens the file; and then another thread is possible to bypass the outer if block code and start using the stream before the std::ofstream::open has completed.
It could be solved by using std::atomic_thread_fence(std::memory_order_release); memory barrier before setting the flag to true.
Also, a compiler is forbidden to reorder accesses to volatile objects on the same thread (https://en.cppreference.com/w/cpp/language/as_if), but, as for the code specifically, the available set of operator << functions and write function of std::ofstream just is not for volatile objects - it would not be possible to write in the stream if make it volatile (and making volatile only the flag would not permit the reordering).
Note, a protection for is_log_file flag from a data race with C++ standard library means releasing std::memory_order_release or stronger memory order - the most reasonable would be std::atomic/std::atomic_bool (see LWimsey's answer for the sample of the code) - would make reordering impossible because the memory order
Formally, an execution with a data race is considered to be causing undefined behaviour - which in the double-checked lock is actual for is_log_file flag. In a conforming to the standard of the language code, the flag must be protected from a data race (the most reasonable way to do it would be using std::atomic/std::atomic_bool).
Though, in practice, if the compiler is not insane so that intentionally spoils your code (some people wrongly consider undefined behaviour to be what occurs in run-time and does not relate to compilation, but standard operates undefined behaviour to regulate compilation) under the pretext it is allowed everything if undefined behavior is caused (by the way, must be documented; see details of compiling C++ code with a data race in: https://stackoverflow.com/a/69062080/1790694
) and at the same time if it implements bool reasonably, so that consider any non-zero physical value as true (it would be reasonable since it must convert arithmetics, pointers and some others to bool so), there will never be a problem with partial setting the flag to true (it would not cause a problem when reading); so the only memory barrier std::atomic_thread_fence(std::memory_order_release); before setting the flag to true, so that reordering is prevented, would make your code work without problems.
At https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables you can read that implementations of initialization of static local variables since C++11 (which you also should consider to use for one-time actions in general, see the note about what to consider for one-time actions in general below) usually use variants of the double-checked locking pattern, which reduces runtime overhead for already-initialized local statics to a single non-atomic boolean comparison.
This is an examples of exactly that environment-dependent safety of a non-atomic flag which I stated above. But it should be understood that these solutions are environment-dependent, and, since they are parts of implementations of the compilers themselves, but not a program using the compilers, there is no concern of conforming to the standard there.
To make your program corresponding to the standard of the language and be protected (as far as the standard is implemented) against a compiler implementation details liberty, you must protect the flag from data races, and the most reasonable then, would be using std::atomic or std::atomic_bool.
Note, even without protection of the flag from data races:
because of the mutex, it is not possible that any thread would not get updates after changing values (both the bool flag and the std::ofstream object) by some thread.
The mutex implements the memory barrier, and if we don’t have the update when checking the flag in the first condition clause, we then get it then come to the mutex, and so guaranteedly have the updated value when checking the flag in the second condition clause.
because the flag can unobservably be potentially accessed in unpredictable ways from other translation units, the compiler would not be able to avoid writes and reads to the flag under the as-if rule even if the other code of translation unit would be so senseless (such as setting the flag to true and then starting the threads so that no resets to false accessible) that it would be permitted in case the flag is not accessible from other translation units.
For one-time actions in general besides raw protection with flags and mutexes consider using:
std::call_once (https://en.cppreference.com/w/cpp/thread/call_once);
calling a function for initializing a static local variable (https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables) if its lifetime suits since its initialization is data race safety (be careful in regards to the fact that data race safety of initialization of static local variables is present only since C++11).
All the mentioned multi-threading functionality is available since C++11 (but, since you are already using std::mutex which is available starting since it too, this is the case).
Also, you should correctly handle the cases of opening the file failure.
Also, everyone must protect your std::ofstream object from concurrent operations of writing to the stream.
Answering the additional question from the update of the question, there are no problems with properly implemented double-check lock, and the proper implementation is possible in C++.

Is atomic<T*> always lock free?

On my MAC OS, atomic<T*> is lock free.
#include <iostream>
#include <atomic>
int main() {
std::cout << std::atomic<void*>().is_lock_free() << std::endl;
return 0;
}
output: 1
I want to know if atomic<T*> is always lock free?
Is there a reference to introduce it?
The standard allows implementing any atomic type (with exception of std::atomic_flag) to be implemented with locks. Even if the platform would allow lock-free atomics for some type, the standard library developers might not have implemented that.
If you need to implement something differently when locks are used, this can be checked at compile time using ATOMIC_POINTER_LOCK_FREE macro.
No, it is not safe to assume that any particular platform's implementation of std::atomic is always lock free.
The standard specifies some marker macros, including ATOMIC_POINTER_LOCK_FREE, which indicates either pointers are never, sometimes or always lock free, for the platform in question.
You can also get an answer from std::atomic<T *>::is_always_lock_free, for your particular T.1
Note 1: A given pointer type must be consistent, so the instance method std::atomic<T *>::is_lock_free() is redundant.

Why does this double mutex lock not cause deadlock?

I test c++11 mutex in my centos computer. I try to double lock this mutex to make deadlock. But after I run it, everything is fine and no deadlock occurs.
#include <thread>
#include <mutex>
#include <iostream>
std::mutex m;
int main()
{
m.lock();
m.lock();
std::cout<<"i am ok"<<std::endl;
return 0;
}
The compiler is g++ 4.8.5 in centos 3.10.0-327.36.3.el7.x86_64:
[zzhao010#localhost shareLibPlay]$ ./3.out
i am ok
Locking a std::mutex that is already locked by the same thread is undefined behavior and therefore it may work, it may fail, it may drink all your beer and throw up on the couch. No guarantees.
The behavior is undefined in case you invoke lock twice as you did.
It works as you would expect it to do is a valid undefined behavior indeed.
See here for further details.
For a deadlock, you need at least two
By definition, a deadlock involves at least 2 parties. This was laid down by many authors, among others Hoare in his pioneering work Communicating Sequential Processes. This is also reminded in the C++ standard definitions (emphasis is mine):
17.3.8: Deadlock: one or more threads are unable to continue execution because
each is blocked waiting for one or more of the others to satisfy some
condition
A more illustrative definition is given by Anthony Williams, in C++ concurrency in action
Neither thread can proceed, because each is waiting for the other to release it's mutex. This scenario is called deadlock and it's the biggest problem with having to lock two or more mutexes.
You can therefore by definition not create a deadlock with a single thread in a single process.
Don't misunderstand the standard
The standard says on mutexes:
30.4.1.2.1/4 [Note: A program may deadlock if the thread that owns a mutex object calls lock() on that object.]
This is a non-normative note. I think it embarrassingly contradicts the standard's own definition. From the terminology point of view, a process that locks itself is in a blocked state.
But more important, and beyond the issue of deadlock terminology, the word "MAY" allows the said behavior for C++ implementations (e.g. if it is not able on a particular OS to detect a redundant lock acquisition). But it's not required at all : I believe that most mainstream C++ implementation will work fine, exactly as you have experienced yourself.
Want to experience with deadlocks ?
If you want to experience with real deadlocks, or if you want simply to find out if your C++ implementation is able to detect the resource_deadlock_would_occur error, here a short example. It could go fine but has high probability of creating a deadlock:
std::mutex m1,m2;
void foo() {
m1.lock();
std::cout<<"foo locked m1"<<std::endl;
std::this_thread::sleep_for (std::chrono::seconds(1));
m2.lock();
m1.unlock();
std::cout<<"foo locked m2 and releases m1"<<std::endl;
m2.unlock();
std::cout<<"foo is ok"<<std::endl;
}
void bar() {
m2.lock();
std::cout<<"bar locked m2"<<std::endl;
std::this_thread::sleep_for (std::chrono::seconds(1));
m1.lock();
m2.unlock();
std::cout<<"barlocked m1 and releases m2"<<std::endl;
m1.unlock();
std::cout<<"bar is ok"<<std::endl;
}
int main()
{
std::thread t1(foo);
bar();
t1.join();
std::cout << "Everything went fine"<<std::endl;
return 0;
}
Online demo
This kind of deadlock is avoided by locking the different mutexes always in the same order.

Linking pthread disables lock-free shared_ptr implementation

The title pretty much conveys all relevant information, but here's a minimal repro:
#include <atomic>
#include <cstdio>
#include <memory>
int main() {
auto ptr = std::make_shared<int>(0);
bool is_lockless = std::atomic_is_lock_free(&ptr);
printf("shared_ptr is lockless: %d\n", is_lockless);
}
Compiling this with the following compiler options produces a lock-free shared_ptr implementation:
g++ -std=c++11 -march=native main.cpp
While this doesn't:
g++ -std=c++11 -march=native -pthread main.cpp
GCC version: 5.3.0 (on Linux, using libstdc++), tested on multiple machines that should have the necessary atomic instructions to make this work.
Is there any way to force the lock-free implementation (I'd need the lock-free version, regardless of performance)?
There are two separate things:
Manipulation of the reference counter in the control block (or equivalent thing) is typically implemented with lock-free atomics whenever possible. This is not what std::atomic_is_lock_free tells you.
libstdc++'s __shared_ptr is templated on the lock policy, so you can explicitly use
template<typename T>
using shared_ptr_unsynchronized = std::__shared_ptr<T, __gnu_cxx::_S_single>;
if you know what you are doing.
std::atomic_is_lock_free tells you whether the atomic access functions (std::atomic_{store, load, exchange, compare_exchange} etc.) on shared_ptr are lock-free. Those functions are used to concurrently access the same shared_ptr object, and typical implementations will use a mutex.
If you use shared_ptr in a threaded environment, you NEED to have locks [of some kind - they could be implemented as atomic increment and decrement, but there may be places where a "bigger" lock is required to ensure no races]. The lockless version only works when there is only one thread. If you are not using threads, don't link with -lpthread.
I'm sure there is some tricky way to convince the compiler that you are not REALLY using the threads for your shared pointers, but you are REALLY in fragile territory if you do - what happens if a shared_ptr is passed to a thread? You may be able to guarantee that NOW, but someone will probably accidentally or on purpose introduce one into something that runs in a different thread, and it all breaks.

Do any C++11 thread-safety guarantees apply to third-party thread libraries compiled/linked with C++11?

C++11 offers features like thread-safe initialization of static variables, and citing that question we'll say for instance:
Logger& g_logger() {
static Logger lg;
return lg;
}
So ostensibly (?) this is true regardless of whether a module compiled with a C++11 compiler included the thread headers, or spawned any threads in its body. You're offered the guarantee even if it were linked against another module that used C++11 threads and called the function.
But what if your "other module" that calls into this code wasn't using C++11 threads, but something like Qt's QThread. Is atomic initialization of statics then outside of the scope of C++11's ability to make such a guarantee? Or does the mere fact of a module having been compiled with C++11 and then linked against other C++11 code imply that you will get the guarantee regardless?
Does anyone know a good reference where issues like this are covered?
Does anyone know a good reference where issues like this are covered?
Sure. The C++ standard. It describes the behavior of C++ code. If your library is C++ code, it is required to follow this behavior. So yes, you get the same guarantees as if it had been your own code.
Exactly how the compiler/runtime/OS/everything else pulls it off is Not Your Problem. The C++ standard guarantees that it is taken care of.
Your example relies on the memory model, not on how threads are implemented. Whoever executes this code will execute the same instructions. If two or more cores execute this code, they will obey the memory model.
The basic implementation is equivalent to this:
std::mutex mtx;
Logger * lg = 0;
Logger& g_logger() {
std::unique_lock<std::mutex> lck(mtx);
if (lg == 0)
lg = new Logger;
return *lg;
}
This code may be optimized to use the double-checked locking pattern (DCLP) which, on a particular processor architecture (e.g., on the x86) might be much faster. Also, because the compiler generates this code, it will know not to make crazy optimizations that break the naive DCLP.