Do global reference capturing lambdas in C++ inhibit alias optimisations? - c++

A question turned up when debugging some code at work for race conditions: here is a reduced example:
//! Schedules a callable to be executed asynchronously
template<class F> void schedule(F &&f);
int main(void)
{
bool flag(false);
// Ignore the fact this is thread unsafe :)
schedule([&] { flag=true; });
// Can the compiler assume under strict aliasing that this check
// for flag being false can be eliminated?
if(!flag)
{
// do something
}
return 0;
}
Obviously the code fragment is thread unsafe - that bool flag needs to be a std::atomic and then the seq_cst memory ordering would force the compiler to always check the value being tested by if. This question isn't about that - it's about whether initialising a capture-all reference lambda tells the compiler that flag may have been aliased, and therefore to not constexpr elide the check for flag's value later on under optimisation?
My own personal guess is that constructing a [&flag] {...} lambda would suggest potential aliasing of flag, while a [&] {...} clobbering all auto initialised variables with being potentially aliased sounds too extreme an anti-optimisation so I'm guessing no to that. However, I would not be surprised if reference capturing lambdas don't alias clobber anything at all.
Over to you C++ language experts! And my thanks in advance.
Edit: I knew that the lack of thread safety would be seen as an answer, however that is not what I am asking. Let me reduce my example still further:
int main(void)
{
bool flag(false);
// Note that this is not invoked, just constructed.
auto foo=[&] { flag=true; };
// Can the compiler assume under strict aliasing that this check
// for flag being false can be eliminated?
if(!flag)
{
// do something
}
return 0;
}
Now can that check for flag being false be elided?
Edit: For those of you coming here in the future, my best understanding of the answers below is "yes, the check can be elided" i.e. constructing a lambda which takes a local variable by reference is not considered by the compiler's optimiser as potentially modifying that variable, and therefore the compiler's optimiser could legally elide subsequent reloads of that variable's storage. Thanks to everyone for your answers.

You can't ignore the lack of thread safety. Data races yield undefined behaviour and this code has a data race, so the answer to "Can the compiler assume under strict aliasing that this check for flag being false can be eliminated?" is "The compiler can do whatever it wants."
If you fix that and make the code thread safe with a std::atomic<bool>, the question disappears: the compiler cannot discard the check because it has to conform to the memory model requirements of atomic variables.
If instead the schedule call didn't do anything related to multithreading, the compiler has to preserve the semantics of the abstract machine. The abstract machine does check the value of the flag, but a compiler might be able to perform static analysis that proves the flag will always have a certain value (either always false or always true) and skip the check. That's allowed under the as-if rule: it is not possible to write a C++ program that can reliably tell the difference between the two possibilities (optimise or not).
So, for the second example, the answer is "The compiler can do whatever it wants, as long as the observable behaviour is the same as if it performed the check."

Related

Passing std::memory_order as constexpr differs from when it is not constexpr

This is a question regarding std::memory_order.
I have created the following two functions which are to be run on separate threads.
constexpr std::memory_order order = std::memory_order_relaxed;
void write() // writer thread
{
x.store(true, order);
y.store(true, order);
}
void read() // reader thread
{
while (!y.load(order)) {}
assert(x.load(order));
}
On an Arm host, the assert can fire, as the memory order I pass in the store and load methods are std::memory_order_relaxed. However, if I strip the "constexpr" off, the assert will never fire.
It seems to me that the value of the std::memory_order of the store and load methods must be determined in compile-time. If it is deferred to run-time, however, the behaviour will be undefiend.
Am I correct? If that is true, is there any C++ documentation emphasizing this limitation?
constexpr doesn't matter. The assertion can (but doesn't need to) fail in either case. You are just getting lucky.
There is no undefined behavior if the order is not a compile-time constant.
Possibly (although impossible to tell without seeing generated assembly) is that the compiler wasn't able to determine that the value of order is a compile-time constant without the constexpr and so didn't apply some optimization/transformation based on that.
As user17732522 says, it is perfectly legal for the memory_order to not be a constant, and the behavior is well defined. There is nowhere in the standard that says it must be a constant.
I am guessing you are using gcc as your compiler. It appears that with gcc specifically, when the memory_order argument is not a compile-time constant, it simply ignores the argument and emits an unconditional seq_cst operation. This is certainly legal; it is always safe to provide stronger ordering than you requested. They probably think that the performance difference between seq_cst and a potentially weaker order is small, and not worth the cost of a sequence of tests and branches to select different instructions at runtime.
(Hopefully it goes without saying that you should not rely on this behavior! If you only request relaxed ordering, your code had better be correct in the event you actually get relaxed ordering.)
Try on godbolt. With gcc, on all the architectures I tested, the function unknown() has identical assembly code to cst(), which is different from rel() (relaxed store).
Testing a couple other compilers:
icc: emits seq_cst unconditionally
icx: test and branch
clang: test and branch

Is there any potential problem with double-check lock for C++?

Here is a simple code snippet for demonstration.
Somebody told me that the double-check lock is incorrect. Since the variable is non-volatile, the compiler is free to reorder the calls or optimize them away(For details, see codereview.stackexchange.com/a/266302/226000).
But I really saw such a code snippet is used in many projects indeed. Could somebody shed some light on this matter? I googled and talked about it with my friends, but I still can't find out the answer.
#include <iostream>
#include <mutex>
#include <fstream>
namespace DemoLogger
{
void InitFd()
{
if (!is_log_file_ready)
{
std::lock_guard<std::mutex> guard(log_mutex);
if (!is_log_file_ready)
{
log_stream.open("sdk.log", std::ofstream::out | std::ofstream::trunc);
is_log_file_ready = true;
}
}
}
extern static bool is_log_file_ready;
extern static std::mutex log_mutex;
extern static std::ofstream log_stream;
}
//cpp
namespace DemoLogger
{
bool is_log_file_ready{false};
std::mutex log_mutex;
std::ofstream log_stream;
}
UPDATE:
Thanks to all of you. There is better implementation for InitFd() indeed, but it's only a simple demo indeed, what I really want to know is that whether there is any potential problem with double-check lock or not.
For the complete code snippet, see https://codereview.stackexchange.com/questions/266282/c-logger-by-template.
The double-checked lock is incorrect because is_log_file_ready is a plain bool, and this flag can be accessed by multiple threads one of which is a writer - that is a race
The simple fix is to change the declaration:
std::atomic<bool> is_log_file_ready{false};
You can then further relax operations on is_log_file_ready:
void InitFd()
{
if (!is_log_file_ready.load(std::memory_order_acquire))
{
std::lock_guard<std::mutex> guard(log_mutex);
if (!is_log_file_ready.load(std::memory_order_relaxed))
{
log_stream.open("sdk.log", std::ofstream::out | std::ofstream::trunc);
is_log_file_ready.store(true, std::memory_order_release);
}
}
}
But in general, double-checked locking should be avoided except in low-level implementations.
As suggested by Arthur P. Golubev, C++ offers primitives to do this, such as std::call_once
Update:
Here's an example that shows one of the problems a race can cause.
#include <thread>
#include <atomic>
using namespace std::literals::chrono_literals;
int main()
{
int flag {0}; // wrong !
std::thread t{[&] { while (!flag); }};
std::this_thread::sleep_for(20ms);
flag = 1;
t.join();
}
The sleep is there to give the thread some time to initialize.
This program should return immediately, but compiled with full optimization -O3, it probably doesn't. This is caused by a valid compiler transformation, that changes the while-loop into something like this:
if (flag) return; while(1);
And if flag is (still) zero, this will run forever (changing the flag type to std::atomic<int> will solve this).
This is only one of the effects of undefined behavior, the compiler does not even have to commit the change to flag to memory.
With a race, or incorrectly set (or missing) barriers, operations can also be re-ordered causing unwanted effects, but these are less likely to occur on X86 since it is a generally more forgiving platform than weaker architectures (although re-ordering effects do exist on X86)
Somebody told me that the double-check lock is incorrect
It usually is.
IIRC double-checked locking originated in Java (whose more strongly-specified memory model made it viable).
From there it spread a plague of ill-informed and incorrect C++ code, presumably because it looks enough like Java to be vaguely plausible.
Since the variable is non-volatile
Double-checked locking cannot be made correct by using volatile for synchronization, because that's not what volatile is for.
Java is perhaps also the source of this misuse of volatile, since it means something entirely different there.
Thanks for linking to the review that suggested this, I'll go and downvote it.
But I really saw such a code snippet is used in many projects indeed. Could somebody shed some light on this matter?
As I say, it's a plague, or really I suppose a harmful meme in the original sense.
I googled and talked about it with my friends, but I still can't find out the answer.
... Is there any potential problem with double-check lock for C++?
There are nothing but problems with double-checked locking for C++. Almost nobody should ever use it. You should probably never copy code from anyone who does use it.
In preference order:
Just use a static local, which is even less effort and still guaranteed to be correct - in fact:
If multiple threads attempt to initialize the same static local variable concurrently, the initialization occurs exactly once (similar behavior can be obtained for arbitrary functions with std::call_once).
Note: usual implementations of this feature use variants of the double-checked locking pattern, which reduces runtime overhead for already-initialized local statics to a single non-atomic boolean comparison.
so you can get correct double-checked locking for free.
Use std::call_once if you need more elaborate initialization and don't want to package it into a class
Use (if you must) double-checked locking with a std::atomic_flag or std::atomic_bool flag and never volatile.
There is nothing to optimize away here (no commands to be excluded, see the details below), but there are the following:
It is possible that is_log_file is set to true before log_stream opens the file; and then another thread is possible to bypass the outer if block code and start using the stream before the std::ofstream::open has completed.
It could be solved by using std::atomic_thread_fence(std::memory_order_release); memory barrier before setting the flag to true.
Also, a compiler is forbidden to reorder accesses to volatile objects on the same thread (https://en.cppreference.com/w/cpp/language/as_if), but, as for the code specifically, the available set of operator << functions and write function of std::ofstream just is not for volatile objects - it would not be possible to write in the stream if make it volatile (and making volatile only the flag would not permit the reordering).
Note, a protection for is_log_file flag from a data race with C++ standard library means releasing std::memory_order_release or stronger memory order - the most reasonable would be std::atomic/std::atomic_bool (see LWimsey's answer for the sample of the code) - would make reordering impossible because the memory order
Formally, an execution with a data race is considered to be causing undefined behaviour - which in the double-checked lock is actual for is_log_file flag. In a conforming to the standard of the language code, the flag must be protected from a data race (the most reasonable way to do it would be using std::atomic/std::atomic_bool).
Though, in practice, if the compiler is not insane so that intentionally spoils your code (some people wrongly consider undefined behaviour to be what occurs in run-time and does not relate to compilation, but standard operates undefined behaviour to regulate compilation) under the pretext it is allowed everything if undefined behavior is caused (by the way, must be documented; see details of compiling C++ code with a data race in: https://stackoverflow.com/a/69062080/1790694
) and at the same time if it implements bool reasonably, so that consider any non-zero physical value as true (it would be reasonable since it must convert arithmetics, pointers and some others to bool so), there will never be a problem with partial setting the flag to true (it would not cause a problem when reading); so the only memory barrier std::atomic_thread_fence(std::memory_order_release); before setting the flag to true, so that reordering is prevented, would make your code work without problems.
At https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables you can read that implementations of initialization of static local variables since C++11 (which you also should consider to use for one-time actions in general, see the note about what to consider for one-time actions in general below) usually use variants of the double-checked locking pattern, which reduces runtime overhead for already-initialized local statics to a single non-atomic boolean comparison.
This is an examples of exactly that environment-dependent safety of a non-atomic flag which I stated above. But it should be understood that these solutions are environment-dependent, and, since they are parts of implementations of the compilers themselves, but not a program using the compilers, there is no concern of conforming to the standard there.
To make your program corresponding to the standard of the language and be protected (as far as the standard is implemented) against a compiler implementation details liberty, you must protect the flag from data races, and the most reasonable then, would be using std::atomic or std::atomic_bool.
Note, even without protection of the flag from data races:
because of the mutex, it is not possible that any thread would not get updates after changing values (both the bool flag and the std::ofstream object) by some thread.
The mutex implements the memory barrier, and if we don’t have the update when checking the flag in the first condition clause, we then get it then come to the mutex, and so guaranteedly have the updated value when checking the flag in the second condition clause.
because the flag can unobservably be potentially accessed in unpredictable ways from other translation units, the compiler would not be able to avoid writes and reads to the flag under the as-if rule even if the other code of translation unit would be so senseless (such as setting the flag to true and then starting the threads so that no resets to false accessible) that it would be permitted in case the flag is not accessible from other translation units.
For one-time actions in general besides raw protection with flags and mutexes consider using:
std::call_once (https://en.cppreference.com/w/cpp/thread/call_once);
calling a function for initializing a static local variable (https://en.cppreference.com/w/cpp/language/storage_duration#Static_local_variables) if its lifetime suits since its initialization is data race safety (be careful in regards to the fact that data race safety of initialization of static local variables is present only since C++11).
All the mentioned multi-threading functionality is available since C++11 (but, since you are already using std::mutex which is available starting since it too, this is the case).
Also, you should correctly handle the cases of opening the file failure.
Also, everyone must protect your std::ofstream object from concurrent operations of writing to the stream.
Answering the additional question from the update of the question, there are no problems with properly implemented double-check lock, and the proper implementation is possible in C++.

C++ Are copies of variables optimised out?

Given a single core CPU embedded environment where reading and writing of variables is guaranteed to be atomic, and the following example:
struct Example
{
bool TheFlag;
void SetTheFlag(bool f) {
Theflag = f;
}
void UseTheFlag() {
if (TheFlag) {
// Do some stuff that has no effect on TheFlag
}
// Do some more stuff that has no effect on TheFlag
if (TheFlag) {
...
}
}
};
It is clear that if SetTheFlag was called by chance on another thread (or interrupt) between the two uses of TheFlag in UseTheFlag, there could be unexpected behavior (or some could argue it is expected behavior in this case!).
Can the following workaround be used to guarantee behavior?
void UseTheFlag() {
auto f = TheFlag;
if (f) {
// Do some stuff that has no effect on TheFlag
}
// Do some more stuff that has no effect on TheFlag
if (f) {
...
}
}
My practical testing showed the variable f is never optimised out and copied once from TheFlag (GCC 10, ARM Cortex M4). But, I would like to know for sure is it guaranteed by the compiler that f will not be optimised out?
I know there are better design practices, critical sections, disabling interrupts etc, but this question is about the behavior of compiler optimisation in this use case.
You might consider this from the point of view of the "as-if" rule, which, loosely stated, states that any optimisations applied by the compiler must not change the original meaning of the code.
So, unless the compiler can prove that TheFlag doesn't change during the lifetime of f, it is obliged to make a local copy.
That said, I'm not sure if 'proof' extends to modifications made to TheFlag in another thread or ISR. Marking TheFlag as atomic (or volatile, for an ISR) might help there.
The C++ standard does not say anything about what will happen in this case. It's just UB, since an object can be modified in one thread while another thread is accessing it.
You only say the platform specifies that these operations are atomic. Obviously, that isn't enough to ensure this code operates correctly. Atomicity only guarantees that two concurrent writes will leave the value as one of the two written values and that a read during one or more writes will never see a value not written. It says nothing about what happens in cases like this.
There is nothing wrong with any optimization that breaks this code. In particularly, atomicity does not prevent a read operation in another thread from seeing a value written before that read operation unless something known to synchronize was used.
If the compiler sees register pressure, nothing prevents it from simply reading TheFlag twice rather than creating a local copy. If the compile can deduce that the intervening code in this thread cannot modify TheFlag, the optimization is legal. Optimizers don't have to take into account what other threads might do unless you follow the rules and use things defined to synchronize or only require the explicit guarantees atomicity replies.
You go beyond that, so all bets are off. You need more than atomicity for TheFlag, so don't use a type that is merely atomic -- it isn't enough.

Can a compiler read twice from a global variable, instead of storing a local one?

I've been trying to get re-familiarized multi-threading recently and found this paper. One of the examples says to be careful when using code like this:
int my_counter = counter; // Read global
int (* my_func) (int);
if (my_counter > my_old_counter) {
... // Consume data
my_func = ...;
... // Do some more consumer work
}
... // Do some other work
if (my_counter > my_old_counter) {
... my_func(...) ...
}
Stating that:
If the compiler decides that it needs to spill the register
containing my counter between the two tests, it may well decide to
avoid storing the value (it’s just a copy of counter, after all), and
to instead simply re-read the value of counter for the second
comparison involving my counter[...]
Doing this would turn the code into:
int my_counter = counter; // Read global
int (* my_func) (int);
if (my_counter > my_old_counter) {
... // Consume data
my_func = ...;
... // Do some more consumer work
}
... // Do some other work
my_counter = counter; // Reread global!
if (my_counter > my_old_counter) {
... my_func(...) ...
}
I, however, am skeptical about this. I don't understand why the compiler is allowed to do this, since to my understanding a data race only occurs when trying to access the same memory area with any number of reads and at least a write at the same time. The author goes on to motivate that:
the core problem arises from the compiler taking advantage of the
assumption that variable values cannot asynchronously change without
an explicit assignment
It seems to me that the condition is respected in this case, as the local variable my_counter is never accessed twice and cannot be accessed by other threads. How would the compiler know that the global variable can not be set elsewhere, in another translation unit by another thread? It cannot, and in fact, I assume that the second if case would just be actually optimized away.
Is the author wrong, or am I missing something?
Unless counter is explicitly volatile, the compiler may assume that it never changes if nothing in the current scope of execution could change it. That means if there can be either no alias on a variable, or there are no function calls in between for which the compiler can't know the effects, any external modification is undefined behavior. With volatile you would be declaring external changes as possible, even if the compiler can't know how.
So that optimization is perfectly valid. In fact, even if it did actually perform the copy it still wouldn't be threadsafe as the value may change partially mid read, or might even be completely stale as cache coherency is not guaranteed without synchronisation primitives or atomics.
Well, actually on x86 you won't get an intermediate value for an integer, at least as long as it is aligned. That's one of the guarantees the architecture makes. Stale caches still apply, the value may already have been modified by another thread.
Use either a mutex or an atomic if you need this behavior.
Compilers [are allowed to] optimize presuming that anything which is "undefined behavior" simply cannot happen: that the programmer will prevent the code from being executed in such a way that would invoke the undefined behavior.
This can lead to rather silly executions where, for example, the following loop never terminates!
int vals[10];
for(int i = 0; i < 11; i++) {
vals[i] = i;
}
This is because the compiler knows that vals[10] would be undefined behavior, therefore it assumes it cannot happen, and since it cannot happen, i will never exceed or be equal to 11, therefore this loop never terminates. Not all compilers will aggressively optimize a loop like this in this way, though I do know that GCC does.
In the specific case you're working with, reading a global variable in this way can be undefined behavior iff [sic] it is possible for another thread to modify it in the interim. As a result, the compiler is assuming that cross-thread modifications never happen (because it's undefined behavior, and compilers can optimize presuming UB doesn't happen), and thus it's perfectly safe to reread the value (which it knows doesn't get modified in its own code).
The solution is to make counter atomic (std::atomic<int>), which forces the compiler to acknowledge that there might be some kind of cross-thread manipulation of the variable.

c++ double check if statement optimization with a lock

My question is not about double check locking but similar.
In the following code, is it possible for the compiler to optimize the inner if statement out?
void MyClass:foo()
{
if(x) //member data of MyClass defined as bool
{
loc.lock(); //mutex. Unlocks when out of scope
if(x) //can the compiler optimize this out?
{
fire1();//doSomthingNotAffectingX
}
}
}
X gets set via another thread in the same translation unit.
void MyClass::unset()
{
loc.lock();
x=false;
fire2();
}
The idea is to guarantee that if fire2 was called, fire1 can't be called
In my answer, I assume you are using std::mutex and not your own custom class!
Essentially, the compiler cannot optimize away the second if.
Well, if the compiler could somehow determine that x cannot be changed (legally), it might remove the check but clearly, this is not the case here. A compiler can only do optimization if the resulting program works AS-IF.
In C++ 11 and later, a mutex is a barrier. So if a variable is properly protected by a mutex, you would read the expected value. However, it you forgot to put some mutex at appropriate location, then the behavior will be undefined.
Here, as your first check is a read, you might have a problem, if x can again become true one after being set once to false.
Memory model
C++11 introduced a standardized memory model. What does it mean? And how is it going to affect C++ programming?
I highly recommend reading the book C++ Concurrency in Action, Practical Multithreading by Anthony Williams. It would help a lot understanding modern C++ multithreading.
In your code, do you need to protect fire1 and fire2 by the mutex? Calling a function can be dangerous if that function also wait for a mutex as it can cause deadlocks. Also, the lock might be effective longer than required. If you only need to ensure that fire1is not called if fire2 is called, then a std::atomic<bool> would be enough…
-- EDIT --
Finally, there is a good example in the documentation: std::mutex
-- EDIT #2 --
As pointed out in a comment, my answer is not fully valid for general cases. I do think that the above code would works correctly with a bool provide that it can only change from true to false.
So I have done a search on the web and found those articles:
Double-Checked Locking is Fixed In C++11
Double-checked locking: Why doesn’t it work and how to fix it
If the compiler doesn't know someone other than loc.lock() might change x, and it knows loc.lock() won't change x for sure, then it can assume x won't change between the ifs. In this case, it is likely to store x in a register, or omit the if altogether.
If you want to avoid such unsafe optimizations, you have to let the compiler know. On C++11, use std::atomic<bool>. On earlier versions, make it volatile bool. Do neither, and the compiler might break your code.