Undefined behaviour when using a non-atomic bool - c++

What is the worst thing that could happen when using a normal bool flag for controlling when one thread stops what it is doing? The peculiarity is that the exact time at which the thread stops is not very important at all, it just play backs some media, it might even be a half-second late in reacting for all I care. It has a simple while (!restart) loop:
while (!restart) //bool restart
{
//do something
}
and the other thread changes some seetings and then sets restart to true:
someSetting = newSetting;
restart = 1;
Since the playback loop runs thousands of times per second, I'm worried that using atomic bool might increase latency. I understand that this is "undefined behavior", but how does that manifest itself? If the bool is 54r*wx]% at some point, so what? Can I get runtime errors? The bool changes to a comprehensible value EVENTUALLY, doesn't it? (The code works currently, btw.) In another post, someone suggested that the flag might never change at all, since the threads have separate caches - this sounds iffy to me, surely the compiler must make sure that shared variables are changed even if a data race exists? Or is it possible that the order of execution for the controlling thread might change and someSetting might be changed after restart? Again, that just sounds creepy, why would a compiler allow that to happen?
I have considered setting a counter inside the loop and checking an atomic bool flag only every thousand times through. But I don't want to do that unless I really must.

UB doesn't mean that your code doesn't work, It just mean that behaviour of your code doesn't specified by standard. You must use std::atomic to make your code standard compliant without actually changing the behaviour. You can do this using memory_order_relaxed:
atomic<int> restart ....
while (!restart.load(memory_order_relaxed))
{
//do something
}
and in another thread:
someSetting = newSetting;
restart.store(1, memory_order_relaxed);
this code will emit the same instructions as yours.

Related

Once more volatile: necessary to prevent optimization?

I've been reading a lot about the 'volatile' keyword but I still don't have a definitive answer.
Consider this code:
class A
{
public:
void work()
{
working = true;
while(working)
{
processSomeJob();
}
}
void stopWorking() // Can be called from another thread
{
working = false;
}
private:
bool working;
}
As work() enters its loop the value of 'working' is true.
Now I'm guessing the compiler is allowed to optimize the while(working) to while(true) as the value of 'working' is true when starting the loop.
If this is not the case, that would mean something like this would be quite inefficient:
for(int i = 0; i < someOtherClassMember; i++)
{
doSomething();
}
...as the value of someOtherClassMember would have to be loaded each iteration.
If this is the case, I would think 'working' has to be volatile in order to prevent the compiler from optimising it.
Which of these two is the case? When googling the use of volatile I find people claiming it's only useful when working with I/O devices writing to memory directly, but I also find claims that it should be used in a scenario like mine.
Your program will get optimized into an infinite loop†.
void foo() { A{}.work(); }
gets compiled to (g++ with O2)
foo():
sub rsp, 8
.L2:
call processSomeJob()
jmp .L2
The standard defines what a hypothetical abstract machine would do with a program. Standard-compliant compilers have to compile your program to behave the same way as that machine in all observable behaviour. This is known as the as-if rule, the compiler has freedom as long as what your program does is the same, regardless of how.
Normally, reading and writing to a variable doesn't constitute as observable, which is why a compiler can elide as much reads and writes as it likes. The compiler can see working doesn't get assigned to and optimizes the read away. The (often misunderstood) effect of volatile is exactly to make them observable, which forces the compilers to leave the reads and writes alone‡.
But wait you say, another thread may assign to working. This is where the leeway of undefined behaviour comes in. The compiler may do anything when there is undefined behaviour, including formatting your hard drive and still be standard-compliant. Since there are no synchronization and working isn't atomic, any other thread writing to working is a data race, which is unconditionally undefined behaviour. Therefore, the only time an infinite loop is wrong is when there is undefined behaviour, by which the compiler decided your program might as well keep on looping.
TL;DR Don't use plain bool and volatile for multi-threading. Use std::atomic<bool>.
†Not in all situations. void bar(A& a) { a.work(); } doesn't for some versions.
‡Actually, there is some debate around this.
Now I'm guessing the compiler is allowed to optimize the while(working) to while(true)
Potentially, yes. But only if it can prove that processSomeJob() does not modify the working variable i.e. if it can prove that the loop is infinite.
If this is not the case, that would mean something like this would be quite inefficient ... as the value of someOtherClassMember would have to be loaded each iteration
Your reasoning is sound. However, the memory location might remain in cache, and reading from CPU cache isn't necessarily significantly slow. If doSomething is complex enough to cause someOtherClassMember to be evicted from the cache, then sure we'd have to load from memory, but on the other hand doSomething might be so complex that a single memory load is insignificant in comparison.
Which of these two is the case?
Either. The optimiser will not be able to analyse all possible code paths; we cannot assume that the loop could be optimised in all cases. But if someOtherClassMember is provably not modified in any code paths, then proving it would be possible in theory, and therefore the loop can be optimised in theory.
but I also find claims that [volatile] should be used in a scenario like mine.
volatile doesn't help you here. If working is modified in another thread, then there is a data race. And data race means that the behaviour of the program is undefined.
To avoid a data race, you need synchronisation: Either use a mutex, or atomic operations to share access across threads.
Volatile will make the while loop reload the working variable on every check. Practically that will often allow you to stop the working function with a call to stopWorking made from an asynchronous signal handler or another thread, but as per the standard it's not enough. The standard requires lock-free atomics or variables of type volatile sig_atomic_t for sighandler <-> regular context communication and atomics for inter-thread communication.

Is there a real-life situation where a simple pointer-to-bool as thread cancellation flag will not effectively cancel a thread?

First and foremost, I understand that formally, using a non-atomic flag to cancel a thread is very much undefined behaviour in the sense that the language does not specify if this variable will be written to before the thread exits.
At work, this was implemented a long time ago, and most calculation threads check the value of this bool throughout their work, as to gracefully cancel whatever it is they're doing. When I first saw this, my first reaction was to change all of this to use a better way (in this case, QThread::requestInterruption and QThread::interruptionRequested seemed like a viable alternative). A quick search through the code turned up about 800 occurences of this variable/construct throughout the codebase, so I let it go.
When I approached a (senior, in terms of years of experience) colleague, he assured me that although it might indeed be wrong, he had never seen it fail to fulfill its purpose. He argued that the only case it would go wrong is if a (group of) thread(s) is allowed to run and another thread that actually changes this flag never gets allowed to execute untill the other threads are finished. He also argued that in this case, the OS would intervene and fairly distribute runtime across all threads, resulting in perhaps a delay of the cancellation.
Now my question is: is there any real-life situation (preferably on a regular system, based upon x86/ARM, preferably C or C++) where this does indeed fail?
Note I'm not trying to win the argument, as my colleague agrees it is technically incorrect, but I would like to know if it could cause problems and under which circumstances this might occur.
The simplest way to beat this is to reduce it to a rather trivial example. The compiler will optimize out reading the flag because it is not atomic and being written to by another thread is UB; therefore the flag won't ever get actually read.
Your colleague's argument is predicated on the assumption that the compiler will actually load the flag when you de-reference the flag. But in fact it has no obligation to do so.
#include <thread>
#include <iostream>
bool cancelled = false;
bool finished = false;
void thread1() {
while(!cancelled) {
std::cout << "Not cancelled";
}
}
int main() {
std::thread t(thread1);
t.detach();
cancelled = true;
while(!finished) {}
}
To run on coliru, load http://coliru.stacked-crooked.com/a/5be139ee34bf0a80, you will need to edit and make a trivial change because the caching is broken for snippets that do not terminate.
Effectively, he's simply betting that the compiler's optimizer will do a poor job, which seems like a truly terrible thing to rely upon.
As long as you wait for the threads to finish before using their data, you'll be OK in practice: the memory barriers set by std::thread::join or QThread::wait will protect you.
Your worry isn't about the cancelled variable, as long as it's volatile you're in practice fine. You should worry about reading inconsistent state of the data modified by the threads.
As can be inferred from Mine's comment, Puppy's code example does not demonstrate the problem. A few minor modifications are necessary.
Firstly, we must add finished = true; at the end of thread1 so that the program even pretends to be able to terminate.
Now, the optimizer isn't able to check every function in every translation unit to be sure that cancelled is in fact always false when entering thread1, so it cannot make the daring optimization to remove the while loop and everything after it. We can fix that by setting cancelled to false at the start of thread1.
With the previous addition, for fairness, we must also continually set cancelled to true in main, because otherwise we cannot guarantee that the single assignment in main is not scheduled after the initial assignment of in thread1.
Edit: Added qualifiers, and synchronous join instead of detachment.
#include <thread>
#include <iostream>
bool cancelled = false;
bool finished = false;
void thread1() {
cancelled = false;
while(!cancelled)
;
finished = true;
}
int main() {
std::thread t(thread1);
while(!finished) {
std::cout << "trying to cancel\n";
cancelled = true;
}
t.join();
}

Is mutex mandatory to access extern variable from a different thread?

I am developing an application in Qt/C++. At some point, there are two threads : one is the UI thread and the other one is the background thread. I have to do some operation from the background thread based on the value of an extern variable which is type of bool. I am setting this value by clicking a button on UI.
header.cpp
extern bool globalVar;
mainWindow.cpp
//main ui thread on button click
setVale(bool val){
globalVar = val;
}
backgroundThread.cpp
while(1){
if(globalVar)
//do some operation
else
//do some other operation
}
Here, writing to globalVar happens only when the user clicks the button whereas reading happens continuously.
So my question is :
In a situation like the one above, is mutex mandatory?
If read and write happens at the same time, does this cause the application to crash?
If read and write happens at same time, is globalVar going to have some value other than true or false?
Finally, does the OS provide any kind of locking mechanism to prevent the read/write operation to access a memory location at the same time by a different thread?
The loop
while(1){
if(globalVar)
//do some operation
else
//do some other operation
}
is busy waiting, which is extremely wasteful. Thus, you're probably better off with some classic synchronization that will wake the background thread (mostly) when there is something to be done. You should consider adapting this example of std::condition_variable.
Say you start with:
#include <thread>
#include <mutex>
#include <condition_variable>
std::mutex m;
std::condition_variable cv;
bool ready = false;
Your worker thread can then be something like this:
void worker_thread()
{
while(true)
{
// Wait until main() sends data
std::unique_lock<std::mutex> lk(m);
cv.wait(lk, []{return ready;});
ready = false;
lk.unlock();
}
The notifying thread should do something like this:
{
std::lock_guard<std::mutex> lk(m);
ready = true;
}
cv.notify_one();
Since it is just a single plain bool, I'd say a mutex is overkill, you should just go for an atomic integer instead. An atomic will read and write in a single CPU clock so no worries there, and it will be lock free, which is always better if possible.
If it is something more complex, then by all means go for a mutex.
It won't crash from that alone, but you can get data corruption, which may crash the application.
The system will not manage that stuff for you, you do it manually, just make sure all access to the data goes through the mutex.
Edit:
Since you specify a number of times that you don't want a complex solution, you may opt for simply using a mutex instead of the bool. There is no need to protect the bool with a mutex, since you can use the mutex as a bool, and yes, you could go with an atomic, but that's what the mutex already does (plus some extra functionality in the case of recursive mutexes).
It also matters what is your exact workload, since your example doesn't make a lot of sense in practice. It would be helpful to know what those some operations are.
So in your ui thread you could simply val ? mutex.lock() : mutex.unlock(), and in your secondary thread you could use if (mutex.tryLock()) doStuff; mutex.unlock(); else doOtherStuff;. Now if the operation in the secondary thread takes too long and you happen to be changing the lock in the main thread, that will block the main thread until the secondary thread unlocks. You could use tryLock(timeout) in the main thread, depending on what you prefer, lock() will block until success, while tryLock(timeout) will prevent blocking but the lock may fail. Also, take care not to unlock from a thread other than the one you locked with, and not to unlock an already unlocked mutex.
Depending on what you are actually doing, maybe an asynchronous event driven approach would be more appropriate. Do you really need that while(1)? How frequently do you perform those operations?
In situation like above does mutex is necessary?
A mutex is one tool that will work. What you actually need are three things:
a means of ensuring an atomic update (a bool will give you this as it's mandated to be an integral type by the standard)
a means of ensuring that the effects of a write made by one thread is actually visible in the other thread. This may sound counter-intuitive but the c++ memory model is single-threaded and optimisations (software and hardware) do not need to consider cross-thread communication, and...
a means of preventing the compiler (and CPU!!) from re-ordering the reads and writes.
The answer to the implied question is 'yes'. You will need something at does all of these things (see below)
If read and write happend at the same time does this cause to crash the application?
not when it's a bool, but the program won't behave as you expect. In fact, because the program is now exhibiting undefined behaviour you can no longer reason about its behaviour at all.
If read and write happens at same time, is globalVar going to have some value other thantrue or false?
not in this case because it's an intrinsic (atomic) type.
And is it going to happen the access(read/write) of a memory location at same time by different thread, does OS providing any kind of locking mechanism to prevent it?
Not unless you specify one.
Your options are:
std::atomic<bool>
std::mutex
std::atomic_signal_fence
Realistically speaking, as long as you use an integer type (not bool), make it volatile, and keep inside of its own cache line by properly aligning its storage, you don't need to do anything special at all.
In situation like above does mutex is necessary?
Only if you want to keep the value of the variable synchronized with other state.
If read and write happed at the same time does this cause to crash the application?
According to C++ standard, it's undefined behavior. So anything can happen: e.g. your application might not crash, but its state might be subtly corrupted. In real life, though, compilers often offer some sane implementation defined behavior and you're fine unless your platform is really weird. Anything commonplace, like 32 and 64 bit intel, PPC and ARM will be fine.
If read and write happens at same time, is globalVar going to have some value other thantrue or false?
globalVar can only have these two values, so it makes no sense to speak of any other values unless you're talking about its binary representation. Yes, it could happen that the binary representation is incorrect and not what the compiler would expect. That's why you shouldn't use a bool but a uint8_t instead.
I wouldn't love to see such flag in a code review, but if a uint8_t flag is the simplest solution to whatever problem you're solving, I say go for it. The if (globalVar) test will treat zero as false, and anything else as true, so temporary "gibberish" is OK and won't have any odd effects in practice. According to the standard, you'll be facing undefined behavior, of course.
And is it going to happen the access(read/write) of a memory location at same time by different thread, does OS providing any kind of locking mechanism to prevent it?
It's not the OS's job to do that.
Speaking of practice, though: on any reasonable platform, the use of a std::atomic_bool will have no overhead over the use of a naked uint8_t, so just use that and be done.

while inside while not working properly in c++

I have curious situation (at least for me :D ) in C++
My code is:
static void startThread(Object* r){
while(true)
{
while(!r->commands->empty())
{
doSomthing();
}
}
}
I start this function as thread using boost where commands in r is queue... this queue I fill up in another thread....
The problem is that if I fill the queue first and then start this tread everything works fine... But if I run the startThread first and after that I fill up queue commands, it is not working... doSomething() will not run...
Howewer if I modify startThread:
static void startThread(Object* r){
while(true)
{
std::cout << "c" << std::endl;
while(!r->commands->empty())
{
doSomthing();
}
}
}
I just added cout... and it is working... Can anybody explain why it is working with cout and not without? Or anybody has idea what can be wrong?
Maybe compiler is doing some kind of optimalization? I do not think so... :(
Thanks
But if I run the startThread first and after that I fill up queue commands, it is not working... doSomething() will not run
Of course not! What did you expect? Your queue is empty, so !r->commands->empty() will be false.
I just added cout... and it is working
You got lucky. cout is comparatively slow, so your main thread had a chance to fill the queue before the inner while test was executed for the first time.
So why does the thread not see an updated version of r->commands after it has been filled by the main thread? Because nothing in your code indicates that your variable is going to change from the outside, so the compiler assumes that it doesn’t.
In fact, the compiler sees that your r’s pointee cannot change, so it can just remove the redundant checks from the inner loop. When working with multithreaded code, you explicitly need to tell C++ that variables can be changed from a different context, using atomic memory access.
When u first run the thread and then fill up the queue, not entering the inner loop is logical, since the test !r->commands->empty() is true. After u add the cout statement, it is working because it takes some time to print the output, and meanwhile the other thread fills up the queue. so the condition becomes again true. But this is not good programming to rely on this facts in a multi-threading environment.
There are two inter-related issues:
You are not forcing a reload of r->commands or r->commands-Yempty(), thus your compiler, diligent as it is in search of the pinnacle of performance, cached the result. Adding some more code might make the compiler remove this optimisation if it cannot prove the caching is still valid.
You have a data-race, so your program has undefined behavior. (I am assuming doSomething() removes an element and some other thread adds elements.
1.10 Multi-threaded executions and data races § 21
The execution of a program contains a data race if it contains two conflicting actions in different threads,
at least one of which is not atomic, and neither happens before the other. Any such data race results in
undefined behavior. [ Note: It can be shown that programs that correctly use mutexes and memory_order_-
seq_cst operations to prevent all data races and use no other synchronization operations behave as if the
operations executed by their constituent threads were simply interleaved, with each value computation of an
object being taken from the last side effect on that object in that interleaving. This is normally referred to as
“sequential consistency”. However, this applies only to data-race-free programs, and data-race-free programs
cannot observe most program transformations that do not change single-threaded program semantics. In
fact, most single-threaded program transformations continue to be allowed, since any program that behaves
differently as a result must perform an undefined operation. —end note ]
22

Is it ok to read a shared boolean flag without locking it when another thread may set it (at most once)?

I would like my thread to shut down more gracefully so I am trying to implement a simple signalling mechanism. I don't think I want a fully event-driven thread so I have a worker with a method to graceully stop it using a critical section Monitor (equivalent to a C# lock I believe):
DrawingThread.h
class DrawingThread {
bool stopRequested;
Runtime::Monitor CSMonitor;
CPInfo *pPInfo;
//More..
}
DrawingThread.cpp
void DrawingThread::Run() {
if (!stopRequested)
//Time consuming call#1
if (!stopRequested) {
CSMonitor.Enter();
pPInfo = new CPInfo(/**/);
//Not time consuming but pPInfo must either be null or constructed.
CSMonitor.Exit();
}
if (!stopRequested) {
pPInfo->foobar(/**/);//Time consuming and can be signalled
}
if (!stopRequested) {
//One more optional but time consuming call.
}
}
void DrawingThread::RequestStop() {
CSMonitor.Enter();
stopRequested = true;
if (pPInfo) pPInfo->RequestStop();
CSMonitor.Exit();
}
I understand (at least in Windows) Monitor/locks are the least expensive thread synchronization primitive but I am keen to avoid overuse. Should I be wrapping each read of this boolean flag? It is initialized to false and only set once to true when stop is requested (if it is requested before the task completes).
My tutors advised to protect even bool's because read/writing may not be atomic. I think this one shot flag is the exception that proves the rule?
It is never OK to read something possibly modified in a different thread without synchronization. What level of synchronization is needed depends on what you are actually reading. For primitive types, you should have a look at atomic reads, e.g. in the form of std::atomic<bool>.
The reason synchronization is always needed is that the processors will have the data possibly shared in a cache line. It has no reason to update this value to a value possibly changed in a different thread if there is no synchronization. Worse, yet, if there is no synchronization it may write the wrong value if something stored close to the value is changed and synchronized.
Boolean assignment is atomic. That's not the problem.
The problem is that a thread may not not see changes to a variable done by a different thread due to either compiler or CPU instruction reordering or data caching (i.e. the thread that reads the boolean flag may read a cached value, instead of the actual updated value).
The solution is a memory fence, which indeed is implicitly added by lock statements, but for a single variable it's overkill. Just declare it as std::atomic<bool>.
The answer, I believe, is "it depends." If you're using C++03, threading isn't defined in the Standard, and you'll have to read what your compiler and your thread library say, although this kind of thing is usually called a "benign race" and is usually OK.
If you're using C++11, benign races are undefined behavior. Even when undefined behavior doesn't make sense for the underlying data type. The problem is that compilers can assume that programs have no undefined behavior, and make optimizations based on that (see also the Part 1 and Part 2 linked from there). For instance, your compiler could decide to read the flag once and cache the value because it's undefined behavior to write to the variable in another thread without some kind of mutex or memory barrier.
Of course, it may well be that your compiler promises to not make that optimization. You'll need to look.
The easiest solution is to use std::atomic<bool> in C++11, or something like Hans Boehm's atomic_ops elsewhere.
No, you have to protect every access, since modern compilers and cpus reorder the code without your multithreading tasks in mind. The read access from different threads might work, but don't have to work.