Volatile and multithreading: is the following thread-safe? - c++

Assume there are two threads running Thread1() and Thread2() respectively. The thread 1 just sets a global flag to tell thread 2 to quit and thread 2 periodically checks if it should quit.
volatile bool is_terminate = false;
void Thread1()
{
is_terminate = true;
}
void Thread2()
{
while (!is_terminate) {
// ...
}
}
I want to ask if the above code is safe assuming that access to is_terminate is atomic. I already know many materials state that volatile can not insure thread-safety generally. But in the situation that only one atomic variable is shared, do we really need to protect the shared variable using a lock?

It is probably sort of thread-safe.
Thread safety tends to depend on context. Updating a bool is always thread safe, if you never read from it.
And if you do read from it, then the answer depends on when you read from it, and what that read signifies.
On some CPUs, but not all, writes to an object of type bool will be atomic. x86 CPUs will generally make it atomic, but others might not. If the update isn't atomic, then adding volatile won't help you.
But the next problem is reordering. The compiler (and CPU) will carry out reads/writes to volatile variables in the order specified, without any reordering. So that's good.
But it makes no guarantee about reordering one volatile memory access relative to all the non-volatile ones. So a common example is that you define some kind of flag to protect access to a resource, you make the flag volatile, and then the compiler moves the resource access up so it happens before you check the flag. It's allowed to do that, because it's not reordering the internal ordering of two volatile accesses, but merely a volatile and a non-volatile one.
Honestly, the question I'd ask is why not just do it properly?
It is possible that volatile will work in this situation, but why not save yourself the trouble, and make it clearer that it's correct? Slap a memory barrier around it instead.

It is not thread safe.
If the threads, for example, are run on CPUs with separate caches there are no language rules saying that the caches are to be synchronized when writing a volatile variable. The other thread may not see the change for a very long time, if ever.
To answer in another way:
If volatile is enough to be thread safe, why is C++0x adding an entire chapter with atomic operations?
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2047.html

First, volatile is used for disabling compile optimization in c/c++. see this for understanding volatile.
The core of atomic is word align and size of is_terminate, if size of is_terminate is less than machine native size and aligned, then R and W of it is atomic.
In your context, with or without volatile, thread2 may read old value after thread1 modified it, but thread2 can read it eventually.
If eventually-read is OK for you, then your codes are thread safety.

it's safe because one thread is only reading and one is only writing.
The threads aren't really sharing that flag, one is reading, one is writing. You can't have a race because the other thread will never write a bad result, and the reading thread will never read a bad result. simple.

No, it is not. It could be thread safe if the value access was atomic, but in C++ you can't assume that variables access is thread-safe unless you use some compiler-specific constructs or synchronization primitives.

It is still not safe. You should use synchronizaton to access is_terminate Access to the bool is not guaranteed to be an atomic operation.

I believe that this code is safe, until both the threads are not writing the bool (already you have mentioned that value access is atomic).

The big problem with assuming that the volatile keyword imposes any kind of thread safety, is that the C or C++ standards have no concept of threads in the abstract machine they describe.
The guarantees that the standard imposes on the volatile keyword, are only valid within a thread - not between multiple threads.
This leaves implementors with full liberty to do whatever they please when it comes to threads. If they chose to implement the volatile keyword to be safe across threads, then you're lucky. More often than not, that's not the case though.

This code isn't seems to be thread safe. Reason can be explained easily.
Problem lies in below code line
"is_terminate = true;"
Even if access to "is_terminate" is atomic, above statement is not atomic.
This statement includes more than 1 operations. Like load "is_terminate" and update "is_terminate".
Now gotcha is if is_terminate is loaded and not updated and thread switches to another one.
Now thread 2 expected result to be true but it won't get it.
Make sure "is_terminate = true;" is atomic. So lock it.
Hope it helps.

Related

A legitimate use case for volatile in C++?

I'm running a few threads that basically are all returning the same object as a result. Then I wait for all of them to complete, and basically read the results. To avoid needing synchronization, I figured I could just pre-allocate all the result objects in an array or vector and give the threads a pointer to each. At a high level, the code is something like this (simplified):
std::vector<Foo> results(2);
RunThread1(&results[0]);
RunThread2(&results[1]);
WaitForAll();
// Read results
cout << results[0].name << results[1].name;
Basically I'd like to know if there's anything potentially unsafe about this "code". One thing I was wondering is whether the vector should declared volatile such that the reads at the end aren't optimized and output an incorrect value.
The short answer to your question is no, the array should not be declared volatile. For two simple reasons:
Using volatile is not necessary. Every sane multithreading platform provides synchronization primitives with well-defined semantics. If you use them, you don't need volatile.
Using volatile isn't sufficient. Since volatile doesn't have defined multithread semantics on any platform you are likely to use, it alone is not enough to provide synchronization.
Most likely, whatever you do in WaitForAll will be sufficient. For example, if it uses an event, mutex, condition variable, or almost anything like that, it will have defined multithreading semantics that are sufficient to make this safe.
Update: "Just out curiosity, what would be an example of something that happens in the WaitForAll that guarantees safety of the read? Wouldn't it need to effectively tell the compiler somehow to "flush" the cache or avoid optimizations of subsequent read operations?"
Well, if you're using pthreads, then if it uses pthread_join that would be an example that guarantees safety of the read because the documentation says that anything the thread does is visible to a thread that joins it after pthread_join returns.
How it does it is an implementation detail. In practice, on modern systems, there are no caches to flush nor are there any optimizations of subsequent reads that are possible but need to be avoided.
Consider if somewhere deep inside WaitForAll, there's a call to pthread_join. Generally, you simply don't let the compiler see into the internals of pthread_join and thus the compiler has to assume that pthread_join might do anything that another thread could do. So keeping information that another thread might modify in a register across a call to pthread_join would be illegal because pthread_join itself might access or modify that data.
I was wondering is whether the vector should declared volatile such that the reads at the end aren't optimized and output an incorrect value.
No. If there was problem with lack of synchronisation, then volatile would not help.
But there is no problem with lack of synchronisation, since according to your description, you don't access the same object from multiple threads - until you've waited for the threads to complete, which is something that synchronises the threads.
There is a potential problem that if the objects are small (less than about 64 bytes; depending on CPU architecture), then the objects in the array share a cache "line", and access to them may become effectively synchronised due to write contention. This is only a problem if the threads write to the variable a lot in relation to operations that don't access the output object.
It depends on what's in WaitForAll(). If it's a proper synchronization, all is good. Mutexes, for example, or thread join, will result in the proper memory synchronization being put in.
Volatile would not help. It may prevent compiler optimizations, but would not affect anything happening at the CPU level, like caches not being updated. Use proper synchronization, like mutexes, thread join, and then the result will be valid (sequentially coherent). Don't count on silver bullet volatile. Compilers and CPUs are now complex enough that it won't be guaranteed.
Other answers will elaborate on the memory fences and other instructions that the synchronization will put in. :-)

Volatile vs. memory fences

The code below is used to assign work to multiple threads, wake them up, and wait until they are done. The "work" in this case consists of "cleaning a volume". What exactly this operation does is irrelevant for this question -- it just helps with the context. The code is part of a huge transaction processing system.
void bf_tree_cleaner::force_all()
{
for (int i = 0; i < vol_m::MAX_VOLS; i++) {
_requested_volumes[i] = true;
}
// fence here (seq_cst)
wakeup_cleaners();
while (true) {
usleep(10000); // 10 ms
bool remains = false;
for (int vol = 0; vol < vol_m::MAX_VOLS; ++vol) {
// fence here (seq_cst)
if (_requested_volumes[vol]) {
remains = true;
break;
}
}
if (!remains) {
break;
}
}
}
A value in a boolean array _requested_volumes[i] tells whether thread i has work to do. When it is done, the worker thread sets it to false and goes back to sleep.
The problem I am having is that the compiler generates an infinite loop, where the variable remains is always true, even though all values in the array have been set to false. This only happens with -O3.
I have tried two solutions to fix that:
Declare _requested_volumes volatile
(EDIT: this solution does work actually. See edit below)
Many experts say that volatile has nothing to do with thread synchronization, and it should only be used in low-level hardware accesses. But there's a lot of dispute over this on the Internet. The way I understand it, volatile is the only way to refrain the compiler from optimizing away accesses to memory which is changed outside of the current scope, regardless of concurrent access. In that sense, volatile should do the trick, even if we disagree on best practices for concurrent programming.
Introduce memory fences
The method wakeup_cleaners() acquires a pthread_mutex_t internally in order to set a wake-up flag in the worker threads, so it should implicitly produce proper memory fences. But I'm not sure if those fences affect memory accesses in the caller method (force_all()). Therefore, I manually introduced fences in the locations specified by the comments above. This should make sure that writes performed by the worker thread in _requested_volumes are visible in the main thread.
What puzzles me is that none of these solutions works, and I have absolutely no idea why. The semantics and proper use of memory fences and volatile is confusing me right now. The problem is that the compiler is applying an undesired optimization -- hence the volatile attempt. But it could also be a problem of thread synchronization -- hence the memory fence attempt.
I could try a third solution in which a mutex protects every access to _requested_volumes, but even if that works, I would like to understand why, because as far as I understand, it's all about memory fences. Thus, it should make no difference whether it's done explicitly or implicitly via a mutex.
EDIT: My assumptions were wrong and Solution 1 actually does work. However, my question remains in order to clarify the use of volatile vs. memory fences. If volatile is such a bad thing, that should never be used in multithreaded programming, what else should I use here? Do memory fences also affect compiler optimizations? Because I see these as two orthogonal issues, and therefore orthogonal solutions: fences for visibility in multiple threads and volatile for preventing optimizations.
Many experts say that volatile has nothing to do with thread synchronization, and it should only be used in low-level hardware accesses.
Yes.
But there's a lot of dispute over this on the Internet.
Not, generally, between "the experts".
The way I understand it, volatile is the only way to refrain the compiler from optimizing away accesses to memory which is changed outside of the current scope, regardless of concurrent access.
Nope.
Non-pure, non-constexpr non-inlined function calls (getters/accessors) also necessarily have this effect. Admittedly link-time optimization confuses the issue of which functions may really get inlined.
In C, and by extension C++, volatile affects memory access optimization. Java took this keyword, and since it can't (or couldn't) do the tasks C uses volatile for in the first place, altered it to provide a memory fence.
The correct way to get the same effect in C++ is using std::atomic.
In that sense, volatile should do the trick, even if we disagree on best practices for concurrent programming.
No, it may have the desired effect, depending on how it interacts with your platform's cache hardware. This is brittle - it could change any time you upgrade a CPU, or add another one, or change your scheduler behaviour - and it certainly isn't portable.
If you're really just tracking how many workers are still working, sane methods might be a semaphore (synchronized counter), or mutex+condvar+integer count. Either are likely more efficient than busy-looping with a sleep.
If you're wedded to the busy loop, you could still reasonably have a single counter, such as std::atomic<size_t>, which is set by wakeup_cleaners and decremented as each cleaner completes. Then you can just wait for it to reach zero.
If you really want a busy loop and really prefer to scan the array each time, it should be an array of std::atomic<bool>. That way you can decide what consistency you need from each load, and it will control both the compiler optimizations and the memory hardware appropriately.
Apparently, volatile does the necessary for your example. The topic of volatile qualifier itself is too broad: you can start by searching "C++ volatile vs atomic" etc. There are a lot of articles and questions&answers on the internet, e.g. Concurrency: Atomic and volatile in C++11 memory model .
Briefly, volatile tells the compiler to disable some aggressive optimizations, particularly, to read the variable each time it is accessed (rather than storing it in a register or cache). There are compilers which do more so making volatile to act more like std::atomic: see Microsoft Specific section here. In your case disablement of an aggressive optimization is exactly what was necessary.
However, volatile doesn't define the order for the execution of the statements around it. That is why you need memory order in case you need to do something else with the data after the flags you check have been set.
For inter-thread communication it is appropriate to use std::atomic, particularly, you need to refactor _requested_volumes[vol] to be of type std::atomic<bool> or even std::atomic_flag: http://en.cppreference.com/w/cpp/atomic/atomic .
An article that discourages usage of volatile and explains that volatile can be used only in rare special cases (connected with hardware I/O): https://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt

Volatile C/C++ on a multithread app

I am having a question on the volatile usage. I typically try that all the variables shared across threads have the volatile keyword to ensure direct memory accesses, and of course protected with mutexes.
However, is volatile really needed if a variable is shared to ensure consistency?
I explain with an example:
Thread1: //(affects it behaviour with the variable)
mymutex.lock();
if(variable)
{
...
variable = false;
}
mymutex.unlock();
Thread2:
mymutex.lock();
variable = true;
mymutex.unlock();
In the upper example, thread2 only writes, and thread1 reads/writes. Is it possible that the variable is cached and the threads do not read the new value? Even though the mutexes are properly set? Do I need volatile in this case?
I am asking this because instead of variable I have a std::vector, which cannot be volatile. And I am not 100% sure that this approach is safe without the volatile keyword.
Thanks.
EDITED: Reformulating the question properly.
volatile in C++ is not meant for concurrency. It's about whether the compiler is allowed to optimize away reads from a variable or not. It is primarily used for things such as interfacing with hardware via memory mapping.
Unfortunately, this means that even if you do have volatile variables, the reads and writes may still access a thread-local store which is not synchronized. Also, an std::vector is not thread safe.
So, in either case, you need to synchronize, for example using a std::mutex (which you do mention). Now, if this is done, the variables which are protected by that mutexdo not need to be volatile. The mutex itself does the synchronization and protects against the type of issues you worry about.
Take a look at my answer here.
Bottom like, don't be clever are use volatile to try guarantee thread safety, use proper

Is it ok to read a shared boolean flag without locking it when another thread may set it (at most once)?

I would like my thread to shut down more gracefully so I am trying to implement a simple signalling mechanism. I don't think I want a fully event-driven thread so I have a worker with a method to graceully stop it using a critical section Monitor (equivalent to a C# lock I believe):
DrawingThread.h
class DrawingThread {
bool stopRequested;
Runtime::Monitor CSMonitor;
CPInfo *pPInfo;
//More..
}
DrawingThread.cpp
void DrawingThread::Run() {
if (!stopRequested)
//Time consuming call#1
if (!stopRequested) {
CSMonitor.Enter();
pPInfo = new CPInfo(/**/);
//Not time consuming but pPInfo must either be null or constructed.
CSMonitor.Exit();
}
if (!stopRequested) {
pPInfo->foobar(/**/);//Time consuming and can be signalled
}
if (!stopRequested) {
//One more optional but time consuming call.
}
}
void DrawingThread::RequestStop() {
CSMonitor.Enter();
stopRequested = true;
if (pPInfo) pPInfo->RequestStop();
CSMonitor.Exit();
}
I understand (at least in Windows) Monitor/locks are the least expensive thread synchronization primitive but I am keen to avoid overuse. Should I be wrapping each read of this boolean flag? It is initialized to false and only set once to true when stop is requested (if it is requested before the task completes).
My tutors advised to protect even bool's because read/writing may not be atomic. I think this one shot flag is the exception that proves the rule?
It is never OK to read something possibly modified in a different thread without synchronization. What level of synchronization is needed depends on what you are actually reading. For primitive types, you should have a look at atomic reads, e.g. in the form of std::atomic<bool>.
The reason synchronization is always needed is that the processors will have the data possibly shared in a cache line. It has no reason to update this value to a value possibly changed in a different thread if there is no synchronization. Worse, yet, if there is no synchronization it may write the wrong value if something stored close to the value is changed and synchronized.
Boolean assignment is atomic. That's not the problem.
The problem is that a thread may not not see changes to a variable done by a different thread due to either compiler or CPU instruction reordering or data caching (i.e. the thread that reads the boolean flag may read a cached value, instead of the actual updated value).
The solution is a memory fence, which indeed is implicitly added by lock statements, but for a single variable it's overkill. Just declare it as std::atomic<bool>.
The answer, I believe, is "it depends." If you're using C++03, threading isn't defined in the Standard, and you'll have to read what your compiler and your thread library say, although this kind of thing is usually called a "benign race" and is usually OK.
If you're using C++11, benign races are undefined behavior. Even when undefined behavior doesn't make sense for the underlying data type. The problem is that compilers can assume that programs have no undefined behavior, and make optimizations based on that (see also the Part 1 and Part 2 linked from there). For instance, your compiler could decide to read the flag once and cache the value because it's undefined behavior to write to the variable in another thread without some kind of mutex or memory barrier.
Of course, it may well be that your compiler promises to not make that optimization. You'll need to look.
The easiest solution is to use std::atomic<bool> in C++11, or something like Hans Boehm's atomic_ops elsewhere.
No, you have to protect every access, since modern compilers and cpus reorder the code without your multithreading tasks in mind. The read access from different threads might work, but don't have to work.

C++ Thread, shared data

I have an application where 2 threads are running... Is there any certanty that when I change a global variable from one thread, the other will notice this change?
I don't have any syncronization or Mutual exclusion system in place... but should this code work all the time (imagine a global bool named dataUpdated):
Thread 1:
while(1) {
if (dataUpdated)
updateScreen();
doSomethingElse();
}
Thread 2:
while(1) {
if (doSomething())
dataUpdated = TRUE;
}
Does a compiler like gcc optimize this code in a way that it doesn't check for the global value, only considering it value at compile time (because it nevers get changed at the same thred)?
PS: Being this for a game-like application, it really doen't matter if there will be a read while the value is being written... all that matters is that the change gets noticed by the other thread.
Yes. No. Maybe.
First, as others have mentioned you need to make dataUpdated volatile; otherwise the compiler may be free to lift reading it out of the loop (depending on whether or not it can see that doSomethingElse doesn't touch it).
Secondly, depending on your processor and ordering needs, you may need memory barriers. volatile is enough to guarentee that the other processor will see the change eventually, but not enough to guarentee that the changes will be seen in the order they were performed. Your example only has one flag, so it doesn't really show this phenomena. If you need and use memory barriers, you should no longer need volatile
Volatile considered harmful and Linux Kernel Memory Barriers are good background on the underlying issues; I don't really know of anything similar written specifically for threading. Thankfully threads don't raise these concerns nearly as often as hardware peripherals do, though the sort of case you describe (a flag indicating completion, with other data presumed to be valid if the flag is set) is exactly the sort of thing where ordering matterns...
Here is an example that uses boost condition variables:
bool _updated=false;
boost::mutex _access;
boost::condition _condition;
bool updated()
{
return _updated;
}
void thread1()
{
boost::mutex::scoped_lock lock(_access);
while (true)
{
boost::xtime xt;
boost::xtime_get(&xt, boost::TIME_UTC);
// note that the second parameter to timed_wait is a predicate function that is called - not the address of a variable to check
if (_condition.timed_wait(lock, &updated, xt))
updateScreen();
doSomethingElse();
}
}
void thread2()
{
while(true)
{
if (doSomething())
_updated=true;
}
}
Use a lock. Always always use a lock to access shared data. Marking the variable as volatile will prevent the compiler from optimizing away the memory read, but will not prevent other problems such as memory re-ordering. Without a lock there is no guarantee that the memory writes in doSomething() will be visible in the updateScreen() function.
The only other safe way is to use a memory fence, either explicitly or an implicitly using an Interlocked* function for example.
Use the volatile keyword to hint to the compiler that the value can change at any time.
volatile int myInteger;
The above will guarantee that any access to the variable will be to and from memory without any specific optimizations and as a result all threads running on the same processor will "see" changes to the variable with the same semantics as the code reads.
Chris Jester-Young pointed out that coherency concerns to such a variable value change may arise in a multi-processor systems. This is a consideration and it depends on the platform.
Actually, there are really two considerations to think about relative to platform. They are coherency and atomicity of the memory transactions.
Atomicity is actually a consideration for both single and multi-processor platforms. The issue arises because the variable is likely multi-byte in nature and the question is if one thread could see a partial update to the value or not. ie: Some bytes changed, context switch, invalid value read by interrupting thread. For a single variable that is at the natural machine word size or smaller and naturally aligned should not be a concern. Specifically, an int type should always be OK in this regard as long as it is aligned - which should be the default case for the compiler.
Relative to coherency, this is a potential concern in a multi-processor system. The question is if the system implements full cache coherency or not between processors. If implemented, this is typically done with the MESI protocol in hardware. The question didn't state platforms, but both Intel x86 platforms and PowerPC platforms are cache coherent across processors for normally mapped program data regions. Therefore this type of issue should not be a concern for ordinary data memory accesses between threads even if there are multiple processors.
The final issue relative to atomicity that arises is specific to read-modify-write atomicity. That is, how do you guarantee that if a value is read updated in value and the written, that this happen atomically, even across processors if more than one. So, for this to work without specific synchronization objects, would require that all potential threads accessing the variable are readers ONLY but expect for only one thread can ever be a writer at one time. If this is not the case, then you do need a sync object available to be able to ensure atomic actions on read-modify-write actions to the variable.
Your solution will use 100% CPU, among other problems. Google for "condition variable".
Chris Jester-Young pointed out that:
This only work under Java 1.5+'s memory model. The C++ standard does not address threading, and volatile does not guarantee memory coherency between processors. You do need a memory barrier for this
being so, the only true answer is implementing a synchronization system, right?
Use the volatile keyword to hint to the compiler that the value can change at any time.
volatile int myInteger;
No, it's not certain. If you declare the variable volatile, then the complier is supposed to generate code that always loads the variable from memory on a read.
If the scope is right ( "extern", global, etc. ) then the change will be noticed. The question is when? And in what order?
The problem is that the compiler can and frequently will re-order your logic to fill all it's concurrent pipelines as a performance optimization.
It doesn't really show in your specific example because there aren't any other instructions around your assignment, but imagine functions declared after your bool assign execute before the assignment.
Check-out Pipeline Hazard on wikipedia or search google for "compiler instruction reordering"
As others have said the volatile keyword is your friend. :-)
You'll most likely find that your code would work when you had all of the optimisation options disabled in gcc. In this case (I believe) it treats everything as volatile and as a result the variable is accessed in memory for every operation.
With any sort of optimisation turned on the compiler will attempt to use a local copy held in a register. Depending on your functions this may mean that you only see the change in variable intermittently or, at worst, never.
Using the keyword volatile indicates to the compiler that the contents of this variable can change at any time and that it should not use a locally cached copy.
With all of that said you may find better results (as alluded to by Jeff) through the use of a semaphore or condition variable.
This is a reasonable introduction to the subject.