C++ std::atomic vs. Boost atomic - c++

In my application, I have an int and a bool variable, which are accessed (multiple write/read) by multiple threads. Currently, I am using two mutexes, one for int and one for bool to protect those variables.
I heard about using atomic variables and operators to write lock-free multi-thread program. My questions are
What's the definition of atomic variables and operators?
What's the main difference between std::atomic and
boost/atomic.hpp? Which one is more standard or popular?
Are these libraries platform-dependent? I am using gnu gcc 4.6 on
Linux at the moment, but ideally it shall be cross-platform. I heard that the definition of "atomic" actually depends on the hardware as well. Can anyone explain that as well?
What's the best way to share a bool variable among multiple threads? I would prefer not to use the "volatile" keyword.
Are these code thread-safe?
double double_m; // double_m is only accessed by current thread.
std::atomic<bool> atomic_bool_x;
atomic_bool_x = true && (double_m > 12.5);
int int_n; // int_n is only accessed by current thread.
std::atomic<int> atomic_int_x;
std::atomic<int> atomic_int_y;
atomic_int_y = atomic_int_x * int_n;

I'm not an expert or anything, but here's what I know:
std::atomic simply says that calling load and store (and a few other operations) concurrently is well-defined. An atomic operation is indivisible - nothing can happen 'in-between'.
I assume std::atomic is based off of boost::atomic. If you can, use std, otherwise use boost.
They are both portable, with the std being completely so, however your compiler will need to support C++11
Likely std::atomic_bool. You should not need to use volatile.
Also, I believe load/store differs from operator=/operator T only load/store are atomic.
Nevermind. I checked the standard and it appears that the operators are defined in terms of load/store/etc, however they may return different things.
Further reading:
http://en.cppreference.com/w/cpp/atomic/atomic
C++11 Standard
C++ Concurrency in Action

Volatile is orthogonal to what you use to implement atomics. In C++ it tells the compiler that certain it is not safe to perform optimizations with that variable. Herb Sutters lays it out:
To safely write lock-free code that communicates between threads without using locks, prefer to use ordered atomic variables: Java/.NET volatile, C++0x atomic, and C-compatible atomic_T.
To safely communicate with special hardware or other memory that has unusual semantics, use unoptimizable variables: ISO C/C++ volatile. Remember that reads and writes of these variables are not necessarily atomic, however.
Finally, to express a variable that both has unusual semantics and has any or all of the atomicity and/or ordering guarantees needed for lock-free coding, only the ISO C++0x draft Standard provides a direct way to spell it: volatile atomic.
(from http://drdobbs.com/article/print?articleId=212701484&siteSectionName=parallel)

See std::atomic class template
std::atomic is standard since C++11, and the Boost stuff is older. But since it is standard now, I would prefer std::atomic.
?? You can use std::atomic with each C++11 compiler on each platform you want.
Without any further information...
std::atomic;

I believe std::atomic (C++11) and boost.atomic are equivalent. If std::atomic is not supported by your compiler yet, use boost::atomic.

Related

difference between standard's atomic bool and atomic flag

I wasn't aware of the std::atomic variables but was aware about the std::mutex (weird right!) provided by the standard; however one thing caught my eye: there are two seemingly-same (to me) atomic types provided by the standard, listed below:
std::atomic<bool>
std::atomic_flag
The std::atomic_flag contains the following explanation:
std::atomic_flag is an atomic boolean type. Unlike all specializations of std::atomic, it is guaranteed to be lock-free. Unlike std::atomic<bool>, std::atomic_flag does not provide load or store operations.
which I fail to understand. Is std::atomic<bool> not guaranteed to be lock-free? Then it's not atomic or what?
So what's the difference between the two and when should I use which?
std::atomic bool type not guranteed to be lock-free?
Correct. std::atomic may be implemented using locks.
then it's not atomic or what?
std::atomic is atomic whether it has been implemented using locks, or without. std::atomic_flag is guaranteed to be implemented without using locks.
So what's the difference b/w two
The primary difference besides the lock-free guarantee is:
std::atomic_flag does not provide load or store operations.
and when should I use which?
Usually, you will want to use std::atomic<bool> when you need an atomic boolean variable. std::atomic_flag is a low level structure that can be used to implement custom atomic structures.
std::atomic<T> guarantees that accesses to the variable will be atomic. It however does not says how is the atomicity achieved. It can be using lock-free variable, or using a lock. The actual implementation depends on your target architecture and the type T.
std::atomic_flag on the other hand is guaranteed to be implemented using a lock-free technique.

do need to use mutex lock?

Simple edition: In a C++ program I'm using two different threads to work with some integer variable. but I'm sure one is always writing some value into it and the other one is only reading That. do I still need to use mutex lock when reading/writing data?
Now details : The main idea is that first thread generates some information and saves them into an array, and the second thread reads data from that array and process them. this array represents a queue. meaning I have two index values pointing to the first and last item in queue. Now I'm wondering if I have to lock these two index values whenever I'm reading or writing values or is it ok to check them without locking? note that generator thread is the only thread changing index of queue_back, and processor thread has exclusive permission to change queue_front.
if makes any change I'm developing for a linux based system and the code is compiled using gcc.
PS: In some codes which use threading, I've seen keyword volatile around variables shared between different threads, do I need to use that too?
No the read and write are not atomic, You will need to synchronize it using some synchronization mechanism.
Also, You MUST mark the shared integer as volatile, otherwise the optimizer might think the variable is never updated in one of your threads.
gcc allows you to do atomic operations on int‘s, long‘s and long long‘s (and their unsigned counterparts).
Look up for the functions:
type __sync_fetch_and_add (type *ptr, type value);
type __sync_fetch_and_sub (type *ptr, type value);
type __sync_fetch_and_or (type *ptr, type value);
type __sync_fetch_and_and (type *ptr, type value);
type __sync_fetch_and_xor (type *ptr, type value);
type __sync_fetch_and_nand (type *ptr, type value);
Do I still need to use mutex lock when reading/writing data?
Yes, you will need a lock. You may be interested in a more specific implementation called a read/write lock.
You can also use atomics and/or memory barriers. Using these will require a better understanding of your target architectures. Reproducing multithreading bugs can be very difficult, and these alternatives should be considered an optimization which may not be portable.
I've seen keyword volatile around variables shared between different threads, do I need to use that too?
Yikes. No! That is not a safe or portable solution to multithreaded reads and writes in C++. Use atomics, locks, copying, immutable and pure implementations (etc...) instead.
Interpretation of volatile can vary by platform and/or compiler, and it's not specified to operate any particular way in C or C++ for the purpose of multithreaded reads and writes (there's an old false legend that it can be used reliably as an atomic read/write). I once tested the effectiveness of volatile in a multithreaded C++ program for fun (on an intel-mac with apple's gcc). I won't provide the results because it worked well enough that some people might consider using it, although they should not because 'almost' isn't good enough.
And to qualify the use of volatile: It exists in my (large, strictly written, multithreading aware) codebase for the sole purpose of interfacing with platform dependent atomics APIs. And being entirely honest: there are a few other uses from earlier days, but they can and should be removed.
Yes, you need to synchronize access to the variable, either with a mutex, a critical section, interlocked access, etc to make sure that the reading thread does not read incomplete bytes while the writing thread is still saving them. This is especially important on multi-core/CPU systems, where the two threads can truely access the variable in parallel.
Reads and writes of properly aligned data no larger than a machine word (usually whatever int resolves to) are atomic on most major architectures. That does not mean every architecture.
This means that no, you cannot just read head and tail and expect that data is consistent. However, if for example sizeof(int) happens to be 4 and sizeof(short) happens to be 2, and if you don't care about "not mainstream" platforms, you can do some union trickery and get away without atomic operations or a mutex.
If you want your code to be portable, there is no way around proper locking or atomic compare/exchange.
About volatile, this does insert a memory barrier for Microsoft Visual C++ (as a compiler-specific sophistry) but the standard does not guarantee anything special other than that the compiler won't optimize the variable. Insofar, making something volatile does not really help much, and it guarantees thread safety in no way.
Yes, you need to protect the queue indexes in both the generator and the reader threads by using some kind of synchronization mechanism, like a mutex.
HTH
If you're really just sharing one single integer, then std::atomic<int> sounds like the right type to use, from the <atomic> header. (There should also be Boost or TR1 versions if you have an old compiler.) This ensures atomic reads and writes. There's no need for a volatile qualifier as far as I understand.

Are volatile reads and writes atomic on Windows+VisualC?

There are a couple of questions on this site asking whether using a volatile variable for atomic / multithreaded access is possible: See here, here, or here for example.
Now, the C(++) standard conformant answer is obviously no.
However, on Windows & Visual C++ compiler, the situation seems not so clear.
I have recently answered and cited the official MSDN docs on volatile
Microsoft Specific
Objects declared as volatile are (...)
A write to a volatile object (volatile write) has Release semantics;
a reference to a global or static object? that occurs before a write to
a volatile object in the instruction sequence will occur before that
volatile write in the compiled binary.
A read of a volatile object (volatile read) has Acquire semantics; a reference to a
global or static object? that occurs after a read of volatile memory in the
instruction
sequence will occur after that volatile read in the compiled binary.
This allows volatile objects to be used for memory locks and releases in multithreaded applications.
[emphasis mine]
Now, reading this, it would appear to me that a volatile variable will be treated by the MS compiler as std::atomic would be in the upcoming C++11 standard.
However, in a comment to my answer, user Hans Passant wrote "That MSDN article is very unfortunate, it is dead wrong. You can't implement a lock with volatile, not even with Microsoft's version. (...)"
Please note: The example given in the MSDN seems pretty fishy, as you cannot generally implement a lock without atomic exchange. (As also pointed out by Alex.) This still leaves the question wrt. to the validity of the other infos given in this MSDN article, especially for use cases like here and here.)
Additionally, there are the docs for The Interlocked* functions, especially InterlockedExchange with takes a volatile(!?) variable and does an atomic read+write. (Note that one question we have on SO -- When should InterlockedExchange be used? -- does not authoritatively answer whether this function is needed for a read-only or write-only atomic access.)
What's more, the volatile docs quoted above somehow allude to "global or static object", where I would have thought that "real" acquire/release semantics should apply to all values.
Back to the question
On Windows, with Visual C++ (2005 - 2010), will declaring a (32bit? int?) variable as volatile allow for atomic reads and writes to this variable -- or not?
What is especially important to me is that this should hold (or not) on Windows/VC++ independently of the processor or platform the program runs on. (That is, does it matter whether it's a WinXP/32bit or a Windows 2008R2/64bit running on Itanum2?)
Please back up your answer with verifiable information, links, test-cases!
Yes they are atomic on windows/vc++ (Assuming you meet alignment requirements etc or course)
However for a lock you would need an atomic test and set, or compare and exchange instuction or similar, not just an atomic update or read.
Otherwise there is no way to test the lock and claim it in one indivisable operation.
EDIT: As commented below, all aligned memory accesses on x86 of 32bit or below are atomic anyway. The key point is that volatile makes the memory accesses ordered. (Thanks for pointing this out in the comments)
As of Visual C++ 2005 volatile variables are atomic. But this only applies to this specific class of compilers and to x86/AMD64 platforms. PowerPC for example may reorder memory reads/writes and would require read/write barriers. I'm not familar what the semantics are for gcc-class compilers, but in any case using volatile for atomics is not very portable.
reference, see first remark "Microsoft Specific": http://msdn.microsoft.com/en-us/library/12a04hfd%28VS.80%29.aspx
A bit off-topic, but let's have a go anyway.
... there are the docs for The Interlocked* functions, especially InterlockedExchange which takes a volatile(!) variable ...
If you think about this:
void foo(int volatile*);
Does it say:
the argument must be a pointer to a volatile int, or
the argument may as well be a pointer to a volatile int?
The latter is the correct answer, since the function can be passed both pointers to volatile and non-volatile int's.
Hence, the fact that InterlockedExchangeX() has its argument volatile-qualified does not imply that it must operate on volatile integers only.
The point is probably to allow stuff like
singleton& get_instance()
{
static volatile singleton* instance;
static mutex instance_mutex;
if (!instance)
{
raii_lock lock(instance_mutex);
if (!instance) instance = new singleton;
}
return *instance;
}
which would break if instance was written to before initialization was complete. With MSVC semantics, you are guaranteed that as soon as you see instance != 0, the object has finished being initialized (which is not the case without proper barrier semantics, even with traditional volatile semantics).
This double-checked lock (anti-)pattern is quite common actually, and broken if you don't provide barrier semantics. However, if there are guarantees that accesses to volatile variables are acquire + release barriers, then it works.
Don't rely on such custom semantics of volatile though. I suspect this has been introduced not to break existing codebases. In any way, don't write locks according to MSDN example. It probably doesn't work (I doubt you can write a lock using just a barrier: you need atomic operations -- CAS, TAS, etc -- for that).
The only portable way to write the double-checked lock pattern is to use C++0x, which provides a suitable memory model, and explicit barriers.
under x86, these operations are guaranteed to be atomic without the need for LOCK based instructions such as Interlocked* (see intel's developer manuals 3A section 8.1):
basic memory operations will always be carried out atomically:
•
Reading or writing a byte
• Reading or writing a word aligned on a
16-bit boundary
• Reading or writing a doubleword aligned on a 32-bit boundary
The Pentium processor (and newer processors since) guarantees
that the following additional memory operations will always be carried
out atomically:
• Reading or writing a quadword aligned on a 64-bit
boundary
• 16-bit accesses to uncached memory locations that fit
within a 32-bit data bus
The P6 family processors (and newer
processors since) guarantee that the following additional memory
operation will always be carried out atomically:
• Unaligned 16-, 32-,
and 64-bit accesses to cached memory that fit within a cache line
This means volatile will only every serve to prevent caching and instruction reordering by the compiler (MSVC won't emit atomic operations for volatile variables, they need to be explicitly used).

Are mutex lock functions sufficient without volatile?

A coworker and I write software for a variety of platforms running on x86, x64, Itanium, PowerPC, and other 10 year old server CPUs.
We just had a discussion about whether mutex functions such as pthread_mutex_lock() ... pthread_mutex_unlock() are sufficient by themselves, or whether the protected variable needs to be volatile.
int foo::bar()
{
//...
//code which may or may not access _protected.
pthread_mutex_lock(m);
int ret = _protected;
pthread_mutex_unlock(m);
return ret;
}
My concern is caching. Could the compiler place a copy of _protected on the stack or in a register, and use that stale value in the assignment? If not, what prevents that from happening? Are variations of this pattern vulnerable?
I presume that the compiler doesn't actually understand that pthread_mutex_lock() is a special function, so are we just protected by sequence points?
Thanks greatly.
Update: Alright, I can see a trend with answers explaining why volatile is bad. I respect those answers, but articles on that subject are easy to find online. What I can't find online, and the reason I'm asking this question, is how I'm protected without volatile. If the above code is correct, how is it invulnerable to caching issues?
Simplest answer is volatile is not needed for multi-threading at all.
The long answer is that sequence points like critical sections are platform dependent as is whatever threading solution you're using so most of your thread safety is also platform dependent.
C++0x has a concept of threads and thread safety but the current standard does not and therefore volatile is sometimes misidentified as something to prevent reordering of operations and memory access for multi-threading programming when it was never intended and can't be reliably used that way.
The only thing volatile should be used for in C++ is to allow access to memory mapped devices, allow uses of variables between setjmp and longjmp, and to allow uses of sig_atomic_t variables in signal handlers. The keyword itself does not make a variable atomic.
Good news in C++0x we will have the STL construct std::atomic which can be used to guarantee atomic operations and thread safe constructs for variables. Until your compiler of choice supports it you may need to turn to the boost library or bust out some assembly code to create your own objects to provide atomic variables.
P.S. A lot of the confusion is caused by Java and .NET actually enforcing multi-threaded semantics with the keyword volatile C++ however follows suit with C where this is not the case.
Your threading library should include the apropriate CPU and compiler barriers on mutex lock and unlock. For GCC, a memory clobber on an asm statement acts as a compiler barrier.
Actually, there are two things that protect your code from (compiler) caching:
You are calling a non-pure external function (pthread_mutex_*()), which means that the compiler doesn't know that that function doesn't modify your global variables, so it has to reload them.
As I said, pthread_mutex_*() includes a compiler barrier, e.g: on glibc/x86 pthread_mutex_lock() ends up calling the macro lll_lock(), which has a memory clobber, forcing the compiler to reload variables.
If the above code is correct, how is it invulnerable to caching
issues?
Until C++0x, it is not. And it is not specified in C. So, it really depends on the compiler. In general, if the compiler does not guarantee that it will respect ordering constraints on memory accesses for functions or operations that involve multiple threads, you will not be able to write multithreaded safe code with that compiler. See Hans J Boehm's Threads Cannot be Implemented as a Library.
As for what abstractions your compiler should support for thread safe code, the wikipedia entry on Memory Barriers is a pretty good starting point.
(As for why people suggested volatile, some compilers treat volatile as a memory barrier for the compiler. It's definitely not standard.)
The volatile keyword is a hint to the compiler that the variable might change outside of program logic, such as a memory-mapped hardware register that could change as part of an interrupt service routine. This prevents the compiler from assuming a cached value is always correct and would normally force a memory read to retrieve the value. This usage pre-dates threading by a couple decades or so. I've seen it used with variables manipulated by signals as well, but I'm not sure that usage was correct.
Variables guarded by mutexes are guaranteed to be correct when read or written by different threads. The threading API is required to ensure that such views of variables are consistent. This access is all part of your program logic and the volatile keyword is irrelevant here.
With the exception of the simplest spin lock algorithm, mutex code is quite involved: a good optimized mutex lock/unlock code contains the kind of code even excellent programmer struggle to understand. It uses special compare and set instructions, manages not only the unlocked/locked state but also the wait queue, optionally uses system calls to go into a wait state (for lock) or wake up other threads (for unlock).
There is no way the average compiler can decode and "understand" all that complex code (again, with the exception of the simple spin lock) no matter way, so even for a compiler not aware of what a mutex is, and how it relates to synchronization, there is no way in practice a compiler could optimize anything around such code.
That's if the code was "inline", or available for analyse for the purpose of cross module optimization, or if global optimization is available.
I presume that the compiler doesn't actually understand that
pthread_mutex_lock() is a special function, so are we just protected
by sequence points?
The compiler does not know what it does, so does not try to optimize around it.
How is it "special"? It's opaque and treated as such. It is not special among opaque functions.
There is no semantic difference with an arbitrary opaque function that can access any other object.
My concern is caching. Could the compiler place a copy of _protected
on the stack or in a register, and use that stale value in the
assignment?
Yes, in code that act on objects transparently and directly, by using the variable name or pointers in a way that the compiler can follow. Not in code that might use arbitrary pointers to indirectly use variables.
So yes between calls to opaque functions. Not across.
And also for variables which can only be used in the function, by name: for local variables that don't have either their address taken or a reference bound to them (such that the compiler cannot follow all further uses). These can indeed be "cached" across arbitrary calls include lock/unlock.
If not, what prevents that from happening? Are variations of this
pattern vulnerable?
Opacity of the functions. Non inlining. Assembly code. System calls. Code complexity. Everything that make compilers bail out and think "that's complicated stuff just make calls to it".
The default position of a compiler is always the "let's execute stupidly I don't understand what is being done anyway" not "I will optimize that/let's rewrite the algorithm I know better". Most code is not optimized in complex non local way.
Now let's assume the absolute worse (from out point of view which is that the compiler should give up, that is the absolute best from the point of view of an optimizing algorithm):
the function is "inline" (= available for inlining) (or global optimization kicks in, or all functions are morally "inline");
no memory barrier is needed (as in a mono-processor time sharing system, and in a multi-processor strongly ordered system) in that synchronization primitive (lock or unlock) so it contains no such thing;
there is no special instruction (like compare and set) used (for example for a spin lock, the unlock operation is a simple write);
there is no system call to pause or wake threads (not needed in a spin lock);
then we might have a problem as the compiler could optimize around the function call. This is fixed trivially by inserting a compiler barrier such as an empty asm statement with a "clobber" for other accessible variables. That means that compiler just assumes that anything that might be accessible to a called function is "clobbered".
or whether the protected variable needs to be volatile.
You can make it volatile for the usual reason you make things volatile: to be certain to be able to access the variable in the debugger, to prevent a floating point variable from having the wrong datatype at runtime, etc.
Making it volatile would actually not even fix the issue described above as volatile is essentially a memory operation in the abstract machine that has the semantics of an I/O operation and as such is only ordered with respect to
real I/O like iostream
system calls
other volatile operations
asm memory clobbers (but then no memory side effect is reordered around those)
calls to external functions (as they might do one the above)
Volatile is not ordered with respect to non volatile memory side effects. That makes volatile practically useless (useless for practical uses) for writing thread safe code in even the most specific case where volatile would a priori help, the case where no memory fence is ever needed: when programming threading primitives on a time sharing system on a single CPU. (That may be one of the least understood aspects of either C or C++.)
So while volatile does prevent "caching", volatile doesn't even prevent compiler reordering of lock/unlock operation unless all shared variables are volatile.
Locks/synchronisation primitives make sure the data is not cached in registers/cpu cache, that means data propagates to memory. If two threads are accessing/ modifying data with in locks, it is guaranteed that data is read from memory and written to memory. We don't need volatile in this use case.
But the case where you have code with double checks, compiler can optimise the code and remove redundant code, to prevent that we need volatile.
Example: see singleton pattern example
https://en.m.wikipedia.org/wiki/Singleton_pattern#Lazy_initialization
Why do some one write this kind of code?
Ans: There is a performance benefit of not accuiring lock.
PS: This is my first post on stack overflow.
Not if the object you're locking is volatile, eg: if the value it represents depends on something foreign to the program (hardware state).
volatile should NOT be used to denote any kind of behavior that is the result of executing the program.
If it's actually volatile what I personally would do is locking the value of the pointer/address, instead of the underlying object.
eg:
volatile int i = 0;
// ... Later in a thread
// ... Code that may not access anything without a lock
std::uintptr_t ptr_to_lock = &i;
some_lock(ptr_to_lock);
// use i
release_some_lock(ptr_to_lock);
Please note that it only works if ALL the code ever using the object in a thread locks the same address. So be mindful of that when using threads with some variable that is part of an API.

assignment in pthreads application

I have a linux multithread application in C++.
In this application in class App offer variable Status:
class App {
...
typedef enum { asStop=0, asStart, asRestart, asWork, asClose } TAppStatus;
TAppStatus Status;
...
}
All threads are often check Status by calling GetStatus() function.
inline TAppStatus App::GetStatus(){ return Status };
Other functions of the application can assign a different values to a Status variable by calling SetStatus() function and do not use Mutexes.
void App::SetStatus( TAppStatus aStatus ){ Status=aStatus };
Edit: All threads use Status in switch operator:
switch ( App::GetStatus() ){ case asStop: ... case asStart: ... };
Is the assignment in this case, an atomic operation?
Is this correct code?
Thanks.
There is no portable way to implement synchronized variables in C99 or C++03 and pthread library does not provide one either. You can:
Use C++0x <atomic> header (or C1x <stdatomic.h>). Gcc does support it for C++ if given -std=c++0x or -std=gnu++0x option since version 4.4.
Use the Linux-specific <linux/atomic.h> (this is implementation used by kernel, but it should be usable from userland as well).
Use GCC-specific __sync_* builtin functions.
Use some other library that provides atomic operations like glib.
Use locks, but that's orders of magnitude slower compared to the fast operation itself.
Note: As Martinho pointed out, while they are called "atomic", for store and load it's not the atomic property (operation cannot be interrupted and load always sees or does not see the whole store, which is usually true of 32-bit stores and loads) but the ordering property (if you store a and than b, nobody may get new value of b and than old value of a) that is hard to get but necessary in this case.
This depends entirely upon the enum representation chosen. For x86, I believe, then all assignment operations of the operating system's word size (so 32bit for x86 and 64bit for x64) and alignment of that size as well are atomic, so a simple read and write is atomic.
Even assuming that it is the correct size and alignment, this doesn't mean that these functions are thread-safe, depends on what the status is used for.
Edit: In addition, the compiler's optimizer may well wreak havoc if there are no uses of atomic operations or other volatile accesses.
Edit to your edit: No, that's not thread safe at all. If you converted it manually into a jump table then you might be thread-safe, I'll need to think about it for a little while.
On certain architecture this assignment might be atomic (by accident), but even if it is, this code is wrong. Compiler and hardware may perform various optimizations, with might break this "atomicity". Look at: http://video.google.com/videoplay?docid=-4714369049736584770#
Use locks or atomic http://www.stdthread.co.uk/doc/headers/atomic/atomic.html variable to fix it.