Atomic Operations in C++ - c++

I have a set of C++ functions:
funcB(){};
funcC(){};
funcA()
{
funcB();
funcC();
}
Now I want to make funcA atomic, ie funcB and funcC calls inside funcA should be executed atomically. Is there any way to achieve this?

One way you can accomplish this is to use the new (C++11) features std::mutex and std::lock_guard.
For each protected resource, you instantiate a single global std::mutex; each thread then locks that mutex, as it requires, by the creation of a std::lock_guard:
#include <thread>
#include <iostream>
#include <mutex>
#include <vector>
// A single mutex, shared by all threads. It is initialized
// into the "unlocked" state
std::mutex m;
void funcB() {
std::cout << "Hello ";
}
void funcC() {
std::cout << "World." << std::endl;
}
void funcA(int i) {
// The creation of lock_guard locks the mutex
// for the lifetime of the lock_guard
std::lock_guard<std::mutex> l(m);
// Now only a single thread can run this code
std::cout << i << ": ";
funcB();
funcC();
// As we exit this scope, the lock_guard is destroyed,
// the mutex is unlocked, and another thread is allowed to run
}
int main () {
std::vector<std::thread> vt;
// Create and launch a bunch of threads
for(int i =0; i < 10; i++)
vt.push_back(std::thread(funcA, i));
// Wait for all of them to complete
for(auto& t : vt)
t.join();
}
Notes:
In your example some code unrelated to funcA could invoke either funcB or funcC without honoring the lock that funcA set.
Depending upon how your program is structured, you may want to manage the lifetime of the mutex differently. As an example, it might want to be a class member of the class that includes funcA.

In general, NO. Atomic operations are very precisely defined. What you want is a semaphore or a mutex.

If you are using GCC 4.7 than you can use the new Transactional Memory feature to do the following:
Transactional memory is intended to make programming with threads simpler, in particular synchronizing access to data shared between several threads using transactions. As with databases, a transaction is a unit of work that either completes in its entirety or has no effect at all (i.e., transactions execute atomically). Further, transactions are isolated from each other such that each transaction sees a consistent view of memory.
Currently, transactions are only supported in C++ and C in the form of transaction statements, transaction expressions, and function transactions. In the following example, both a and b will be read and the difference will be written to c, all atomically and isolated from other transactions:
__transaction_atomic { c = a - b; }
Therefore, another thread can use the following code to concurrently update b without ever causing c to hold a negative value (and without having to use other synchronization constructs such as locks or C++11 atomics):
__transaction_atomic { if (a > b) b++; }
The precise semantics of transactions are defined in terms of the C++11/C1X memory model (see below for a link to the specification). Roughly, transactions provide synchronization guarantees that are similar to what would be guaranteed when using a single global lock as a guard for all transactions. Note that like other synchronization constructs in C/C++, transactions rely on a data-race-free program (e.g., a nontransactional write that is concurrent with a transactional read to the same memory location is a data race).
More info: http://gcc.gnu.org/wiki/TransactionalMemory

Related

Updating two atomic variables under a condition in C++

I want to update two atomic variables under an if condition and the if condition uses one of the atomic variable. I am not sure if both these atomic variables will be updated together or not.
I have a multithreaded code below. In "if(local > a1)" a1 is an atomic variable so will reading it in if condition be atomic across threads, In other words if thread t1 is at the if condition, will thread t2 wait for a1 to be updated by thread t1? Is it possible that a2 is updated by one thread and a1 is updated by another thread?
// constructing atomics
#include <iostream> // std::cout
#include <atomic> // std::atomic
#include <thread> // std::thread
#include <vector> // std::vector
std::atomic<int> a1{0};
std::atomic<int> a2{0};
void count1m (int id) {
double local = id;
double local2 = id*3;
*if(local > a1) {* // a1 is an atomic variable so will reading it in if condition be atomic across threads or not?
a1 = local;
a2 = local2;
}
};
int main ()
{
std::vector<std::thread> threads;
std::cout << "spawning 20 threads that count to 1 million...\n";
for (int i=20; i>=0; --i) {
threads.push_back(std::thread(count1m,i));
}
for (auto& th : threads) th.join();
cout << "a1 = " << a1 << endl;
}
I am not sure if both these atomic variables will be updated together or not.
Not.
Atomic means indivisible, in that writes to an atomic can't be read half-done, in an intermediate or incomplete state.
However, updates to one atomic aren't somehow batched with updates to another atomic. How could the compiler tell which updates were supposed to be batched like this?
If you have two atomic variables, you have two independent objects neither of which can individually be observed to have a part-written state. You can still read them both and see a state where another thread has updated one but not the other, even if the stores are adjacent in the code.
Possibilities are:
Just use a mutex.
You ruled this out in a comment, but I'm going to mention it for completeness and because it's by far the easiest way.
Pack both objects into a single atomic.
Note that a 128-bit object (large enough for two binary64 doubles) may have to use a mutex or similar synchronization primitive internally, if your platform doesn't have native 128-bit atomics. You can check with std::atomic<DoublePair>::is_lock_free() to find out (for a suitable struct DoublePair containing a pair of doubles).
Whether a non-lock-free atomic is acceptable under your mutex prohibition I cannot guess.
Concoct an elaborate lock-free synchronization protocol, such as:
storing the index into a circular array of DoublePair objects and atomically updating that (there are various schemes for this with multiple producers, but single producer is definitely simpler - and don't forget A-B-A protection)
using a raw futex, or a semaphore, or some other technically-not-a-mutex synchronization primitive that already exists
using atomics to write a spinlock (again not technically a mutex, but again I can't guess whether it's actually suitable for you)
The main issue is that you've said you're not allowed to use a mutex, but haven't said why. Does the code have to be lock-free? Wait-free? Does someone just really hate std::mutex but will accept any other synchronization primitive?
There are basically two ways to do this and they are different.
The first way is to create an atomic struct that will be updated at once. Note that with this approach there is a race condition where the comparison between local and aip.a1 might change before aip is updated.
struct IntPair {
int a1;
int a2;
};
std::atomic<IntPair> aip = IntPair{0,0};
void count1m (int id) {
double local = id;
double local2 = id*3;
if(local > aip.load().a1) {
aip = IntPair{int(local),int(local2)};
}
};
The second approach is to use a mutex to synchronize the entire section, like below. This will guarantee that no race condition occurs and everything is done atomically. We used a std::lock_guard for better safety rather than calling m.lock() and m.unlock() manually.
IntPair ip{0,0};
std::mutex m;
void count1m (int id) {
double local = id;
double local2 = id*3;
std::lock_guard<std::mutex> g(m);
if(local > ip.a1) {
ip = IntPair{int(local),int(local2)};
}
};

Difference between shared mutex and mutex (why do both exist in C++ 11)?

Haven't got an example online to demonstrate this vividly. Saw an example at http://en.cppreference.com/w/cpp/header/shared_mutex but
it is still unclear. Can somebody help?
By use of normal mutexes, you can guarantee exclusive access to some kind of critical resource – and nothing else. Shared mutexes extend this feature by allowing two levels of access: shared and exclusive as follows:
Exclusive access prevents any other thread from acquiring the mutex, just as with the normal mutex. It does not matter if the other thread tries to acquire shared or exclusive access.
Shared access allows multiple threads to acquire the mutex, but all of them only in shared mode. Exclusive access is not granted until all of the previous shared holders have returned the mutex (typically, as long as an exclusive request is waiting, new shared ones are queued to be granted after the exclusive access).
A typical scenario is a database: It does not matter if several threads read one and the same data simultaneously. But modification of the database is critical - if some thread reads data while another one is writing it might receive inconsistent data. So all reads must have finished before writing is allowed and new reading must wait until writing has finished. After writing, further reads can occur simultaneously again.
Edit: Sidenote:
Why readers need a lock?
This is to prevent the writer from acquiring the lock while reading yet occurs. Additionally, it prevents new readers from acquiring the lock if it is yet held exclusively.
A shared mutex has two levels of access 'shared' and 'exclusive'.
Multiple threads can acquire shared access but only one can hold 'exclusive' access (that includes there being no shared access).
The common scenario is a read/write lock. Recall that a Data Race can only occur when two threads access the same data at least one of which is a write.
The advantage of that is data may be read by many readers but when a writer needs access they must obtain exclusive access to the data.
Why have both? One the one hand the exclusive lock constitutes a normal mutex so arguably only Shared is needed. But there may be overheads in an shared lock implementation that can be avoided using the less featured type.
Here's an example (adapted slightly from the example here http://en.cppreference.com/w/cpp/thread/shared_mutex).
#include <iostream>
#include <mutex>
#include <shared_mutex>
#include <thread>
std::mutex cout_mutex;//Not really part of the example...
void log(const std::string& msg){
std::lock_guard guard(cout_mutex);
std::cout << msg << std::endl;
}
class ThreadSafeCounter {
public:
ThreadSafeCounter() = default;
// Multiple threads/readers can read the counter's value at the same time.
unsigned int get() const {
std::shared_lock lock(mutex_);//NB: std::shared_lock will shared_lock() the mutex.
log("get()-begin");
std::this_thread::sleep_for(std::chrono::milliseconds(500));
auto result=value_;
log("get()-end");
return result;
}
// Only one thread/writer can increment/write the counter's value.
void increment() {
std::unique_lock lock(mutex_);
value_++;
}
// Only one thread/writer can reset/write the counter's value.
void reset() {
std::unique_lock lock(mutex_);
value_ = 0;
}
private:
mutable std::shared_mutex mutex_;
unsigned int value_ = 0;
};
int main() {
ThreadSafeCounter counter;
auto increment_and_print = [&counter]() {
for (int i = 0; i < 3; i++) {
counter.increment();
auto ctr=counter.get();
{
std::lock_guard guard(cout_mutex);
std::cout << std::this_thread::get_id() << ' ' << ctr << '\n';
}
}
};
std::thread thread1(increment_and_print);
std::thread thread2(increment_and_print);
std::thread thread3(increment_and_print);
thread1.join();
thread2.join();
thread3.join();
}
Possible partial output:
get()-begin
get()-begin
get()-end
140361363867392 2
get()-end
140361372260096 2
get()-begin
get()-end
140361355474688 3
//Etc...
Notice how the two get()-begin() return show that two threads are holding the shared lock during the read.
"Shared mutexes are usually used in situations when multiple readers can access the same resource at the same time without causing data races, but only one writer can do so."
cppreference.com
This is useful when you need read/writer lock: https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock

std::lock_guard example, explanation on why it works

I've reached a point in my project that requires communication between threads on resources that very well may be written to, so synchronization is a must. However I don't really understand synchronization at anything other than the basic level.
Consider the last example in this link: http://www.bogotobogo.com/cplusplus/C11/7_C11_Thread_Sharing_Memory.php
#include <iostream>
#include <thread>
#include <list>
#include <algorithm>
#include <mutex>
using namespace std;
// a global variable
std::list<int>myList;
// a global instance of std::mutex to protect global variable
std::mutex myMutex;
void addToList(int max, int interval)
{
// the access to this function is mutually exclusive
std::lock_guard<std::mutex> guard(myMutex);
for (int i = 0; i < max; i++) {
if( (i % interval) == 0) myList.push_back(i);
}
}
void printList()
{
// the access to this function is mutually exclusive
std::lock_guard<std::mutex> guard(myMutex);
for (auto itr = myList.begin(), end_itr = myList.end(); itr != end_itr; ++itr ) {
cout << *itr << ",";
}
}
int main()
{
int max = 100;
std::thread t1(addToList, max, 1);
std::thread t2(addToList, max, 10);
std::thread t3(printList);
t1.join();
t2.join();
t3.join();
return 0;
}
The example demonstrates how three threads, two writers and one reader, accesses a common resource(list).
Two global functions are used: one which is used by the two writer threads, and one being used by the reader thread. Both functions use a lock_guard to lock down the same resource, the list.
Now here is what I just can't wrap my head around: The reader uses a lock in a different scope than the two writer threads, yet still locks down the same resource. How can this work? My limited understanding of mutexes lends itself well to the writer function, there you got two threads using the exact same function. I can understand that, a check is made right as you are about to enter the protected area, and if someone else is already inside, you wait.
But when the scope is different? This would indicate that there is some sort of mechanism more powerful than the process itself, some sort of runtime environment blocking execution of the "late" thread. But I thought there were no such things in c++. So I am at a loss.
What exactly goes on under the hood here?
Let’s have a look at the relevant line:
std::lock_guard<std::mutex> guard(myMutex);
Notice that the lock_guard references the global mutex myMutex. That is, the same mutex for all three threads. What lock_guard does is essentially this:
Upon construction, it locks myMutex and keeps a reference to it.
Upon destruction (i.e. when the guard's scope is left), it unlocks myMutex.
The mutex is always the same one, it has nothing to do with the scope. The point of lock_guard is just to make locking and unlocking the mutex easier for you. For example, if you manually lock/unlock, but your function throws an exception somewhere in the middle, it will never reach the unlock statement. So, doing it the manual way you have to make sure that the mutex is always unlocked. On the other hand, the lock_guard object gets destroyed automatically whenever the function is exited – regardless how it is exited.
myMutex is global, which is what is used to protect myList. guard(myMutex) simply engages the lock and the exit from the block causes its destruction, dis-engaging the lock. guard is just a convenient way to engage and dis-engage the lock.
With that out of the way, mutex does not protect any data. It just provides a way to protect data. It is the design pattern that protects data. So if I write my own function to modify the list as below, the mutex cannot protect it.
void addToListUnsafe(int max, int interval)
{
for (int i = 0; i < max; i++) {
if( (i % interval) == 0) myList.push_back(i);
}
}
The lock only works if all pieces of code that need to access the data engage the lock before accessing and disengage after they are done. This design-pattern of engaging and dis-engaging the lock before and after every access is what protects the data (myList in your case)
Now you would wonder, why use mutex at all, and why not, say, a bool. And yes you can, but you will have to make sure that the bool variable will exhibit certain characteristics including but not limited to the below list.
Not be cached (volatile) across multiple threads.
Read and write will be atomic operation.
Your lock can handle situation where there are multiple execution pipelines (logical cores, etc).
There are different synchronization mechanisms that provide "better locking" (across processes versus across threads, multiple processor versus, single processor, etc) at a cost of "slower performance", so you should always choose a locking mechanism which is just about enough for your situation.
Just to add onto what others here have said...
There is an idea in C++ called Resource Acquisition Is Initialization (RAII) which is this idea of binding resources to the lifetime of objects:
Resource Acquisition Is Initialization or RAII, is a C++ programming technique which binds the life cycle of a resource that must be acquired before use (allocated heap memory, thread of execution, open socket, open file, locked mutex, disk space, database connection—anything that exists in limited supply) to the lifetime of an object.
C++ RAII Info
The use of a std::lock_guard<std::mutex> class follows the RAII idea.
Why is this useful?
Consider a case where you don't use a std::lock_guard:
std::mutex m; // global mutex
void oops() {
m.lock();
doSomething();
m.unlock();
}
in this case, a global mutex is used and is locked before the call to doSomething(). Then once doSomething() is complete the mutex is unlocked.
One problem here is what happens if there is an exception? Now you run the risk of never reaching the m.unlock() line which releases the mutex to other threads.
So you need to cover the case where you run into an exception:
std::mutex m; // global mutex
void oops() {
try {
m.lock();
doSomething();
m.unlock();
} catch(...) {
m.unlock(); // now exception path is covered
// throw ...
}
}
This works but is ugly, verbose, and inconvenient.
Now lets write our own simple lock guard.
class lock_guard {
private:
std::mutex& m;
public:
lock_guard(std::mutex& m_):(m(m_)){ m.lock(); } // lock on construction
~lock_guard() { t.unlock(); }} // unlock on deconstruction
}
When the lock_guard object is destroyed, it will ensure that the mutex is unlocked.
Now we can use this lock_guard to handle the case from before in a better/cleaner way:
std::mutex m; // global mutex
void ok() {
lock_guard lk(m); // our simple lock guard, protects against exception case
doSomething();
} // when scope is exited our lock guard object is destroyed and the mutex unlocked
This is the same idea behind std::lock_guard.
Again this approach is used with many different types of resources which you can read more about by following the link on RAII.
This is precisely what a lock does. When a thread takes the lock, regardless of where in the code it does so, it must wait its turn if another thread holds the lock. When a thread releases a lock, regardless of where in the code it does so, another thread may acquire that lock.
Locks protect data, not code. They do it by ensuring all code that accesses the protected data does so while it holds the lock, excluding other threads from any code that might access that same data.

Does relaxed memory order effect can be extended to after performing-thread's life?

Let's say inside a C++11 program, we have a main thread named A that launches an asynchronous thread named B. Inside thread B, we perform an atomic store on an atomic variable with std::memory_order_relaxed memory order. Then thread A joins with thread B. Then thread A launches another thread named C that performs an atomic load operation with std::memory_order_relaxed memory order. Is it possible that thread C loaded content is different from the content written by thread B? In other words, does relaxed memory consistency here extends to even after the life of a thread?
To try this, I wrote a simple program and ran it with many tries. The program does not report a mismatch. I'm thinking since thread A imposes an order in launch of threads, mismatch cannot happen. However, I'm not sure of it.
#include <atomic>
#include <iostream>
#include <future>
int main() {
static const int nTests = 100000;
std::atomic<int> myAtomic( 0 );
auto storeFunc = [&]( int inNum ){
myAtomic.store( inNum, std::memory_order_relaxed );
};
auto loadFunc = [&]() {
return myAtomic.load( std::memory_order_relaxed );
};
for( int ttt = 1; ttt <= nTests; ++ttt ) {
auto writingThread = std::async( std::launch::async, storeFunc, ttt );
writingThread.get();
auto readingThread = std::async( std::launch::async, loadFunc );
auto readVal = readingThread.get();
if( readVal != ttt ) {
std::cout << "mismatch!\t" << ttt << "\t!=\t" << readVal << "\n";
return 1;
}
}
std::cout << "done.\n";
return 0;
}
Before portable threading platforms generally offered you the ability to specify memory visibility or place explicit memory barriers, portable synchronization was accomplished exclusively with explicit synchronization (things like mutexes) and implicit synchronization.
Generally, before a thread is created, some data structures are set up that the thread will access when it starts up. To avoid having to use a mutex just to implement this common pattern, thread creation was defined as an implicitly synchronizing event. It's equally common to join a thread and then look at some results it computed. Again, to avoid having to use a mutex just to implement this common pattern, joining a thread is defined as an implicitly synchronizing event.
Since thread creation and structure is defined as a synchronizing operation, joining a thread necessarily happens after that thread terminates. Thus you will see anything that necessarily happened before the thread terminated. The same is true of code that changes some variables and then creates a thread -- the new thread necessarily sees all the changes that happened before it was created. Synchronization on thread creation or termination is just like synchronization on a mutex. Synchronizing operations create this kinds of ordering relationships that ensure memory visibility.
As SergeyA mentioned, you should definitely never try to prove something in the multithreaded world by testing. Certainly if a test fails, that proves you can't rely on the thing you tested. But even if a test succeeds every way you can think of to test it, that doesn't mean it won't fail on some platform, CPU, or library that you didn't test. You can never prove something like this is reliable by that kind of testing.
If you want to test something like this, there are model checkers you can use to explore all possible executions (subject to some esoteric limitations) for a test case.
See http://plrg.eecs.uci.edu/c11modelchecker.html

Boost Mutex Scoped Lock

I was reading through a Boost Mutex tutorial on drdobbs.com, and found this piece of code:
#include <boost/thread/thread.hpp>
#include <boost/thread/mutex.hpp>
#include <boost/bind.hpp>
#include <iostream>
boost::mutex io_mutex;
void count(int id)
{
for (int i = 0; i < 10; ++i)
{
boost::mutex::scoped_lock
lock(io_mutex);
std::cout << id << ": " <<
i << std::endl;
}
}
int main(int argc, char* argv[])
{
boost::thread thrd1(
boost::bind(&count, 1));
boost::thread thrd2(
boost::bind(&count, 2));
thrd1.join();
thrd2.join();
return 0;
}
Now I understand the point of a Mutex is to prevent two threads from accessing the same resource at the same time, but I don't see the correlation between io_mutex and std::cout. Does this code just lock everything within the scope until the scope is finished?
Now I understand the point of a Mutex is to prevent two threads from accessing the same resource at the same time, but I don't see the correlation between io_mutex and std::cout.
std::cout is a global object, so you can see that as a shared resource. If you access it concurrently from several threads, those accesses must be synchronized somehow, to avoid data races and undefined behavior.
Perhaps it will be easier for you to notice that concurrent access occurs by considering that:
std::cout << x
Is actually equivalent to:
::operator << (std::cout, x)
Which means you are calling a function that operates on the std::cout object, and you are doing so from different threads at the same time. std::cout must be protected somehow. But that's not the only reason why the scoped_lock is there (keep reading).
Does this code just lock everything within the scope until the scope is finished?
Yes, it locks io_mutex until the lock object itself goes out of scope (being a typical RAII wrapper), which happens at the end of each iteration of your for loop.
Why is it needed? Well, although in C++11 individual insertions into cout are guaranteed to be thread-safe, subsequent, separate insertions may be interleaved when several threads are outputting something.
Keep in mind that each insertion through operator << is a separate function call, as if you were doing:
std::cout << id;
std::cout << ": ";
std::cout << i;
std::cout << endl;
The fact that operator << returns the stream object allows you to chain the above function calls in a single expression (as you have done in your program), but the fact that you are having several separate function calls still holds.
Now looking at the above snippet, it is more evident that the purpose of this scoped lock is to make sure that each message of the form:
<id> ": " <index> <endl>
Gets printed without its parts being interleaved with parts from other messages.
Also, in C++03 (where insertions into cout are not guaranteed to be thread-safe) , the lock will protect the cout object itself from being accessed concurrently.
A mutex has nothing to do with anything else in the program
(except a conditional variable), at least at a higher level.
A mutex has two effeccts: it controls program flow, and prevents
multiple threads from executing the same block of code
simultaneously. It also ensures memory synchronization. The
important issue here, is that mutexes aren't associated with
resources, and don't prevent two threads from accessing the same
resource at the same time. A mutex defines a critical section
of code, which can only be entered by one thread at a time. If
all of the use of a particular resource is done in critical
sections controled by the same mutex, then the resource is
effectively protected by the mutex. But the relationship is
established by the coder, by ensuring that all use does take
place in the critical sections.