How to make threads with std::execution::par_unseq thread-safe?

How to make threads with std::execution::par_unseq thread-safe? - c++

I am reading C++ concurrency in action.
It says that when you use std::execution::par, you can use mutex per internal element like below.
#include <mutex>
#include <vector>
class X {
mutable std::mutex m;
int data;
public:
X() : data(0) {}
int get_value() const {
std::lock_guard guard(m);
return data;
}
void increment() {
std::lock_guard guard(m);
++data;
}
};
void increment_all(std::vector<X>& v) {
std::for_each(v.begin(), v.end(), [](X& x) { x.increment(); });
}
But it says that when you use std::execution::par_unseq, you have to replace this mutex with a whole-container mutex like below
#include <mutex>
#include <vector>
class Y {
int data;
public:
Y() : data(0) {}
int get_value() const { return data; }
void increment() { ++data; }
};
class ProtectedY {
std::mutex m;
std::vector<Y> v;
public:
void lock() { m.lock(); }
void unlock() { m.unlock(); }
std::vector<Y>& get_vec() { return v; }
};
void incremental_all(ProtectedY& data) {
std::lock_guard<ProtectedY> guard(data);
auto& v = data.get_vec();
std::for_each(std::execution::par_unseq, v.begin(), v.end(),
[](Y& y) { y.increment(); });
}
But even if you use second version, y.increament() in parallel algorithm thread also has data race condition because there is no lock among parallel algorithm threads.
How does this second version with std::execution::par_unseq can be thread safe ?

It is only thread-safe because you do not access shared data in the parallel algorithm.
The only thing being executed in parallel are the calls to y.increment(). These can happen in any order, on any thread and be arbitrarily interleaved with each other, even within a single thread. But y.increment() only accesses private data of y, and each y is distinct from all the other vector elements. So there is no opportunity for data races here, because there is no "overlap" between the individual elements.
A different example would be if the increment function also accesses some global state that is being shared between all the different elements of the vector. In that case, there is now a potential for a data race, so access to that shared global state needs to be synchronized. But because of the specific requirements of the parallel unsequenced policy, you can't just use a mutex for synchronizing here.
Note that if a mutex is being used in the context of parallel algorithms it may protect against different hazards: One use is using a mutex to synchronize among the different threads executing the for-each. This works for the parallel execution policy, but not for parallel-unsequenced. This is not the use case in your examples, as in your case no data is shared, so we don't need any synchronization. Instead in your examples the mutex only synchronizes the invocation of the for-each against any other threads that might still be running as part of a larger application, but there is no synchronization within the for-each itself. This is a valid use case for both parallel and parallel-unsequenced, but in the latter case, it cannot be achieved by using per-element mutexes.

Related

Deadlock simulation using std::mutex

I have following example:
template <typename T>
class container
{
public:
std::mutex _lock;
std::set<T> _elements;
void add(T element)
{
_elements.insert(element);
}
void remove(T element)
{
_elements.erase(element);
}
};
void exchange(container<int>& cont1, container<int>& cont2, int value)
{
cont1._lock.lock();
std::this_thread::sleep_for(std::chrono::seconds(1));
cont2._lock.lock();
cont1.remove(value);
cont2.add(value);
cont1._lock.unlock();
cont2._lock.unlock();
}
int main()
{
container<int> cont1, cont2;
cont1.add(1);
cont2.add(2);
std::thread t1(exchange, std::ref(cont1), std::ref(cont2), 1);
std::thread t2(exchange, std::ref(cont2), std::ref(cont1), 2);
t1.join();
t2.join();
return 0;
}
In this case I'm expiriencing a deadlock. But when I use std::lock_guard instead of manually locking and unlocking mutextes I have no deadlock. Why?
void exchange(container<int>& cont1, container<int>& cont2, int value)
{
std::lock_guard<std::mutex>(cont1._lock);
std::this_thread::sleep_for(std::chrono::seconds(1));
std::lock_guard<std::mutex>(cont2._lock);
cont1.remove(value);
cont2.add(value);
}

Your two code snippets are not comparable. The second snippet locks and immediately unlocks each mutex as the temporary lock_guard object is destroyed at the semicolon:
std::lock_guard<std::mutex>(cont1._lock); // temporary object
The correct way to use lock guards is to make scoped variables of them:
{
std::lock_guard<std::mutex> lock(my_mutex);
// critical section here
} // end of critical section, "lock" is destroyed, calling mutex.unlock()
(Note that there is another common error that's similar but different:
std::mutex mu;
// ...
std::lock_guard(mu);
This declares a variable named mu (just like int(n);). However, this code is ill-formed because std::lock_guard does not have a default constructor. But it would compile with, say, std::unique_lock, and it also would not end up locking anything.)
Now to address the real problem: How do you lock multiple mutexes at once in consistent order? It may not be feasible to agree on a single lock order across an entire codebase, or even across a future user's codebase, or even in local cases as your example shows. In such cases, use the std::lock algorithm:
std::mutex mu1;
std::mutex mu2;
void f()
{
std::lock(mu1, mu2);
// order below does not matter
std::lock_guard<std::mutex> lock1(mu1, std::adopt_lock);
std::lock_guard<std::mutex> lock2(mu2, std::adopt_lock);
}
In C++17 there is a new variadic lock guard template called scoped_lock:
void f_17()
{
std::scoped_lock lock(mu1, mu2);
// ...
}
The constructor of scoped_lock uses the same algorithm as std::lock, so the two can be used compatibly.

While Kerrek SB's answer is entirely valid I thought I'd throw an alternative hat in the ring. std::lock or any try-and-retreat deadlock avoidance strategies should be seen as the last resort from a performance perspective.
How about:
#include <functional> //includes std::less<T> template.
static const std::less<void*> l;//comparison object. See note.
void exchange(container<int>& cont1, container<int>& cont2, int value)
{
if(&cont1==&cont2) {
return; //aliasing protection.
}
std::unique_lock<std::mutex> lock1(cont1._lock, std::defer_lock);
std::unique_lock<std::mutex> lock2(cont2._lock, std::defer_lock);
if(l(&cont1,&cont2)){//in effect portal &cont1<&cont2
lock1.lock();
std::this_thread::sleep_for(std::chrono::seconds(1));
lock2.lock();
}else{
lock2.lock();
std::this_thread::sleep_for(std::chrono::seconds(1));
lock1.lock();
}
cont1.remove(value);
cont2.add(value);
}
This code uses the memory address of the objects to determine an arbitrary but consistent lock order. This approach can (of course) be generalized.
Note also that in reusable code the aliasing protection is necessary because the version where cont1 is cont2 would be invalid by trying to lock the same lock twice. std::mutex cannot be assumed to be a recursive lock and normally isn't.
NB: The use of std::less<void> ensures compliance as it guarantees a consistent total ordering of addresses. Technically (&cont1<&cont2) is unspecified behavior. Thanks Kerrek SB!

mutex or atomic in const member function

I am reading Item 16 in Scott Meyers's Effective Modern C++.
In the later part of the item, he says
For a single variable or memory location requiring synchronization, use of a std::atomic is adequate, but once you get to two or more variables or memory locations that require manipulation as a unit, you should reach for
a mutex.
But I still don't see why it is adequate in the case of a single variable or memory location, take the polynomial example in this item
class Polynomial {
public:
using RootsType = std::vector<double>;
RootsType roots() const
{
if (!rootsAreValid) { // if cache not valid
.... // **very expensive compuation**, computing roots,
// store them in rootVals
rootsAreValid = true;
}
return rootVals;
}
private:
mutable std::atomic<bool> rootsAreValid{ false };
mutable RootsType rootVals{};
};
My question is:
If thread 1 is in the middle of heavily computing the rootVals before the rootAreValid gets assigned to true, and thread 2 also calls function roots(), and evaluates rootAreValid to false, then thread 2 will also steps into the heavy computation of rootVals, so for this case how an atomic bool is adequate? I still think a std::lock_guard<mutex> is needed to protect the entry to the rootVals computation.

In your example there are two variables being synchronized : rootVals and rootsAreValid. That particular item is referring to the case where only the atomic value requires synchronization. For example :
#include <atomic>
class foo
{
public:
void work()
{
++times_called;
/* multiple threads call this to do work */
}
private:
// Counts the number of times work() was called
std::atomic<int> times_called{0};
};
times_called is the only variable in this case.

I suggest you to avoid unnecessary heavy computation using the following code:
class Polynomial {
public:
using RootsType = std::vector<double>;
RootsType roots() const
{
if (!rootsAreValid) { // Acquiring mutex usually is not cheap, so we check the state without locking
std::lock_guard<std::mutex> lock_guard(sync);
if (!rootsAreValid) // The state could changed because the mutex was not owned at the first check
{
.... // **very expensive compuation**, computing roots,
// store them in rootVals
}
rootsAreValid = true;
}
return rootVals;
}
private:
mutable std::mutex sync;
mutable std::atomic<bool> rootsAreValid{ false };
mutable RootsType rootVals{};
};

c++: Function that locks mutex for other function but can itself be executed in parallel

I have a question regarding thread safety and mutexes. I have two functions that may not be executed at the same time because this could cause problems:
std::mutex mutex;
void A() {
std::lock_guard<std::mutex> lock(mutex);
//do something (should't be done while function B is executing)
}
T B() {
std::lock_guard<std::mutex> lock(mutex);
//do something (should't be done while function A is executing)
return something;
}
Now the thing is, that function A and B should not be executed at the same time. That's why I use the mutex. However, it is perfectly fine if function B is called simultaneously from multiple threads. However, this is also prevented by the mutex (and I don't want this). Now, is there a way to ensure that A and B are not executed at the same time while still letting function B be executed multiple times in parallel?

If C++14 is an option, you could use a shared mutex (sometimes called "reader-writer" mutex). Basically, inside function A() you would acquire a unique (exclusive, "writer") lock, while inside function B() you would acquire a shared (non-exclusive, "reader") lock.
As long as a shared lock exists, the mutex cannot be acquired exclusively by other threads (but can be acquired non-exclusively); as long as an exclusive locks exist, the mutex cannot be acquired by any other thread anyhow.
The result is that you can have several threads concurrently executing function B(), while the execution of function A() prevents concurrent executions of both A() and B() by other threads:
#include <shared_mutex>
std::shared_timed_mutex mutex;
void A() {
std::unique_lock<std::shared_timed_mutex> lock(mutex);
//do something (should't be done while function B is executing)
}
T B() {
std::shared_lock<std::shared_timed_mutex> lock(mutex);
//do something (should't be done while function A is executing)
return something;
}
Notice, that some synchronization overhead will always be present even for concurrent executions of B(), and whether this will eventually give you better performance than using plain mutexes is highly dependent on what is going on inside and outside those functions - always measure before committing to a more complicated solution.
Boost.Thread also provides an implementation of shared_mutex.

You have an option in C++14.
Use std::shared_timed_mutex.
A would use lock, B would use lock_shared

This is quite possibly full of bugs, but since you have no C++14 you could create a lock-counting wrapper around std::mutex and use that:
// Lock-counting class
class SharedLock
{
public:
SharedLock(std::mutex& m) : count(0), shared(m) {}
friend class Lock;
// RAII lock
class Lock
{
public:
Lock(SharedLock& l) : lock(l) { lock.lock(); }
~Lock() { lock.unlock(); }
private:
SharedLock& lock;
};
private:
void lock()
{
std::lock_guard<std::mutex> guard(internal);
if (count == 0)
{
shared.lock();
}
++count;
}
void unlock()
{
std::lock_guard<std::mutex> guard(internal);
--count;
if (count == 0)
{
shared.unlock();
}
}
int count;
std::mutex& shared;
std::mutex internal;
};
std::mutex shared_mutex;
void A()
{
std::lock_guard<std::mutex> lock(shared_mutex);
// ...
}
void B()
{
static SharedLock shared_lock(shared_mutex);
SharedLock::Lock mylock(shared_lock);
// ...
}
... unless you want to dive into Boost, of course.

Is possible to get a thread-locking mechanism in C++ with a std::atomic_flag?

Using MS Visual C++2012
A class has a member of type std::atomic_flag
class A {
public:
...
std::atomic_flag lockFlag;
A () { std::atomic_flag_clear (&lockFlag); }
};
There is an object of type A
A object;
who can be accessed by two (Boost) threads
void thr1(A* objPtr) { ... }
void thr2(A* objPtr) { ... }
The idea is wait the thread if the object is being accessed by the other thread.
The question is: do it is possible construct such mechanism with an atomic_flag object? Not to say that for the moment, I want some lightweight that a boost::mutex.
By the way the process involved in one of the threads is very long query to a dBase who get many rows, and I only need suspend it in a certain zone of code where the collision occurs (when processing each row) and I can't wait the entire thread to finish join().
I've tryed in each thread some as:
thr1 (A* objPtr) {
...
while (std::atomic_flag_test_and_set_explicit (&objPtr->lockFlag, std::memory_order_acquire)) {
boost::this_thread::sleep(boost::posix_time::millisec(100));
}
... /* Zone to portect */
std::atomic_flag_clear_explicit (&objPtr->lockFlag, std::memory_order_release);
... /* the process continues */
}
But with no success, because the second thread hangs. In fact, I don't completely understand the mechanism involved in the atomic_flag_test_and_set_explicit function. Neither if such function returns inmediately or can delay until the flag can be locked.
Also it is a mistery to me how to get a lock mechanism with such a function who always set the value, and return the previous value. with no option to only read the actual setting.
Any suggestion are welcome.

By the way the process involved in one of the threads is very long query to a dBase who get many rows, and I only need suspend it in a certain zone of code where the collision occurs (when processing each row) and I can't wait the entire thread to finish join().
Such a zone is known as the critical section. The simplest way to work with a critical section is to lock by mutual exclusion.
The mutex solution suggested is indeed the way to go, unless you can prove that this is a hotspot and the lock contention is a performance problem. Lock-free programming using just atomic and intrinsics is enormously complex and cannot be recommended at this level.
Here's a simple example showing how you could do this (live on http://liveworkspace.org/code/6af945eda5132a5221db823fa6bde49a):
#include <iostream>
#include <thread>
#include <mutex>
struct A
{
std::mutex mux;
int x;
A() : x(0) {}
};
void threadf(A* data)
{
for(int i=0; i<10; ++i)
{
std::lock_guard<std::mutex> lock(data->mux);
data->x++;
}
}
int main(int argc, const char *argv[])
{
A instance;
auto t1 = std::thread(threadf, &instance);
auto t2 = std::thread(threadf, &instance);
t1.join();
t2.join();
std::cout << instance.x << std::endl;
return 0;
}

It looks like you're trying to write a spinlock. Yes, you can do that with std::atomic_flag, but you are better off using std::mutex instead. Don't use atomics unless you really know what you're doing.

To actually answer the question asked: Yes, you can use std::atomic_flag to create a thread locking object called a spinlock.
#include <atomic>
class atomic_lock
{
public:
atomic_lock()
: lock_( ATOMIC_FLAG_INIT )
{}
void lock()
{
while ( lock_.test_and_set() ) { } // Spin until the lock is acquired.
}
void unlock()
{
lock_.clear();
}
private:
std::atomic_flag lock_;
};

Boost mutex order

So having simple class
class mySafeData
{
public:
mySafeData() : myData(0)
{
}
void Set(int i)
{
boost::mutex::scoped_lock lock(myMutex);
myData = i; // set the data
++stateCounter; // some int to track state chages
myCondvar.notify_all(); // notify all readers
}
void Get( int& i)
{
boost::mutex::scoped_lock lock(myMutex);
// copy the current state
int cState = stateCounter;
// waits for a notification and change of state
while (stateCounter == cState)
myCondvar.wait( lock );
}
private:
int myData;
int stateCounter;
boost::mutex myMutex;
};
and array of threads in infinite loops calling each one function
Get()
Set()
Get()
Get()
Get()
will they always call functions in the same order and only once per circle (by circle I mean will all boost threads run in same order each time so that each thread would Get() only once after one Set())?

No. You can never make any assumptions of which order the threads will be served. This is nothing related to boost, it is the basics of multiprogramming.

The threads should acquire the lock in the same order that they reach the scoped_lock constructor (I think). But there's no guarantee that they will reach that point in any fixed order!
So in general: don't rely on it.

No, the mutex only prevents two threads from accessing the variable at the same time. It does not affect the thread scheduling order or execution time, which can for all intents and purposes be assumed to be random.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to make threads with std::execution::par_unseq thread-safe? - c++

Related

Deadlock simulation using std::mutex

mutex or atomic in const member function

c++: Function that locks mutex for other function but can itself be executed in parallel

Is possible to get a thread-locking mechanism in C++ with a std::atomic_flag?

Boost mutex order

Categories

Resources