Consider the following simplified program modelling a real scenario where different users can make concurrent requests to the same resource:
#include <thread>
#include <memory>
#include <mutex>
#include <iostream>
using namespace std;
struct T {
void op() { /* some stuff */ }
~T() noexcept { /* some stuff */ }
};
std::shared_ptr<T> t;
std::mutex mtx;
std::weak_ptr<T> w{t};
enum action { destroy, op};
void request(action a) {
if (a == action::destroy) {
lock_guard<mutex> lk{mtx};
t.reset();
std::cout << "*t certainly destroyed\n";
} else if (a == action::op) {
lock_guard<mutex> lk{mtx};
if (auto l = w.lock()) {
l->op();
}
}
}
int main() {
// At some point in time and different points in the program,
// two different users make two different concurrent requests
std::thread th1{request, destroy}; std::thread th2{request, op};
// ....
th2.join();
th1.join();
}
I am not asking if the program is formally correct - I think it is, but I have never seen this approach for guaranteeing a synchronous destruction of a resource shared via smart pointers. I personally think it is fine and has a valid use.
However, I am wondering if others think the same and, in case, if there are more elegant alternatives apart from the classic synchronization with unique_locks and condition variables and from introducing modifications (e.g. atomic flags) to T.
It would be ideal if I could even get rid of the mtx somehow.
Yes, it's fine. The reference counting in the shared_ptr is atomic and the locked copy stays in scope for the duration of the op, so the object can't be destroyed during the op.
In this case the mutex is not actually protecting the lifetime of T, but sequencing calls to op() and destruction. If you don't mind multiple concurrent calls to op(), or the destruction time being indeterminate (i.e. after the last running op() has completed) then you can do away with it, since std::shared_ptr<>::reset() and std::weak_ptr<>::lock() are both thread-safe.
However, I would advise caution as the author clearly meant for calls to op() to be serialised.
Related
https://en.cppreference.com/w/cpp/memory/shared_ptr/use_count states:
In multithreaded environment, the value returned by use_count is approximate (typical implementations use a memory_order_relaxed load)
But does this mean that use_count() is totally useless in a multi-threaded environment?
Consider the following example, where the Circular class implements a circular buffer of std::shared_ptr<int>.
One method is supplied to users - get(), which checks whether the reference count of the next element in the std::array<std::shared_ptr<int>> is greater than 1 (which we don't want, since it means that it's being held by a user which previously called get()).
If it's <= 1, a copy of the std::shared_ptr<int> is returned to the user.
In this case, the users are two threads which do nothing at all except love to call get() on the circular buffer - that's their purpose in life.
What happens in practice when I execute the program is that it runs for a few cycles (tested by adding a counter to the circular buffer class), after which it throws the exception, complaining that the reference counter for the next element is > 1.
Is this a result of the statement that the value returned by use_count() is approximate in a multi-threaded environment?
Is it possible to adjust the underlying mechanism to make it, uh, deterministic and behave as I would have liked it to behave?
If my thinking is correct - use_count() (or rather the real number of users) of the next element should never EVER increase above 1 when inside the get() function of Circular, since there are only two consumers, and every time a thread calls get(), it's already released its old (copied) std::shared_ptr<int> (which in turn means that the remaining std::shared_ptr<int> residing in Circular::ints_ should have a reference count of only 1).
#include <mutex>
#include <array>
#include <memory>
#include <exception>
#include <thread>
class Circular {
public:
Circular() {
for (auto& i : ints_) { i = std::make_shared<int>(0); }
}
std::shared_ptr<int> get() {
std::lock_guard<std::mutex> lock_guard(guard_);
index_ = index_ % 2; // Re-set the index pointer.
if (ints_.at(index_).use_count() > 1) {
// This shouldn't happen - right? (but it does)
std::string excp = std::string("OOPSIE: ") + std::to_string(index_) + " " + std::to_string(ints_.at(index_).use_count());
throw std::logic_error(excp);
}
return ints_.at(index_++);
}
private:
std::mutex guard_;
unsigned int index_{0};
std::array<std::shared_ptr<int>, 2> ints_;
};
Circular circ;
void func() {
do {
auto scoped_shared_int_pointer{circ.get()};
}while(1);
}
int main() {
std::thread t1(func), t2(func);
t1.join(); t2.join();
}
While use_count is fraught with problems, the core issue right now is outside of that logic.
Assume thread t1 takes the shared_ptr at index 0, and then t2 runs its loop twice before t1 finishes its first loop iteration. t2 will obtain the shared_ptr at index 1, release it, and then attempt to acquire the shared_ptr at index 0, and will hit your failure condition, since t1 is just running behind.
Now, that said, in a broader context, it's not particularly safe, as if a user creates a weak_ptr, it's entirely possible for the use_count to go from 1 to 2 without passing through this function. In this simple example, it would work to have it loop through the index array until it finds the free shared pointer.
use_count is for debugging only and shouldn't be used. If you want to know when nobody else has a reference to a pointer any more just let the shared pointer die and use a custom deleter to detect that and do whatever you need to do with the now unused pointer.
This is an example of how you might implement this in your code:
#include <mutex>
#include <array>
#include <memory>
#include <exception>
#include <thread>
#include <vector>
#include <iostream>
class Circular {
public:
Circular() {
size_t index = 0;
for (auto& i : ints_)
{
i = 0;
unused_.push_back(index++);
}
}
std::shared_ptr<int> get() {
std::lock_guard<std::mutex> lock_guard(guard_);
if (unused_.empty())
{
throw std::logic_error("OOPSIE: none left");
}
size_t index = unused_.back();
unused_.pop_back();
return std::shared_ptr<int>(&ints_[index], [this, index](int*) {
std::lock_guard<std::mutex> lock_guard(guard_);
unused_.push_back(index);
});
}
private:
std::mutex guard_;
std::vector<size_t> unused_;
std::array<int, 2> ints_;
};
Circular circ;
void func() {
do {
auto scoped_shared_int_pointer{ circ.get() };
} while (1);
}
int main() {
std::thread t1(func), t2(func);
t1.join(); t2.join();
}
A list of unused indexes is kept, when the shared pointer is destroyed the custom deleter returns the index back to the list of unused indexes ready to be used in the next call to get.
There's a new experimental feature (probably C++20), which is the "synchronized block". The block provides a global lock on a section of code. The following is an example from cppreference.
#include <iostream>
#include <vector>
#include <thread>
int f()
{
static int i = 0;
synchronized {
std::cout << i << " -> ";
++i;
std::cout << i << '\n';
return i;
}
}
int main()
{
std::vector<std::thread> v(10);
for(auto& t: v)
t = std::thread([]{ for(int n = 0; n < 10; ++n) f(); });
for(auto& t: v)
t.join();
}
I feel it's superfluous. Is there any difference between the a synchronized block from above, and this one:
std::mutex m;
int f()
{
static int i = 0;
std::lock_guard<std::mutex> lg(m);
std::cout << i << " -> ";
++i;
std::cout << i << '\n';
return i;
}
The only advantage I find here is that I'm saved the trouble of having a global lock. Is there more advantages of using a synchronized block? When should it be preferred?
On the face of it, the synchronized keyword is similar to std::mutex functionally, but by introducing a new keyword and associated semantics (such the block enclosing the synchronized region) it makes it much easier to optimize these regions for transactional memory.
In particular, std::mutex and friends are in principle more or less opaque to the compiler, while synchronized has explicit semantics. The compiler can't be sure what the standard library std::mutex does and would have a hard time transforming it to use TM. A C++ compiler would be expected to work correctly when the standard library implementation of std::mutex is changed, and so can't make many assumptions about the behavior.
In addition, without an explicit scope provided by the block that is required for synchronized, it is hard for the compiler to reason about the extent of the block - it seems easy in simple cases such as a single scoped lock_guard, but there are plenty of complex cases such as if the lock escapes the function at which point the compiler never really knows where it could be unlocked.
Locks do not compose well in general. Consider:
//
// includes and using, omitted to simplify the example
//
void move_money_from(Cash amount, BankAccount &a, BankAccount &b) {
//
// suppose a mutex m within BankAccount, exposed as public
// for the sake of simplicity
//
lock_guard<mutex> lckA { a.m };
lock_guard<mutex> lckB { b.m };
// oversimplified transaction, obviously
if (a.withdraw(amount))
b.deposit(amount);
}
int main() {
BankAccount acc0{/* ... */};
BankAccount acc1{/* ... */};
thread th0 { [&] {
// ...
move_money_from(Cash{ 10'000 }, acc0, acc1);
// ...
} };
thread th1 { [&] {
// ...
move_money_from(Cash{ 5'000 }, acc1, acc0);
// ...
} };
// ...
th0.join();
th1.join();
}
In this case, the fact that th0, by moving money from acc0 to acc1, is
trying to take acc0.m first, acc1.m second, whereas th1, by moving money from acc1 to acc0, is trying to take acc1.m first, acc0.m second could make them deadlock.
This example is oversimplified, and could be solved by using std::lock()
or a C++17 variadic lock_guard-equivalent, but think of the general case
where one is using third party software, not knowing where locks are being
taken or freed. In real-life situations, synchronization through locks gets
tricky really fast.
The transactional memory features aim to offer synchronization that composes
better than locks; it's an optimization feature of sorts, depending on context, but it's also a safety feature. Rewriting move_money_from() as follows:
void move_money_from(Cash amount, BankAccount &a, BankAccount &b) {
synchronized {
// oversimplified transaction, obviously
if (a.withdraw(amount))
b.deposit(amount);
}
}
... one gets the benefits of the transaction being done as a whole or not at
all, without burdening BankAccount with a mutex and without risking deadlocks due to conflicting requests from user code.
std::thread is not simply inheritable by classes, cannot auto join when destruction, etc.
Lots of pitfalls like need to use std::atomic_bool for stopping, cannot simply share this of object when use std::thread as member variables to execute a member method.
Is there any good practice to implement QThread like classes using std::thread?
I'm aiming at an inheritable Thread class, enables start(), detach(), stop() functionality.
For example I wrote one like the following:
#include <atomic>
#include <chrono>
#include <cstdio>
#include <cstdlib>
#include <thread>
#include <vector>
struct Thread {
Thread(int id) {
std::atomic_init(&(this->id), id);
std::atomic_init(&(this->m_stop), false);
}
Thread(Thread &&rhs) :
id(),
m_stop(),
m_thread(std::move(rhs.m_thread))
{
std::atomic_init(&(this->id), rhs.id.load());
rhs.id.store(-1);
std::atomic_init(&(this->m_stop), rhs.m_stop.load());
}
virtual ~Thread() {
this->stop();
}
void start() {
this->m_thread = std::move(std::thread(&Thread::work, this));
}
void stop() {
this->m_stop.store(true);
if (this->m_thread.joinable()) {
this->m_thread.join();
}
}
virtual void work() {
while (!(this->m_stop)) {
std::chrono::milliseconds ts(5000);
std::this_thread::sleep_for(ts);
}
}
std::atomic_int id;
std::atomic_bool m_stop;
std::thread m_thread;
};
int main() {
srand(42);
while (true) {
std::vector<Thread> v;
for (int i = 0; i < 10; ++i) {
auto t = Thread(i);
v.push_back(std::move(t));
printf("Start %d\n", i);
v[i].start();
}
printf("Start fin!\n");
int time_sleep = rand() % 2000 + 1000;
std::chrono::milliseconds ts(time_sleep);
std::this_thread::sleep_for(ts);
for (int i = 0; i < 10; ++i) {
printf("Stop %d\n", i);
v[i].stop();
printf("Pop %d\n", i);
v.pop_back();
}
printf("Stop fin!\n");
}
return 0;
}
But I'm having a hard time getting it right, just after Stop 0 it deadlocked, or sometimes it core dumps.
std::thread is intended as the lowest building block for implementing a threading primitive. As such, it does not offer as rich an interface as QThread, but together with the synchronization primitives from the standard library, it allows you to implement more complex behavior like what's offered by QThread very easily.
You correctly noted that inheriting from std::thread is a bad idea (the absence of a virtual destructor is a dead giveaway) and I would argue that having a polymorphic Thread type is not the smartest design in the first place, but you can easily encapsulate a std::thread as a member of any class (polymorphic or not) if you want to.
Joining on destruction is really just a policy. A class encapsulating a std::thread as a member can simply call join in its destructor, effectively implementing self-joining on destruction. Concurrency is of no concern here, as object destruction is always executed non-concurrently, by definition. If you want to share ownership of a thread between multiple (possible concurrently invoked) execution paths, std::shared_ptr will handle that for you. But even here, the destructor is always executed non-concurrently by the last remaining thread that gives up its shared_ptr.
Stuff like QThread::isInterruptionRequested can be implemented with a single flag, access to which of course must be synchronized by the class (either with a mutex or by using an atomic flag).
Changing a thread's priority is not specified by the standard, as not all platforms envisioned by the standard would allow this, but you can use native_handle to implement this yourself using platform-specific code.
And so on. All the pieces are there, you just have to assemble them according to your needs.
Using MS Visual C++2012
A class has a member of type std::atomic_flag
class A {
public:
...
std::atomic_flag lockFlag;
A () { std::atomic_flag_clear (&lockFlag); }
};
There is an object of type A
A object;
who can be accessed by two (Boost) threads
void thr1(A* objPtr) { ... }
void thr2(A* objPtr) { ... }
The idea is wait the thread if the object is being accessed by the other thread.
The question is: do it is possible construct such mechanism with an atomic_flag object? Not to say that for the moment, I want some lightweight that a boost::mutex.
By the way the process involved in one of the threads is very long query to a dBase who get many rows, and I only need suspend it in a certain zone of code where the collision occurs (when processing each row) and I can't wait the entire thread to finish join().
I've tryed in each thread some as:
thr1 (A* objPtr) {
...
while (std::atomic_flag_test_and_set_explicit (&objPtr->lockFlag, std::memory_order_acquire)) {
boost::this_thread::sleep(boost::posix_time::millisec(100));
}
... /* Zone to portect */
std::atomic_flag_clear_explicit (&objPtr->lockFlag, std::memory_order_release);
... /* the process continues */
}
But with no success, because the second thread hangs. In fact, I don't completely understand the mechanism involved in the atomic_flag_test_and_set_explicit function. Neither if such function returns inmediately or can delay until the flag can be locked.
Also it is a mistery to me how to get a lock mechanism with such a function who always set the value, and return the previous value. with no option to only read the actual setting.
Any suggestion are welcome.
By the way the process involved in one of the threads is very long query to a dBase who get many rows, and I only need suspend it in a certain zone of code where the collision occurs (when processing each row) and I can't wait the entire thread to finish join().
Such a zone is known as the critical section. The simplest way to work with a critical section is to lock by mutual exclusion.
The mutex solution suggested is indeed the way to go, unless you can prove that this is a hotspot and the lock contention is a performance problem. Lock-free programming using just atomic and intrinsics is enormously complex and cannot be recommended at this level.
Here's a simple example showing how you could do this (live on http://liveworkspace.org/code/6af945eda5132a5221db823fa6bde49a):
#include <iostream>
#include <thread>
#include <mutex>
struct A
{
std::mutex mux;
int x;
A() : x(0) {}
};
void threadf(A* data)
{
for(int i=0; i<10; ++i)
{
std::lock_guard<std::mutex> lock(data->mux);
data->x++;
}
}
int main(int argc, const char *argv[])
{
A instance;
auto t1 = std::thread(threadf, &instance);
auto t2 = std::thread(threadf, &instance);
t1.join();
t2.join();
std::cout << instance.x << std::endl;
return 0;
}
It looks like you're trying to write a spinlock. Yes, you can do that with std::atomic_flag, but you are better off using std::mutex instead. Don't use atomics unless you really know what you're doing.
To actually answer the question asked: Yes, you can use std::atomic_flag to create a thread locking object called a spinlock.
#include <atomic>
class atomic_lock
{
public:
atomic_lock()
: lock_( ATOMIC_FLAG_INIT )
{}
void lock()
{
while ( lock_.test_and_set() ) { } // Spin until the lock is acquired.
}
void unlock()
{
lock_.clear();
}
private:
std::atomic_flag lock_;
};
I have a set of C++ functions:
funcB(){};
funcC(){};
funcA()
{
funcB();
funcC();
}
Now I want to make funcA atomic, ie funcB and funcC calls inside funcA should be executed atomically. Is there any way to achieve this?
One way you can accomplish this is to use the new (C++11) features std::mutex and std::lock_guard.
For each protected resource, you instantiate a single global std::mutex; each thread then locks that mutex, as it requires, by the creation of a std::lock_guard:
#include <thread>
#include <iostream>
#include <mutex>
#include <vector>
// A single mutex, shared by all threads. It is initialized
// into the "unlocked" state
std::mutex m;
void funcB() {
std::cout << "Hello ";
}
void funcC() {
std::cout << "World." << std::endl;
}
void funcA(int i) {
// The creation of lock_guard locks the mutex
// for the lifetime of the lock_guard
std::lock_guard<std::mutex> l(m);
// Now only a single thread can run this code
std::cout << i << ": ";
funcB();
funcC();
// As we exit this scope, the lock_guard is destroyed,
// the mutex is unlocked, and another thread is allowed to run
}
int main () {
std::vector<std::thread> vt;
// Create and launch a bunch of threads
for(int i =0; i < 10; i++)
vt.push_back(std::thread(funcA, i));
// Wait for all of them to complete
for(auto& t : vt)
t.join();
}
Notes:
In your example some code unrelated to funcA could invoke either funcB or funcC without honoring the lock that funcA set.
Depending upon how your program is structured, you may want to manage the lifetime of the mutex differently. As an example, it might want to be a class member of the class that includes funcA.
In general, NO. Atomic operations are very precisely defined. What you want is a semaphore or a mutex.
If you are using GCC 4.7 than you can use the new Transactional Memory feature to do the following:
Transactional memory is intended to make programming with threads simpler, in particular synchronizing access to data shared between several threads using transactions. As with databases, a transaction is a unit of work that either completes in its entirety or has no effect at all (i.e., transactions execute atomically). Further, transactions are isolated from each other such that each transaction sees a consistent view of memory.
Currently, transactions are only supported in C++ and C in the form of transaction statements, transaction expressions, and function transactions. In the following example, both a and b will be read and the difference will be written to c, all atomically and isolated from other transactions:
__transaction_atomic { c = a - b; }
Therefore, another thread can use the following code to concurrently update b without ever causing c to hold a negative value (and without having to use other synchronization constructs such as locks or C++11 atomics):
__transaction_atomic { if (a > b) b++; }
The precise semantics of transactions are defined in terms of the C++11/C1X memory model (see below for a link to the specification). Roughly, transactions provide synchronization guarantees that are similar to what would be guaranteed when using a single global lock as a guard for all transactions. Note that like other synchronization constructs in C/C++, transactions rely on a data-race-free program (e.g., a nontransactional write that is concurrent with a transactional read to the same memory location is a data race).
More info: http://gcc.gnu.org/wiki/TransactionalMemory