C++ thread safe class, not working as expected

C++ thread safe class, not working as expected - c++

I am trying to implement a thread-safe class. I put lock_guard for setter and getter for each member variable.
#include <iostream>
#include <omp.h>
#include <mutex>
#include <vector>
#include <map>
#include <string>
class B {
std::mutex m_mutex;
std::vector<int> m_vec;
std::map<int,std::string> m_map;
public:
void insertInVec(int x) {
std::lock_guard<std::mutex> lock(m_mutex);
m_vec.push_back(x);
}
void insertInMap(int x, const std::string& s) {
std::lock_guard<std::mutex> lock(m_mutex);
m_map[x] = s;
}
const std::string& getValAtKey(int k) {
std::lock_guard<std::mutex> lock(m_mutex);
return m_map[k];
}
int getValAtIdx(int i) {
std::lock_guard<std::mutex> lock(m_mutex);
return m_vec[i];
}
};
int main() {
B b;
#pragma omp parallel for num_threads(4)
for (int i = 0; i < 100000; i++) {
b.insertInVec(i);
b.insertInMap(i, std::to_string(i));
}
std::cout << b.getValAtKey(20) << std::endl;
std::cout << b.getValAtIdx(20) << std::endl;
return 0;
}
when I run this code, the output from map is correct but output from vector is garbage. I get o/p as
20
50008
Ofcourse the second o/p changes at each run.
1. what is wrong with this code? (I also have to consider the scenario, where there can be multiple instances of class B, running at multiple threads)
2. For each member variable do I need separate mutex variable? like
std::mutex vec_mutex;
std::mutex map_mutex;

I don't understand why you think that the output is garbage. Your loop is executed in 4 threads, so a possible sharing of the tasks could be:
Thread 1: 0 <= i < 25000
Thread 2: 25000 <= i < 50000
Thread 3: 50000 <= i < 75000
Thread 4: 75000 <= i < 100000
Each thread is doing a push_back of i to the vector. If now thread 1 starts and writes "0, 1, 2, 3, 4, 5, 6, 7, 8, 9" and then thread 1 writes "25000, 25001" and then thread 3 writes "50000, 50001, 50002, 50003, 50004, 50005, 50006, 50007, 50008". So you will end up with value 50008 at index 20. Of course other thread interleaving is also possible and you might also see values like for example 25003 or 75004.

The output you see is fine, only your expectations are off.
You add elements into the vector via:
void insertInVec(int x) {
std::lock_guard<std::mutex> lock(m_mutex);
m_vec.push_back(x);
}
and then retrive them via:
int getValAtIdx(int i) {
std::lock_guard<std::mutex> lock(m_mutex);
return m_vec[i];
}
Because the loop is executed in parallel, there is no guarantee that the values are inserted in the order you expect. Whichever thread first grabs the mutex will insert values first. If you wanted to insert the values in some specified order you would need to resize the vector upfront and then use something along the line of:
void setInVecAtIndex(int x,size_t index) {
std::lock_guard<std::mutex> lock(m_mutex); // maybe not needed, see PS
m_vec[index] = x;
}
So this isn't the problem with your code. However, I see two problems:
getValAtKey returns a reference to the value in the map. It is a const reference, but that does not prevent somebody else to modify the value via a call to insertInMap. Returning a reference here defeats the purpose of using the lock. Using that reference is not thread safe! To make it thread safe you need to return a copy.
You forgot to protect the compiler generated methods. For an overview, see What are all the member-functions created by compiler for a class? Does that happen all the time?. The compiler generated methods will not use your getters and setter, hence are not thread-safe by default. You should either define them yourself or delete them (see also rule of 3/5).
PS: Accessing different elements in a vector from different threads needs no synchronization. As long as you do not resize the vector and only access different elements you do not need the mutex if you can ensure that no two threads access the same index.

Related

Woes with std::shared_ptr<T>.use_counter()

https://en.cppreference.com/w/cpp/memory/shared_ptr/use_count states:
In multithreaded environment, the value returned by use_count is approximate (typical implementations use a memory_order_relaxed load)
But does this mean that use_count() is totally useless in a multi-threaded environment?
Consider the following example, where the Circular class implements a circular buffer of std::shared_ptr<int>.
One method is supplied to users - get(), which checks whether the reference count of the next element in the std::array<std::shared_ptr<int>> is greater than 1 (which we don't want, since it means that it's being held by a user which previously called get()).
If it's <= 1, a copy of the std::shared_ptr<int> is returned to the user.
In this case, the users are two threads which do nothing at all except love to call get() on the circular buffer - that's their purpose in life.
What happens in practice when I execute the program is that it runs for a few cycles (tested by adding a counter to the circular buffer class), after which it throws the exception, complaining that the reference counter for the next element is > 1.
Is this a result of the statement that the value returned by use_count() is approximate in a multi-threaded environment?
Is it possible to adjust the underlying mechanism to make it, uh, deterministic and behave as I would have liked it to behave?
If my thinking is correct - use_count() (or rather the real number of users) of the next element should never EVER increase above 1 when inside the get() function of Circular, since there are only two consumers, and every time a thread calls get(), it's already released its old (copied) std::shared_ptr<int> (which in turn means that the remaining std::shared_ptr<int> residing in Circular::ints_ should have a reference count of only 1).
#include <mutex>
#include <array>
#include <memory>
#include <exception>
#include <thread>
class Circular {
public:
Circular() {
for (auto& i : ints_) { i = std::make_shared<int>(0); }
}
std::shared_ptr<int> get() {
std::lock_guard<std::mutex> lock_guard(guard_);
index_ = index_ % 2; // Re-set the index pointer.
if (ints_.at(index_).use_count() > 1) {
// This shouldn't happen - right? (but it does)
std::string excp = std::string("OOPSIE: ") + std::to_string(index_) + " " + std::to_string(ints_.at(index_).use_count());
throw std::logic_error(excp);
}
return ints_.at(index_++);
}
private:
std::mutex guard_;
unsigned int index_{0};
std::array<std::shared_ptr<int>, 2> ints_;
};
Circular circ;
void func() {
do {
auto scoped_shared_int_pointer{circ.get()};
}while(1);
}
int main() {
std::thread t1(func), t2(func);
t1.join(); t2.join();
}

While use_count is fraught with problems, the core issue right now is outside of that logic.
Assume thread t1 takes the shared_ptr at index 0, and then t2 runs its loop twice before t1 finishes its first loop iteration. t2 will obtain the shared_ptr at index 1, release it, and then attempt to acquire the shared_ptr at index 0, and will hit your failure condition, since t1 is just running behind.
Now, that said, in a broader context, it's not particularly safe, as if a user creates a weak_ptr, it's entirely possible for the use_count to go from 1 to 2 without passing through this function. In this simple example, it would work to have it loop through the index array until it finds the free shared pointer.

use_count is for debugging only and shouldn't be used. If you want to know when nobody else has a reference to a pointer any more just let the shared pointer die and use a custom deleter to detect that and do whatever you need to do with the now unused pointer.
This is an example of how you might implement this in your code:
#include <mutex>
#include <array>
#include <memory>
#include <exception>
#include <thread>
#include <vector>
#include <iostream>
class Circular {
public:
Circular() {
size_t index = 0;
for (auto& i : ints_)
{
i = 0;
unused_.push_back(index++);
}
}
std::shared_ptr<int> get() {
std::lock_guard<std::mutex> lock_guard(guard_);
if (unused_.empty())
{
throw std::logic_error("OOPSIE: none left");
}
size_t index = unused_.back();
unused_.pop_back();
return std::shared_ptr<int>(&ints_[index], [this, index](int*) {
std::lock_guard<std::mutex> lock_guard(guard_);
unused_.push_back(index);
});
}
private:
std::mutex guard_;
std::vector<size_t> unused_;
std::array<int, 2> ints_;
};
Circular circ;
void func() {
do {
auto scoped_shared_int_pointer{ circ.get() };
} while (1);
}
int main() {
std::thread t1(func), t2(func);
t1.join(); t2.join();
}
A list of unused indexes is kept, when the shared pointer is destroyed the custom deleter returns the index back to the list of unused indexes ready to be used in the next call to get.

Two questions on std::condition_variables

I have been trying to figure out std::condition_variables and I am particularly confused by wait() and whether to use notify_all or notify_one.
First, I've written some code and attached it below. Here's a short explanation: Collection is a class that holds onto a bunch of Counter objects. These Counter objects have a Counter::increment() method, which needs to be called on all the objects, over and over again. To speed everything up, Collection also maintains a thread pool to distribute the work over, and sends out all the work with its Collection::increment_all() method.
These threads don't need to communicate with each other, and there are usually many more Counter objects than there are threads. It's fine if one thread processes more than Counters than others, just as long as all the work gets done. Adding work to the queue is easy and only needs to be done in the "main" thread. As far as I can see, the only bad thing that can happen is if other methods (e.g. Collection::printCounts) are allowed to be called on the counters in the middle of the work being done.
#include <iostream>
#include <thread>
#include <vector>
#include <mutex>
#include <condition_variable>
#include <queue>
class Counter{
private:
int m_count;
public:
Counter() : m_count(0) {}
void increment() {
m_count ++;
}
int getCount() const { return m_count; }
};
class Collection{
public:
Collection(unsigned num_threads, unsigned num_counters)
: m_shutdown(false)
{
// start workers
for(size_t i = 0; i < num_threads; ++i){
m_threads.push_back(std::thread(&Collection::work, this));
}
// intsntiate counters
for(size_t j = 0; j < num_counters; ++j){
m_counters.emplace_back();
}
}
~Collection()
{
m_shutdown = true;
for(auto& t : m_threads){
if(t.joinable()){
t.join();
}
}
}
void printCounts() {
// wait for work to be done
std::unique_lock<std::mutex> lk(m_mtx);
m_work_complete.wait(lk); // q2: do I need a while lop?
// print all current counters
for(const auto& cntr : m_counters){
std::cout << cntr.getCount() << ", ";
}
std::cout << "\n";
}
void increment_all()
{
std::unique_lock<std::mutex> lock(m_mtx);
m_work_complete.wait(lock);
for(size_t i = 0; i < m_counters.size(); ++i){
m_which_counters_have_work.push(i);
}
}
private:
void work()
{
while(!m_shutdown){
bool action = false;
unsigned which_counter;
{
std::unique_lock<std::mutex> lock(m_mtx);
if(m_which_counters_have_work.size()){
which_counter = m_which_counters_have_work.front();
m_which_counters_have_work.pop();
action = true;
}else{
m_work_complete.notify_one(); // q1: notify_all
}
}
if(action){
m_counters[which_counter].increment();
}
}
}
std::vector<Counter> m_counters;
std::vector<std::thread> m_threads;
std::condition_variable m_work_complete;
std::mutex m_mtx;
std::queue<unsigned> m_which_counters_have_work;
bool m_shutdown;
};
int main() {
int num_threads = std::thread::hardware_concurrency()-1;
int num_counters = 10;
Collection myCollection(num_threads, num_counters);
myCollection.printCounts();
myCollection.increment_all();
myCollection.printCounts();
myCollection.increment_all();
myCollection.printCounts();
return 0;
}
I compile this on Ubuntu 18.04 with g++ -std=c++17 -pthread thread_pool.cpp -o tp && ./tp I think the code accomplishes all of those objectives, but a few questions remain:
I am using m_work_complete.wait(lk) to make sure the work is finished before I start printing all the new counts. Why do I sometimes see this written inside a while loop, or with a second argument as a lambda predicate function? These docs mention spurious wake ups. If a spurious wake up occurs, does that mean printCounts could prematurely print? If so, I don't want that. I just want to ensure the work queue is empty before I start using the numbers that should be there.
I am using m_work_complete.notify_all instead of m_work_complete.notify_one. I've read this thread, and I don't think it matters--only the main thread is going to be blocked by this. Is it faster to use notify_one just so the other threads don't have to worry about it?

std::condition_variable is not really a condition variable, it's more of a synchronization tool for reaching a certain condition. What that condition is is up to the programmer, and it should still be checked after each condition_variable wake-up, since it can wake-up spuriously, or "too early", when the desired condition isn't yet reached.
On POSIX systems, condition_variable::wait() delegates to pthread_cond_wait, which is susceptible to spurious wake-up (see "Condition Wait Semantics" in the Rationale section). On Linux, pthread_cond_wait is in turn implemented via a futex, which is again susceptible to spurious wake-up.
So yes you still need a flag (protected by the same mutex) or some other way to check that the work is actually complete. A convenient way to do this is by wrapping the check in a predicate and passing it to the wait() function, which would loop for you until the predicate is satisfied.
notify_all unblocks all threads waiting on the condition variable; notify_one unblocks just one (or at least one, to be precise). If there are more than one waiting threads, and they are equivalent, i.e. either one can handle the condition fully, and if the condition is sufficient to let just one thread continue (as in submitting a work unit to a thread pool), then notify_one would be more efficient since it won't unblock other threads unnecessarily for them to only notice no work to be done and going back to waiting. If you ever only have one waiter, then there would be no difference between notify_one and notify_all.

It's pretty simple: Use notify() when;
There is no reason why more than one thread needs to know about the event. (E.g., use notify() to announce the availability of an item that a worker thread will "consume," and thereby make the item unavailable to other workers)*AND*
There is no wrong thread that could be awakened. (E.g., you're probably safe if all of the threads are wait()ing in the same line of the same exact function.)
Use notify_all() in all other cases.

Elegant assert that function is not called from several threads

I have a function that must not be called from more than one thread at the same time. Can you suggest some elegant assert for this?

You can use a thin RAII wrapper around std::atomic<>:
namespace {
std::atomic<int> access_counter;
struct access_checker {
access_checker() { check = ++access_counter; }
access_checker( const access_checker & ) = delete;
~access_checker() { --access_counter; }
int check;
};
}
void foobar()
{
access_checker checker;
// assert than checker.check == 1 and react accordingly
...
}
it is simplified version for single use to show the idea and can be improved to use for multiple functions if necessary

Sounds like you need a mutex. Assuming you are using std::thread you can look at the coding example in the following link for specifically using std::mutex: http://www.cplusplus.com/reference/mutex/mutex/
// mutex example
#include <iostream> // std::cout
#include <thread> // std::thread
#include <mutex> // std::mutex
std::mutex mtx; // mutex for critical section
void print_block (int n, char c) {
// critical section (exclusive access to std::cout signaled by locking mtx):
mtx.lock();
for (int i=0; i<n; ++i) { std::cout << c; }
std::cout << '\n';
mtx.unlock();
}
int main ()
{
std::thread th1 (print_block,50,'*');
std::thread th2 (print_block,50,'$');
th1.join();
th2.join();
return 0;
}
In the above code print_block locks mtx, does what it needs to do, and then unlocks mtx. If print_block is called from two different threads, one thread will lock mtx first and the other thread will block on mtx.lock() and be force to wait until the other thread calls mtx.unlock(). This means only one thread can execute the code between mtx.lock() and mtx.unlock() (exclusive) at the same time.
This assumes by "at the same time" you mean at the same literal time. If you only want one thread to be able to call a function I would recommend looking into std::this_thread::get_id which will get you the id of the current thread. An assert could be as simple as storing the owning thread in owning_thread_id and then calling assert(owning_thread_id == std::this_thread::get_id()).

Thread synchronization between data pointed by vectors of std::shared_ptr

I'm pretty new to concurrent programming and I have a specific issue to which I could not find a solution by browsing the internet..
Basically I have this situation (schematic pseudocode):
void fun1(std::vector<std::shared_ptr<SmallObj>>& v) {
for(int i=0; i<v.size(); i++)
.. read and write on *v[i] ..
}
void fun2(std::vector<std::shared_ptr<SmallObj>>& w) {
for(int i=0; i<w.size(); i++)
.. just read on *w[i] ..
}
int main() {
std::vector<std::shared_ptr<SmallObj>> tot;
for(int iter=0; iter<iterMax; iter++) {
for(int nObj=0; nObj<nObjMax; nObj++)
.. create a SmallObj in the heap and store a shared_ptr in tot ..
std::vector<std::shared_ptr<SmallObj>> v, w;
.. copy elements of "tot" in v and w ..
fun1(v);
fun2(w);
}
return 0;
}
What I want to do is operating concurrently spawning two threads to execute fun1 and fun2 but I need to regulate the access to the SmallObjs using some locking mechanism. How can I do it? In the literature I can only find examples of using mutexes to lock the access to a specific object or a portion of code, but not on the same pointed variables by different objects (in this case v and w)..
Thank you very much and sorry for my ignorance on the matter..

I need to regulate the access to the SmallObjs using some locking mechanism. How can I do it?
Use getters and setters for your data members. Use a std::mutex (or a std::recursive_mutex depending on whether recursive locking is needed) data member to guard the accesses, then always lock with a lock guard.
Example (also see the comments in the code):
class SmallObject{
int getID() const{
std::lock_guard<std::mutex> lck(m_mutex);
return ....;
}
void setID(int id){
std::lock_guard<std::mutex> lck(m_mutex);
....;
}
MyType calculate() const{
std::lock_guard<std::mutex> lck(m_mutex);
//HERE is a GOTCHA if `m_mutex` is a `std::mutex`
int k = this->getID(); //Ooopsie... Deadlock
//To do the above, change the decaration of `m_mutex` from
//std::mutex, to std::recursive_mutex
}
private:
..some data
mutable std::mutex m_mutex;
};

The simplest solution is to hold an std::mutex for the whole vector:
#include <mutex>
#include <thread>
#include <vector>
void fun1(std::vector<std::shared_ptr<SmallObj>>& v,std::mutex &mtx) {
for(int i=0; i<v.size(); i++)
//Anything you can do before read/write of *v[i]...
{
std::lock_guard<std::mutex> guard(mtx);
//read-write *v[i]
}
//Anything you can do after read/write of *v[i]...
}
void fun2(std::vector<std::shared_ptr<SmallObj>>& w,std::mutex &mtx) {
for(int i=0; i<w.size(); i++) {
//Anything that can happen before reading *w[i]
{
std::lock_guard<std::mutex> guard(mtx);
//read *w[i]
}
//Anything that can happen after reading *w[i]
}
int main() {
std::mutex mtx;
std::vector<std::shared_ptr<SmallObj>> tot;
for(int iter=0; iter<iterMax; iter++) {
for(int nObj=0; nObj<nObjMax; nObj++)
.. create a SmallObj in the heap and store a shared_ptr in tot ..
std::vector<std::shared_ptr<SmallObj>> v, w;
.. copy elements of "tot" in v and w ..
std::thread t1([&v,&mtx] { fun1(v,mtx); });
std::thread t2([&w,&mtx] { fun2(w,mtx); });
t1.join();
t2.join();
}
return 0;
}
However you will only realistically get any parallelism on the bits done in the before/after blocks in the loop of fun1() and fun2().
You could further increase parallelism by introducing more locks.
For example you can maybe get away with only 2 mutexes which control odd and even elements:
void fun1(std::vector<int>&v,std::mutex& mtx0,std::mutex& mtx1 ){
for(size_t i{0};i<v.size();++i){
{
std::lock_guard<std::mutex> guard(i%2==0?mtx0:mtx1);
//read-write *v[i]
}
}
}
With a similar format for fun2().
You might be able to reduce contention by working from opposite ends of the vectors or using try_lock and moving onto subsequent elements and 'coming back' to the locked element when available.
That can be most significant if the execution of an iteration of one function is much greater than the other and there's some advantage in getting the results from the 'faster' one before the other finishes.
Alternatives:
It's obviously possible to add an std::mutex to each object.
Whether that works / is necessary will depend on what is actually done in the functions fun1 and fun2 as well as how those mutexes are managed.
If it's necessary to lock before either of the loops start there may be in fact no benefit in parallelism because one of fun1() or fun2() will essentially wait for the other to finish and the two will in effect run in series.

Am I using this deque in a thread safe manner?

I'm trying to understand multi threading in C++. In the following bit of code, will the deque 'tempData' declared in retrieve() always have every element processed once and only once, or could there be multiple copies of tempData across multiple threads with stale data, causing some elements to be processed multiple times? I'm not sure if passing by reference actually causes there to be only one copy in this case?
static mutex m;
void AudioAnalyzer::analysisThread(deque<shared_ptr<AudioAnalysis>>& aq)
{
while (true)
{
m.lock();
if (aq.empty())
{
m.unlock();
break;
}
auto aa = aq.front();
aq.pop_front();
m.unlock();
if (false) //testing
{
retrieveFromDb(aa);
}
else
{
analyzeAudio(aa);
}
}
}
void AudioAnalyzer::retrieve()
{
deque<shared_ptr<AudioAnalysis>>tempData(data);
vector<future<void>> futures;
for (int i = 0; i < NUM_THREADS; ++i)
{
futures.push_back(async(bind(&AudioAnalyzer::analysisThread, this, _1), ref(tempData)));
}
for (auto& f : futures)
{
f.get();
}
}

Looks OK to me.
Threads have shared memory and if the reference to tempData turns up as a pointer in the thread then every thread sees exactly the same pointer value and the same single copy of tempData. [You can check that if you like with a bit of global code or some logging.]
Then the mutex ensures single-threaded access, at least in the threads.
One problem: somewhere there must be a push onto the deque, and that may need to be locked by the mutex as well. [Obviously the push_back onto the futures queue is just local.]

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js