Thread safety in std::map of std::shared_ptr - c++

I know there are a lot of similar questions with answers around, but since I still don't understand this particular case, I decided to pose a question.
What I have is a map of shared_ptrs to a dynamically allocated array (MyVector). What I want is limited concurrent access without the need to lock. I know that the map per se is not thread safe, but I always thought what I'm doing here should be ok, which is:
I fill the map in a single threaded environment like that:
typedef shared_ptr<MyVector<float>> MyVectorPtr;
for (int i = 0; i < numElements; i++)
{
content[i] = MyVectorPtr(new MyVector<float>(numRows));
}
After the initialization, I have one thread that reads from the elements and one that replaces what the shared_ptrs point to.
Thread 1:
for(auto i=content.begin();i!=content.end();i++)
{
MyVectorPtr p(i->second);
if (p)
{
memory_use+=sizeof(int) + sizeof(float) * p->number;
}
}
Thread 2:
for (auto itr=content.begin();content.end()!=itr;++itr)
{
itr->second.reset(new MyVector<float>(numRows));
}
After a while I get either a seg fault or a double free in one of the two threads. Somehow not really surprisingly, but still I don't really get it.
The reasons why I thought this would work, are:
I don't add or remove any items of the map in the multi-threaded
environment, so the iterators should always point to something valid.
I thought concurrently changing a single element of the map is fine as long as the operation is atomic.
I thought the operations I do on the shared_ptr (increment ref count, decrement ref count in Thread 1, reset in Thread 2) are atomic. SO Question
Obviously, either one ore more of my assumptions are wrong, or I'm not doing what I think I am. I think that reset actually is not thread safe, would std::atomic_exchange help?
Can someone release me? Thanks a lot!
If someone wants to try out, here is the full code example:
#include <stdio.h>
#include <iostream>
#include <string>
#include <map>
#include <unistd.h>
#include <pthread.h>
using namespace std;
template<class T>
class MyVector
{
public:
MyVector(int length)
: number(length)
, array(new T[length])
{
}
~MyVector()
{
if (array != NULL)
{
delete[] array;
}
array = NULL;
}
int number;
private:
T* array;
};
typedef shared_ptr<MyVector<float>> MyVectorPtr;
static map<int,MyVectorPtr> content;
const int numRows = 1000;
const int numElements = 10;
//pthread_mutex_t write_lock;
double get_cache_size_in_megabyte()
{
double memory_use=0;
//BlockingLockGuard guard(write_lock);
for(auto i=content.begin();i!=content.end();i++)
{
MyVectorPtr p(i->second);
if (p)
{
memory_use+=sizeof(int) + sizeof(float) * p->number;
}
}
return memory_use/(1024.0*1024.0);
}
void* write_content(void*)
{
while(true)
{
//BlockingLockGuard guard(write_lock);
for (auto itr=content.begin();content.end()!=itr;++itr)
{
itr->second.reset(new MyVector<float>(numRows));
cout << "one new written" <<endl;
}
}
return NULL;
}
void* loop_size_checker(void*)
{
while (true)
{
cout << get_cache_size_in_megabyte() << endl;;
}
return NULL;
}
int main(int argc, const char* argv[])
{
for (int i = 0; i < numElements; i++)
{
content[i] = MyVectorPtr(new MyVector<float>(numRows));
}
pthread_attr_t attr;
pthread_attr_init(&attr) ;
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
pthread_t *grid_proc3 = new pthread_t;
pthread_create(grid_proc3, &attr, &loop_size_checker,NULL);
pthread_t *grid_proc = new pthread_t;
pthread_create(grid_proc, &attr, &write_content,(void*)NULL);
// to keep alive and avoid content being deleted
sleep(10000);
}

I thought concurrently changing a single element of the map is fine as long as the operation is atomic.
Changing the element in a map is not atomic unless you have a atomic type like std::atomic.
I thought the operations I do on the shared_ptr (increment ref count, decrement ref count in Thread 1, reset in Thread 2) are atomic.
That is correct. Unfortunately you are also changing the underlying pointer. That pointer is not atomic. Since it is not atomic you need synchronization.
One thing you can do though is use the atomic free functions that are introduced with std::shared_ptr. This will let you avoid having to use a mutex.

Lets expand MyVectorPtr p(i->second); which is running on thread-1:
The constructor called for this is:
template< class Y >
shared_ptr( const shared_ptr<Y>& r ) = default;
Which probably boils down to 2 assignments of the underlying shared pointer and the reference count.
It may very well happen that thread 2 would delete the shared pointer while in thread-1 the pointer is being assigned to p. The underlying pointer stored inside shared_ptr is not atomic.
Thus, you usage of std::shared_ptr is not thread safe. It is thread safe as long as you do not update or modify the underlying pointer.

TL;DR;
Changing std::map isn't thread safe, while using std::shared_ptr regarding additional references is.
You should protect accessing your map regarding read/write operations using an appropriate synchronization mechanism, like e.g. a std::mutex.
Also if the state of an instance referenced by the std::shared_ptr should change, it needs to be protected against data races if it's accessed from concurrent threads.
BTW, the MyVector you are showing is a way too naive implementation.

Related

Woes with std::shared_ptr<T>.use_counter()

https://en.cppreference.com/w/cpp/memory/shared_ptr/use_count states:
In multithreaded environment, the value returned by use_count is approximate (typical implementations use a memory_order_relaxed load)
But does this mean that use_count() is totally useless in a multi-threaded environment?
Consider the following example, where the Circular class implements a circular buffer of std::shared_ptr<int>.
One method is supplied to users - get(), which checks whether the reference count of the next element in the std::array<std::shared_ptr<int>> is greater than 1 (which we don't want, since it means that it's being held by a user which previously called get()).
If it's <= 1, a copy of the std::shared_ptr<int> is returned to the user.
In this case, the users are two threads which do nothing at all except love to call get() on the circular buffer - that's their purpose in life.
What happens in practice when I execute the program is that it runs for a few cycles (tested by adding a counter to the circular buffer class), after which it throws the exception, complaining that the reference counter for the next element is > 1.
Is this a result of the statement that the value returned by use_count() is approximate in a multi-threaded environment?
Is it possible to adjust the underlying mechanism to make it, uh, deterministic and behave as I would have liked it to behave?
If my thinking is correct - use_count() (or rather the real number of users) of the next element should never EVER increase above 1 when inside the get() function of Circular, since there are only two consumers, and every time a thread calls get(), it's already released its old (copied) std::shared_ptr<int> (which in turn means that the remaining std::shared_ptr<int> residing in Circular::ints_ should have a reference count of only 1).
#include <mutex>
#include <array>
#include <memory>
#include <exception>
#include <thread>
class Circular {
public:
Circular() {
for (auto& i : ints_) { i = std::make_shared<int>(0); }
}
std::shared_ptr<int> get() {
std::lock_guard<std::mutex> lock_guard(guard_);
index_ = index_ % 2; // Re-set the index pointer.
if (ints_.at(index_).use_count() > 1) {
// This shouldn't happen - right? (but it does)
std::string excp = std::string("OOPSIE: ") + std::to_string(index_) + " " + std::to_string(ints_.at(index_).use_count());
throw std::logic_error(excp);
}
return ints_.at(index_++);
}
private:
std::mutex guard_;
unsigned int index_{0};
std::array<std::shared_ptr<int>, 2> ints_;
};
Circular circ;
void func() {
do {
auto scoped_shared_int_pointer{circ.get()};
}while(1);
}
int main() {
std::thread t1(func), t2(func);
t1.join(); t2.join();
}
While use_count is fraught with problems, the core issue right now is outside of that logic.
Assume thread t1 takes the shared_ptr at index 0, and then t2 runs its loop twice before t1 finishes its first loop iteration. t2 will obtain the shared_ptr at index 1, release it, and then attempt to acquire the shared_ptr at index 0, and will hit your failure condition, since t1 is just running behind.
Now, that said, in a broader context, it's not particularly safe, as if a user creates a weak_ptr, it's entirely possible for the use_count to go from 1 to 2 without passing through this function. In this simple example, it would work to have it loop through the index array until it finds the free shared pointer.
use_count is for debugging only and shouldn't be used. If you want to know when nobody else has a reference to a pointer any more just let the shared pointer die and use a custom deleter to detect that and do whatever you need to do with the now unused pointer.
This is an example of how you might implement this in your code:
#include <mutex>
#include <array>
#include <memory>
#include <exception>
#include <thread>
#include <vector>
#include <iostream>
class Circular {
public:
Circular() {
size_t index = 0;
for (auto& i : ints_)
{
i = 0;
unused_.push_back(index++);
}
}
std::shared_ptr<int> get() {
std::lock_guard<std::mutex> lock_guard(guard_);
if (unused_.empty())
{
throw std::logic_error("OOPSIE: none left");
}
size_t index = unused_.back();
unused_.pop_back();
return std::shared_ptr<int>(&ints_[index], [this, index](int*) {
std::lock_guard<std::mutex> lock_guard(guard_);
unused_.push_back(index);
});
}
private:
std::mutex guard_;
std::vector<size_t> unused_;
std::array<int, 2> ints_;
};
Circular circ;
void func() {
do {
auto scoped_shared_int_pointer{ circ.get() };
} while (1);
}
int main() {
std::thread t1(func), t2(func);
t1.join(); t2.join();
}
A list of unused indexes is kept, when the shared pointer is destroyed the custom deleter returns the index back to the list of unused indexes ready to be used in the next call to get.

About shared_mutex and shared_ptr across multiple threads

I implemented code such that multiple instances running on different threads reads other instances' data using reader-writer lock and shared_ptr. It seemed fine, but I am not 100% sure about that and I came up with some questions about usage of those.
Detail
I have multiple instances of a class called Chunk and each instance does some calculations in a dedicated thread. A chunk needs to read neighbour chunks' data as well as its own data, but it doesn't write neighbours' data, so reader-writer lock is used. Also, neighbours can be set at runtime. For example, I might want o set a different neighbour chunk at runtime, sometimes just nullptr. It is possible to delete a chunk at runtime, too. Raw pointers can be used but I thought shared_ptr and weak_ptr are better for this, in order to keep track of the lifetime. Own data in shared_ptr and neighbours' data in weak_ptr.
I provided a simpler version of my code below. ChunkData has data and a mutex for it. I use InitData for data initialization and DoWork function is called in a dedicated thread after that. other functions can be called from main thread.
This seems to work, but I am not so confident. Especially, about use of shared_ptr across multiple threads.
What happens if a thread calls shared_ptr's reset() (in ctor and InitData) and other uses it with weak_ptr's lock (in DoWork)? Does this need a lock dataMutex or chunkMutex?
How about copy(in SetNeighbour)? Do I need locks for this as well?
I think other parts are ok, but please let me know if you find anything dangerous. Appreciate that.
By the way, I considered about storing shared_ptr of Chunk instead of ChunkData, but decided not to use this method because internal code, which I don't manage, has GC system and it can delete a pointer to Chunk when I don't expect it.
class Chunk
{
public:
class ChunkData
{
public:
shared_mutex dataMutex; // mutex to read/write data
int* data;
int size;
ChunkData() : data(nullptr) { }
~ChunkData()
{
if (data)
{
delete[] data;
data = nullptr;
}
}
};
private:
mutex chunkMutex; // mutex to read/write member variables
shared_ptr<ChunkData> chunkData;
weak_ptr<ChunkData> neighbourChunkData;
string result;
public:
Chunk(string _name)
: chunkData(make_shared<ChunkData>())
{
}
~Chunk()
{
EndProcess();
unique_lock lock(chunkMutex); // is this needed?
chunkData.reset();
}
void InitData(int size)
{
ChunkData* NewData = new ChunkData();
NewData->size = size;
NewData->data = new int[size];
{
unique_lock lock(chunkMutex); // is this needed?
chunkData.reset(NewData);
cout << "init chunk " << name << endl;
}
}
// This is executed in other thread. e.g. thread t(&Chunk::DoWork, this);
void DoWork()
{
lock_guard lock(chunkMutex); // we modify some members such as result(string) reading chunk data, so need this.
if (chunkData)
{
shared_lock readLock(chunkData->dataMutex);
if (chunkData->data)
{
// read chunkData->data[i] and modify some members such as result(string)
for (int i = 0; i < chunkData->size; ++i)
{
// Is this fine, or should I write data result outside of readLock scope?
result += to_string(chunkData->data[i]) + " ";
}
}
}
// does this work?
if (shared_ptr<ChunkData> neighbour = neighbourChunkData.lock())
{
shared_lock readLock(neighbour->dataMutex);
if (neighbour->data)
{
// read neighbour->data[i] and modify some members as above
}
}
}
shared_ptr<ChunkData> GetChunkData()
{
unique_lock lock(chunkMutex);
return chunkData;
}
void SetNeighbour(Chunk* neighbourChunk)
{
if (neighbourChunk)
{
// safe?
shared_ptr<ChunkData> newNeighbourData = neighbourChunk->GetChunkData();
unique_lock lock(chunkMutex); // lock for chunk properties
{
shared_lock readLock(newNeighbourData->dataMutex); // not sure if this is needed.
neighbourChunkData = newNeighbourData;
}
}
}
int GetDataAt(int index)
{
shared_lock readLock(chunkData->dataMutex);
if (chunkData->data && 0 <= index && index < chunkData->size)
{
return chunkData->data[index];
}
return 0;
}
void SetDataAt(int index, int element)
{
unique_lock writeLock(chunkData->dataMutex);
if (chunkData->data && 0 <= index && index < chunkData->size)
{
chunkData->data[index] = element;
}
}
};
Edit 1
I added more detail for DoWork function. Chunk data is read and chunk's member variables are edited in the function.
After Homer512's anwer, I came up with other questions.
A) In DoWork function I write a member variable inside a read lock. Should I only read data in a read lock scope and if I need to modify other data based on read data, do I have to do outside of the read lock? For example, copy the whole array to a local variable in a read lock, and modify other members outside of the read lock using the local.
B) I followed Homer512 and modifed GetDataAt/SetDataAt as below. I do read/write lock chunkData->dataMutex before unlocking chunkMutex. I also do this in DoWork function. Should I instead do locks separately? For example, make a local variable shared_ptr and set chunkData to it in a chunkMutex lock, unlock it, then lastly read/write lock that local variable's dataMutex and read/write data.
int GetDataAt(int index)
{
lock_guard chunkLock(chunkMutex);
shared_lock readLock(chunkData->dataMutex);
if (chunkData->data && 0 <= index && index < chunkData->size)
{
return chunkData->data[index];
}
return 0;
}
void SetDataAt(int index, int element)
{
lock_guard chunkLock(chunkMutex);
unique_lock writeLock(chunkData->dataMutex);
if (chunkData->data && 0 <= index && index < chunkData->size)
{
chunkData->data[index] = element;
}
}
I have several remarks:
~ChunkData: You could change your data member from int* to unique_ptr<int[]> to get the same result without an explicit destructor. Your code is correct though, just less convenient.
~Chunk: I don't think you need a lock or call the reset method. By the time the destructor runs, by definition, no one should have a reference to the Chunk object. So the lock can never be contested. And reset is unnecessary because the shared_ptr destructor will handle that.
InitData: Yes, the lock is needed because InitData can race with DoWork. You could avoid this by moving InitData to the constructor but I assume there are reasons for this division. You could also change the shared_ptr to std::atomic<std::shared_ptr<ChunkData> > to avoid the lock.
It is more efficient to write InitData like this:
void InitData(int size)
{
std::shared_ptr<ChunkData> NewData = std::make_shared<ChunkData>();
NewData->size = size;
NewData->data = new int[size]; // or std::make_unique<int[]>(size)
{
std::lock_guard<std::mutex> lock(chunkMutex);
chunkData.swap(NewData);
}
// deletes old chunkData outside locked region if it was initialized before
}
make_shared avoids an additional memory allocation for the reference counter. This also moves all allocations and deallocations out of the critical section.
DoWork: Your comment "ready chunkData->data[i] and modify some members". You only take a shared_lock but say that you modify members. Well, which is it, reading or writing? Or do you mean to say that you modify Chunk but not ChunkData, with Chunk being protected by its own mutex?
SetNeighbour: You need to lock both your own chunkMutex and the neighbour's. You should not lock both at the same time to avoid the dining philosopher's problem (though std::lock solves this).
void SetNeighbour(Chunk* neighbourChunk)
{
if(! neighbourChunk)
return;
std::shared_ptr<ChunkData> newNeighbourData;
{
std::lock_guard<std::mutex> lock(neighbourChunk->chunkMutex);
newNeighbourData = neighbourChunk->chunkData;
}
std::lock_guard<std::mutex> lock(this->chunkMutex);
this->neighbourChunkData = newNeighbourData;
}
GetDataAt and SetDataAt: You need to lock chunkMutex. Otherwise you might race with InitData. There is no need to use std::lock because the order of locks is never swapped around.
EDIT 1:
DoWork: The line if (shared_ptr<ChunkData> neighbour = neighbourChunkData.lock()) doesn't keep the neighbur alive. Move the variable declaration out of the if to keep the reference.
EDIT: Alternative design proposal
What I'm bothered with is that your DoWork may be unable to proceed if InitData is still running or waiting to run. How do you want to deal with this? I suggest you make it possible to wait until the work can be done. Something like this:
class Chunk
{
std::mutex chunkMutex;
std::shared_ptr<ChunkData> chunkData;
std::weak_ptr<ChunkData> neighbourChunkData;
std::condition_variable chunkSet;
void waitForChunk(std::unique_lock<std::mutex>& lock)
{
while(! chunkData)
chunkSet.wait(lock);
}
public:
// modified version of my code above
void InitData(int size)
{
std::shared_ptr<ChunkData> NewData = std::make_shared<ChunkData>();
NewData->size = size;
NewData->data = new int[size]; // or std::make_unique<int[]>(size)
{
std::lock_guard<std::mutex> lock(chunkMutex);
chunkData.swap(NewData);
}
chunkSet.notify_all();
}
void DoWork()
{
std::unique_lock<std::mutex> ownLock(chunkMutex);
waitForChunk(lock); // blocks until other thread finishes InitData
{
shared_lock readLock(chunkData->dataMutex);
...
}
shared_ptr<ChunkData> neighbour = neighbourChunkData.lock();
if(! neighbour)
return;
shared_lock readLock(neighbour->dataMutex);
...
}
void SetNeighbour(Chunk* neighbourChunk)
{
if(! neighbourChunk)
return;
shared_ptr<ChunkData> newNeighbourData;
{
std::unique_lock<std::mutex> lock(neighbourChunk->chunkMutex);
neighbourChunk->waitForChunk(lock); // wait until neighbor has finished InitData
newNeighbourData = neighbourChunk->chunkData;
}
std::lock_guard<std::mutex> ownLock(this->chunkMutex);
this->neighbourChunkData = std::move(newNeighbourData);
}
};
The downside to this is that you could deadlock if InitData is never called or if it failed with an exception. There are ways around this, like using an std::shared_future which knows that it is valid (set when InitData is scheduled) and whether it failed (records exception of associated promise or packaged_task).

Read-write thread-safe smart pointer in C++, x86-64

I develop some lock free data structure and following problem arises.
I have writer thread that creates objects on heap and wraps them in smart pointer with reference counter. I also have a lot of reader threads, that work with these objects. Code can look like this:
SmartPtr ptr;
class Reader : public Thread {
virtual void Run {
for (;;) {
SmartPtr local(ptr);
// do smth
}
}
};
class Writer : public Thread {
virtual void Run {
for (;;) {
SmartPtr newPtr(new Object);
ptr = newPtr;
}
}
};
int main() {
Pool* pool = SystemThreadPool();
pool->Run(new Reader());
pool->Run(new Writer());
for (;;) // wait for crash :(
}
When I create thread-local copy of ptr it means at least
Read an address.
Increment reference counter.
I can't do these two operations atomically and thus sometimes my readers work with deleted object.
The question is - what kind of smart pointer should I use to make read-write access from several threads with correct memory management possible? Solution should exist, since Java programmers don't even care about such a problem, simply relying on that all objects are references and are deleted only when nobody uses them.
For PowerPC I found http://drdobbs.com/184401888, looks nice, but uses Load-Linked and Store-Conditional instructions, that we don't have in x86.
As far I as I understand, boost pointers provide such functionality only using locks. I need lock free solution.
boost::shared_ptr have atomic_store which uses a "lock-free" spinlock which should be fast enough for 99% of possible cases.
boost::shared_ptr<Object> ptr;
class Reader : public Thread {
virtual void Run {
for (;;) {
boost::shared_ptr<Object> local(boost::atomic_load(&ptr));
// do smth
}
}
};
class Writer : public Thread {
virtual void Run {
for (;;) {
boost::shared_ptr<Object> newPtr(new Object);
boost::atomic_store(&ptr, newPtr);
}
}
};
int main() {
Pool* pool = SystemThreadPool();
pool->Run(new Reader());
pool->Run(new Writer());
for (;;)
}
EDIT:
In response to comment below, the implementation is in "boost/shared_ptr.hpp"...
template<class T> void atomic_store( shared_ptr<T> * p, shared_ptr<T> r )
{
boost::detail::spinlock_pool<2>::scoped_lock lock( p );
p->swap( r );
}
template<class T> shared_ptr<T> atomic_exchange( shared_ptr<T> * p, shared_ptr<T> r )
{
boost::detail::spinlock & sp = boost::detail::spinlock_pool<2>::spinlock_for( p );
sp.lock();
p->swap( r );
sp.unlock();
return r; // return std::move( r )
}
With some jiggery-pokery you should be able to accomplish this using InterlockedCompareExchange128. Store the reference count and pointer in a 2 element __int64 array. If reference count is in array[0] and pointer in array[1] the atomic update would look like this:
while(true)
{
__int64 comparand[2];
comparand[0] = refCount;
comparand[1] = pointer;
if(1 == InterlockedCompareExchange128(
array,
pointer,
refCount + 1,
comparand))
{
// Pointer is ready for use. Exit the while loop.
}
}
If an InterlockedCompareExchange128 intrinsic function isn't available for your compiler then you may use the underlying CMPXCHG16B instruction instead, if you don't mind mucking around in assembly language.
The solution proposed by RobH doesn't work. It has the same problem as the original question: when accessing the reference count object, it might already have been deleted.
The only way I see of solving the problem without a global lock (as in boost::atomic_store) or conditional read/write instructions is to somehow delay the destruction of the object (or the shared reference count object if such thing is used). So zennehoy has a good idea but his method is too unsafe.
The way I might do it is by keeping copies of all the pointers in the writer thread so that the writer can control the destruction of the objects:
class Writer : public Thread {
virtual void Run() {
list<SmartPtr> ptrs; //list that holds all the old ptr values
for (;;) {
SmartPtr newPtr(new Object);
if(ptr)
ptrs.push_back(ptr); //push previous pointer into the list
ptr = newPtr;
//Periodically go through the list and destroy objects that are not
//referenced by other threads
for(auto it=ptrs.begin(); it!=ptrs.end(); )
if(it->refCount()==1)
it = ptrs.erase(it);
else
++it;
}
}
};
However there are still requirements for the smart pointer class. This doesn't work with shared_ptr as the reads and writes are not atomic. It almost works with boost::intrusive_ptr. The assignment on intrusive_ptr is implemented like this (pseudocode):
//create temporary from rhs
tmp.ptr = rhs.ptr;
if(tmp.ptr)
intrusive_ptr_add_ref(tmp.ptr);
//swap(tmp,lhs)
T* x = lhs.ptr;
lhs.ptr = tmp.ptr;
tmp.ptr = x;
//destroy temporary
if(tmp.ptr)
intrusive_ptr_release(tmp.ptr);
As far as I understand the only thing missing here is a compiler level memory fence before lhs.ptr = tmp.ptr;. With that added, both reading rhs and writing lhs would be thread-safe under strict conditions: 1) x86 or x64 architecture 2) atomic reference counting 3) rhs refcount must not go to zero during the assignment (guaranteed by the Writer code above) 4) only one thread writing to lhs (using CAS you could have several writers).
Anyway, you could create your own smart pointer class based on intrusive_ptr with necessary changes. Definitely easier than re-implementing shared_ptr. And besides, if you want performance, intrusive is the way to go.
The reason this works much more easily in java is garbage collection. In C++, you have to manually ensure that a value is not just starting to be used by a different thread when you want to delete it.
A solution I've used in a similar situation is to simply delay the deletion of the value. I create a separate thread that iterates through a list of things to be deleted. When I want to delete something, I add it to this list with a timestamp. The deleting thread waits until some fixed time after this timestamp before actually deleting the value. You just have to make sure that the delay is large enough to guarantee that any temporary use of the value has completed.
100 milliseconds would have been enough in my case, I chose a few seconds to be safe.

What are the problems with this producer/consumer implementation?

So I'm looking at using a simple producer/consumer queue in C++. I'll end up using boost for threading but this example is just using pthreads. I'll also end up using a far more OO approach, but I think that would obscure the details I'm interested in at the moment.
Anyway the particular issues I'm worried about are
Since this code is using push_back and pop_front of std::deque - it's probably doing allocation and deallocation of the underlying data in different threads - I believe this is bad (undefined behaviour) - what's the easiest way to avoid this?
Nothing is marked volatile. But the important bits are mutex protected. Do I need to mark anything as volatile and if so what? - I don't think I do as I believe the mutex contains appropriate memory barriers etc., but I'm unsure.
Are there any other glaring issues?
Anyway heres the code:
#include <pthread.h>
#include <deque>
#include <iostream>
struct Data
{
std::deque<int> * q;
pthread_mutex_t * mutex;
};
void* producer( void* arg )
{
std::deque<int> &q = *(static_cast<Data*>(arg)->q);
pthread_mutex_t * m = (static_cast<Data*>(arg)->mutex);
for(unsigned int i=0; i<100; ++i)
{
pthread_mutex_lock( m );
q.push_back( i );
std::cout<<"Producing "<<i<<std::endl;
pthread_mutex_unlock( m );
}
return NULL;
}
void* consumer( void * arg )
{
std::deque<int> &q = *(static_cast<Data*>(arg)->q);
pthread_mutex_t * m = (static_cast<Data*>(arg)->mutex);
for(unsigned int i=0; i<100; ++i)
{
pthread_mutex_lock( m );
int v = q.front();
q.pop_front();
std::cout<<"Consuming "<<v<<std::endl;
pthread_mutex_unlock( m );
}
return NULL;
}
int main()
{
Data d;
std::deque<int> q;
d.q = &q;
pthread_mutex_t mutex;
pthread_mutex_init( &mutex, NULL );
d.mutex = & mutex;
pthread_t producer_thread;
pthread_t consumer_thread;
pthread_create( &producer_thread, NULL, producer, &d );
pthread_create( &consumer_thread, NULL, consumer, &d );
pthread_join( producer_thread, NULL );
pthread_join( consumer_thread, NULL );
}
EDIT:
I did end up throwing away this implementation, I'm now using a modified version of the code from here by Anthony Williams. My modified version can be found here This modified version uses a more sensible condition variable based approach.
Since this code is using push_back and pop_front of std::deque - it's probably doing allocation and deallocation of the underlying data in different threads - I believe this is bad (undefined behaviour) - what's the easiest way to avoid this?
As long as only one thread can modify the container at a time, this is okay.
Nothing is marked volatile. But the important bits are mutex protected. Do I need to mark anything as volatile and if so what? - I don't think I do as I believe the mutex contains appropriate memory barriers etc., but I'm unsure.
So long as you correctly control access to the container using a mutex, it does not need to be volatile (this is dependent upon your threads library, but it wouldn't be a very good mutex if it didn't provide a correct memory barrier).
It is perfectly valid to allocate memory in one thread and free it in another if both threads are in the same process.
Using a mutex to protect access to the deque should provide the correct memory access configuration.
EDIT: The only other thing to think about is the nature of the producer and consumer. Your synthesized example lacks some of the subtleties involved with a real implementation. For example, how will you synchronize the producer with the consumer if they are not operating at the exact same rate? You might want to consider using something like a pipe or an OS queue instead of a deque so that the consumer can block on read if there is no data ready to process.

Need some advice to make the code multithreaded

I received a code that is not for multi-threaded app, now I have to modify the code to support for multi-threaded.
I have a Singleton class(MyCenterSigltonClass) that based on instruction in:
http://en.wikipedia.org/wiki/Singleton_pattern
I made it thread-safe
Now I see inside the class that contains 10-12 members, some with getter/setter methods.
Some members are declared as static and are class pointer like:
static Class_A* f_static_member_a;
static Class_B* f_static_member_b;
for these members, I defined a mutex(like mutex_a) INSIDE the class(Class_A) , I didn't add the mutex directly in my MyCenterSigltonClass, the reason is they are one to one association with my MyCenterSigltonClass, I think I have option to define mutex in the class(MyCenterSigltonClass) or (Class_A) for f_static_member_a.
1) Am I right?
Also, my Singleton class(MyCenterSigltonClass) contains some other members like
Class_C f_classC;
for these kind of member variables, should I define a mutex for each of them in MyCenterSigltonClass to make them thread-safe? what would be a good way to handle these cases?
Appreciate for any suggestion.
-Nima
Whether the members are static or not doesn't really matter. How you protect the member variables really depends on how they are accessed from public methods.
You should think about a mutex as a lock that protects some resource from concurrent read/write access. You don't need to think about protecting the internal class objects necessarily, but the resources within them. You also need to consider the scope of the locks you'll be using, especially if the code wasn't originally designed to be multithreaded. Let me give a few simple examples.
class A
{
private:
int mValuesCount;
int* mValues;
public:
A(int count, int* values)
{
mValuesCount = count;
mValues = (count > 0) ? new int[count] : NULL;
if (mValues)
{
memcpy(mValues, values, count * sizeof(int));
}
}
int getValues(int count, int* values) const
{
if (mValues && values)
{
memcpy(values, mValues, (count < mValuesCount) ? count : mValuesCount);
}
return mValuesCount;
}
};
class B
{
private:
A* mA;
public:
B()
{
int values[5] = { 1, 2, 3, 4, 5 };
mA = new A(5, values);
}
const A* getA() const { return mA; }
};
In this code, there's no need to protect mA because there's no chance of conflicting access across multiple threads. None of the threads can modify the state of mA, so all concurrent access just reads from mA. However, if we modify class A:
class A
{
private:
int mValuesCount;
int* mValues;
public:
A(int count, int* values)
{
mValuesCount = 0;
mValues = NULL;
setValues(count, values);
}
int getValues(int count, int* values) const
{
if (mValues && values)
{
memcpy(values, mValues, (count < mValuesCount) ? count : mValuesCount);
}
return mValuesCount;
}
void setValues(int count, int* values)
{
delete [] mValues;
mValuesCount = count;
mValues = (count > 0) ? new int[count] : NULL;
if (mValues)
{
memcpy(mValues, values, count * sizeof(int));
}
}
};
We can now have multiple threads calling B::getA() and one thread can read from mA while another thread writes to mA. Consider the following thread interaction:
Thread A: a->getValues(maxCount, values);
Thread B: a->setValues(newCount, newValues);
It's possible that Thread B will delete mValues while Thread A is in the middle of copying it. In this case, you would need a mutex within class A to protect access to mValues and mValuesCount:
int getValues(int count, int* values) const
{
// TODO: Lock mutex.
if (mValues && values)
{
memcpy(values, mValues, (count < mValuesCount) ? count : mValuesCount);
}
int returnCount = mValuesCount;
// TODO: Unlock mutex.
return returnCount;
}
void setValues(int count, int* values)
{
// TODO: Lock mutex.
delete [] mValues;
mValuesCount = count;
mValues = (count > 0) ? new int[count] : NULL;
if (mValues)
{
memcpy(mValues, values, count * sizeof(int));
}
// TODO: Unlock mutex.
}
This will prevent concurrent read/write on mValues and mValuesCount. Depending on the locking mechanisms available in your environment, you may be able to use a read-only locking mechanism in getValues() to prevent multiple threads from blocking on concurrent read access.
However, you'll also need to understand the scope of the locking you need to implement if class A is more complex:
class A
{
private:
int mValuesCount;
int* mValues;
public:
A(int count, int* values)
{
mValuesCount = 0;
mValues = NULL;
setValues(count, values);
}
int getValueCount() const { return mValuesCount; }
int getValues(int count, int* values) const
{
if (mValues && values)
{
memcpy(values, mValues, (count < mValuesCount) ? count : mValuesCount);
}
return mValuesCount;
}
void setValues(int count, int* values)
{
delete [] mValues;
mValuesCount = count;
mValues = (count > 0) ? new int[count] : NULL;
if (mValues)
{
memcpy(mValues, values, count * sizeof(int));
}
}
};
In this case, you could have the following thread interaction:
Thread A: int maxCount = a->getValueCount();
Thread A: // allocate memory for "maxCount" int values
Thread B: a->setValues(newCount, newValues);
Thread A: a->getValues(maxCount, values);
Thread A has been written as though calls to getValueCount() and getValues() will be an uninterrupted operation, but Thread B has potentially changed the count in the middle of Thread A's operations. Depending on whether the new count is larger or smaller than the original count, it may take a while before you discover this problem. In this case, class A would need to be redesigned or it would need to provide some kind of transaction support so the thread using class A could block/unblock other threads:
Thread A: a->lockValues();
Thread A: int maxCount = a->getValueCount();
Thread A: // allocate memory for "maxCount" int values
Thread B: a->setValues(newCount, newValues); // Blocks until Thread A calls unlockValues()
Thread A: a->getValues(maxCount, values);
Thread A: a->unlockValues();
Thread B: // Completes call to setValues()
Since the code wasn't initially designed to be multithreaded, it's very likely you'll run into these kinds of issues where one method call uses information from an earlier call, but there was never a concern for the state of the object changing between those calls.
Now, begin to imagine what could happen if there are complex state dependencies among the objects within your singleton and multiple threads can modify the state of those internal objects. It can all become very, very messy with a large number of threads and debugging can become very difficult.
So as you try to make your singleton thread-safe, you need to look at several layers of object interactions. Some good questions to ask:
Do any of the methods on the singleton reveal internal state that may change between method calls (as in the last example I mention)?
Are any of the internal objects revealed to clients of the singleton?
If so, do any of the methods on those internal objects reveal internal state that may change between method calls?
If internal objects are revealed, do they share any resources or state dependencies?
You may not need any locking if you're just reading state from internal objects (first example). You may need to provide simple locking to prevent concurrent read/write access (second example). You may need to redesign the classes or provide clients with the ability to lock object state (third example). Or you may need to implement more complex locking where internal objects share state information across threads (e.g. a lock on a resource in class Foo requires a lock on a resource in class Bar, but locking that resource in class Bar doesn't necessarily require a lock on a resource in class Foo).
Implementing thread-safe code can become a complex task depending on how all your objects interact. It can be much more complicated than the examples I've given here. Just be sure you clearly understand how your classes are used and how they interact (and be prepared to spend some time tracking down difficult to reproduce bugs).
If this is the first time you're doing threading, consider not accessing the singleton from the background thread. You can get it right, but you probably won't get it right the first time.
Realize that if your singleton exposes pointers to other objects, these should be made thread safe as well.
You don't have to define a mutex for each member. For example, you could instead use a single mutex to synchronize access each to member, e.g.:
class foo
{
public:
...
void some_op()
{
// acquire "lock_" and release using RAII ...
Lock(lock_);
a++;
}
void set_b(bar * b)
{
// acquire "lock_" and release using RAII ...
Lock(lock_);
b_ = b;
}
private:
int a_;
bar * b_;
mutex lock_;
}
Of course a "one lock" solution may be not suitable in your case. That's up to you to decide. Regardless, simply introducing locks doesn't make the code thread-safe. You have to use them in the right place in the right way to avoid race conditions, deadlocks, etc. There are lots of concurrency issues you could run in to.
Furthermore you don't always need mutexes, or other threading mechanisms like TSS, to make code thread-safe. For example, the following function "func" is thread-safe:
class Foo;
void func (Foo & f)
{
f.some_op(); // Foo::some_op() of course needs to be thread-safe.
}
// Thread 1
Foo a;
func(&a);
// Thread 2
Foo b;
func(&b);
While the func function above is thread-safe the operations it invokes may not be thread-safe. The point is you don't always need to pepper your code with mutexes and other threading mechanisms to make the code thread safe. Sometimes restructuring the code is sufficient.
There's a lot of literature on multithreaded programming. It's definitely not easy to get right so take your time in understanding the nuances, and take advantage of existing frameworks like Boost.Thread to mitigate some of the inherent and accidental complexities that exist in the lower-level multithreading APIs.
I'd really recommend the Interlocked.... Methods to increment, decrement and CompareAndSwap values when using code that needs to be multi-thread-aware. I don't have 1st-hand C++ experience but a quick search for http://www.bing.com/search?q=c%2B%2B+interlocked reveals lots of confirming advice. If you need perf, these will likely be faster than locking.
As stated by #Void a mutex alone is not always the solution to a concurrency problem:
Regardless, simply introducing locks doesn't make the code
thread-safe. You have to use them in the right place in the right way
to avoid race conditions, deadlocks, etc. There are lots of
concurrency issues you could run in to.
I want to add another example:
class MyClass
{
mutex m_mutex;
AnotherClass m_anotherClass;
void setObject(AnotherClass& anotherClass)
{
m_mutex.lock();
m_anotherClass = anotherClass;
m_mutex.unlock();
}
AnotherClass getObject()
{
AnotherClass anotherClass;
m_mutex.lock();
anotherClass = m_anotherClass;
m_mutex.unlock();
return anotherClass;
}
}
In this case the getObject() method is always safe because is protected with mutex and you have a copy of the object which is returned to the caller which may be a different class and thread. This means you are working on a copy which might be old (in the meantime another thread might have changed the m_anotherClass by calling setObject() ).Now what if you turn m_anotherClass to a pointer instead of an object-variable ?
class MyClass
{
mutex m_mutex;
AnotherClass *m_anotherClass;
void setObject(AnotherClass *anotherClass)
{
m_mutex.lock();
m_anotherClass = anotherClass;
m_mutex.unlock();
}
AnotherClass * getObject()
{
AnotherClass *anotherClass;
m_mutex.lock();
anotherClass = m_anotherClass;
m_mutex.unlock();
return anotherClass;
}
}
This is an example where a mutex is not enough to solve all the problems.
With pointers you can have a copy only of the pointer but the pointed object is the same in the both the caller and the method. So even if the pointer was valid at the time that the getObject() was called you don't have any guarantee that the pointed value will exists during the operation you are performing with it. This is simply because you don't have control on the object lifetime. That's why you should use object-variables as much as possible and avoid pointers (if you can).