Critical Sections and return values in C++ - c++

In attempting to create a thread-safe container class from scratch, I've run into the problem of returning values from access methods. For example in Windows:
myNode getSomeData( )
{
EnterCriticalSection(& myCritSec);
myNode retobj;
// fill retobj with data from structure
LeaveCriticalSection(& myCritSec);
return retobj;
}
Now I suppose that this type of method is not at all thread-safe because after the code releases the critical section another thread is able to come along and immediately overwrite retobj before the first thread returns. So what is an elegant way to return retobj to the caller in a thread-safe manner?

No, it's thread-safe because each thread has it's own stack, and that's where retobj is.
However, it's certainly not exception-safe. Wrap the critical section in a RAII-style object would help that. Something like...
class CriticalLock : boost::noncopyable {
CriticalSection &section;
public:
CriticalLock(CriticalSection &cs) : section(cs)
{
EnterCriticalSection(section);
}
~CriticalLock()
{
LeaveCriticalSection(section);
}
};
Usage:
myNode getSomeData( )
{
CriticalLock lock(myCritSec); // automatically released.
...
}

This is C++, and retobj has automatic storage type, so it's stored on the stack.
Every thread has its own stack, so another thread cannot clobber the value of retobj before it is returned.

Related

Synchronizing method calls on shared object from multiple threads

I am thinking about how to implement a class that will contain private data that will be eventually be modified by multiple threads through method calls. For synchronization (using the Windows API), I am planning on using a CRITICAL_SECTION object since all the threads will spawn from the same process.
Given the following design, I have a few questions.
template <typename T> class Shareable
{
private:
const LPCRITICAL_SECTION sync; //Can be read and used by multiple threads
T *data;
public:
Shareable(LPCRITICAL_SECTION cs, unsigned elems) : sync{cs}, data{new T[elems]} { }
~Shareable() { delete[] data; }
void sharedModify(unsigned index, T &datum) //<-- Can this be validly called
//by multiple threads with synchronization being implicit?
{
EnterCriticalSection(sync);
/*
The critical section of code involving reads & writes to 'data'
*/
LeaveCriticalSection(sync);
}
};
// Somewhere else ...
DWORD WINAPI ThreadProc(LPVOID lpParameter)
{
Shareable<ActualType> *ptr = static_cast<Shareable<ActualType>*>(lpParameter);
T copyable = /* initialization */;
ptr->sharedModify(validIndex, copyable); //<-- OK, synchronized?
return 0;
}
The way I see it, the API calls will be conducted in the context of the current thread. That is, I assume this is the same as if I had acquired the critical section object from the pointer and called the API from within ThreadProc(). However, I am worried that if the object is created and placed in the main/initial thread, there will be something funky about the API calls.
When sharedModify() is called on the same object concurrently,
from multiple threads, will the synchronization be implicit, in the
way I described it above?
Should I instead get a pointer to the
critical section object and use that instead?
Is there some other
synchronization mechanism that is better suited to this scenario?
When sharedModify() is called on the same object concurrently, from multiple threads, will the synchronization be implicit, in the way I described it above?
It's not implicit, it's explicit. There's only only CRITICAL_SECTION and only one thread can hold it at a time.
Should I instead get a pointer to the critical section object and use that instead?
No. There's no reason to use a pointer here.
Is there some other synchronization mechanism that is better suited to this scenario?
It's hard to say without seeing more code, but this is definitely the "default" solution. It's like a singly-linked list -- you learn it first, it always works, but it's not always the best choice.
When sharedModify() is called on the same object concurrently, from multiple threads, will the synchronization be implicit, in the way I described it above?
Implicit from the caller's perspective, yes.
Should I instead get a pointer to the critical section object and use that instead?
No. In fact, I would suggest giving the Sharable object ownership of its own critical section instead of accepting one from the outside (and embrace RAII concepts to write safer code), eg:
template <typename T>
class Shareable
{
private:
CRITICAL_SECTION sync;
std::vector<T> data;
struct SyncLocker
{
CRITICAL_SECTION &sync;
SyncLocker(CRITICAL_SECTION &cs) : sync(cs) { EnterCriticalSection(&sync); }
~SyncLocker() { LeaveCriticalSection(&sync); }
}
public:
Shareable(unsigned elems) : data(elems)
{
InitializeCriticalSection(&sync);
}
Shareable(const Shareable&) = delete;
Shareable(Shareable&&) = delete;
~Shareable()
{
{
SyncLocker lock(sync);
data.clear();
}
DeleteCriticalSection(&sync);
}
void sharedModify(unsigned index, const T &datum)
{
SyncLocker lock(sync);
data[index] = datum;
}
Shareable& operator=(const Shareable&) = delete;
Shareable& operator=(Shareable&&) = delete;
};
Is there some other synchronization mechanism that is better suited to this scenario?
That depends. Will multiple threads be accessing the same index at the same time? If not, then there is not really a need for the critical section at all. One thread can safely access one index while another thread accesses a different index.
If multiple threads need to access the same index at the same time, a critical section might still not be the best choice. Locking the entire array might be a big bottleneck if you only need to lock portions of the array at a time. Things like the Interlocked API, or Slim Read/Write locks, might make more sense. It really depends on your thread designs and what you are actually trying to protect.

Is this inter-thread object sharing strategy sound?

I'm trying to come up with a fast way of solving the following problem:
I have a thread which produces data, and several threads which consume it. I don't need to queue produced data, because data is produced much more slowly than it is consumed (and even if this failed to be the case occasionally, it wouldn't be a problem if a data point were skipped occasionally). So, basically, I have an object that encapsulates the "most recent state", which only the producer thread is allowed to update.
My strategy is as follows (please let me know if I'm completely off my rocker):
I've created three classes for this example: Thing (the actual state object), SharedObject<Thing> (an object that can be local to each thread, and gives that thread access to the underlying Thing), and SharedObjectManager<Thing>, which wraps up a shared_ptr along with a mutex.
The instance of the SharedObjectManager (SOM) is a global variable.
When the producer starts, it instantiates a Thing, and tells the global SOM about it. It then makes a copy, and does all of it's updating work on that copy. When it is ready to commit it's changes to the Thing, it passes the new Thing to the global SOM, which locks it's mutex, updates the shared pointer it keeps, and then releases the lock.
Meanwhile, the consumer threads all intsantiate SharedObject<Thing>. these objects each keep a pointer to the global SOM, as well as a cached copy of the shared_ptr kept by the SOM... It keeps this cached until update() is explicitly called.
I believe this is getting hard to follow, so here's some code:
#include <mutex>
#include <iostream>
#include <memory>
class Thing
{
private:
int _some_member = 10;
public:
int some_member() const { return _some_member; }
void some_member(int val) {_some_member = val; }
};
// one global instance
template<typename T>
class SharedObjectManager
{
private:
std::shared_ptr<T> objPtr;
std::mutex objLock;
public:
std::shared_ptr<T> get_sptr()
{
std::lock_guard<std::mutex> lck(objLock);
return objPtr;
}
void commit_new_object(std::shared_ptr<T> new_object)
{
std::lock_guard<std::mutex> lck (objLock);
objPtr = new_object;
}
};
// one instance per consumer thread.
template<typename T>
class SharedObject
{
private:
SharedObjectManager<T> * som;
std::shared_ptr<T> cache;
public:
SharedObject(SharedObjectManager<T> * backend) : som(backend)
{update();}
void update()
{
cache = som->get_sptr();
}
T & operator *()
{
return *cache;
}
T * operator->()
{
return cache.get();
}
};
// no actual threads in this test, just a quick sanity check.
SharedObjectManager<Thing> glbSOM;
int main(void)
{
glbSOM.commit_new_object(std::make_shared<Thing>());
SharedObject<Thing> myobj(&glbSOM);
std::cout<<myobj->some_member()<<std::endl;
// prints "10".
}
The idea for use by the producer thread is:
// initialization - on startup
auto firstStateObj = std::make_shared<Thing>();
glbSOM.commit_new_object(firstStateObj);
// main loop
while (1)
{
// invoke copy constructor to copy the current live Thing object
auto nextState = std::make_shared<Thing>(*(glbSOM.get_sptr()));
// do stuff to nextState, gradually filling out it's new value
// based on incoming data from other sources, etc.
...
// commit the changes to the shared memory location
glbSOM.commit_new_object(nextState);
}
The use by consumers would be:
SharedObject<Thing> thing(&glbSOM);
while(1)
{
// think about the data contained in thing, and act accordingly...
doStuffWith(thing->some_member());
// re-cache the thing
thing.update();
}
Thanks!
That is way overengineered. Instead, I'd suggest to do following:
Create a pointer to Thing* theThing together with protection mutex. Either a global one, or shared by some other means. Initialize it to nullptr.
In your producer: use two local objects of Thing type - Thing thingOne and Thing thingTwo (remember, thingOne is no better than thingTwo, but one is called thingOne for a reason, but this is a thing thing. Watch out for cats.). Start with populating thingOne. When done, lock the mutex, copy thingOne address to theThing, unlock the mutex. Start populating thingTwo. When done, see above. Repeat untill killed.
In every listener: (make sure the pointer is not nullptr). Lock the mutex. Make a copy of the object pointed two by the theThing. Unlock the mutex. Work with your copy. Burn after reading. Repeat untill killed.

How to release heap memory of thread local storage

I have a structure used for thread local storage like this:
namespace {
typedef boost::unordered_map< std::string, std::vector<xxx> > YYY;
boost::thread_specific_ptr<YYY> cache;
void initCache() {
//The first time called by the current thread.
if (!cache.get()){
cache.reset(new YYY());
}
}
void clearCache() {
if (cache.get()){
cache.reset();
}
}
}
And a class whose object could have been created by the main thread:
class A {
public:
void f() {
initCache();
//and for example:
insertIntoCache();
}
~A(){
clearCache();// <-- Does/Can this do anything good ??
}
}
Multiple threads can access object(s) of A stored, for example, in a global container. Each of these threads need to call A::f() from time to time. So they create their own copy of cache on the heap once , and finally join when they done with all their jobs.
So the question is : who is going to clean-up threads' memory? and How?
Thank you
There's no reason to call clearCache().
Once the thread exits or the thread_specific_ptr goes out of scope, the cleanup function will be invoked. If you don't pass a cleanup function to the thread_specific_ptr's constructor, it will just use delete.

how to insert vector only once in multiple thread

I have below code snippet.
std::vector<int> g_vec;
void func()
{
//I add double check to avoid thread need lock every time.
if(g_vec.empty())
{
//lock
if(g_vec.empty())
{
//insert items into g_vec
}
//unlock
}
...
}
func will be called by multiple thread, and I want g_vec will be inserted items only once which is a bit similar as singleton instance. And about singleton instance, I found there is a DCLP issue.
Question:
1. My above code snippet is thread safe, is it has DCLP issue?
2. If not thread safe, how to modify it?
Your code has a data race.
The first check outside the lock is not synchronized with the insertion inside the lock. That means, you may end up with one thread reading the vector (through .empty()) while another thread is writing the vector (through .insert()), which is by definition a data race and leads to undefined behavior.
A solution for exactly this kind of problem is given by the standard in form of call_once.
#include<mutex>
std::vector<int> g_vec;
std::once_flag g_flag;
void func()
{
std::call_once(g_flag, [&g_vec](){ g_vec.insert( ... ); });
}
In your example, it could happen that second reentrant thread will find a non empty half initialized vector, that it's something that you won`t want anyway. You should use a flag, and mark it when initialization job is completed. Better a standard one, but a simple static int will do the job as well
std::vector<int> g_vec;
void func()
{
//I add double check to avoid thread need lock every time.
static int called = 0;
if(!called)
{
lock()
if(!called)
{
//insert items into g_vec
called = 1;
}
unlock()
}
...
}

Read-write thread-safe smart pointer in C++, x86-64

I develop some lock free data structure and following problem arises.
I have writer thread that creates objects on heap and wraps them in smart pointer with reference counter. I also have a lot of reader threads, that work with these objects. Code can look like this:
SmartPtr ptr;
class Reader : public Thread {
virtual void Run {
for (;;) {
SmartPtr local(ptr);
// do smth
}
}
};
class Writer : public Thread {
virtual void Run {
for (;;) {
SmartPtr newPtr(new Object);
ptr = newPtr;
}
}
};
int main() {
Pool* pool = SystemThreadPool();
pool->Run(new Reader());
pool->Run(new Writer());
for (;;) // wait for crash :(
}
When I create thread-local copy of ptr it means at least
Read an address.
Increment reference counter.
I can't do these two operations atomically and thus sometimes my readers work with deleted object.
The question is - what kind of smart pointer should I use to make read-write access from several threads with correct memory management possible? Solution should exist, since Java programmers don't even care about such a problem, simply relying on that all objects are references and are deleted only when nobody uses them.
For PowerPC I found http://drdobbs.com/184401888, looks nice, but uses Load-Linked and Store-Conditional instructions, that we don't have in x86.
As far I as I understand, boost pointers provide such functionality only using locks. I need lock free solution.
boost::shared_ptr have atomic_store which uses a "lock-free" spinlock which should be fast enough for 99% of possible cases.
boost::shared_ptr<Object> ptr;
class Reader : public Thread {
virtual void Run {
for (;;) {
boost::shared_ptr<Object> local(boost::atomic_load(&ptr));
// do smth
}
}
};
class Writer : public Thread {
virtual void Run {
for (;;) {
boost::shared_ptr<Object> newPtr(new Object);
boost::atomic_store(&ptr, newPtr);
}
}
};
int main() {
Pool* pool = SystemThreadPool();
pool->Run(new Reader());
pool->Run(new Writer());
for (;;)
}
EDIT:
In response to comment below, the implementation is in "boost/shared_ptr.hpp"...
template<class T> void atomic_store( shared_ptr<T> * p, shared_ptr<T> r )
{
boost::detail::spinlock_pool<2>::scoped_lock lock( p );
p->swap( r );
}
template<class T> shared_ptr<T> atomic_exchange( shared_ptr<T> * p, shared_ptr<T> r )
{
boost::detail::spinlock & sp = boost::detail::spinlock_pool<2>::spinlock_for( p );
sp.lock();
p->swap( r );
sp.unlock();
return r; // return std::move( r )
}
With some jiggery-pokery you should be able to accomplish this using InterlockedCompareExchange128. Store the reference count and pointer in a 2 element __int64 array. If reference count is in array[0] and pointer in array[1] the atomic update would look like this:
while(true)
{
__int64 comparand[2];
comparand[0] = refCount;
comparand[1] = pointer;
if(1 == InterlockedCompareExchange128(
array,
pointer,
refCount + 1,
comparand))
{
// Pointer is ready for use. Exit the while loop.
}
}
If an InterlockedCompareExchange128 intrinsic function isn't available for your compiler then you may use the underlying CMPXCHG16B instruction instead, if you don't mind mucking around in assembly language.
The solution proposed by RobH doesn't work. It has the same problem as the original question: when accessing the reference count object, it might already have been deleted.
The only way I see of solving the problem without a global lock (as in boost::atomic_store) or conditional read/write instructions is to somehow delay the destruction of the object (or the shared reference count object if such thing is used). So zennehoy has a good idea but his method is too unsafe.
The way I might do it is by keeping copies of all the pointers in the writer thread so that the writer can control the destruction of the objects:
class Writer : public Thread {
virtual void Run() {
list<SmartPtr> ptrs; //list that holds all the old ptr values
for (;;) {
SmartPtr newPtr(new Object);
if(ptr)
ptrs.push_back(ptr); //push previous pointer into the list
ptr = newPtr;
//Periodically go through the list and destroy objects that are not
//referenced by other threads
for(auto it=ptrs.begin(); it!=ptrs.end(); )
if(it->refCount()==1)
it = ptrs.erase(it);
else
++it;
}
}
};
However there are still requirements for the smart pointer class. This doesn't work with shared_ptr as the reads and writes are not atomic. It almost works with boost::intrusive_ptr. The assignment on intrusive_ptr is implemented like this (pseudocode):
//create temporary from rhs
tmp.ptr = rhs.ptr;
if(tmp.ptr)
intrusive_ptr_add_ref(tmp.ptr);
//swap(tmp,lhs)
T* x = lhs.ptr;
lhs.ptr = tmp.ptr;
tmp.ptr = x;
//destroy temporary
if(tmp.ptr)
intrusive_ptr_release(tmp.ptr);
As far as I understand the only thing missing here is a compiler level memory fence before lhs.ptr = tmp.ptr;. With that added, both reading rhs and writing lhs would be thread-safe under strict conditions: 1) x86 or x64 architecture 2) atomic reference counting 3) rhs refcount must not go to zero during the assignment (guaranteed by the Writer code above) 4) only one thread writing to lhs (using CAS you could have several writers).
Anyway, you could create your own smart pointer class based on intrusive_ptr with necessary changes. Definitely easier than re-implementing shared_ptr. And besides, if you want performance, intrusive is the way to go.
The reason this works much more easily in java is garbage collection. In C++, you have to manually ensure that a value is not just starting to be used by a different thread when you want to delete it.
A solution I've used in a similar situation is to simply delay the deletion of the value. I create a separate thread that iterates through a list of things to be deleted. When I want to delete something, I add it to this list with a timestamp. The deleting thread waits until some fixed time after this timestamp before actually deleting the value. You just have to make sure that the delay is large enough to guarantee that any temporary use of the value has completed.
100 milliseconds would have been enough in my case, I chose a few seconds to be safe.