How to use `boost::thread_specific_ptr` with `for_each` - c++

In the code below, Bar is supposed to model a thread-unsafe object that is moderately expensive to create. Foo contains a Bar and is multi-threaded, so it uses a thread_specific_ptr<Bar> to make a per-thread Bar that can be re-used across multiple calls to loop for the same Foo (therefore amortizing the cost of creating a Bar for each thread). Foo always creates a Bar with the same num, so the sanity check is supposed to always pass, yet it fails.
The reason for this is (I think) explained in the requirement for the thread_specific_ptr destructor:
All the thread specific instances associated to this thread_specific_ptr (except maybe the one associated to this thread) must be null.
So the problem is caused by a combination of three things:
Bar objects created in worker threads are not cleaned up when Foos thread_specific_ptr is cleaned up, and are therefore persisted across iterations of the loop in main (essentially, a memory leak)
The C++ runtime is re-using threads in for_each between iterations of the loop in main
The C++ runtime is re-allocating each Foo in the main loop to the same memory address
The way that thread_specific_ptrs are indexed (by the thread_specific_ptr's memory address and the thread ID) results in old Bars being accidentally reused. I understand the problem; what I don't understand is what to do about it. Note the remark from the docs:
The requirement is due to the fact that in order to delete all these instances, the implementation should be forced to maintain a list of all the threads having an associated specific ptr, which is against the goal of thread specific data.
I'd like to avoid this complexity as well.
How can I use for_each for simple thread management, but also avoid the memory leak? Solution requirements:
It should only create one Bar per thread per Foo (i.e., don't create a new Bar inside the for_each)
Assume Bar is not thread-safe.
If possible, use for_each to make the parallel loop as simple as possible
The loop should actually run in parallel (i.e., no mutex around a single Bar)
Bar objects created by loop should be available for use until the Foo object that created them is destructed, at which point all Bar objects should also be destructed.
The following code compiles and should exit with return code 1 with high probability on a machine with sufficient cores.
#include <boost/thread/tss.hpp>
#include <execution>
#include <iostream>
#include <vector>
using namespace std;
class Bar {
public:
// models a thread-unsafe object
explicit Bar(int i) : num(i) { }
int num;
};
class Foo {
public:
explicit Foo(int i) : num(i) { }
void loop() {
vector<int> idxs(32);
iota(begin(idxs), end(idxs), 0);
for_each(__pstl::execution::par, begin(idxs), end(idxs), [&](int) {
if (ptr.get() == nullptr) {
// no `Bar` exists for this thread yet, so create one
Bar *tmp = new Bar(num);
ptr.reset(tmp);
}
// Get the thread-local Bar
Bar &b = *ptr;
// Sanity check: we ALWAYS create a `Bar` with the same num as `Foo`;
// see the `if` block above.
// Therefore, this condition shouldn't ever be true (but it is!)
if (b.num != num) {
cout << "NOT THREAD SAFE: Foo index is " << num << ", but Bar index is " << b.num << endl;
exit(1);
}
});
}
boost::thread_specific_ptr<Bar> ptr;
int num;
};
int main() {
for(int i = 0; i < 100; i++) {
Foo f(i);
f.loop();
}
return 0;
}

According to the documentation
~thread_specific_ptr();
Requires:
All the thread specific instances associated to this thread_specific_ptr
(except maybe the one associated to this thread) must be null.
This means you are not allowed to destroy Foo until all of its Bar have been destroyed. This is a problem because execution_policy::par does not have to operate on a fresh thread pool, nor does it have to terminate the threads once the for_each() is done.
This gives us enough to answer the question as asked: You can only use thread_specific_ptr alongside execution::par to share data between various iterations on the same thread if:
The thread_specific_ptr is never destroyed. This is required because there is no way to know whether a given iteration of the for_each will be the last one for its assigned thread, and that thread might never get scheduled again.
You are comfortable leaking one instance of the pointed object per thread until the end of the program.
What's going on in your code
We are already in Undefined Behavior land, but the behavior you are seeing can still be explained a bit further. Considering that:
Boost.Thread uses the address of the thread_specific_ptr instance as key of the thread specific pointers. This avoids to create/destroy a key which will need a lock to protect from race conditions. This has a little performance liability, as the access must be done using an associative container.
... and that all 100 instances of Foo will most likely be at the same place in memory, you end up seeing instances of Bar from the previous Foo when the worker threads are recycled, leading to your (innacurate, see below) check to hit.
Solution: What I think you should do
I would suggest you just drop thread_specific_ptr altogether and manually manage the pool of per-thread/per-Foo Bar instances with an associative container, this makes managing the lifetime of the Bar objects a lot more straightforward:
class per_thread_bar_pool {
std::map<std::thread::id, Bar> bars_;
// alternatively:
// std::map<std::thread::id, std::unique_ptr<Bar>> bars_;
std::mutex mtx_;
public:
Bar& get(int num) {
auto tid = std::this_thread::get_id();
std::unique_lock l{mtx_};
auto found = bars_.find(tid);
if(found == bars_.end()) {
l.unlock(); // Let other threads access the map while `Bar` is being built.
Bar new_bar(num);
// auto new_bar = std::make_unique<Bar>(num);
l.lock();
assert(bars_.find(tid) == bars_.end());
found = bars_.emplace(tid, std::move(new_bar)).first;
}
return found->second;
// return *found->second;
}
};
void loop() {
per_thread_bar_pool bars;
vector<int> idxs(32);
iota(begin(idxs), end(idxs), 0);
for_each(__pstl::execution::par, begin(idxs), end(idxs), [&](int) {
Bar& current_bar = bars.get(num);
// ...
}
}
thread_specific_ptr already uses std::map<> under the hood (it maintains one per thread). So introducing one here is not that big of a deal.
We do introduce a mutex, but it only comes into play for a simple lookup/insertion into a map, and since constructing Bar is supposed to be so expensive, it will most likely have very little impact. It also has the benefit that multiple instances of Foo do not interact with each other anymore, so you avoid surprising bugs that could occur if you ever end up calling foo::loop() from multiple threads.
N.B.: if (b.num != num) { is not a valid test since all instances of Bar from a given Foo share the same num. That should only cause false-negatives though.
Solution: Making your code work (almost)
All this being said, if you are absolutely gung-ho about using thread_specific_pointer and execution::par at the same time you'll have to do the following:
void loop() {
static boost::thread_specific_ptr<Bar> ptr; // lives till the end of the program
vector<int> idxs(32);
iota(begin(idxs), end(idxs), 0);
for_each(__pstl::execution::par, begin(idxs), end(idxs), [&](int) {
if (ptr.get() == nullptr || ptr->num != num) {
// no `Bar` exists for this thread yet, or it's from a previous run
Bar *tmp = new Bar(num);
ptr.reset(tmp);
}
// Get the thread-local Bar
Bar &b = *ptr;
});
However, this will leak up to 1 Bar per thread, as cleanup only ever happens when we try to reuse a Bar from a previous run. There are no ways around this.

Related

C++ Destruction Dependencies

I have a C++ class that needs to track resource usage, and when the last instance referencing the specific resource is destructed, the resource must be released. Due to the nature of how this resource can be acquired by multiple different objects (and not just through a copy / move constructor), I've had to implement my own reference tracking.
This reference tracking works great with very little performance hit, but when I started adding threading, I had to add a critical section to guard accesses to the reference counting structure. The critical section is a static member of the class that uses it. This works fine until the process begins exiting, and it's time for everything's destructor to be called.
What's happening is that the critical section's destructor (which calls DeleteCriticalSection) is being called before the last destructor of my objects. The result is that I'm stuck with either possible race conditions on my reference counter or a crash from trying to enter an invalid critical section.
These objects have a clear dependency on this critical section, and I don't see a great way to prevent it from being destroyed until the last object is gone. My thinking is to change the critical section from a static member to a std::shared_ptr<CriticalSection> belonging to each instance of the class, but that seems like it'd have an unnecessary performance hit.
Is there some other way to outline this dependency? Or is there a better way to do what I'm trying to do without a need for this dependency in the first place?
EDIT: To be clear, I tried using std::shared_ptr to handle reference tracking. Unfortunately, that doesn't work. Here's a trivialized example of how it causes issues.
Object Get(){
Object o1{ GetResourceIdentifier(3) };
Object o2{ o1 }; // Copy constructor
Object o3{ GetResourceIdentifier(3) };
return o3;
}
void main(){
auto test{ Get() };
test.DoStuff();
}
Necessarily, when an object is instantiated, it will just open the same resource if it's already open. So o1, o2, and o3 will all refer to the same underlying resource in this example. But with std::shared_ptr, when Get returns, the shared pointer that o1 and o2 have will think that there are no references left and release the resource. Unfortunately, since o3 refers to the same resource, its resource also gets freed here, meaning the call to DoStuff will go awry.
If you'd like to see the actual code (file is rather large), the source is here and the header is here
If I understand your use case can use shared_ptr, but the trick is to store weak_ptrs, otherwise your resource will not be freed until the end of the program:
struct Resource
{
int id;
Resource(int id):id{id}{};
~Resource() { std::cout << "~" << id << std::endl; }
auto foo() { std::cout << "foo" << id << std::endl; }
};
std::shared_ptr<Resource> get_resource(int resource_id)
{
static std::mutex mutex{};
static std::unordered_map<int, std::weak_ptr<Resource>> resource_map{};
std::scoped_lock lock{mutex};
auto& weak = resource_map[resource_id];
if (!weak.lock())
{
auto shared = std::make_shared<Resource>(resource_id);
weak = shared;
return shared;
}
return weak.lock();
}
This works as follows: When request a resource i
if there isn't any resource i with active owners: creates a new resource, returns a new shared_ptr and stores a weak_ptr to it
if there is a resource i with active owners returns a new share_ptr to it
A resource i:
will be created on the first get_resource(i) call
will be deleted once there are no more owners to it (get_resource doesn't hold owners)
once deleted a new resource will be recreated on the next get_resource(i) call
there will never be more than 1 resources i at the same time
This seems to work, but be advised I have done only summary testing:
auto test()
{
auto r0 = get_resource(1);
auto r1 = get_resource(24);
auto r2 = r1;
auto r3 = get_resource(24);
return r3;
}
int main()
{
auto r = test();
r->foo();
}
Outputs
~1
foo24
~24

Is this inter-thread object sharing strategy sound?

I'm trying to come up with a fast way of solving the following problem:
I have a thread which produces data, and several threads which consume it. I don't need to queue produced data, because data is produced much more slowly than it is consumed (and even if this failed to be the case occasionally, it wouldn't be a problem if a data point were skipped occasionally). So, basically, I have an object that encapsulates the "most recent state", which only the producer thread is allowed to update.
My strategy is as follows (please let me know if I'm completely off my rocker):
I've created three classes for this example: Thing (the actual state object), SharedObject<Thing> (an object that can be local to each thread, and gives that thread access to the underlying Thing), and SharedObjectManager<Thing>, which wraps up a shared_ptr along with a mutex.
The instance of the SharedObjectManager (SOM) is a global variable.
When the producer starts, it instantiates a Thing, and tells the global SOM about it. It then makes a copy, and does all of it's updating work on that copy. When it is ready to commit it's changes to the Thing, it passes the new Thing to the global SOM, which locks it's mutex, updates the shared pointer it keeps, and then releases the lock.
Meanwhile, the consumer threads all intsantiate SharedObject<Thing>. these objects each keep a pointer to the global SOM, as well as a cached copy of the shared_ptr kept by the SOM... It keeps this cached until update() is explicitly called.
I believe this is getting hard to follow, so here's some code:
#include <mutex>
#include <iostream>
#include <memory>
class Thing
{
private:
int _some_member = 10;
public:
int some_member() const { return _some_member; }
void some_member(int val) {_some_member = val; }
};
// one global instance
template<typename T>
class SharedObjectManager
{
private:
std::shared_ptr<T> objPtr;
std::mutex objLock;
public:
std::shared_ptr<T> get_sptr()
{
std::lock_guard<std::mutex> lck(objLock);
return objPtr;
}
void commit_new_object(std::shared_ptr<T> new_object)
{
std::lock_guard<std::mutex> lck (objLock);
objPtr = new_object;
}
};
// one instance per consumer thread.
template<typename T>
class SharedObject
{
private:
SharedObjectManager<T> * som;
std::shared_ptr<T> cache;
public:
SharedObject(SharedObjectManager<T> * backend) : som(backend)
{update();}
void update()
{
cache = som->get_sptr();
}
T & operator *()
{
return *cache;
}
T * operator->()
{
return cache.get();
}
};
// no actual threads in this test, just a quick sanity check.
SharedObjectManager<Thing> glbSOM;
int main(void)
{
glbSOM.commit_new_object(std::make_shared<Thing>());
SharedObject<Thing> myobj(&glbSOM);
std::cout<<myobj->some_member()<<std::endl;
// prints "10".
}
The idea for use by the producer thread is:
// initialization - on startup
auto firstStateObj = std::make_shared<Thing>();
glbSOM.commit_new_object(firstStateObj);
// main loop
while (1)
{
// invoke copy constructor to copy the current live Thing object
auto nextState = std::make_shared<Thing>(*(glbSOM.get_sptr()));
// do stuff to nextState, gradually filling out it's new value
// based on incoming data from other sources, etc.
...
// commit the changes to the shared memory location
glbSOM.commit_new_object(nextState);
}
The use by consumers would be:
SharedObject<Thing> thing(&glbSOM);
while(1)
{
// think about the data contained in thing, and act accordingly...
doStuffWith(thing->some_member());
// re-cache the thing
thing.update();
}
Thanks!
That is way overengineered. Instead, I'd suggest to do following:
Create a pointer to Thing* theThing together with protection mutex. Either a global one, or shared by some other means. Initialize it to nullptr.
In your producer: use two local objects of Thing type - Thing thingOne and Thing thingTwo (remember, thingOne is no better than thingTwo, but one is called thingOne for a reason, but this is a thing thing. Watch out for cats.). Start with populating thingOne. When done, lock the mutex, copy thingOne address to theThing, unlock the mutex. Start populating thingTwo. When done, see above. Repeat untill killed.
In every listener: (make sure the pointer is not nullptr). Lock the mutex. Make a copy of the object pointed two by the theThing. Unlock the mutex. Work with your copy. Burn after reading. Repeat untill killed.

Non blocking way of adding a work item to array or list

Edit:
I now have finished my queue (overcoming the problem described below, and more). For those interested it can be found here. I'd be happy to hear any remarks:). Please note the queue isn't just a work item queue, but rather a template container which of course could be instantiated with work items.
Original:
After watching Herb Sutter's talk on concurrency in C++11 and 14 I got all excited about non blocking concurrency.
However, I've not yet been able to find a solution for what I considered a basic problem. So if this is already on here, please be gentile with me.
My problem is quite simple. I'm creating a very simple threadpool. In order to do this I've got some worker threads running inside the workPool class. And I keep a list of workItems.
How do I add a work item in a lock free way.
The non lock free way of doing this would of course be to create a mutex. Lock it if you add an item and read(and lock of course) the list once the current work item is done.
I do not know how to do this in an lock free way however.
Below a rough idea of what I'm creating. This code I've written for this question. And It's neither complete, nor error less:)
#include <thread>
#include <deque>
#include <vector>
class workPool
{
public:
workPool(int workerCount) :
running(1)
{
for (int i = workerCount; i > 0; --i)
workers.push_back(std::thread(&workPool::doWork, this));
}
~workPool()
{
running = 0;
}
private:
bool running;
std::vector< std::thread > workers;
std::deque< std::function<void()> > workItems;
void doWork()
{
while (running)
{
(*workItems.begin())();
workItems.erase(workItems.begin());
if (!workItems.size())
//here the thread should be paused till a new item is added
}
}
void addWorkitem()
{
//This is my confusion. How should I do this?
}
};
I have seen Herb's talks recently and I believe his lock-free linked list should do fine. The only problem is that atomic< shared_ptr<T> > is not yet implemented. I've used the atomic_* function calls as also explained by Herb in his talk.
In the example, I've simplified a task to an int, but it could be anything you want.
The function atomic_compare_exchange_weak takes three arguments: the item to compare, the expected value and the desired value. It returns true or false to indicate success or failure. On failure, the expected value will be changed to the value that was found instead.
#include <memory>
#include <atomic>
// Untested code.
struct WorkItem { // Simple linked list implementation.
int work;
shared_ptr<WorkItem> next; // remember to use as atomic
};
class WorkList {
shared_ptr<WorkItem> head; // remember to use as atomic
public:
// Used by producers to add work to the list. This implementation adds
// new items to the front (stack), but it can easily be changed to a queue.
void push_work(int work) {
shared_ptr<WorkItem> p(new WorkItem()); // The new item we want to add.
p->work = work;
p->next = head;
// Do we get to change head to p?
while (!atomic_compare_exchange_weak(&head, &p->next, p)) {
// Nope, someone got there first, try again with the new p->next,
// and remember: p->next is automatically changed to the new value of head.
}
// Yup, great! Everything's done then.
}
// Used by consumers to claim items to process.
int pop_work() {
auto p = atomic_load(&head); // The item we want to process.
int work = (p ? p->work : -1);
// Do we get to change head to p->next?
while (p && !atomic_compare_exchange_weak(&head, &p, p->next)) {
// Nope, someone got there first, try again with the new p,
// and remember: p is automatically changed to the new value of head.
work = (p ? p->work : -1); // Make sure to update work as well!
}
// Yup, great! Everything's done then, return the new task.
return work; // Returns -1 if list is empty.
}
};
Edit: The reason for using shared_ptr in combination with atomic_* functions is explained in the talk. In a nutshell: popping an item from the linked list might delete it from underneath someone traversing the list, or a different node might get allocated on the same memory address (The ABA Problem). Using shared_ptr will ensure any old readers will hold a valid reference to the original item.
As Herb explained, this makes the pop-function trivial to implement.
Lock free in this kind of context where you have a shared resource (a work queue) is often going to be replaced by atomics and a CAS loop if you really dig deep.
The basic idea is rather simple to get a lock-free concurrent stack (edit: though perhaps a bit deceptively tricky as I made a goof in my first post -- all the more reason to appreciate a good lib). I chose a stack for simplicity but it doesn't take much more to use a queue instead.
Writing to the stack:
Create a new work item.
Loop Repeatedly:
Store the top pointer to the stack.
Set the work item's next pointer to the top of the stack.
Atomic: Compare and swap the top pointer with the pointer to the work item.
If this succeeds and returns the top pointer we stored, break out
of the loop.
Popping from the stack:
Loop:
Fetch top pointer.
If top pointer is not null:
Atomic: CAS top pointer with next pointer.
If successful, break.
Else:
(Optional) Sleep/Yield to avoid burning cycles.
Process the item pointed to by the previous top pointer.
Now if you get really elaborate, you can stick in other work for the thread to do when a push or pop fails, e.g.
I do not know how to do this in C++ 11 (or later); however, here is a solution for how to do it with C++ 98 and `boost (v1.50):
This is obviously not a very useful example, it's only for demonstrative purposes:
#include <boost/scoped_ptr.hpp>
#include <boost/function.hpp>
#include <boost/asio/io_service.hpp>
#include <boost/thread.hpp>
class WorkHandler
{
public:
WorkHandler();
~WorkHandler();
typedef boost::function<void(void)> Work; // the type of work we can handle
void AddWork(Work w) { pThreadProcessing->post(w); }
private:
void ProcessWork();
boost::scoped_ptr<boost::asio::io_service> pThreadProcessing;
boost::thread thread;
bool runThread; // Make sure this is atomic
};
WorkHandler::WorkHandler()
: pThreadProcessing(new boost::asio::io_service), // create our io service
thread(&WorkHandler::ProcessWork, this), // create our thread
runThread(true) // run the thread
{
}
WorkHandler::~WorkHandler()
{
runThread = false; // stop running the thread
thread.join(); // wait for the thread to finish
}
void WorkHandler::ProcessWork()
{
while (runThread) // while the thread is running
{
pThreadProcessing->run(); // process work
pThreadProcessing->reset(); // prepare for more work
}
}
int CalculateSomething(int a, int b)
{
return a + b;
}
int main()
{
WorkHandler wh; // create a work handler
// give it some work to do
wh.AddWork(boost::bind(&CalculateSomething, 4, 5));
wh.AddWork(boost::bind(&CalculateSomething, 10, 100));
wh.AddWork(boost::bind(&CalculateSomething, 35, -1));
Sleep(2000); // ONLY for demonstration! This just allows the thread a chance to work before we destroy it.
return 0;
}
boost::asio::io_service is thread-safe, so you can post work to it without needing mutexes.
NB: Although I haven't made the bool runThread atomic, for thread-safety it should be (I just don't have atomic in my c++)

Add and remove from a list in runtime

I have a simulation program. In the main class of the simulation I am "creating + adding" and "removing + destroying" Agents.
The problem is that once in a while (once every 3-4 times I run the program) the program crashes because I am apparently calling a function of an invalid agent in the main loop. The program works just fine most of the time. There are normally thousands of agents in the list.
I don't know how is it possible that I have invalid Agents in my Loop.
It is very difficult to debug the code because I receive the memory exception inside the "Agent::Step function" (which is too late because I cannot understand how was the invalid Agent in the list and got called).
When I look into the Agent reference inside the Agent::Step function (exception point) no data in the agent makes sense, not even the initialized data. So it is definitely invalid.
void World::step()
{
AddDemand();
// run over all the agents and check whether they have remaining actions
// Call their step function if they have, otherwise remove them from space and memory
list<Agent*>::iterator it = agents_.begin();
while (it != agents_.end())
{
if (!(*it)->AllIntentionsFinished())
{
(*it)->step();
it++;
}
else
{
(*it)->removeYourselfFromSpace(); //removes its reference from the space
delete (*it);
agents_.erase(it++);
}
}
}
void World::AddDemand()
{
int demand = demandIdentifier_.getDemandLevel(stepCounter_);
for (int i = 0; i < demand; i++)
{
Agent* tmp = new Agent(*this);
agents_.push_back(tmp);
}
}
Agent:
bool Agent::AllIntentionsFinished()
{
return this->allIntentionsFinished_; //bool flag will be true if all work is done
}
1- Is it possible that VStudio 2012 optimization of Loops (i.e. running in multi-thread if possible) creates the problem?
2- Any suggestions on debugging the code?
If you're running the code multi-threaded, then you'll need to add code to protect things like adding items to and removing items from the list. You can create a wrapper that adds thread safety for a container fairly easily -- have a mutex that you lock any time you do a potentially modifying operation on the underlying container.
template <class Container>
thread_safe {
Container c;
std::mutex m;
public:
void push_back(typename Container::value_type const &t) {
std::lock_guard l(m);
c.push_back(t);
}
// ...
};
A few other points:
You can almost certainly clean your code up quite a bit by having the list hold Agents directly, instead of a pointer to an Agent that you have to allocate dynamically.
Your Agent::RemoveYourselfFromSpace looks/sounds a lot like something that should be handled by Agent's destructor.
You can almost certainly do quite a bit more to clean up the code by using some standard algorithms.
For example, it looks to me like your step could be written something like this:
agents.remove_if([](Agent const &a) { return a.AllIntentionsFinished(); });
std::for_each(agents.begin(), agents.end(),
[](Agent &a) { a.step(); });
...or, you might prefer to continue using an explicit loop, but use something like:
for (Agent & a : agents)
a.step();
The problem is this:
agents_.erase(it++);
See Add and remove from a list in runtime
I don't see any thread-safe components in the code you showed, so if you are running multiple threads and sharing data between them, then absolutely you could have a threading issue. For instance, you do this:
(*it)->removeYourselfFromSpace(); //removes its reference from the space
delete (*it);
agents_.erase(it++);
This is the worst possible order for an unlocked list. You should: remove from the list, destruct object, delete object, in that order.
But if you are not specifically creating threads which share lists/agents, then threading is probably not your problem.

Need some advice to make the code multithreaded

I received a code that is not for multi-threaded app, now I have to modify the code to support for multi-threaded.
I have a Singleton class(MyCenterSigltonClass) that based on instruction in:
http://en.wikipedia.org/wiki/Singleton_pattern
I made it thread-safe
Now I see inside the class that contains 10-12 members, some with getter/setter methods.
Some members are declared as static and are class pointer like:
static Class_A* f_static_member_a;
static Class_B* f_static_member_b;
for these members, I defined a mutex(like mutex_a) INSIDE the class(Class_A) , I didn't add the mutex directly in my MyCenterSigltonClass, the reason is they are one to one association with my MyCenterSigltonClass, I think I have option to define mutex in the class(MyCenterSigltonClass) or (Class_A) for f_static_member_a.
1) Am I right?
Also, my Singleton class(MyCenterSigltonClass) contains some other members like
Class_C f_classC;
for these kind of member variables, should I define a mutex for each of them in MyCenterSigltonClass to make them thread-safe? what would be a good way to handle these cases?
Appreciate for any suggestion.
-Nima
Whether the members are static or not doesn't really matter. How you protect the member variables really depends on how they are accessed from public methods.
You should think about a mutex as a lock that protects some resource from concurrent read/write access. You don't need to think about protecting the internal class objects necessarily, but the resources within them. You also need to consider the scope of the locks you'll be using, especially if the code wasn't originally designed to be multithreaded. Let me give a few simple examples.
class A
{
private:
int mValuesCount;
int* mValues;
public:
A(int count, int* values)
{
mValuesCount = count;
mValues = (count > 0) ? new int[count] : NULL;
if (mValues)
{
memcpy(mValues, values, count * sizeof(int));
}
}
int getValues(int count, int* values) const
{
if (mValues && values)
{
memcpy(values, mValues, (count < mValuesCount) ? count : mValuesCount);
}
return mValuesCount;
}
};
class B
{
private:
A* mA;
public:
B()
{
int values[5] = { 1, 2, 3, 4, 5 };
mA = new A(5, values);
}
const A* getA() const { return mA; }
};
In this code, there's no need to protect mA because there's no chance of conflicting access across multiple threads. None of the threads can modify the state of mA, so all concurrent access just reads from mA. However, if we modify class A:
class A
{
private:
int mValuesCount;
int* mValues;
public:
A(int count, int* values)
{
mValuesCount = 0;
mValues = NULL;
setValues(count, values);
}
int getValues(int count, int* values) const
{
if (mValues && values)
{
memcpy(values, mValues, (count < mValuesCount) ? count : mValuesCount);
}
return mValuesCount;
}
void setValues(int count, int* values)
{
delete [] mValues;
mValuesCount = count;
mValues = (count > 0) ? new int[count] : NULL;
if (mValues)
{
memcpy(mValues, values, count * sizeof(int));
}
}
};
We can now have multiple threads calling B::getA() and one thread can read from mA while another thread writes to mA. Consider the following thread interaction:
Thread A: a->getValues(maxCount, values);
Thread B: a->setValues(newCount, newValues);
It's possible that Thread B will delete mValues while Thread A is in the middle of copying it. In this case, you would need a mutex within class A to protect access to mValues and mValuesCount:
int getValues(int count, int* values) const
{
// TODO: Lock mutex.
if (mValues && values)
{
memcpy(values, mValues, (count < mValuesCount) ? count : mValuesCount);
}
int returnCount = mValuesCount;
// TODO: Unlock mutex.
return returnCount;
}
void setValues(int count, int* values)
{
// TODO: Lock mutex.
delete [] mValues;
mValuesCount = count;
mValues = (count > 0) ? new int[count] : NULL;
if (mValues)
{
memcpy(mValues, values, count * sizeof(int));
}
// TODO: Unlock mutex.
}
This will prevent concurrent read/write on mValues and mValuesCount. Depending on the locking mechanisms available in your environment, you may be able to use a read-only locking mechanism in getValues() to prevent multiple threads from blocking on concurrent read access.
However, you'll also need to understand the scope of the locking you need to implement if class A is more complex:
class A
{
private:
int mValuesCount;
int* mValues;
public:
A(int count, int* values)
{
mValuesCount = 0;
mValues = NULL;
setValues(count, values);
}
int getValueCount() const { return mValuesCount; }
int getValues(int count, int* values) const
{
if (mValues && values)
{
memcpy(values, mValues, (count < mValuesCount) ? count : mValuesCount);
}
return mValuesCount;
}
void setValues(int count, int* values)
{
delete [] mValues;
mValuesCount = count;
mValues = (count > 0) ? new int[count] : NULL;
if (mValues)
{
memcpy(mValues, values, count * sizeof(int));
}
}
};
In this case, you could have the following thread interaction:
Thread A: int maxCount = a->getValueCount();
Thread A: // allocate memory for "maxCount" int values
Thread B: a->setValues(newCount, newValues);
Thread A: a->getValues(maxCount, values);
Thread A has been written as though calls to getValueCount() and getValues() will be an uninterrupted operation, but Thread B has potentially changed the count in the middle of Thread A's operations. Depending on whether the new count is larger or smaller than the original count, it may take a while before you discover this problem. In this case, class A would need to be redesigned or it would need to provide some kind of transaction support so the thread using class A could block/unblock other threads:
Thread A: a->lockValues();
Thread A: int maxCount = a->getValueCount();
Thread A: // allocate memory for "maxCount" int values
Thread B: a->setValues(newCount, newValues); // Blocks until Thread A calls unlockValues()
Thread A: a->getValues(maxCount, values);
Thread A: a->unlockValues();
Thread B: // Completes call to setValues()
Since the code wasn't initially designed to be multithreaded, it's very likely you'll run into these kinds of issues where one method call uses information from an earlier call, but there was never a concern for the state of the object changing between those calls.
Now, begin to imagine what could happen if there are complex state dependencies among the objects within your singleton and multiple threads can modify the state of those internal objects. It can all become very, very messy with a large number of threads and debugging can become very difficult.
So as you try to make your singleton thread-safe, you need to look at several layers of object interactions. Some good questions to ask:
Do any of the methods on the singleton reveal internal state that may change between method calls (as in the last example I mention)?
Are any of the internal objects revealed to clients of the singleton?
If so, do any of the methods on those internal objects reveal internal state that may change between method calls?
If internal objects are revealed, do they share any resources or state dependencies?
You may not need any locking if you're just reading state from internal objects (first example). You may need to provide simple locking to prevent concurrent read/write access (second example). You may need to redesign the classes or provide clients with the ability to lock object state (third example). Or you may need to implement more complex locking where internal objects share state information across threads (e.g. a lock on a resource in class Foo requires a lock on a resource in class Bar, but locking that resource in class Bar doesn't necessarily require a lock on a resource in class Foo).
Implementing thread-safe code can become a complex task depending on how all your objects interact. It can be much more complicated than the examples I've given here. Just be sure you clearly understand how your classes are used and how they interact (and be prepared to spend some time tracking down difficult to reproduce bugs).
If this is the first time you're doing threading, consider not accessing the singleton from the background thread. You can get it right, but you probably won't get it right the first time.
Realize that if your singleton exposes pointers to other objects, these should be made thread safe as well.
You don't have to define a mutex for each member. For example, you could instead use a single mutex to synchronize access each to member, e.g.:
class foo
{
public:
...
void some_op()
{
// acquire "lock_" and release using RAII ...
Lock(lock_);
a++;
}
void set_b(bar * b)
{
// acquire "lock_" and release using RAII ...
Lock(lock_);
b_ = b;
}
private:
int a_;
bar * b_;
mutex lock_;
}
Of course a "one lock" solution may be not suitable in your case. That's up to you to decide. Regardless, simply introducing locks doesn't make the code thread-safe. You have to use them in the right place in the right way to avoid race conditions, deadlocks, etc. There are lots of concurrency issues you could run in to.
Furthermore you don't always need mutexes, or other threading mechanisms like TSS, to make code thread-safe. For example, the following function "func" is thread-safe:
class Foo;
void func (Foo & f)
{
f.some_op(); // Foo::some_op() of course needs to be thread-safe.
}
// Thread 1
Foo a;
func(&a);
// Thread 2
Foo b;
func(&b);
While the func function above is thread-safe the operations it invokes may not be thread-safe. The point is you don't always need to pepper your code with mutexes and other threading mechanisms to make the code thread safe. Sometimes restructuring the code is sufficient.
There's a lot of literature on multithreaded programming. It's definitely not easy to get right so take your time in understanding the nuances, and take advantage of existing frameworks like Boost.Thread to mitigate some of the inherent and accidental complexities that exist in the lower-level multithreading APIs.
I'd really recommend the Interlocked.... Methods to increment, decrement and CompareAndSwap values when using code that needs to be multi-thread-aware. I don't have 1st-hand C++ experience but a quick search for http://www.bing.com/search?q=c%2B%2B+interlocked reveals lots of confirming advice. If you need perf, these will likely be faster than locking.
As stated by #Void a mutex alone is not always the solution to a concurrency problem:
Regardless, simply introducing locks doesn't make the code
thread-safe. You have to use them in the right place in the right way
to avoid race conditions, deadlocks, etc. There are lots of
concurrency issues you could run in to.
I want to add another example:
class MyClass
{
mutex m_mutex;
AnotherClass m_anotherClass;
void setObject(AnotherClass& anotherClass)
{
m_mutex.lock();
m_anotherClass = anotherClass;
m_mutex.unlock();
}
AnotherClass getObject()
{
AnotherClass anotherClass;
m_mutex.lock();
anotherClass = m_anotherClass;
m_mutex.unlock();
return anotherClass;
}
}
In this case the getObject() method is always safe because is protected with mutex and you have a copy of the object which is returned to the caller which may be a different class and thread. This means you are working on a copy which might be old (in the meantime another thread might have changed the m_anotherClass by calling setObject() ).Now what if you turn m_anotherClass to a pointer instead of an object-variable ?
class MyClass
{
mutex m_mutex;
AnotherClass *m_anotherClass;
void setObject(AnotherClass *anotherClass)
{
m_mutex.lock();
m_anotherClass = anotherClass;
m_mutex.unlock();
}
AnotherClass * getObject()
{
AnotherClass *anotherClass;
m_mutex.lock();
anotherClass = m_anotherClass;
m_mutex.unlock();
return anotherClass;
}
}
This is an example where a mutex is not enough to solve all the problems.
With pointers you can have a copy only of the pointer but the pointed object is the same in the both the caller and the method. So even if the pointer was valid at the time that the getObject() was called you don't have any guarantee that the pointed value will exists during the operation you are performing with it. This is simply because you don't have control on the object lifetime. That's why you should use object-variables as much as possible and avoid pointers (if you can).