Boost::Mutex in class not thread-safe - c++

I'm learning concurrent programming and what I want to do is have a class where each object it responsible for running its own Boost:Thread. I'm a little over my head with this code because it uses A LOT of functionality that I'm not that comfortable with (dynamically allocated memory, function pointers, concurrency etc etc). It's like every line of code I had to check some references to get it right.
(Yes, all allocated memory is accounted for in the real code!)
I'm having trouble with the mutexes. I declare it static and it seems to get the same value for all the instances (as it should). The code is STILL not thread safe.
The mutex should stop the the threads (right?) from progressing any further in case someone else locked it. Because mutexes are scoped (kind of a neat functionality) and it's within the if statement that should look the other threads out no? Still I get console out puts that clearly suggests it is not thread safe.
Also I'm not sure I'm using the static vaiable right. I tried different ways of refering to it (Seller::ticketSaleMutex) but the only thing that worked was "this->ticketSaleMutex" which seems very shady and it seems to defeat the purpose of it being static.
Seller.h:
class Seller
{
public:
//Some vaiables
private:
//Other variables
static boost::mutex ticketSaleMutex; //Mutex definition
};
Seller.cpp:
boost::mutex Seller::ticketSaleMutex; //Mutex declaration
void Seller::StartTicketSale()
{
ticketSale = new boost::thread(boost::bind(&Seller::SellTickets, this));
}
void Seller::SellTickets()
{
while (*totalSoldTickets < totalNumTickets)
{
if ([Some time tick])
{
boost::mutex::scoped_lock(this->ticketSaleMutex);
(*totalSoldTickets)++;
std::cout << "Seller " << ID << " sold ticket " << *totalSoldTickets << std::endl;
}
}
}
main.cpp:
int main(int argc, char**argv)
{
std::vector<Seller*> seller;
const int numSellers = 10;
int numTickets = 40;
int *soldTickets = new int;
*soldTickets = 0;
for (int i = 0; i < numSellers; i++)
{
seller.push_back(new Seller(i, numTickets, soldTickets));
seller[i]->StartTicketSale();
}
}

This will create a temporary that is immediately destroyed:
boost::mutex::scoped_lock(this->ticketSaleMutex);
resulting in no synchronization. You need to declare a variable:
boost::mutex::scoped_lock local_lock(this->ticketSaleMutex);

Related

Willingly introducing a race condition on a 64-bit pointer value

I have an application object which can receive messages from multiple services running in multiple threads. The message gets dispatched internally by an instance of a dispatcher object in the threads of the services. The application can at any time change the current dispatcher. Dispatchers never get destroyed. The services never outlive the application.
Here's an example code
#include <iostream>
#include <thread>
#include <atomic>
#include <cstdlib>
#include <functional>
using namespace std;
using Msg = int;
struct Dispatcher
{
virtual ~Dispatcher() = default;
virtual void dispatchMessage(Msg msg) = 0;
};
struct DispatcherA : Dispatcher
{
void dispatchMessage(Msg msg)
{
cout << "Thread-safe dispatch of " << msg << " by A" << endl;
}
};
struct DispatcherB : Dispatcher
{
void dispatchMessage(Msg msg)
{
cout << "Thread-safe dispatch of " << msg << " by B" << endl;
}
};
struct Application
{
Application() : curDispatcher(&a) {}
void sendMessage(Msg msg)
{
// race here as this is called (and dereferenced) from many threads
// and can be changed by the main thread
curDispatcher->dispatchMessage(msg);
}
void changeDispatcher()
{
// race her as this is changed but can be dereferenced by many threads
if (rand() % 2) curDispatcher = &a;
else curDispatcher = &b;
}
atomic_bool running = true;
Dispatcher* curDispatcher; // race on this
DispatcherA a;
DispatcherB b;
};
void service(Application& app, int i) {
while (app.running) app.sendMessage(i++);
}
int main()
{
Application app;
std::thread t1(std::bind(service, std::ref(app), 1));
std::thread t2(std::bind(service, std::ref(app), 20));
for (int i = 0; i < 10000; ++i)
{
app.changeDispatcher();
}
app.running = false;
t1.join();
t2.join();
return 0;
}
I am aware that there is a race condition here. The curDispatcher pointer gets accessed by many threads and it can be changed at the same time by the main thread. It can be fixed by making the pointer atomic and explicitly loading it on every sendMessage call.
I don't want to pay the price of the atomic loads.
Can something bad happen of this?
Here's what I can think of:
The value of curDispatcher can get cached by a service and it can always call the same one, even if the app has changed the value. I'm ok with that. If I stop being ok with that, I can make it volatile. Newly created services should be ok, anyway.
If this ever runs on a 32-bit CPU which emulates 64-bit, the writes and reads of the pointer will not be instruction-level atomic and it might lead to invalid pointer values and crashes: I am making sure that this only runs on 64-bit CPUs.
Destroying dispatchers isn't safe. As I said: I'm never destroying dispatchers.
???

What advantage does the new feature, "synchronized" block, in C++ provide?

There's a new experimental feature (probably C++20), which is the "synchronized block". The block provides a global lock on a section of code. The following is an example from cppreference.
#include <iostream>
#include <vector>
#include <thread>
int f()
{
static int i = 0;
synchronized {
std::cout << i << " -> ";
++i;
std::cout << i << '\n';
return i;
}
}
int main()
{
std::vector<std::thread> v(10);
for(auto& t: v)
t = std::thread([]{ for(int n = 0; n < 10; ++n) f(); });
for(auto& t: v)
t.join();
}
I feel it's superfluous. Is there any difference between the a synchronized block from above, and this one:
std::mutex m;
int f()
{
static int i = 0;
std::lock_guard<std::mutex> lg(m);
std::cout << i << " -> ";
++i;
std::cout << i << '\n';
return i;
}
The only advantage I find here is that I'm saved the trouble of having a global lock. Is there more advantages of using a synchronized block? When should it be preferred?
On the face of it, the synchronized keyword is similar to std::mutex functionally, but by introducing a new keyword and associated semantics (such the block enclosing the synchronized region) it makes it much easier to optimize these regions for transactional memory.
In particular, std::mutex and friends are in principle more or less opaque to the compiler, while synchronized has explicit semantics. The compiler can't be sure what the standard library std::mutex does and would have a hard time transforming it to use TM. A C++ compiler would be expected to work correctly when the standard library implementation of std::mutex is changed, and so can't make many assumptions about the behavior.
In addition, without an explicit scope provided by the block that is required for synchronized, it is hard for the compiler to reason about the extent of the block - it seems easy in simple cases such as a single scoped lock_guard, but there are plenty of complex cases such as if the lock escapes the function at which point the compiler never really knows where it could be unlocked.
Locks do not compose well in general. Consider:
//
// includes and using, omitted to simplify the example
//
void move_money_from(Cash amount, BankAccount &a, BankAccount &b) {
//
// suppose a mutex m within BankAccount, exposed as public
// for the sake of simplicity
//
lock_guard<mutex> lckA { a.m };
lock_guard<mutex> lckB { b.m };
// oversimplified transaction, obviously
if (a.withdraw(amount))
b.deposit(amount);
}
int main() {
BankAccount acc0{/* ... */};
BankAccount acc1{/* ... */};
thread th0 { [&] {
// ...
move_money_from(Cash{ 10'000 }, acc0, acc1);
// ...
} };
thread th1 { [&] {
// ...
move_money_from(Cash{ 5'000 }, acc1, acc0);
// ...
} };
// ...
th0.join();
th1.join();
}
In this case, the fact that th0, by moving money from acc0 to acc1, is
trying to take acc0.m first, acc1.m second, whereas th1, by moving money from acc1 to acc0, is trying to take acc1.m first, acc0.m second could make them deadlock.
This example is oversimplified, and could be solved by using std::lock()
or a C++17 variadic lock_guard-equivalent, but think of the general case
where one is using third party software, not knowing where locks are being
taken or freed. In real-life situations, synchronization through locks gets
tricky really fast.
The transactional memory features aim to offer synchronization that composes
better than locks; it's an optimization feature of sorts, depending on context, but it's also a safety feature. Rewriting move_money_from() as follows:
void move_money_from(Cash amount, BankAccount &a, BankAccount &b) {
synchronized {
// oversimplified transaction, obviously
if (a.withdraw(amount))
b.deposit(amount);
}
}
... one gets the benefits of the transaction being done as a whole or not at
all, without burdening BankAccount with a mutex and without risking deadlocks due to conflicting requests from user code.

Using boost::mutex as a private member of class

I have a class that contains a boost::mutex as a private member. It becomes locked when you call one of its public functions and unlocks when the function exits. This is to provide synchronous access to the object's internals.
class StringDeque
{
boost::mutex mtx;
std::deque<string> string_deque;
public:
StringDeque() { }
void addToDeque(const string& str_to_add)
{
boost::lock_guard<boost::mutex> guard(mtx);
string_deque.push(str_to_add);
}
string popFromDeque()
{
boost::lock_guard<boost::mutex> guard(mtx);
string popped_string = string_deque.front();
string_deque.pop();
return popped_string;
}
};
This class isn't meant to be particularly useful but I am just playing around with mutexes and threads.
I have a main() that also has another function defined that pops strings from the class and prints them in a thread. It will repeat this 10 times and then return from the function. Once again, this is purely for testing purposes. It looks like this:
void printTheStrings(StringDeque& str_deque)
{
int i = 0;
while(i < 10)
{
string popped_string = str_deque.popFromDeque();
if(popped_string.empty())
{
sleep(1);
continue;
}
cout << popped_string << endl;
++i;
}
}
int main()
{
StringDeque str_deque;
boost::thread the_thread(printTheStrings, str_deque);
str_deque.addToDeque("Say your prayers");
str_deque.addToDeque("Little One");
str_deque.addToDeque("And Don't forget My Son");
str_deque.addToDeque("To include everyone");
str_deque.addToDeque("I tuck you in");
str_deque.addToDeque("Warm within");
str_deque.addToDeque("Keep you free from sin");
str_deque.addToDeque("Until the sandman he comes");
str_deque.addToDeque("Sleep with one eye open");
str_deque.addToDeque("Gripping your pillow tight");
the_thread.join();
}
The error I keep getting is that boost::mutex is noncopyable. The printTheStrings() function takes a reference so I am a little confused as to why this is trying to copy the object.
I have read up a bit on this and one solution I keep reading is to make the boost::mutex a static private member of the object. However, this defeats the purpose of my mutex since I want it to be on an object-by-object basis rather than a class variable.
Is this just bad use of mutexes? Should I just be rethinking this entire application?
EDIT:
I just discovered condition_variable which should serve my purpose a lot better to have the thread wait until there is something actually in the deque before waking up to pop from the deque and print it. All the examples that I see define these mutexes and condition_variable objects at a global scope. This seems very... not object-oriented in my opinion. Even the examples straight from Boost themselves show that it is done in this way. Is this really how other people use these objects?
You are correct that printToString takes the StringQueue by reference. Your problem is that boost::thread take its arguments by value. To force it to take the arguments by reference you will need to modify things to:
boost::thread the_thread(printTheStrings, boost::ref(str_deque));
As an aside, from C++11 onwards, threads are part of the standard library. You should probably use std::thread instead

Thread safety in std::map of std::shared_ptr

I know there are a lot of similar questions with answers around, but since I still don't understand this particular case, I decided to pose a question.
What I have is a map of shared_ptrs to a dynamically allocated array (MyVector). What I want is limited concurrent access without the need to lock. I know that the map per se is not thread safe, but I always thought what I'm doing here should be ok, which is:
I fill the map in a single threaded environment like that:
typedef shared_ptr<MyVector<float>> MyVectorPtr;
for (int i = 0; i < numElements; i++)
{
content[i] = MyVectorPtr(new MyVector<float>(numRows));
}
After the initialization, I have one thread that reads from the elements and one that replaces what the shared_ptrs point to.
Thread 1:
for(auto i=content.begin();i!=content.end();i++)
{
MyVectorPtr p(i->second);
if (p)
{
memory_use+=sizeof(int) + sizeof(float) * p->number;
}
}
Thread 2:
for (auto itr=content.begin();content.end()!=itr;++itr)
{
itr->second.reset(new MyVector<float>(numRows));
}
After a while I get either a seg fault or a double free in one of the two threads. Somehow not really surprisingly, but still I don't really get it.
The reasons why I thought this would work, are:
I don't add or remove any items of the map in the multi-threaded
environment, so the iterators should always point to something valid.
I thought concurrently changing a single element of the map is fine as long as the operation is atomic.
I thought the operations I do on the shared_ptr (increment ref count, decrement ref count in Thread 1, reset in Thread 2) are atomic. SO Question
Obviously, either one ore more of my assumptions are wrong, or I'm not doing what I think I am. I think that reset actually is not thread safe, would std::atomic_exchange help?
Can someone release me? Thanks a lot!
If someone wants to try out, here is the full code example:
#include <stdio.h>
#include <iostream>
#include <string>
#include <map>
#include <unistd.h>
#include <pthread.h>
using namespace std;
template<class T>
class MyVector
{
public:
MyVector(int length)
: number(length)
, array(new T[length])
{
}
~MyVector()
{
if (array != NULL)
{
delete[] array;
}
array = NULL;
}
int number;
private:
T* array;
};
typedef shared_ptr<MyVector<float>> MyVectorPtr;
static map<int,MyVectorPtr> content;
const int numRows = 1000;
const int numElements = 10;
//pthread_mutex_t write_lock;
double get_cache_size_in_megabyte()
{
double memory_use=0;
//BlockingLockGuard guard(write_lock);
for(auto i=content.begin();i!=content.end();i++)
{
MyVectorPtr p(i->second);
if (p)
{
memory_use+=sizeof(int) + sizeof(float) * p->number;
}
}
return memory_use/(1024.0*1024.0);
}
void* write_content(void*)
{
while(true)
{
//BlockingLockGuard guard(write_lock);
for (auto itr=content.begin();content.end()!=itr;++itr)
{
itr->second.reset(new MyVector<float>(numRows));
cout << "one new written" <<endl;
}
}
return NULL;
}
void* loop_size_checker(void*)
{
while (true)
{
cout << get_cache_size_in_megabyte() << endl;;
}
return NULL;
}
int main(int argc, const char* argv[])
{
for (int i = 0; i < numElements; i++)
{
content[i] = MyVectorPtr(new MyVector<float>(numRows));
}
pthread_attr_t attr;
pthread_attr_init(&attr) ;
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
pthread_t *grid_proc3 = new pthread_t;
pthread_create(grid_proc3, &attr, &loop_size_checker,NULL);
pthread_t *grid_proc = new pthread_t;
pthread_create(grid_proc, &attr, &write_content,(void*)NULL);
// to keep alive and avoid content being deleted
sleep(10000);
}
I thought concurrently changing a single element of the map is fine as long as the operation is atomic.
Changing the element in a map is not atomic unless you have a atomic type like std::atomic.
I thought the operations I do on the shared_ptr (increment ref count, decrement ref count in Thread 1, reset in Thread 2) are atomic.
That is correct. Unfortunately you are also changing the underlying pointer. That pointer is not atomic. Since it is not atomic you need synchronization.
One thing you can do though is use the atomic free functions that are introduced with std::shared_ptr. This will let you avoid having to use a mutex.
Lets expand MyVectorPtr p(i->second); which is running on thread-1:
The constructor called for this is:
template< class Y >
shared_ptr( const shared_ptr<Y>& r ) = default;
Which probably boils down to 2 assignments of the underlying shared pointer and the reference count.
It may very well happen that thread 2 would delete the shared pointer while in thread-1 the pointer is being assigned to p. The underlying pointer stored inside shared_ptr is not atomic.
Thus, you usage of std::shared_ptr is not thread safe. It is thread safe as long as you do not update or modify the underlying pointer.
TL;DR;
Changing std::map isn't thread safe, while using std::shared_ptr regarding additional references is.
You should protect accessing your map regarding read/write operations using an appropriate synchronization mechanism, like e.g. a std::mutex.
Also if the state of an instance referenced by the std::shared_ptr should change, it needs to be protected against data races if it's accessed from concurrent threads.
BTW, the MyVector you are showing is a way too naive implementation.

Don't understand this block of code(it runs with no condition)

I'm learning c++ and haven't really seen this in any of the books I've read. I wanted to read and comment code so I can learn better and came across a odd section of code that runs but does not have a condition. From what I read(and my experiences with other languages, you need an if, while,for or something for blocks).
I'm looking at the tbb threads package so I'm not sure if its related to launching threads or c++ specific(if you don't recognize this as something common in c++ then its probably tdd specific).
I think I understand what the code inside actually does but I'm not sure how its being triggered or ran. Any ideas?
Here's the section:
{
//this is the graph part of the code
Graph g;
g.create_random_dag(nodes);
std::vector<Cell*> root_set;
g.get_root_set(root_set);
root_set_size = root_set.size();
for( unsigned int trial=0; trial<traversals; ++trial ) {
ParallelPreorderTraversal(root_set);
}
}
p.s. If it helps here's the entire file(the above code is in the middle of the main()).
#include <cstdlib>
#include "tbb/task_scheduler_init.h"
#include "tbb/tick_count.h"
#include "../../common/utility/utility.h"
#include <iostream>
#include <vector>
#include "Graph.h"
// some forward declarations
class Cell;
void ParallelPreorderTraversal( const std::vector<Cell*>& root_set );
//------------------------------------------------------------------------
// Test driver
//------------------------------------------------------------------------
utility::thread_number_range threads(tbb::task_scheduler_init::default_num_threads);
static unsigned nodes = 1000;
static unsigned traversals = 500;
static bool SilentFlag = false;
//! Parse the command line.
static void ParseCommandLine( int argc, const char* argv[] ) {
utility::parse_cli_arguments(
argc,argv,
utility::cli_argument_pack()
//"-h" option for for displaying help is present implicitly
.positional_arg(threads,"n-of-threads","number of threads to use; a range of the form low[:high], where low and optional high are non-negative integers or 'auto' for the TBB default.")
.positional_arg(nodes,"n-of-nodes","number of nodes in the graph.")
.positional_arg(traversals,"n-of-traversals","number of times to evaluate the graph. Reduce it (e.g. to 100) to shorten example run time\n")
.arg(SilentFlag,"silent","no output except elapsed time ")
);
}
int main( int argc, const char* argv[] ) {
try {
tbb::tick_count main_start = tbb::tick_count::now(); //tbb counter start
ParseCommandLine(argc,argv);
// Start scheduler with given number of threads.
std::cout << threads << std::endl;
for( int p=threads.first; p<=threads.last; ++p ) {
tbb::tick_count t0 = tbb::tick_count::now(); //timer
tbb::task_scheduler_init init(4); //creates P number of threads
srand(2); //generates a random number between 0-2?
size_t root_set_size = 0;
{
//this is the graph part of the code
Graph g;
g.create_random_dag(nodes);
std::vector<Cell*> root_set;
g.get_root_set(root_set);
root_set_size = root_set.size();
for( unsigned int trial=0; trial<traversals; ++trial ) {
ParallelPreorderTraversal(root_set);
}
}
tbb::tick_count::interval_t interval = tbb::tick_count::now()-t0; //counter done
if (!SilentFlag){ //output the results
std::cout
<<interval.seconds()<<" seconds using "<<p<<" threads ("<<root_set_size<<" nodes in root_set)\n";
}
}
utility::report_elapsed_time((tbb::tick_count::now()-main_start).seconds());
return 0;
}catch(std::exception& e){
std::cerr
<< "unexpected error occurred. \n"
<< "error description: "<<e.what()<<std::endl;
return -1;
}
}
No you don't need an if or while statement to introduce a new level of scope. Basically the { symbol opens a new scope level and } ends it. The usual scoping rules apply, for example, variables defined within this new block are undefined outside of, at the end of the block object destructors are run, and variables named the same as another in a scope level above will be shadowed.
A common use case is in switch statements. For example,
switch (a)
{
case 1:
{
int i;
}
case 2:
{
int i; //note reuse of variable with the same name as in case 1
}
}
Without the { } in the case statements the compiler will complain about multiply defined identifiers.
The pair of { and } are creating a local scope. At the end of the scope the compiler will automatically invoke destructors for all stack variables (if one exists) that were declared within that scope.
In your case, destructors for g and root_set will be called at the end of the scope.
One very common use I can think of is to obtain a mutex lock when working with threads. Let's say you have a class named Lock that accepts a mutex object and acquires a lock on it. Then you can surround a critical section of code that needs to be protected from concurrent access as follows:
{
Lock lock( mutex ); // the Lock constructor will acquire a lock on mutex
// do stuff
} // Here the Lock destructor runs and releases the lock on mutex, allowing
// other threads to acquire a lock
The advantage of doing the above is that even if the code within the { ... } block throws an exception, the compiler still invokes Lock's destructor ensuring that the mutex lock is released.
If you are referring to the fact that the block of code has an extra set of braces, that is not uncommon in C++ programming when dealing with short-lived objects on the stack, in this case the Graph and std::vector<Cell*> objects. A pair of curly braces creates a new scope. They do not have to be attached to any control statements. So in this case, a temporary scope is being used to ensure the Graph and vector objects gets freed quickly when they go out of scope. If the extra braces where not present, the objects would not get freed until the next iteration of the outer for loop.
You can create extra blocks like that. They're used to impose an additional level of scope. In your example, G won't exist before or after that block.