Thread move in C++11 - c++

I meet some problem when testing examples in C++ concurrency in Action.
/***
Scoped_thread
Help explain the move semantics of Scoped_guard
#c++ Concurrency in Action
***/
#include <thread>
#include <iostream>
using namespace std;
class Scoped_thread
{
std::thread t;
public:
Scoped_thread(std::thread _t):
t(std::move(_t))
{
cout << "Success?" << endl;
if (!t.joinable())
throw std::logic_error("No thread");
}
~Scoped_thread()
{
t.join();
}
Scoped_thread(Scoped_thread const&) = delete;
Scoped_thread& operator=(Scoped_thread const&) = delete;
};
struct func
{
int& i;
func(int& i):i(i) {}
void operator()()
{
for (unsigned j = 0; j < 1000000; j++)
{
cout << j << endl;
}
}
};
int main()
{
int some_local_state = 1;
func myfunc(some_local_state);
Scoped_thread t2(std::thread(myfunc));
for (unsigned j = 0; j < 1000; j++)
{
cout << "Main thread " << j << endl;
}
}
When printing out, only "Main Thread" comes out. I've found the constructor didn't start. Does that indicates some problem using thread move semantics?
My working environment is Ubuntu 16.04, and the compile command is 'g++ -std=c++11 -Wall -pthread file.cpp'

Scoped_thread t2(std::thread(myfunc));
Here we have a slightly non-conventional case of most vexing parse. The thing is: the following function forward declarations are equivalent:
void f(int arg);
void f(int (arg));
Therefore, Scoped_thread t2(std::thread(myfunc)); gets parsed as a forward declaration of function t2 that returns Scoped_thread and takes std::thread myfunc as an argument.
Two solution is:
Scoped_thread t2{std::thread(myfunc)};

Related

Do i need synchronization in below example?

As per cppreference document:
All member functions (including copy constructor and copy assignment)
can be called by multiple threads on different instances of shared_ptr
without additional synchronization even if these instances are copies
and share ownership of the same object.
As i understood from cppreference, you don't need to put synchronization as you are calling different member functions using different instances of shared_ptr which points to the same object.
Please correct me if i understood wrong.and also give small example to understand it clearly.
#include <iostream>
#include <memory>
#include <thread>
using namespace std;
class Demo
{
public:
int Value;
Demo():Value(10){}
void fun1()
{
for(int i=0; i<300000; i++)
{
Value = Value + i;
std::cout << "Value1 :" << Value << std::endl;
}
}
void fun2()
{
for(int i=0; i<300000; i++)
{
Value = Value + i;
std::cout << "Value2 :" << Value << std::endl;
}
}
void fun3()
{
for(int i=0; i<300000; i++)
{
Value = Value + i;
std::cout << "Value3 :" << Value << std::endl;
}
}
};
int main()
{
std::shared_ptr<Demo> ptr1(new Demo);
std::thread t1(&Demo::fun1, ptr1);
std::shared_ptr<Demo> ptr2(ptr1);
std::thread t2(&Demo::fun2, ptr2);
std::shared_ptr<Demo> ptr3(ptr2);
std::thread t3(&Demo::fun3, ptr3);
t1.join();
t2.join();
t3.join();
}
//output:
Getting random(asynchronized) output as shown below:
Value3 :70993659Value2 :
71000412
Value1 :71006910Value2 :
70993659Value1 :71013664
All member functions (including copy constructor and copy assignment) can be called by multiple threads on different instances of shared_ptr without additional synchronization even if these instances are copies and share ownership of the same object.
shared_ptr's member functions can be called without synchronization. However, you still need to synchronize member function calls of the template type, i.e. Demo.
fun1, fun2 and fun3 are member of Demo, not member of shared_ptr. So you still need to use lock to protect them.
Yes, you need synchronization for both value_ and std::cout.
Beside that, the sum of the first 300,000 positive integers is: 45,000,000,000 (45 billion). X 3 threads is: 13.5 billion. To avoid undefined behavior when a signed integer exceeds INT_MAX (2,147,483,647), it is advisable to change its data type to unsigned integer and to use wider integer data types such as uint32_t and uint64_t.
Example:
#include <iostream>
#include <memory>
#include <thread>
#include <mutex>
class Demo
{
public:
uint64_t value_;
Demo() : value_(10) {}
void fun1()
{
for (uint32_t i = 0; i < 300000; i++)
{
std::lock_guard lock(mutex_);
value_ += i;
std::cout << "Value1 :" << value_ << std::endl;
}
}
void fun2()
{
for (uint32_t i = 0; i < 300000; i++)
{
std::lock_guard lock(mutex_);
value_ += i;
std::cout << "Value2 :" << value_ << std::endl;
}
}
void fun3()
{
for (uint32_t i = 0; i < 300000; i++)
{
std::lock_guard lock(mutex_);
value_ += i;
std::cout << "Value3 :" << value_ << std::endl;
}
}
protected:
std::mutex mutex_{};
};
int main()
{
std::shared_ptr<Demo> ptr1(new Demo);
std::thread t1(&Demo::fun1, ptr1);
std::shared_ptr<Demo> ptr2(ptr1);
std::thread t2(&Demo::fun2, ptr2);
std::shared_ptr<Demo> ptr3(ptr2);
std::thread t3(&Demo::fun3, ptr3);
t1.join();
t2.join();
t3.join();
}

c++: Program crashes with parameter (by value) passed to a lambda

I simplified the code, so pardon my style.
I was wondering what happens to an object that is constructed by a constructor that actually allocates memory, and passed to a lambda by value, when this lambda itself, is being a callback by another thread.
It didn't surprise me to see the program crashes when the destructor is called. This was test#1.
test#2: I removed the "new" and the "delete[]" from c'tor and d'tor of A, and now - it worked fine.
test#3:
I brought the "new" and the "delete[]" back as before, but now I changed every place with "A objA" (by value) into "A& objA", and now, it didn't crash as well.
Now, I can rationalize it by waving my hands but I'd like to understand what really happened here, and for that matter - what would happen if an object that is passed into a lambda by "capture", also ceases to exist.
and last question: is there a good practice or tip what to do (or what to avoid) in such cases?
#include <iostream>
#include <thread>
#include <future>
#include <chrono>
using namespace std::chrono_literals;
class A {
public:
A() : x(1) { ptr = new char[1024]; }
~A() { delete[](ptr); }
int getX() { return x; }
private:
int x = 0;
char* ptr = nullptr;
};
std::function<void(A objA)> myCb;
int myThread()
{
static int counter = 0;
auto a = new A;
while (true) {
std::this_thread::sleep_for(2s);
if (myCb)
myCb(*a);
else
std::cout << "myCb is still NULL: counter = " << counter << std::endl;
if (counter++ == 5)
break;
}
return 0;
}
void registerCallback(std::function<void(A obj)> cb)
{
myCb = cb;
}
int main()
{
std::thread t1(myThread);
std::this_thread::sleep_for(6s);
int val = 5;
registerCallback([&val](A objA) {
std::cout << "here lambda is called with " << objA.getX() << " and " << val << std::endl;
});
val = 6;
std::this_thread::sleep_for(1s);
val = 7;
std::this_thread::sleep_for(1s);
val = 8;
std::this_thread::sleep_for(1s);
t1.join();
}
class A is violating the Rule of 3/5/0, as it does not implement a copy-constructor and/or move-constructor, or a copy-assignment and/or move-assignment operator.
So, when an instance of A is passed around by value, a shallow copy is made that shares the same char* pointer to a single char[] array in memory, and thus the code MAY crash (ie, undefined behavior) when trying to delete[] that same array multiple times.
What you need is a deep copy instead, so that each instance of A allocates its own char[] array, eg:
class A
{
public:
A() : x(1), ptr(new char[1024])
{
std::fill(ptr, ptr + 1024, '\0');
}
A(const A &src) : x(src.x), ptr(new char[1024])
{
std::copy(src.ptr, src.ptr + 1024, ptr);
}
A(A &&src)
: x(src.x), ptr(src.ptr)
{
src.ptr = nullptr;
}
~A()
{
delete[] ptr;
}
A& operator=(A rhs)
{
std::swap(x, rhs.x);
std::swap(ptr, rhs.ptr);
return *this;
}
int getX() const { return x; }
private:
int x;
char* ptr;
};
A simpler way to implement this is to use std::vector instead of new[], since vector is already compliant with the Rule of 3/5/0, and so compiler-generated constructors, destructor, and assignment operators for A will suffice to make copies/moves of the vector for you, eg:
#include <vector>
class A
{
public:
A() : vec(1024, '\0') {}
int getX() const { return x; }
private:
int x = 1;
std::vector<char> vec;
};
You should use unique_ptr. deleting a void* is undefined behavior
#include <iostream>
#include <thread>
#include <future>
#include <chrono>
using namespace std::chrono_literals;
class A {
public:
A() : x(1)
{
ptr = std::make_unique<char[]>(1024);
}
~A()
{
}
int getX() { return x; }
private:
int x = 0;
std::unique_ptr<char[]> ptr = nullptr;
};
std::function<void(A& objA)> myCb;
int myThread()
{
static int counter = 0;
auto a = new A;
while (true) {
std::this_thread::sleep_for(2s);
if (myCb)
myCb(*a);
else
std::cout << "myCb is still NULL: counter = " << counter << std::endl;
if (counter++ == 5)
break;
}
return 0;
}
void registerCallback(std::function<void(A& obj)> cb)
{
myCb = cb;
}
int mymain()
{
std::thread t1(myThread);
std::this_thread::sleep_for(6s);
int val = 5;
registerCallback([&val](A& objA) {
std::cout << "here lambda is called with " << objA.getX() << " and " << val << std::endl;
});
val = 6;
std::this_thread::sleep_for(1s);
val = 7;
std::this_thread::sleep_for(1s);
val = 8;
std::this_thread::sleep_for(1s);
t1.join();
return 0;
}

C++ Boost multithreading strange behavior

I am following Boost multithreading tutorial here
. The code is as follow:
#define _CRT_SECURE_NO_WARNINGS
#include <iostream>
#include <string>
#include <boost/array.hpp>
#include <boost/bind.hpp>
#include <boost/shared_ptr.hpp>
#include <boost/asio.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/thread.hpp>
using std::cout;
using std::cin;
using std::endl;
class CallableClass
{
private:
// Number of iterations
int m_iterations;
public:
// Default constructor
CallableClass()
{
m_iterations = 10;
}
// Constructor with number of iterations
CallableClass(int iterations)
{
m_iterations = iterations;
}
// Copy constructor
CallableClass(const CallableClass& source)
{
m_iterations = source.m_iterations;
}
// Destructor
~CallableClass()
{
cout << "Callable class exiting." << endl;
}
// Assignment operator
CallableClass& operator = (const CallableClass& source)
{
m_iterations = source.m_iterations;
return *this;
}
// Static function called by thread
static void StaticFunction()
{
for (int i = 0; i < 10; i++) // Hard-coded upper limit
{
cout << i << " - Do something in parallel (Static function)." << endl;
boost::this_thread::yield(); // 'yield' discussed in section 18.6
}
}
// Operator() called by the thread
void operator () ()
{
for (int i = 0; i < m_iterations; i++)
{
cout << i << " - Do something in parallel (operator() )." << endl;
boost::this_thread::yield(); // 'yield' discussed in section 18.6
}
}
};
int main()
{
boost::thread t(&CallableClass::StaticFunction);
for (int i = 0; i < 10; i++)
cout << i << " - Do something in main method." << endl;
return 0;
}
However, if I change main() to this:
int main()
{
// Using a callable object as thread function
int numberIterations = 20;
CallableClass c(numberIterations);
boost::thread t(c);
for (int i = 0; i < 10; i++)
cout << i << " - Do something in main method." << endl;
return 0;
}
The class destructor is called before the operator is executed. I don't quite understand this behavior. Shouldn't the class stops executing when the destructor is called? Also, why does the operator has two sets of brackets? How do I know when the 2nd thread (not main()) stops and safely exits? Thanks.
No boost::thread doesn't stop executing when it is destructed. std::thread doesn't allow you to do this and will call std::terminate if you destruct the thread without joining it.
You need to add t.join() to the end of your main() method.
void operator () () is the () operator with no parameters, the first () is the name of the operator, the second() is the list of parameters. For example a callable class with parameters would look like:
struct s
{
void operator ()( int p ) {};
}

Multithreaded program blocks when compiled with optimizations

I am developing a project in which I have to model (arbitrary) computations that happen in pipeline.
The pipeline is made of stages, each stage takes the input from the previous stage (except the first, who directly receives tasks from the pipeline object), makes a computation and sends the result to the next stage. Each stage is implemented with a separate thread of execution.
The pipeline should have a basic load balancing capability: if (after a while) it recognizes that the sum of the execution times of two consecutive stages is smaller than the execution time of the slowest stage, it "collapses" those two stages, that is it makes both of them run sequentially, using a single thread.
There are three classes in the project: classes Pipeline and Stage are obvious, while class TSOHeap (Thread-Safe Ordered heap) is the buffer used in input by each stage. It has a maximum size and the capability to give highest priority to special messages indicating that a Stage has to be collapsed.
My question is: why if I compile without optimizations the code runs smoothly (or at least does not block), while if I compile with optimizations ( -O2, -O3 ) the program blocks? If I run the program with the debugger it blocks few times; if I run the program "normally" from terminal it blocks almost always.
The strange thing is that a thread blocks on a line in which there is a simple print. Before I added that print (for debugging purpose), the program blocked on the previous line, which is the guard of a while loop.
I guess the problem is related to synchronization among threads, but I don't know how to discover the faulty part. The only constant is that the program blocks after the method collapse_next_stage() has been invoked, that is after a thread has been stopped.
Any suggestion would be appreciated, even general procedures to discover bugs like these.
I report the code to run an example:
Class "TSOHeap.hpp":
#include <mutex>
#include <queue>
#include <vector>
#include <atomic>
#include <climits>
using namespace std;
template<typename T>
struct Comparator{
bool operator()(pair<T,int> p1, pair<T,int> p2){
return p1.second > p2.second;
}
};
//Thread-Safe Ordered Heap
template<typename T>
struct TSOHeap
{
TSOHeap(int _max=10):size{0},max{_max}{};
~TSOHeap(){}
void push(T* item, int id){
while(size==max);
{
lock_guard<mutex> lock(heap_mutex);
heap.push(pair<T*,int>(item, id));
size++;
}
}
pair<T*,int> pop(){
while(size==0);
{
lock_guard<mutex> lock(heap_mutex);
pair<T*,int> p = heap.top();
heap.pop();
size--;
return p;
}
}
priority_queue<pair<T*,int>, vector<pair<T*,int>>,Comparator<T*>> heap;
atomic<int> size;
int max;
mutex heap_mutex;
};
Class "Stage.hpp":
#include "TSOHeap.hpp"
#include <iostream>
#include <thread>
#include <vector>
#include <chrono>
#include <mutex>
using namespace std;;
struct IStage{
virtual void run() = 0;
virtual void wait_end() = 0;
virtual void stage_func() = 0;
virtual double get_exec_time() = 0;
virtual void reset_exec_time()=0;
virtual void add_next(IStage&)=0;
virtual IStage* get_next() = 0;
virtual void* get_input_ptr() = 0;
virtual void set_input(void*) = 0;
virtual void collapse() = 0;
virtual bool is_collapsed() = 0;
virtual void collapse_next_stage() = 0;
virtual int num_collapsed() = 0;
~IStage(){};
};
template <typename Tin, typename Tf, typename Tout>
struct Stage : IStage{
Stage(Tf function, int ind):fun{function}, input_ptr{new(TSOHeap<Tin>)},_end{false},
next{nullptr}, collapsed{0}, i{ind}, exec_time{0.0},count{0},collapsing{false},c{0}{};
~Stage(){delete input_ptr;}
void stage_func(){
Tin * input = input_ptr->pop().first;
if (input!=nullptr){
auto start = chrono::system_clock::now();
Tout out = fun(*input);
auto end = chrono::system_clock::now();
chrono::duration<double> diff = end-start;
set_exec_time(diff.count());
if (next!=nullptr)
next->set_input(new Tout(out));
}
else
_end = true;
}
void run_thread(){
while(!_end){
cout << "t " << i << ", r " << ++c << endl; // BLOCKS HERE
while(collapsing); //waiting that next stage finishes the remaining tasks
stage_func();
if(collapsed==1 && !_end)
next->stage_func();
}
if(collapsed!=-1){
IStage * nptr = next;
if(nptr!=nullptr && nptr->is_collapsed())
nptr = nptr->get_next();
if(nptr!=nullptr)
nptr->set_input(nullptr);
}
else{
while((input_ptr->size)>0)
stage_func();
}
}
void run()
{
thread _t(&Stage::run_thread, this);
t = move(_t);
return;
}
void wait_end()
{
t.join();
}
void set_input(void * iptr)
{
input_ptr->push(static_cast<Tin*>(iptr), ++count);
}
void* get_input_ptr()
{
return input_ptr;
}
void add_next(IStage &n)
{
next = &n;
output_ptr = static_cast<TSOHeap<Tout>*>(n.get_input_ptr());
}
void collapse()
{
collapsed=-1;
input_ptr->push(nullptr, INT_MIN);
// First condition is to avoid deadlock, in case this thread finished the execution in the meanwhile
while(!_end && (input_ptr->size) > 0);
}
bool is_collapsed()
{
return collapsed==-1;
}
void collapse_next_stage()
{
collapsing = true;
next->collapse();
collapsed++;
collapsing = false;
cout << "Stage # " << i << " has collapsed the successive Stage" << endl;
}
IStage* get_next()
{
return next;
}
double get_exec_time()
{
return exec_time;
}
void reset_exec_time()
{
set_exec_time(0.0);
}
void set_exec_time(double value)
{
lock_guard<mutex> lock(et_mutex);
exec_time = value;
}
int num_collapsed()
{
return collapsed;
}
Tf fun;
TSOHeap<Tin> * input_ptr;
bool _end;
IStage * next;
int collapsed;
int const i;
double exec_time;
int count;
mutex et_mutex;
bool collapsing;
int c;
TSOHeap<Tout> * output_ptr;
thread t;
};
Class "Pipe.hpp":
#include "Stage.hpp"
#include <list>
#include <thread>
#include <algorithm>
using namespace std;;
template <typename Tin, typename Tout>
struct Pipe{
Pipe(list<IStage*>li, int n_samples=10):slowest{-1},end{false},num_samples{n_samples}
{
for(auto& s:li)
add_node(s);
}
void add_node(IStage* sptr)
{
if(!nodes.empty())
nodes.back()->add_next(*sptr);
nodes.push_back(sptr);
}
void set_input(void * in_ptr)
{
nodes.front()->set_input(in_ptr);
}
int num_nodes()
{
return nodes.size();
}
void run()
{
for(auto &x: nodes)
x->run();
}
void run(list<Tin>&& input)
{
thread t(&Pipe::run_manager, this, ref(input));
while(!end)
monitor_times();
t.join();
}
void run_manager(list<Tin>& input)
{
run();
for(auto& x:input)
set_input(&x);
set_input(nullptr);
end=true;
for(auto& s : nodes)
s->wait_end();
}
void monitor_times()
{ // initialization phase
vector<int> count;
vector<double> avg;
vector<priority_queue<pair<double,int>, vector<pair<double,int>>,Comparator<double>>> measures;
for(auto& x : nodes){
count.push_back(0);
avg.push_back(0);
measures.push_back(priority_queue<pair<double,int>,
vector<pair<double,int>>,Comparator<double>>());
}
while(!end){
// monitoring phase
for(int i=0; i<nodes.size(); i++){
if(nodes[i]->get_exec_time()!=0){
pair<double,int> measure = pair<double,int>(nodes[i]->get_exec_time(),++count[i]);
nodes[i]->reset_exec_time();
measures[i].push(measure);
if(count[i]<=num_samples){
avg[i] = (avg[i]*(count[i]-1) + measure.first) / count[i];
}
else
{
double old = measures[i].top().first;
// the ordering of the heap guarantees that I drop the oldest measure
measures[i].pop();
avg[i] = (avg[i] * num_samples - old + measure.first) / num_samples;
}
}
}
// updating phase
if(is_steady_state(count)){
int slowest = get_slowest_stage(avg);
for(int i=0; i<nodes.size()-1; i++){
if(avg[i]+avg[i+1]<avg[slowest]){
if(nodes[i]->num_collapsed()==0 && nodes[i+1]->num_collapsed()==0){
nodes[i]->collapse_next_stage();
break;
}
}
}
}
}
}
bool is_steady_state(vector<int>& count){
for(auto& c: count){
if(c < num_samples) return false;
}
return true;
}
int get_slowest_stage(vector<double>& avg){
double max = 0.0;
int index = -1;
for(int i=0; i<avg.size(); i++){
if(avg[i]>max){
max=avg[i];
index = i;
}
}
return index;
}
int slowest;
bool end;
int num_samples;
vector<IStage*> nodes;
};
Class "main.cpp":
#include<iostream>
#include<functional>
#include <chrono>
#include<cmath>
#include "Pipe.hpp"
using namespace std;;
auto f = [](int x){
int c = 0;
for(int i=0; i<300; i++)
c=sin(i);
return x;
};
auto fast = [] (int x) {return x;};
auto fast_init = [](int x){
if(x < 5)
return x;
int c=0;
for(int i=0; i<300; i++)
c=sin(i);
return x;
};
auto print = [] (int x) {
cout << "Result: " << x << " " << endl;
return x;
};
int main(int argc, char* argv[])
{
auto print_usage_msg = [&](){
cout << "Usage: " << argv[0] << " <func_type> \n" <<
"<func_type> = \n"
" 0 to have 2 consecutive stages running the fast function\n"
" 1 to have 2 consecutive stages running the fast function "
"but after a short time reaching steady state " << endl;
};
if(argc!=2){
print_usage_msg();
return 1;
}
int fun_code = atoi(argv[1]);
if (fun_code!=0 && fun_code!=1){
print_usage_msg();
return 1;
}
Stage<int,function<int(int)>,int> s1{f,1};
Stage<int,function<int(int)>,int> s2{f,2};
Stage<int,function<int(int)>,int> s3{f,3};
Stage<int,function<int(int)>,int> s4{f,4};
Stage<int,function<int(int)>,int> s5{f,5};
Stage<int,function<int(int)>,int> s6{f,6};
Stage<int,function<int(int)>,int> s7{f,7};
Stage<int,function<int(int)>,int> sp{print,8};
if(fun_code==0){
s2.fun = fast;
s3.fun = fast;
}
else{
s2.fun = fast_init;
s3.fun = fast_init;
}
Pipe<int,int> p ({&s1, &s2, &s3, &s4, &s5, &s6, &s7, &sp});
cout << "Pipe length: " << p.num_nodes() << endl;
list<int> li {};
for(int i=0; i<100; i++)
li.push_back(i);
p.run(move(li));
return 0;
}
Compile with:
g++ main.cpp -std=c++11 -pthread -O3 -o gpipe -g
Run with :
./gpipe 1
Thanks for any help!
Imagine the following code for a single-threaded program:
void func()
{
bool a = true;
while(a)
{
// busy wait...
}
}
Will this function ever return? Obviously not. If you were a compiler, how would you write optimized code for this?
1: NOP
2: GOTO 1
This is exactly what you're doing with this bit of code. Twice.
while(!_end){ // here #1
cout << "t " << i << ", r " << ++c << endl;
while(collapsing) // here #2
; // for the love of God, move your semicolon here or use braces
stage_func();
if(collapsed==1 && !_end)
next->stage_func();
}
Your compiler has absolutely no obligation to realize that you're doing multi-threading programming. (It's your job to tell it)
The compiler needs to know not to perform optimizations on _end and collapsed. DO NOT USE volatile. Why? volatile will keep the compiler from optimizing a variable, but... heh heh... the CPU can also potentially optimize away your writes to _end and collapsed from different threads (by keeping them in its cache and not writing to main memory). Compilers and CPU's will also re-order your instructions, which can cause similar problems.
Memory fences (aka memory barriers) can be used to instruct the CPU to do things like push out pending writes or re-update its cached value for reading. They also give guidelines for command re-ordering. AFAIK the std::atomic_thread_fence will prevent compiler reordering but I've read conflicting things about this...
By far the simplest, most-pragmatic, and easiest-to-prove-correct thing to do is just to switch all your inter-thread communicating variables to std::atomic<> types, which incorporate memory barriers. So
std::atomic<bool> _end;
std::atomic<int> collapsed;
As a general rule, any data that is shared between threads should be protected by a mutex OR be an std::atomic<> if race conditions are not an issue (as you are doing with the simple signaling). You can break this rule if you really know what you're doing and really know the architecture, compiler, and standard implementation really well, but that's a tall order even for an expert.
By the way, a mutex's lock and unlock operation both incorporate a memory barrier, in case you were worried about that. So when you get a pointer from the TSOHeap, that's fine (assuming your TSOHeap implementation is correct...I didn't look at it).
You have race conditions in TSOHeap when using size. While size is atomic, it is a part of larger state that is not atomic, so that changes in size are not synchronized with changes to the rest of the state.
Make size non-atomic and access it only when the mutex is locked. Add condition variables to notify threads waiting in push and pop.
Alternatively, remove size entirely. Example:
template<typename T>
struct TSOHeap
{
TSOHeap(size_t _max=10): max{_max}{}
void push(T* item, int id){
unique_lock<mutex> lock(heap_mutex);
while(heap.size() == max)
cnd_pop.wait(lock);
heap.push(pair<T*,int>(item, id));
cnd_push.notify_one();
}
pair<T*,int> pop() {
pair<T*,int> result = {};
{
unique_lock<mutex> lock(heap_mutex);
while(heap.empty())
cnd_push.wait(lock);
bool notify = heap.size() == max;
result = heap.top();
heap.pop();
if(notify)
cnd_pop.notify_one();
}
return result;
}
mutex heap_mutex;
condition_variable cnd_push, cnd_pop;
priority_queue<pair<T*,int>, vector<pair<T*,int>>,Comparator<T*>> heap;
size_t const max;
};

std::thread throwing "resource dead lock would occur"

I have a list of objects, each object has member variables which are calculated by an "update" function. I want to update the objects in parallel, that is I want to create a thread for each object to execute it's update function.
Is this a reasonable thing to do? Any reasons why this may not be a good idea?
Below is a program which attempts to do what I described, this is a complete program so you should be able to run it (I'm using VS2015). The goal is to update each object in parallel. The problem is that once the update function completes, the thread throws an "resource dead lock would occur" exception and aborts.
Where am I going wrong?
#include <iostream>
#include <thread>
#include <vector>
#include <algorithm>
#include <thread>
#include <mutex>
#include <chrono>
class Object
{
public:
Object(int sleepTime, unsigned int id)
: m_pSleepTime(sleepTime), m_pId(id), m_pValue(0) {}
void update()
{
if (!isLocked()) // if an object is not locked
{
// create a thread to perform it's update
m_pThread.reset(new std::thread(&Object::_update, this));
}
}
unsigned int getId()
{
return m_pId;
}
unsigned int getValue()
{
return m_pValue;
}
bool isLocked()
{
bool mutexStatus = m_pMutex.try_lock();
if (mutexStatus) // if mutex is locked successfully (meaning it was unlocked)
{
m_pMutex.unlock();
return false;
}
else // if mutex is locked
{
return true;
}
}
private:
// private update function which actually does work
void _update()
{
m_pMutex.lock();
{
std::cout << "thread " << m_pId << " sleeping for " << m_pSleepTime << std::endl;
std::chrono::milliseconds duration(m_pSleepTime);
std::this_thread::sleep_for(duration);
m_pValue = m_pId * 10;
}
m_pMutex.unlock();
try
{
m_pThread->join();
}
catch (const std::exception& e)
{
std::cout << e.what() << std::endl; // throws "resource dead lock would occur"
}
}
unsigned int m_pSleepTime;
unsigned int m_pId;
unsigned int m_pValue;
std::mutex m_pMutex;
std::shared_ptr<std::thread> m_pThread; // store reference to thread so it doesn't go out of scope when update() returns
};
typedef std::shared_ptr<Object> ObjectPtr;
class ObjectManager
{
public:
ObjectManager()
: m_pNumObjects(0){}
void updateObjects()
{
for (int i = 0; i < m_pNumObjects; ++i)
{
m_pObjects[i]->update();
}
}
void removeObjectByIndex(int index)
{
m_pObjects.erase(m_pObjects.begin() + index);
}
void addObject(ObjectPtr objPtr)
{
m_pObjects.push_back(objPtr);
m_pNumObjects++;
}
ObjectPtr getObjectByIndex(unsigned int index)
{
return m_pObjects[index];
}
private:
std::vector<ObjectPtr> m_pObjects;
int m_pNumObjects;
};
void main()
{
int numObjects = 2;
// Generate sleep time for each object
std::vector<int> objectSleepTimes;
objectSleepTimes.reserve(numObjects);
for (int i = 0; i < numObjects; ++i)
objectSleepTimes.push_back(rand());
ObjectManager mgr;
// Create some objects
for (int i = 0; i < numObjects; ++i)
mgr.addObject(std::make_shared<Object>(objectSleepTimes[i], i));
// Print expected object completion order
// Sort from smallest to largest
std::sort(objectSleepTimes.begin(), objectSleepTimes.end());
for (int i = 0; i < numObjects; ++i)
std::cout << objectSleepTimes[i] << ", ";
std::cout << std::endl;
// Update objects
mgr.updateObjects();
int numCompleted = 0; // number of objects which finished updating
while (numCompleted != numObjects)
{
for (int i = 0; i < numObjects; ++i)
{
auto objectRef = mgr.getObjectByIndex(i);
if (!objectRef->isLocked()) // if object is not locked, it is finished updating
{
std::cout << "Object " << objectRef->getId() << " completed. Value = " << objectRef->getValue() << std::endl;
mgr.removeObjectByIndex(i);
numCompleted++;
}
}
}
system("pause");
}
Looks like you've got a thread that is trying to join itself.
While I was trying to understand your solution I was simplifying it a lot. And I come to point that you use std::thread::join() method in a wrong way.
std::thread provide capabilities to wait for it completion (non-spin wait) -- In your example you wait for thread completion in infinite loop (snip wait) that will consume CPU time heavily.
You should call std::thread::join() from other thread to wait for thread completion. Mutex in Object in your example is not necessary. Moreover, you missed one mutex to synchronize access to std::cout, which is not thread-safe. I hope the example below will help.
#include <iostream>
#include <thread>
#include <vector>
#include <algorithm>
#include <thread>
#include <mutex>
#include <chrono>
#include <cassert>
// cout is not thread-safe
std::recursive_mutex cout_mutex;
class Object {
public:
Object(int sleepTime, unsigned int id)
: _sleepTime(sleepTime), _id(id), _value(0) {}
void runUpdate() {
if (!_thread.joinable())
_thread = std::thread(&Object::_update, this);
}
void waitForResult() {
_thread.join();
}
unsigned int getId() const { return _id; }
unsigned int getValue() const { return _value; }
private:
void _update() {
{
{
std::lock_guard<std::recursive_mutex> lock(cout_mutex);
std::cout << "thread " << _id << " sleeping for " << _sleepTime << std::endl;
}
std::this_thread::sleep_for(std::chrono::seconds(_sleepTime));
_value = _id * 10;
}
std::lock_guard<std::recursive_mutex> lock(cout_mutex);
std::cout << "Object " << getId() << " completed. Value = " << getValue() << std::endl;
}
unsigned int _sleepTime;
unsigned int _id;
unsigned int _value;
std::thread _thread;
};
class ObjectManager : public std::vector<std::shared_ptr<Object>> {
public:
void runUpdate() {
for (auto it = this->begin(); it != this->end(); ++it)
(*it)->runUpdate();
}
void waitForAll() {
auto it = this->begin();
while (it != this->end()) {
(*it)->waitForResult();
it = this->erase(it);
}
}
};
int main(int argc, char* argv[]) {
enum {
TEST_OBJECTS_NUM = 2,
};
srand(static_cast<unsigned int>(time(nullptr)));
ObjectManager mgr;
// Generate sleep time for each object
std::vector<int> objectSleepTimes;
objectSleepTimes.reserve(TEST_OBJECTS_NUM);
for (int i = 0; i < TEST_OBJECTS_NUM; ++i)
objectSleepTimes.push_back(rand() * 9 / RAND_MAX + 1); // 1..10 seconds
// Create some objects
for (int i = 0; i < TEST_OBJECTS_NUM; ++i)
mgr.push_back(std::make_shared<Object>(objectSleepTimes[i], i));
assert(mgr.size() == TEST_OBJECTS_NUM);
// Print expected object completion order
// Sort from smallest to largest
std::sort(objectSleepTimes.begin(), objectSleepTimes.end());
for (size_t i = 0; i < mgr.size(); ++i)
std::cout << objectSleepTimes[i] << ", ";
std::cout << std::endl;
// Update objects
mgr.runUpdate();
mgr.waitForAll();
//system("pause"); // use Ctrl+F5 to run the app instead. That's more reliable in case of sudden app exit.
}
About is it a reasonable thing to do...
A better approach is to create an object update queue. Objects that need to be updated are added to this queue, which can be fulfilled by a group of threads instead of one thread per object.
The benefits are:
No 1-to-1 correspondence between thread and objects. Creating a thread is a heavy operation, probably more expensive than most update code for a single object.
Supports thousands of objects: with your solution you would need to create thousands of threads, which you will find exceeds your OS capacity.
Can support additional features like declaring dependencies between objects or updating a group of related objects as one operation.