Concurrently push()ing to a shared queue with pthread? - c++

I am practicing pthread.
In my original program, it pushes to the shared queue an instance of a class called request, but I first at least want to make sure that I am pushing something to a shared queue.
It is a very simple code, but it just throws a lot of errors that I could not figure out the reason.
I guess it's probably the syntax, but whatever I tried it did not work.
Do you see why it is not working?
Following is the code I have been trying.
extern "C" {
#include<pthread.h>
#include<unistd.h>
}
#include<queue>
#include<iostream>
#include<string>
using namespace std;
class request {
public:
string req;
request(string s) : req(s) {}
};
int n;
queue<request> q;
pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;
void * putToQueue(string);
int main ( void ) {
pthread_t t1, t2;
request* ff = new request("First");
request* trd = new request("Third");
int result1 = pthread_create(&t1, NULL, &putToQueue, reinterpret_cast<void*>(&ff));
if (result1 != 0) cout << "error 1" << endl;
int result2 = pthread_create(&t2, NULL, &putToQueue, reinterpret_cast<void*>(&trd));
if (result2 != 0) cout << "error 2" << endl;
pthread_join(t1, NULL);
pthread_join(t2, NULL);
for(int i=0; i<q.size(); ++i) {
cout << q.front().req << " is in queue" << endl;
q.pop();
--n;
}
return 0;
}
void * putToQueue(void* elem) {
pthread_mutex_lock(&mut);
q.push(reinterpret_cast<request>(elem));
++n;
cout << n << " items are in the queue." << endl;
pthread_mutex_unlock(&mut);
return 0;
}

The code below comments on everything that had to be changed. I would write up a detailed description of why they had to change, but I hope the code speaks for itself. It still isn't bullet-proof. There are plenty of things that could be done differently or better (exception handling for failed new, etc) but at least it compiles, runs, and doesn't leak memory.
#include <queue>
#include <iostream>
#include <string>
#include <pthread.h>
#include <unistd.h>
using namespace std;
// MINOR: param should be a const-ref
class request {
public:
string req;
request(const string& s) : req(s) {}
};
int n;
queue<request> q;
pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;
// FIXED: made protoype a proper pthread-proc signature
void * putToQueue(void*);
int main ( void )
{
pthread_t t1, t2;
// FIXED: made thread param the actual dynamic allocation address
int result1 = pthread_create(&t1, NULL, &putToQueue, new request("First"));
if (result1 != 0) cout << "error 1" << endl;
// FIXED: made thread param the actual dynamic allocation address
int result2 = pthread_create(&t2, NULL, &putToQueue, new request("Third"));
if (result2 != 0) cout << "error 2" << endl;
pthread_join(t1, NULL);
pthread_join(t2, NULL);
// FIXED: was skipping elements because the queue size was shrinking
// with each pop in the while-body.
while (!q.empty())
{
cout << q.front().req << " WAS in queue" << endl;
q.pop();
}
return 0;
}
// FIXED: pretty much a near-total-rewrite
void* putToQueue(void* elem)
{
request *req = static_cast<request*>(elem);
if (pthread_mutex_lock(&mut) == 0)
{
q.push(*req);
cout << ++n << " items are in the queue." << endl;
pthread_mutex_unlock(&mut);
}
delete req; // FIXED: squelched memory leak
return 0;
}
Output (yours may vary)
1 items are in the queue.
2 items are in the queue.
Third WAS in queue
First WAS in queue

As noted in the comment, I'd advise skipping direct use of pthreads, and use the C++11 threading primitives instead. I'd start with a simple protected queue class:
template <class T, template<class, class> class Container=std::deque>
class p_q {
typedef typename Container<T, std::allocator<T>> container;
typedef typename container::iterator iterator;
container data;
std::mutex m;
public:
void push(T a) {
std::lock_guard<std::mutex> l(m);
data.emplace_back(a);
}
iterator begin() { return data.begin(); }
iterator end() { return data.end(); }
// omitting front() and pop() for now, because they're not used in this code
};
Using this, the main-stream of the code stays nearly as simple and clean as single-threaded code, something like this:
int main() {
p_q<std::string> q;
auto pusher = [&q](std::string const& a) { q.push(a); };
std::thread t1{ pusher, "First" };
std::thread t2{ pusher, "Second" };
t1.join();
t2.join();
for (auto s : q)
std::cout << s << "\n";
}
As it stands right now, this is a multiple-producer, single-consumer queue. Further, it depends on the fact that the producers are no longer running when the consuming is happening. That's true in this case, but wouldn't/won't always be. When it's not the case, you'll need a (marginally) more complex queue that does locking as it reads/pops from the queue, not just when writing to it.

Related

boost::interprocess how to implement a simple thread safe job queue for worker processes

I'm attempting to create a basic system for taking jobs from a queue between processes with boost interprocess communications on Windows. When a worker process is free, it will take a job from the shared queue area.
The code is loosely copied from examples in the documentation.
I have a child process that attempts to take on jobs from a queue stored in shared memory as Jobs. The issue is that it crashes as soon as the child attempts to read the front of the queue in SafeQueue::next() at elem = q.front(); (commented below). The child process will terminate when the queue is empty (when it returns -999).
I feel like I'm doing something horribly wrong. I'm new to Boost IPC and would appreciate any pointers or advice on how to achieve this simple worker queue system.
#include <boost/interprocess/windows_shared_memory.hpp>
#include <boost/interprocess/managed_windows_shared_memory.hpp>
#include <boost/interprocess/smart_ptr/shared_ptr.hpp>
#include <boost/interprocess/shared_memory_object.hpp>
#include <string>
#include <thread>
#include <iostream>
#include <mutex>
#include <queue>
using namespace boost::interprocess;
class SafeQueue {
std::queue<int> q;
std::mutex m;
public:
SafeQueue() {}
void push(int elem) {
m.lock();
q.push(elem);
m.unlock();
}
void push(std::vector<int> elem) {
m.lock();
for (int e : elem) {
q.push(e);
}
m.unlock();
}
int next() {
int elem = -999;
m.lock();
if (!q.empty()) {
elem = q.front(); //crashes here
q.pop();
}
m.unlock();
return elem;
}
};
class Jobs
{
public:
SafeQueue queue;
};
typedef managed_shared_ptr<Jobs, managed_windows_shared_memory>::type my_shared_ptr;
int main(int argc, char* argv[])
{
if (argc == 1) { //Parent process
std::cout << "starting as parent" << std::endl;
managed_windows_shared_memory segment(create_only, "MySharedMemory", 4096);
my_shared_ptr sh_ptr = make_managed_shared_ptr(segment.construct<Jobs>("object to share")(), segment);
sh_ptr->queue.push({1, 2, 3});
std::string command = "\"" + std::string(argv[0]) + "\"";
command += " child ";
std::thread t([](const std::string& command) {
std::system(command.c_str());
}, command);
while (true) {
}
}
else {
std::cout << "starting as child" << std::endl;
//Open already created shared memory object.
managed_windows_shared_memory shm(open_only, "MySharedMemory");
Jobs* shared_job_list = shm.find<Jobs>("object to share").first;
std::vector<int> taken;
while (true) {
int result;
if ((result = shared_job_list->queue.next()) != -999) {
taken.push_back(result);
std::cout << "took job " << result << std::endl;
continue;
}
break;
}
std::string out = "taken jobs: ";
for (int res : taken) {
out += ", " + res;
}
std::cout << out << std::endl;
return 0;
}
return 0;
}
The internal data of the shared Jobs must be pointer-free to work with multiple processes. But it is not because it contains std::queue . The pointers inside will not work across multiple processes.

Multithreaded program blocks when compiled with optimizations

I am developing a project in which I have to model (arbitrary) computations that happen in pipeline.
The pipeline is made of stages, each stage takes the input from the previous stage (except the first, who directly receives tasks from the pipeline object), makes a computation and sends the result to the next stage. Each stage is implemented with a separate thread of execution.
The pipeline should have a basic load balancing capability: if (after a while) it recognizes that the sum of the execution times of two consecutive stages is smaller than the execution time of the slowest stage, it "collapses" those two stages, that is it makes both of them run sequentially, using a single thread.
There are three classes in the project: classes Pipeline and Stage are obvious, while class TSOHeap (Thread-Safe Ordered heap) is the buffer used in input by each stage. It has a maximum size and the capability to give highest priority to special messages indicating that a Stage has to be collapsed.
My question is: why if I compile without optimizations the code runs smoothly (or at least does not block), while if I compile with optimizations ( -O2, -O3 ) the program blocks? If I run the program with the debugger it blocks few times; if I run the program "normally" from terminal it blocks almost always.
The strange thing is that a thread blocks on a line in which there is a simple print. Before I added that print (for debugging purpose), the program blocked on the previous line, which is the guard of a while loop.
I guess the problem is related to synchronization among threads, but I don't know how to discover the faulty part. The only constant is that the program blocks after the method collapse_next_stage() has been invoked, that is after a thread has been stopped.
Any suggestion would be appreciated, even general procedures to discover bugs like these.
I report the code to run an example:
Class "TSOHeap.hpp":
#include <mutex>
#include <queue>
#include <vector>
#include <atomic>
#include <climits>
using namespace std;
template<typename T>
struct Comparator{
bool operator()(pair<T,int> p1, pair<T,int> p2){
return p1.second > p2.second;
}
};
//Thread-Safe Ordered Heap
template<typename T>
struct TSOHeap
{
TSOHeap(int _max=10):size{0},max{_max}{};
~TSOHeap(){}
void push(T* item, int id){
while(size==max);
{
lock_guard<mutex> lock(heap_mutex);
heap.push(pair<T*,int>(item, id));
size++;
}
}
pair<T*,int> pop(){
while(size==0);
{
lock_guard<mutex> lock(heap_mutex);
pair<T*,int> p = heap.top();
heap.pop();
size--;
return p;
}
}
priority_queue<pair<T*,int>, vector<pair<T*,int>>,Comparator<T*>> heap;
atomic<int> size;
int max;
mutex heap_mutex;
};
Class "Stage.hpp":
#include "TSOHeap.hpp"
#include <iostream>
#include <thread>
#include <vector>
#include <chrono>
#include <mutex>
using namespace std;;
struct IStage{
virtual void run() = 0;
virtual void wait_end() = 0;
virtual void stage_func() = 0;
virtual double get_exec_time() = 0;
virtual void reset_exec_time()=0;
virtual void add_next(IStage&)=0;
virtual IStage* get_next() = 0;
virtual void* get_input_ptr() = 0;
virtual void set_input(void*) = 0;
virtual void collapse() = 0;
virtual bool is_collapsed() = 0;
virtual void collapse_next_stage() = 0;
virtual int num_collapsed() = 0;
~IStage(){};
};
template <typename Tin, typename Tf, typename Tout>
struct Stage : IStage{
Stage(Tf function, int ind):fun{function}, input_ptr{new(TSOHeap<Tin>)},_end{false},
next{nullptr}, collapsed{0}, i{ind}, exec_time{0.0},count{0},collapsing{false},c{0}{};
~Stage(){delete input_ptr;}
void stage_func(){
Tin * input = input_ptr->pop().first;
if (input!=nullptr){
auto start = chrono::system_clock::now();
Tout out = fun(*input);
auto end = chrono::system_clock::now();
chrono::duration<double> diff = end-start;
set_exec_time(diff.count());
if (next!=nullptr)
next->set_input(new Tout(out));
}
else
_end = true;
}
void run_thread(){
while(!_end){
cout << "t " << i << ", r " << ++c << endl; // BLOCKS HERE
while(collapsing); //waiting that next stage finishes the remaining tasks
stage_func();
if(collapsed==1 && !_end)
next->stage_func();
}
if(collapsed!=-1){
IStage * nptr = next;
if(nptr!=nullptr && nptr->is_collapsed())
nptr = nptr->get_next();
if(nptr!=nullptr)
nptr->set_input(nullptr);
}
else{
while((input_ptr->size)>0)
stage_func();
}
}
void run()
{
thread _t(&Stage::run_thread, this);
t = move(_t);
return;
}
void wait_end()
{
t.join();
}
void set_input(void * iptr)
{
input_ptr->push(static_cast<Tin*>(iptr), ++count);
}
void* get_input_ptr()
{
return input_ptr;
}
void add_next(IStage &n)
{
next = &n;
output_ptr = static_cast<TSOHeap<Tout>*>(n.get_input_ptr());
}
void collapse()
{
collapsed=-1;
input_ptr->push(nullptr, INT_MIN);
// First condition is to avoid deadlock, in case this thread finished the execution in the meanwhile
while(!_end && (input_ptr->size) > 0);
}
bool is_collapsed()
{
return collapsed==-1;
}
void collapse_next_stage()
{
collapsing = true;
next->collapse();
collapsed++;
collapsing = false;
cout << "Stage # " << i << " has collapsed the successive Stage" << endl;
}
IStage* get_next()
{
return next;
}
double get_exec_time()
{
return exec_time;
}
void reset_exec_time()
{
set_exec_time(0.0);
}
void set_exec_time(double value)
{
lock_guard<mutex> lock(et_mutex);
exec_time = value;
}
int num_collapsed()
{
return collapsed;
}
Tf fun;
TSOHeap<Tin> * input_ptr;
bool _end;
IStage * next;
int collapsed;
int const i;
double exec_time;
int count;
mutex et_mutex;
bool collapsing;
int c;
TSOHeap<Tout> * output_ptr;
thread t;
};
Class "Pipe.hpp":
#include "Stage.hpp"
#include <list>
#include <thread>
#include <algorithm>
using namespace std;;
template <typename Tin, typename Tout>
struct Pipe{
Pipe(list<IStage*>li, int n_samples=10):slowest{-1},end{false},num_samples{n_samples}
{
for(auto& s:li)
add_node(s);
}
void add_node(IStage* sptr)
{
if(!nodes.empty())
nodes.back()->add_next(*sptr);
nodes.push_back(sptr);
}
void set_input(void * in_ptr)
{
nodes.front()->set_input(in_ptr);
}
int num_nodes()
{
return nodes.size();
}
void run()
{
for(auto &x: nodes)
x->run();
}
void run(list<Tin>&& input)
{
thread t(&Pipe::run_manager, this, ref(input));
while(!end)
monitor_times();
t.join();
}
void run_manager(list<Tin>& input)
{
run();
for(auto& x:input)
set_input(&x);
set_input(nullptr);
end=true;
for(auto& s : nodes)
s->wait_end();
}
void monitor_times()
{ // initialization phase
vector<int> count;
vector<double> avg;
vector<priority_queue<pair<double,int>, vector<pair<double,int>>,Comparator<double>>> measures;
for(auto& x : nodes){
count.push_back(0);
avg.push_back(0);
measures.push_back(priority_queue<pair<double,int>,
vector<pair<double,int>>,Comparator<double>>());
}
while(!end){
// monitoring phase
for(int i=0; i<nodes.size(); i++){
if(nodes[i]->get_exec_time()!=0){
pair<double,int> measure = pair<double,int>(nodes[i]->get_exec_time(),++count[i]);
nodes[i]->reset_exec_time();
measures[i].push(measure);
if(count[i]<=num_samples){
avg[i] = (avg[i]*(count[i]-1) + measure.first) / count[i];
}
else
{
double old = measures[i].top().first;
// the ordering of the heap guarantees that I drop the oldest measure
measures[i].pop();
avg[i] = (avg[i] * num_samples - old + measure.first) / num_samples;
}
}
}
// updating phase
if(is_steady_state(count)){
int slowest = get_slowest_stage(avg);
for(int i=0; i<nodes.size()-1; i++){
if(avg[i]+avg[i+1]<avg[slowest]){
if(nodes[i]->num_collapsed()==0 && nodes[i+1]->num_collapsed()==0){
nodes[i]->collapse_next_stage();
break;
}
}
}
}
}
}
bool is_steady_state(vector<int>& count){
for(auto& c: count){
if(c < num_samples) return false;
}
return true;
}
int get_slowest_stage(vector<double>& avg){
double max = 0.0;
int index = -1;
for(int i=0; i<avg.size(); i++){
if(avg[i]>max){
max=avg[i];
index = i;
}
}
return index;
}
int slowest;
bool end;
int num_samples;
vector<IStage*> nodes;
};
Class "main.cpp":
#include<iostream>
#include<functional>
#include <chrono>
#include<cmath>
#include "Pipe.hpp"
using namespace std;;
auto f = [](int x){
int c = 0;
for(int i=0; i<300; i++)
c=sin(i);
return x;
};
auto fast = [] (int x) {return x;};
auto fast_init = [](int x){
if(x < 5)
return x;
int c=0;
for(int i=0; i<300; i++)
c=sin(i);
return x;
};
auto print = [] (int x) {
cout << "Result: " << x << " " << endl;
return x;
};
int main(int argc, char* argv[])
{
auto print_usage_msg = [&](){
cout << "Usage: " << argv[0] << " <func_type> \n" <<
"<func_type> = \n"
" 0 to have 2 consecutive stages running the fast function\n"
" 1 to have 2 consecutive stages running the fast function "
"but after a short time reaching steady state " << endl;
};
if(argc!=2){
print_usage_msg();
return 1;
}
int fun_code = atoi(argv[1]);
if (fun_code!=0 && fun_code!=1){
print_usage_msg();
return 1;
}
Stage<int,function<int(int)>,int> s1{f,1};
Stage<int,function<int(int)>,int> s2{f,2};
Stage<int,function<int(int)>,int> s3{f,3};
Stage<int,function<int(int)>,int> s4{f,4};
Stage<int,function<int(int)>,int> s5{f,5};
Stage<int,function<int(int)>,int> s6{f,6};
Stage<int,function<int(int)>,int> s7{f,7};
Stage<int,function<int(int)>,int> sp{print,8};
if(fun_code==0){
s2.fun = fast;
s3.fun = fast;
}
else{
s2.fun = fast_init;
s3.fun = fast_init;
}
Pipe<int,int> p ({&s1, &s2, &s3, &s4, &s5, &s6, &s7, &sp});
cout << "Pipe length: " << p.num_nodes() << endl;
list<int> li {};
for(int i=0; i<100; i++)
li.push_back(i);
p.run(move(li));
return 0;
}
Compile with:
g++ main.cpp -std=c++11 -pthread -O3 -o gpipe -g
Run with :
./gpipe 1
Thanks for any help!
Imagine the following code for a single-threaded program:
void func()
{
bool a = true;
while(a)
{
// busy wait...
}
}
Will this function ever return? Obviously not. If you were a compiler, how would you write optimized code for this?
1: NOP
2: GOTO 1
This is exactly what you're doing with this bit of code. Twice.
while(!_end){ // here #1
cout << "t " << i << ", r " << ++c << endl;
while(collapsing) // here #2
; // for the love of God, move your semicolon here or use braces
stage_func();
if(collapsed==1 && !_end)
next->stage_func();
}
Your compiler has absolutely no obligation to realize that you're doing multi-threading programming. (It's your job to tell it)
The compiler needs to know not to perform optimizations on _end and collapsed. DO NOT USE volatile. Why? volatile will keep the compiler from optimizing a variable, but... heh heh... the CPU can also potentially optimize away your writes to _end and collapsed from different threads (by keeping them in its cache and not writing to main memory). Compilers and CPU's will also re-order your instructions, which can cause similar problems.
Memory fences (aka memory barriers) can be used to instruct the CPU to do things like push out pending writes or re-update its cached value for reading. They also give guidelines for command re-ordering. AFAIK the std::atomic_thread_fence will prevent compiler reordering but I've read conflicting things about this...
By far the simplest, most-pragmatic, and easiest-to-prove-correct thing to do is just to switch all your inter-thread communicating variables to std::atomic<> types, which incorporate memory barriers. So
std::atomic<bool> _end;
std::atomic<int> collapsed;
As a general rule, any data that is shared between threads should be protected by a mutex OR be an std::atomic<> if race conditions are not an issue (as you are doing with the simple signaling). You can break this rule if you really know what you're doing and really know the architecture, compiler, and standard implementation really well, but that's a tall order even for an expert.
By the way, a mutex's lock and unlock operation both incorporate a memory barrier, in case you were worried about that. So when you get a pointer from the TSOHeap, that's fine (assuming your TSOHeap implementation is correct...I didn't look at it).
You have race conditions in TSOHeap when using size. While size is atomic, it is a part of larger state that is not atomic, so that changes in size are not synchronized with changes to the rest of the state.
Make size non-atomic and access it only when the mutex is locked. Add condition variables to notify threads waiting in push and pop.
Alternatively, remove size entirely. Example:
template<typename T>
struct TSOHeap
{
TSOHeap(size_t _max=10): max{_max}{}
void push(T* item, int id){
unique_lock<mutex> lock(heap_mutex);
while(heap.size() == max)
cnd_pop.wait(lock);
heap.push(pair<T*,int>(item, id));
cnd_push.notify_one();
}
pair<T*,int> pop() {
pair<T*,int> result = {};
{
unique_lock<mutex> lock(heap_mutex);
while(heap.empty())
cnd_push.wait(lock);
bool notify = heap.size() == max;
result = heap.top();
heap.pop();
if(notify)
cnd_pop.notify_one();
}
return result;
}
mutex heap_mutex;
condition_variable cnd_push, cnd_pop;
priority_queue<pair<T*,int>, vector<pair<T*,int>>,Comparator<T*>> heap;
size_t const max;
};

std::thread throwing "resource dead lock would occur"

I have a list of objects, each object has member variables which are calculated by an "update" function. I want to update the objects in parallel, that is I want to create a thread for each object to execute it's update function.
Is this a reasonable thing to do? Any reasons why this may not be a good idea?
Below is a program which attempts to do what I described, this is a complete program so you should be able to run it (I'm using VS2015). The goal is to update each object in parallel. The problem is that once the update function completes, the thread throws an "resource dead lock would occur" exception and aborts.
Where am I going wrong?
#include <iostream>
#include <thread>
#include <vector>
#include <algorithm>
#include <thread>
#include <mutex>
#include <chrono>
class Object
{
public:
Object(int sleepTime, unsigned int id)
: m_pSleepTime(sleepTime), m_pId(id), m_pValue(0) {}
void update()
{
if (!isLocked()) // if an object is not locked
{
// create a thread to perform it's update
m_pThread.reset(new std::thread(&Object::_update, this));
}
}
unsigned int getId()
{
return m_pId;
}
unsigned int getValue()
{
return m_pValue;
}
bool isLocked()
{
bool mutexStatus = m_pMutex.try_lock();
if (mutexStatus) // if mutex is locked successfully (meaning it was unlocked)
{
m_pMutex.unlock();
return false;
}
else // if mutex is locked
{
return true;
}
}
private:
// private update function which actually does work
void _update()
{
m_pMutex.lock();
{
std::cout << "thread " << m_pId << " sleeping for " << m_pSleepTime << std::endl;
std::chrono::milliseconds duration(m_pSleepTime);
std::this_thread::sleep_for(duration);
m_pValue = m_pId * 10;
}
m_pMutex.unlock();
try
{
m_pThread->join();
}
catch (const std::exception& e)
{
std::cout << e.what() << std::endl; // throws "resource dead lock would occur"
}
}
unsigned int m_pSleepTime;
unsigned int m_pId;
unsigned int m_pValue;
std::mutex m_pMutex;
std::shared_ptr<std::thread> m_pThread; // store reference to thread so it doesn't go out of scope when update() returns
};
typedef std::shared_ptr<Object> ObjectPtr;
class ObjectManager
{
public:
ObjectManager()
: m_pNumObjects(0){}
void updateObjects()
{
for (int i = 0; i < m_pNumObjects; ++i)
{
m_pObjects[i]->update();
}
}
void removeObjectByIndex(int index)
{
m_pObjects.erase(m_pObjects.begin() + index);
}
void addObject(ObjectPtr objPtr)
{
m_pObjects.push_back(objPtr);
m_pNumObjects++;
}
ObjectPtr getObjectByIndex(unsigned int index)
{
return m_pObjects[index];
}
private:
std::vector<ObjectPtr> m_pObjects;
int m_pNumObjects;
};
void main()
{
int numObjects = 2;
// Generate sleep time for each object
std::vector<int> objectSleepTimes;
objectSleepTimes.reserve(numObjects);
for (int i = 0; i < numObjects; ++i)
objectSleepTimes.push_back(rand());
ObjectManager mgr;
// Create some objects
for (int i = 0; i < numObjects; ++i)
mgr.addObject(std::make_shared<Object>(objectSleepTimes[i], i));
// Print expected object completion order
// Sort from smallest to largest
std::sort(objectSleepTimes.begin(), objectSleepTimes.end());
for (int i = 0; i < numObjects; ++i)
std::cout << objectSleepTimes[i] << ", ";
std::cout << std::endl;
// Update objects
mgr.updateObjects();
int numCompleted = 0; // number of objects which finished updating
while (numCompleted != numObjects)
{
for (int i = 0; i < numObjects; ++i)
{
auto objectRef = mgr.getObjectByIndex(i);
if (!objectRef->isLocked()) // if object is not locked, it is finished updating
{
std::cout << "Object " << objectRef->getId() << " completed. Value = " << objectRef->getValue() << std::endl;
mgr.removeObjectByIndex(i);
numCompleted++;
}
}
}
system("pause");
}
Looks like you've got a thread that is trying to join itself.
While I was trying to understand your solution I was simplifying it a lot. And I come to point that you use std::thread::join() method in a wrong way.
std::thread provide capabilities to wait for it completion (non-spin wait) -- In your example you wait for thread completion in infinite loop (snip wait) that will consume CPU time heavily.
You should call std::thread::join() from other thread to wait for thread completion. Mutex in Object in your example is not necessary. Moreover, you missed one mutex to synchronize access to std::cout, which is not thread-safe. I hope the example below will help.
#include <iostream>
#include <thread>
#include <vector>
#include <algorithm>
#include <thread>
#include <mutex>
#include <chrono>
#include <cassert>
// cout is not thread-safe
std::recursive_mutex cout_mutex;
class Object {
public:
Object(int sleepTime, unsigned int id)
: _sleepTime(sleepTime), _id(id), _value(0) {}
void runUpdate() {
if (!_thread.joinable())
_thread = std::thread(&Object::_update, this);
}
void waitForResult() {
_thread.join();
}
unsigned int getId() const { return _id; }
unsigned int getValue() const { return _value; }
private:
void _update() {
{
{
std::lock_guard<std::recursive_mutex> lock(cout_mutex);
std::cout << "thread " << _id << " sleeping for " << _sleepTime << std::endl;
}
std::this_thread::sleep_for(std::chrono::seconds(_sleepTime));
_value = _id * 10;
}
std::lock_guard<std::recursive_mutex> lock(cout_mutex);
std::cout << "Object " << getId() << " completed. Value = " << getValue() << std::endl;
}
unsigned int _sleepTime;
unsigned int _id;
unsigned int _value;
std::thread _thread;
};
class ObjectManager : public std::vector<std::shared_ptr<Object>> {
public:
void runUpdate() {
for (auto it = this->begin(); it != this->end(); ++it)
(*it)->runUpdate();
}
void waitForAll() {
auto it = this->begin();
while (it != this->end()) {
(*it)->waitForResult();
it = this->erase(it);
}
}
};
int main(int argc, char* argv[]) {
enum {
TEST_OBJECTS_NUM = 2,
};
srand(static_cast<unsigned int>(time(nullptr)));
ObjectManager mgr;
// Generate sleep time for each object
std::vector<int> objectSleepTimes;
objectSleepTimes.reserve(TEST_OBJECTS_NUM);
for (int i = 0; i < TEST_OBJECTS_NUM; ++i)
objectSleepTimes.push_back(rand() * 9 / RAND_MAX + 1); // 1..10 seconds
// Create some objects
for (int i = 0; i < TEST_OBJECTS_NUM; ++i)
mgr.push_back(std::make_shared<Object>(objectSleepTimes[i], i));
assert(mgr.size() == TEST_OBJECTS_NUM);
// Print expected object completion order
// Sort from smallest to largest
std::sort(objectSleepTimes.begin(), objectSleepTimes.end());
for (size_t i = 0; i < mgr.size(); ++i)
std::cout << objectSleepTimes[i] << ", ";
std::cout << std::endl;
// Update objects
mgr.runUpdate();
mgr.waitForAll();
//system("pause"); // use Ctrl+F5 to run the app instead. That's more reliable in case of sudden app exit.
}
About is it a reasonable thing to do...
A better approach is to create an object update queue. Objects that need to be updated are added to this queue, which can be fulfilled by a group of threads instead of one thread per object.
The benefits are:
No 1-to-1 correspondence between thread and objects. Creating a thread is a heavy operation, probably more expensive than most update code for a single object.
Supports thousands of objects: with your solution you would need to create thousands of threads, which you will find exceeds your OS capacity.
Can support additional features like declaring dependencies between objects or updating a group of related objects as one operation.

Producer and consumer functions for test thread-safe stack examples of C++ concurrency in action book

I've started to learn concurrency(C++11) reading the book C++ Concurrency in Action. How to test a thread-safe stack class (Example was taken from C++ concurrency in action listing 3.5). I would like to have differents implementations of producer/consumer functions that let me test all its functions.
#include <exception>
#include <memory>
#include <mutex>
#include <stack>
struct empty_stack: std::exception
{
const char* what() const throw();
};
template<typename T>
class threadsafe_stack
{
private:
std::stack<T> data;
mutable std::mutex m;
public:
threadsafe_stack() {}
threadsafe_stack(const threadsafe_stack& other)
{
std::lock_guard<std::mutex> lock(other.m);
data=other.data;
}
threadsafe_stack& operator = (const threadsafe_stack&) = delete;
void push(T new_value)
{
std::lock_guard<std::mutex> lock(m);
data.push(new_value);
}
std::shared_ptr<T> pop()
{
std::lock_guard<std::mutex> lock(m);
if(data.empty()) throw empty_stack();
std::shared_ptr<T> const res(std::make_shared<T>(data.top()));
data.pop();
return res;
}
void pop(T& value)
{
std::lock_guard<std::mutex> lock(m);
if (data.empty()) throw empty_stack();
value = data.top();
data.pop();
}
bool empty() const
{
std::lock_guard<std::mutex> lock(m);
return data.empty();
}
};
int main()
{
//test class
return 0;
}
You simply need to:
Create a stack from your main function
Start a thread that will fill the stack (pass the stack object pointer as parameter to the thread and make the thread execute a for loop filling the stack by calling push all the time)
Then, while this thread runs, empty the stack from another loop of your main program
You can also declare the stack as a global variable if you simply want to do a quick test and don't know how to pass objects to the thread upon creation.
If you need clean exit, add an atomic (edited, I first recommended volatile) bool passed to the thread to tell it you're done and ask it to stop its loop. Then use join to wait for the thread to exit.
A minimal testdriver for your structure could look like this:
struct Msg {
size_t a;size_t b;size_t c;size_t d;
};
bool isCorrupted(const Msg& m) {
return !(m.a == m.b && m.b == m.c && m.c == m.d);
}
int main()
{
threadsafe_stack<Msg> stack;
auto prod = std::async(std::launch::async, [&]() {
for (size_t i = 0; i < 1000000; ++i){
Msg m = { i, i, i, i };
stack.push(m);
//std::this_thread::sleep_for(std::chrono::microseconds(1));
if (i % 1000 == 0) {
std::cout << "stack.push called " << i << " times " << std::endl;
}
}
});
auto cons = std::async(std::launch::async, [&]() {
for (size_t i = 0; i < 1000000; ++i){
try {
Msg m;
stack.pop(m);
if (isCorrupted(m)) {
std::cout << i <<" ERROR: MESSAGE WAS CORRUPED:" << m.a << "-" << m.b << "-" << m.c << "-" << m.d << std::endl;
}
if (i % 1000 == 0) {
std::cout << "stack.pop called " << i << " times " << std::endl;
}
}
catch (empty_stack e) {
std::cout << i << " Stack was empty!" << std::endl;
}
}
});
prod.wait();
cons.wait();
return 0;
}
Note, that this doesn't test all different functions, nor for all possible race conditions, so you'd have to exend it.
Two recommendations regarding your class design:
1) I wouldn't throw an exception when the stack is empty, as this is a very common case in an asynchronous scenario. Rather make the consumer thread wait (see condition variables for this) or return a false or nullptr respectively.
2) Use std::unique_ptr instead of std::shared_ptr<T> in your pop() function as it is more efficient and you don't share anything here anyway.

Boost shared memory and synchronized queue issue/crash in consumer process

I'm trying to consume from a child process a synchronized queue in c++. I'm using this synchronized queue in C++ () (http://www.internetmosquito.com/2011/04/making-thread-safe-queue-in-c-i.html)
I modified the queue to be serializable in boost and also replaced the used boost::mutex io_mutex_ to use instead an inteprocess mutex (thanks #Sehe) boost::interprocess::interprocess_mutex io_mutex_ And when locking
I changed every line that has boost::mutex::scoped_lock lock(io_mutex_); to scoped_lock<interprocess_mutex> lock(io_mutex_);
template<class T>
class SynchronizedQueue
{
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & sQueue;
ar & io_mutex_;
ar & waitCondition;
}
... // queue implementation (see [http://www.internetmosquito.com/2011/04/making-thread-safe-queue-in-c-i.html][2])
}
In my Test app, I'm creating the synchronized queue and storing in it 100 instances of this class:
class gps_position
{
friend class boost::serialization::access;
template<class Archive>
void serialize(Archive & ar, const unsigned int version)
{
ar & degrees;
ar & minutes;
ar & seconds;
}
public:
int degrees;
int minutes;
float seconds;
gps_position() {};
gps_position(int d, int m, float s) :
degrees(d), minutes(m), seconds(s)
{}
};
Common definitions between Consumer and producer:
char *SHARED_MEMORY_NAME = "MySharedMemory";
char *SHARED_QUEUE_NAME = "MyQueue";
typedef SynchronizedQueue<gps_position> MySynchronisedQueue;
Producer process code:
// Remove shared memory if it was created before
shared_memory_object::remove(SHARED_MEMORY_NAME);
// Create a new segment with given name and size
managed_shared_memory mysegment(create_only,SHARED_MEMORY_NAME, 65536);
MySynchronisedQueue *myQueue = mysegment.construct<MySynchronisedQueue>(SHARED_QUEUE_NAME)();
//Insert data in the queue
for(int i = 0; i < 100; ++i) {
gps_position position(i, 2, 3);
myQueue->push(position);
}
// Start 1 process (for testing for now)
STARTUPINFO info1={sizeof(info1)};
PROCESS_INFORMATION processInfo1;
ZeroMemory(&info1, sizeof(info1));
info1.cb = sizeof info1 ; //Only compulsory field
ZeroMemory(&processInfo1, sizeof(processInfo1));
// Launch child process
LPTSTR szCmdline = _tcsdup(TEXT("ClientTest.exe"));
CreateProcess(NULL, szCmdline, NULL, NULL, TRUE, 0, NULL, NULL, &info1, &processInfo1);
// Wait a little bit ( 5 seconds) for the started client process to load
WaitForSingleObject(processInfo1.hProcess, 5000);
/* THIS TESTING CODE WORK HERE AT PARENT PROCESS BUT NOT IN CLIENT PROCESS
// Open the managed segment memory
managed_shared_memory openedSegment(open_only, SHARED_MEMORY_NAME);
//Find the synchronized queue using it's name
MySynchronisedQueue *openedQueue = openedSegment.find<MySynchronisedQueue>(SHARED_QUEUE_NAME).first;
gps_position position;
while (true) {
if (myQueue->pop(position)) {
std::cout << "Degrees= " << position.degrees << " Minutes= " << position.minutes << " Seconds= " << position.seconds;
std::cout << "\n";
}
else
break;
}*/
// Wait until the queue is empty: has been processed by client(s)
while(myQueue->sizeOfQueue() > 0) continue;
// Close process and thread handles.
CloseHandle( processInfo1.hThread );
My consumer code is as follow:
//Open the managed segment memory
managed_shared_memory segment(open_only, SHARED_MEMORY_NAME);
//Find the vector using it's name
MySynchronisedQueue *myQueue = segment.find<MySynchronisedQueue>(SHARED_QUEUE_NAME).first;
gps_position position;
// Pop each position until the queue become empty and output its values
while (true)
{
if (myQueue->pop(position)) { // CRASH HERE
std::cout << "Degrees= " << position.degrees << " Minutes= " << position.minutes << " Seconds= " << position.seconds;
std::cout << "\n";
}
else
break;
}
When I run the parent process (producer) that create the queue and create the child (consumer) process, the child crash when trying to 'pop' from the queue.
What I'm doing wrong here ? Any idea ? Thanks for any insight. This is my first app creating using boost and shared memory.
My goal is to be able to consume this queue from multiple process. In the example above I'm creating only one child process to make sure first it works before creating other child process. The idea is the queue will be filled in advance by items and multiple created process will 'pop' items from it without clashing on each other.
To the updated code:
you should be using interprocess_mutex if you're gonna share the queue; This implies a host of dependent changes.
your queue should be using a shared-memory allocator if you're gonna share the queue
the conditions should be raised under the mutex for reliable behaviour on all platforms
you failed to lock inside toString(). Even though you copy the collection, that's not nearly enough because the container may get modified during that copy.
The queue design makes much sense (what is the use of a "thread safe" function that returns empty()? It could be no longer empty/just empty before you process the return value... These are called race conditions and lead to really hard to track bugs
What has Boost Serialization got to do with anything? It seems just there to muddle the picture, because it's not required and not being used.
Likewise for Boost Any. Why is any used in toString()? Due to the design of the queue, the typeid is always gpsposition anyways.
Likewise for boost::lexical_cast<> (why are you doing string concatenation if you already have the stringstream anyways?)
Why are empty(), toString(), sizeOfQueue() not const?
I highly recommend to use boost::interprocess::message_queue. This seems to be what you actually wanted to use.
Here's a modified version that puts the container in shared memory and it works:
#include <boost/interprocess/allocators/allocator.hpp>
#include <boost/interprocess/containers/deque.hpp>
#include <boost/interprocess/managed_shared_memory.hpp>
#include <boost/interprocess/sync/interprocess_condition.hpp>
#include <boost/interprocess/sync/interprocess_mutex.hpp>
#include <boost/thread/lock_guard.hpp>
#include <sstream>
namespace bip = boost::interprocess;
template <class T> class SynchronizedQueue {
public:
typedef bip::allocator<T, bip::managed_shared_memory::segment_manager> allocator_type;
private:
bip::deque<T, allocator_type> sQueue;
mutable bip::interprocess_mutex io_mutex_;
mutable bip::interprocess_condition waitCondition;
public:
SynchronizedQueue(allocator_type alloc) : sQueue(alloc) {}
void push(T element) {
boost::lock_guard<bip::interprocess_mutex> lock(io_mutex_);
sQueue.push_back(element);
waitCondition.notify_one();
}
bool empty() const {
boost::lock_guard<bip::interprocess_mutex> lock(io_mutex_);
return sQueue.empty();
}
bool pop(T &element) {
boost::lock_guard<bip::interprocess_mutex> lock(io_mutex_);
if (sQueue.empty()) {
return false;
}
element = sQueue.front();
sQueue.pop_front();
return true;
}
unsigned int sizeOfQueue() const {
// try to lock the mutex
boost::lock_guard<bip::interprocess_mutex> lock(io_mutex_);
return sQueue.size();
}
void waitAndPop(T &element) {
boost::lock_guard<bip::interprocess_mutex> lock(io_mutex_);
while (sQueue.empty()) {
waitCondition.wait(lock);
}
element = sQueue.front();
sQueue.pop();
}
std::string toString() const {
bip::deque<T> copy;
// make a copy of the class queue, to reduce time locked
{
boost::lock_guard<bip::interprocess_mutex> lock(io_mutex_);
copy.insert(copy.end(), sQueue.begin(), sQueue.end());
}
if (copy.empty()) {
return "Queue is empty";
} else {
std::stringstream os;
int counter = 0;
os << "Elements in the Synchronized queue are as follows:" << std::endl;
os << "**************************************************" << std::endl;
while (!copy.empty()) {
T object = copy.front();
copy.pop_front();
os << "Element at position " << counter << " is: [" << typeid(object).name() << "]\n";
}
return os.str();
}
}
};
struct gps_position {
int degrees;
int minutes;
float seconds;
gps_position(int d=0, int m=0, float s=0) : degrees(d), minutes(m), seconds(s) {}
};
static char const *SHARED_MEMORY_NAME = "MySharedMemory";
static char const *SHARED_QUEUE_NAME = "MyQueue";
typedef SynchronizedQueue<gps_position> MySynchronisedQueue;
#include <boost/interprocess/shared_memory_object.hpp>
#include <iostream>
void consumer()
{
bip::managed_shared_memory openedSegment(bip::open_only, SHARED_MEMORY_NAME);
MySynchronisedQueue *openedQueue = openedSegment.find<MySynchronisedQueue>(SHARED_QUEUE_NAME).first;
gps_position position;
while (openedQueue->pop(position)) {
std::cout << "Degrees= " << position.degrees << " Minutes= " << position.minutes << " Seconds= " << position.seconds;
std::cout << "\n";
}
}
void producer() {
bip::shared_memory_object::remove(SHARED_MEMORY_NAME);
bip::managed_shared_memory mysegment(bip::create_only,SHARED_MEMORY_NAME, 65536);
MySynchronisedQueue::allocator_type alloc(mysegment.get_segment_manager());
MySynchronisedQueue *myQueue = mysegment.construct<MySynchronisedQueue>(SHARED_QUEUE_NAME)(alloc);
for(int i = 0; i < 100; ++i)
myQueue->push(gps_position(i, 2, 3));
// Wait until the queue is empty: has been processed by client(s)
while(myQueue->sizeOfQueue() > 0)
continue;
}
int main() {
producer();
// or enable the consumer code for client:
// consumer();
}