Why does my multithreaded job queue crash? - c++

I'm trying to run a multi-threaded job queue in c++. As an example I have tested the following program:
#include <thread>
#include <mutex>
#include <list>
#include <vector>
class Job
{
public:
void Run(void)
{
}
};
class Queue
{
private:
std::recursive_mutex mtxJobs;
std::list<Job *> mJobs;
public:
Job *Take(void)
{
std::scoped_lock(mtxJobs);
if (mJobs.size() > 0)
{
Job *pJob(mJobs.front());
mJobs.pop_front();
return pJob;
}
else
return NULL;
}
void Add(Job *pJob)
{
std::scoped_lock(mtxJobs);
mJobs.push_back(pJob);
}
size_t Size(void)
{
std::scoped_lock(mtxJobs);
return mJobs.size();
}
};
void Work(Queue &q)
{
Job *pJob;
while ((pJob = q.Take()) != NULL)
{
pJob->Run();
delete pJob;
}
}
int main()
{
size_t i;
Queue q;
for (i = 0; i < 1000; i++)
q.Add(new Job);
std::vector<std::thread> threads(4);
for (i = 0; i < 4; i++)
threads[i] = std::thread(Work, std::ref(q));
for (i = 0; i < 4; i++)
threads[i].join();
return 0;
}
When I run it like this:
g++ -std=c++17 -lpthread test.cpp -o test && ./test
it crashes with a SEGFAULT. Does anyone have any idea why?
GDB indicates that the crash always occurs when the list 'mJobs' is accessed. However, the locks should prevent concurrent modification right?
Can anyone help me?

You are accessing your queue without synchronization:
std::scoped_lock(mtxJobs);
this is local variable with name mtxJobs which is created taking no arguments and hides your mutex mtxJobs member. When scoped_lock is created without arguments it does nothing according to reference.
You need to write:
std::scoped_lock lock(mtxJobs);
now, your mutex is locked in ctor of scoped_lock object.

Related

Can I avoid using mutex lock by implementing function as a class object method

Background: I have a list of files in a location and moveFile() function that will be used to move these files. my goal is to move all those files in parallel. So, I implemented multiple threads.
To avoid conflict initially I considered mutex lock before moveFile(). That will prevent threads to run in parallel.
Here's how it's been implemented:
std::mutex mtx;
enum class status
{ SUCCESS, FAILED };
status moveFile()
{ //function implementation }
void fileOperator()
{ // This is prevent parallel operation
mtx.lock;
moveFile();
mtx.unlock;
}
int main ()
{
int threadSize= 3; //generic size
std::thread fileProc[threadSize];
int index = 0;
// staring new threads
for (index; index < threadSize; index++)
{
fileProc[index] = std::thread (&fileOperator);
}
//joining all the threads
for (int i=0; i <threadSize; i++)
{
fileProc[i].join();
}
return 0;
}
Suggestion: I was wondering, if I remove mutex lock implement the moveFile() as in a class and call it as an object method, will it be a better way to implement parallel operation?
Not really sure what the problem here is, most probably it's located in the moveFile function but something like this should work:
#include <future>
#include <iostream>
#include <mutex>
#include <thread>
#include <vector>
std::mutex mtx;
enum class status { SUCCESS, FAILED };
status moveFile() {
std::cout << "Moving file" << std::endl;
return status::SUCCESS;
}
void fileOperator() {
std::lock_guard<std::mutex> lock(mtx);
moveFile();
}
int main(int argc, char** argv) {
std::vector<std::thread> threads;
int threadSize = 3;
for (int index = 0; index < threadSize; ++index) {
threads.push_back(std::thread(&fileOperator));
}
for (auto& th : threads) {
th.join();
}
return 0;
}
Could you also please post the contents of the moveFile to be able to help you with that? Thanks.

Thread pool on a queue in C++

I've been trying to solve a problem concurrently, which fits the thread pool pattern very nicely. Here I will try to provide a minimal representative example:
Say we have a pseudo-program like this:
Q : collection<int>
while (!Q.empty()) {
for each q in Q {
// perform some computation
}
// assign a new value to Q
Q = something_completely_new();
}
I'm trying to implement that in a parallel way, with n-1 workers and one main thread. The workers will perform the computation in the inner loop by grabbing elements from Q.
I tried to solve this using two conditional variables, work, on which the master threads notifies the workers that Q has been assigned to, and another, work_done, where the workers notify master that the entire computation might be done.
Here's my C++ code:
#include <iostream>
#include <mutex>
#include <condition_variable>
#include <queue>
#include <thread>
using namespace std;
std::queue<int> Q;
std::mutex mut;
std::condition_variable work;
std::condition_variable work_done;
void run_thread() {
for (;;) {
std::unique_lock<std::mutex> lock(mut);
work.wait(lock, [&] { return Q.size() > 0; });
// there is work to be done - pretend we're working on something
int x = Q.front(); Q.pop();
std::cout << "Working on " << x << std::endl;
work_done.notify_one();
}
}
int main() {
// your code goes here
std::vector<std::thread *> workers(3);
for (size_t i = 0; i < 3; i++) {
workers[i] = new std::thread{
[&] { run_thread(); }
};
}
for (int i = 4; i > 0; --i) {
std::unique_lock<std::mutex> lock(mut);
Q = std::queue<int>();
for (int k = 0; k < i; k++) {
Q.push(k);
}
work.notify_all();
work_done.wait(lock, [&] { return Q.size() == 0; });
}
for (size_t i = 0; i < 3; i++) {
delete workers[i];
}
return 0;
}
Unfortunately, after compiling it on OS X with g++ -std=c++11 -Wall -o main main.cpp I get the following output:
Working on 0
Working on 1
Working on 2
Working on 3
Working on 0
Working on 1
Working on 2
Working on 0
Working on 1
Working on 0
libc++abi.dylib: terminating
Abort trap: 6
After a while of googling it looks like a segmentation fault. It probably has to do with me misusing conditional variables. I would appreciate some insight, both architectural (on how to approach this type of problem) and specific, as in what I'm doing wrong here exactly.
I appreciate the help
Your application was killed by std::terminate.
Body of your thread function is infinite-loop, so when these lines are executed
for (size_t i = 0; i < 3; i++) {
delete workers[i];
}
you want to delete threads which are still running (each thread is in joinable state). When you call destructor of thread which is in joinable state the following thing happens (from http://www.cplusplus.com/reference/thread/thread/~thread/)
If the thread is joinable when destroyed, terminate() is called.
so if you want terminate not to be called, you should call detach() method after creating threads.
for (size_t i = 0; i < 3; i++) {
workers[i] = new std::thread{
[&] { run_thread(); }
};
workers[i]->detach(); // <---
}
Just because the queue is empty doesn't mean the work is done.
finished = true;
work.notify_all();
for (size_t i = 0; i < 3; i++) {
workers[i].join(); // wait for threads to finish
delete workers[i];
}
and we need some way to terminate the threads
for (;!finshed;) {
std::unique_lock<std::mutex> lock(mut);
work.wait(lock, [&] { return Q.size() > 0 || finished; });
if (finished)
return;

Static vs dynamic memory allocation for thread

I've written a sample program to show my problem - I don't understand why firstVersion() is working properly, and secondVersion() gives me error :terminate called without an active exception Aborted. Thanks for answers!
Here's the code :)
#include <thread>
#include <iostream>
#include <chrono>
using namespace std;
const int threadCount = 100;
int N = 1;
void f() {
N++;
}
void firstVersion() {
thread * t[threadCount];
for(int i = 0; i < threadCount; i++) {
thread * ti = new thread{f};
t[i] = ti;
}
for(int i = 0; i < threadCount; i++) {
t[i]->join();
delete t[i];
}
}
void secondVersion() {
thread * t[threadCount];
for(int i = 0; i < threadCount; i++) {
thread ti{f};
t[i] = &ti;
}
for(int i = 0; i < threadCount; i++)
t[i]->join();
}
int main() {
//firstVersion();
secondVersion();
return 0;
}
The second version fails because the lifetime of thread ends at the end of your for loop before you call join().
void secondVersion() {
thread * t[threadCount];
for(int i = 0; i < threadCount; i++) {
thread ti{f}; // local object of thread
t[i] = &ti;
} // the object dies without a join()
Your example can be simplified as:
void SomeFunc() {}
int main()
{
std::thread* tp;
//{
std::thread t{SomeFunc};
tp= &t;
//} // if the closing brace is present, object t calls destructor here!
tp->join();
}
If you take a look into your STL you find the following code:
~thread()
{
if (joinable())
std::terminate();
}
That simply results in the call to the terminate.
So the example code has two mistakes:
1) Create a pointer to an object which dies before the pointer is used which is called dangling reference
2) Because thread object dies before join() was called, it simply calls terminate.
a std::thread needs to be joined or detached before its destructor runs.
since you didn't call any detach or join the std::thread's destructor called std::abort.
in the first example, you first joined the thread before actually calling its destructor (via delete):
t[i]->join();
delete t[i];
luckily for you, it prevented something much more worse: dangling pointers. in the end of each
for(int i = 0; i < threadCount; i++) {
thread ti{f};
t[i] = &ti;
}
ti is dead, you keep a pointer to an object which does not live anymore. you violate some basic rule of C++ never return or keep a pointer or reference to local variable outside its scope

How to apply a concurrent solution to a Producer-Consumer like situation

I have a XML file with a sequence of nodes. Each node represents an element that I need to parse and add in a sorted list (the order must be the same of the nodes found in the file).
At the moment I am using a sequential solution:
struct Graphic
{
bool parse()
{
// parsing...
return parse_outcome;
}
};
vector<unique_ptr<Graphic>> graphics;
void producer()
{
for (size_t i = 0; i < N_GRAPHICS; i++)
{
auto g = new Graphic();
if (g->parse())
graphics.emplace_back(g);
else
delete g;
}
}
So, only if the graphic (that actually is an instance of a class derived from Graphic, a Line, a Rectangle and so on, that is why the new) can be properly parse, it will be added to my data structure.
Since I only care about the order in which thes graphics are added to my list, I though to call the parse method asynchronously, such that the producer has the task of read each node from the file and add this graphic to the data structure, while the consumer has the task of parse each graphic whenever a new graphic is ready to be parsed.
Now I have several consumer threads (created in the main) and my code looks like the following:
queue<pair<Graphic*, size_t>> q;
mutex m;
atomic<size_t> n_elements;
void producer()
{
for (size_t i = 0; i < N_GRAPHICS; i++)
{
auto g = new Graphic();
graphics.emplace_back(g);
q.emplace(make_pair(g, i));
}
n_elements = graphics.size();
}
void consumer()
{
pair<Graphic*, size_t> item;
while (true)
{
{
std::unique_lock<std::mutex> lk(m);
if (n_elements == 0)
return;
n_elements--;
item = q.front();
q.pop();
}
if (!item.first->parse())
{
// here I should remove the item from the vector
assert(graphics[item.second].get() == item.first);
delete item.first;
graphics[item.second] = nullptr;
}
}
}
I run the producer first of all in my main, so that when the first consumer starts the queue is already completely full.
int main()
{
producer();
vector<thread> threads;
for (auto i = 0; i < N_THREADS; i++)
threads.emplace_back(consumer);
for (auto& t : threads)
t.join();
return 0;
}
The concurrent version seems to be at least twice as faster as the original one.
The full code has been uploaded here.
Now I am wondering:
Are there any (synchronization) errors in my code?
Is there a way to achieve the same result faster (or better)?
Also, I noticed that on my computer I get the best result (in terms of elapsed time) if I set the number of thread equals to 8. More (or less) threads give me worst results. Why?
Blockquote
There isn't synchronization errors, but I think that the memory managing could be better, since your code leaked if parse() throws an exception.
There isn't synchronization errors, but I think that your memory managing could be better, since you will have leaks if parse() throw an exception.
Blockquote
Is there a way to achieve the same result faster (or better)?
Probably. You could use a simple implementation of a thread pool and a lambda that do the parse() for you.
The code below illustrate this approach. I use the threadpool implementation
here
#include <iostream>
#include <stdexcept>
#include <vector>
#include <memory>
#include <chrono>
#include <utility>
#include <cassert>
#include <ThreadPool.h>
using namespace std;
using namespace std::chrono;
#define N_GRAPHICS (1000*1000*1)
#define N_THREADS 8
struct Graphic;
using GPtr = std::unique_ptr<Graphic>;
static vector<GPtr> graphics;
struct Graphic
{
Graphic()
: status(false)
{
}
bool parse()
{
// waste time
try
{
throw runtime_error("");
}
catch (runtime_error)
{
}
status = true;
//return false;
return true;
}
bool status;
};
int main()
{
auto start = system_clock::now();
auto producer_unit = []()-> GPtr {
std::unique_ptr<Graphic> g(new Graphic);
if(!g->parse()){
g.reset(); // if g don't parse, return nullptr
}
return g;
};
using ResultPool = std::vector<std::future<GPtr>>;
ResultPool results;
// ThreadPool pool(thread::hardware_concurrency());
ThreadPool pool(N_THREADS);
for(int i = 0; i <N_GRAPHICS; ++i){
// Running async task
results.emplace_back(pool.enqueue(producer_unit));
}
for(auto &t : results){
auto value = t.get();
if(value){
graphics.emplace_back(std::move(value));
}
}
auto duration = duration_cast<milliseconds>(system_clock::now() - start);
cout << "Elapsed: " << duration.count() << endl;
for (size_t i = 0; i < graphics.size(); i++)
{
if (!graphics[i]->status)
{
cerr << "Assertion failed! (" << i << ")" << endl;
break;
}
}
cin.get();
return 0;
}
It is a bit faster (1s) on my machine, more readable, and removes the necessity of shared datas (synchronization is evil, avoid it or hide it in a reliable and efficient way).

How to iterate through boost thread specific pointers

I have a multi-thread application. Each thread initializes a struct data type in its own local storage. Some elements are being added to the vectors inside the struct type variables. At the end of the program, I would like to iterate through these thread local storages and add all the results together. How can I iterate through the thread specific pointer so that I can add all the results from the multi threads together ?
Thanks in advance.
boost::thread_specific_ptr<testStruct> tss;
size_t x = 10;
void callable(string str, int x) {
if(!tss.get()){
tss.reset(new testStruct);
(*tss).xInt.resize(x, 0);
}
// Assign some values to the vector elements after doing some calculations
}
Example:
#include <iostream>
#include <vector>
#include <boost/thread/mutex.hpp>
#include <boost/thread/tss.hpp>
#include <boost/thread.hpp>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#define NR_THREAD 4
#define SAMPLE_SIZE 500
using namespace std;
static bool busy = false;
struct testStruct{
vector<int> intVector;
};
boost::asio::io_service ioService;
boost::thread_specific_ptr<testStruct> tsp;
boost::condition_variable cond;
boost::mutex mut;
void callable(int x) {
if(!tsp.get()){
tsp.reset(new testStruct);
}
(*tsp).intVector.push_back(x);
if (x + 1 == SAMPLE_SIZE){
busy = true;
cond.notify_all();
}
}
int main() {
boost::thread_group threads;
size_t (boost::asio::io_service::*run)() = &boost::asio::io_service::run;
boost::asio::io_service::work work(ioService);
for (short int i = 0; i < NR_THREAD; ++i) {
threads.create_thread(boost::bind(run, &ioService));
}
size_t iterations = 10;
for (int i = 0; i < iterations; i++) {
busy = false;
for (short int j = 0; j < SAMPLE_SIZE; ++j) {
ioService.post(boost::bind(callable, j));
}
// all threads need to finish the job for the next iteration
boost::unique_lock<boost::mutex> lock(mut);
while (!busy) {
cond.wait(lock);
}
cout << "Iteration: " << i << endl;
}
vector<int> sum(SAMPLE_SIZE, 0); // sum up all the values from thread local storages
work.~work();
threads.join_all();
return 0;
}
So, after I haven given some thought to this issue, I have come up with such a solution:
void accumulateTLS(size_t idxThread){
if (idxThread == nr_threads) // Suspend all the threads till all of them are called and waiting here
{
busy = true;
}
boost::unique_lock<boost::mutex> lock(mut);
while (!busy)
{
cond.wait(lock);
}
// Accumulate the variables using thread specific pointer
cond.notify_one();
}
With boost io_service, the callable function can be changed after the threads are initialized. So, after I have done all the calculations, I am sending jobs(as many as the number of threads) to the io service again with callable function accumulateTLS(idxThread). The N jobs are sent to N threads and the accumulation process is done inside accumulateTLS method.
P.S. instead of work.~work(), work.reset() should be used.