using boost thread pool executor submit method - c++

I am trying to learn boost thread pool.
not working, hope to find help here.
Some strange thing happens in the following code, when I
create an executor ea and
use async to start it till it finishes the submitted functions,
then submit other functions to this first executor ea but I don't start it,
and then go make another executor ea3 of the same type (basic_thread_pool) and different number of threads (1)
and start the 2nd executor ea3 with async,
it first executes the 1st executer ea, then the second executor ea3!
Although I just use ea3 as argument to async.
Why does this happen? and how do I prevent the execution of ea?
Does nameless scope has an effect??
Live On Coliru
#define BOOST_THREAD_VERSION 4
#define BOOST_THREAD_PROVIDES_EXECUTORS
#define BOOST_THREAD_USES_LOG_THREAD_ID
#define BOOST_THREAD_QUEUE_DEPRECATE_OLD
#include <boost/lexical_cast.hpp> // for lexical_cast
#include <boost/thread.hpp>
#include <boost/thread/caller_context.hpp> // for BOOST_CONTEXTOF, caller_context_t, operator<<
#include <boost/thread/executors/basic_thread_pool.hpp> // for basic_thread_pool
#include <boost/thread/executors/executor.hpp> // for executor
#include <boost/thread/executors/executor_adaptor.hpp> // for executor_adaptor
#include <iostream> // for operator<<, cout, endl, ..
#include <chrono> // for operator "" ms, operator "" s
#include <thread> // for sleep_for
#include <string>
#include <vector>
static boost::mutex stdout_mutex;
template <typename A, typename B = std::string> static inline void trace(A const &a, B const& b = "") {
boost::lock_guard<boost::mutex> lock(stdout_mutex);
std::cout << a << b << std::endl;
}
static inline void trace(boost::caller_context_t const &ctx) { trace(boost::lexical_cast<std::string>(ctx)); }
namespace {
using namespace std::chrono_literals;
using std::this_thread::sleep_for;
void p1() { trace(BOOST_CONTEXTOF); sleep_for(20ms); }
void p2() { trace(BOOST_CONTEXTOF); sleep_for(1s); }
int f1() { trace(BOOST_CONTEXTOF); sleep_for(100ms); return 1; }
}
void submit_some(boost::executor &tp) {
for (int i = 0; i < 3; ++i) { tp.submit(p2); }
for (int i = 0; i < 3; ++i) { tp.submit(p1); }
}
int test_executor_adaptor() {
trace(BOOST_CONTEXTOF);
try {
boost::executor_adaptor<boost::basic_thread_pool> ea(4);
trace(BOOST_CONTEXTOF);
submit_some(ea);
{
boost::future<int> t1 = boost::async(ea, &f1);
trace(BOOST_CONTEXTOF);
trace(" t1= ", t1.get()); // problem unknown on running showing thread result before running code
std::vector<boost::future<int> > vt1;
for (int i = 0; i < 4; i++) {
vt1.push_back(boost::async(ea, &f1)); // here async starts all closures already submitted to ea
// then submit f1 to all threads in ea to work
// asynchronusly so end result will be 7 already submitted
// and 4 more f1 and return futures to only the last 4 f1
// which is submitted by async
}
for (auto &e : vt1) {
auto e_value = e.get();
trace("vt1 e_value = ", e_value);
}
}
submit_some(ea);
{
boost::executor_adaptor<boost::basic_thread_pool> ea3(1);
boost::future<int> t1 = boost::async(ea3, &f1);
std::vector<boost::future<int> > vt1;
for (int i = 0; i < 4; i++) {
vt1.push_back(
boost::async(ea3, &f1)); // here async starts all closures already submitted to ea
// then submit f1 to all threads in ea to work
// asynchronusly so end result will be 7 already submitted
// and 4 more f1 and return futures to only the last 4 f1
// which is submitted by async
}
for (auto &e : vt1) {
trace("vt1 e_value = ", e.get());
}
}
}
catch (std::exception &ex) { trace("ERROR= ", ex.what()); return 1; }
catch (...) { trace("UNKNOWN EXCEPTION"); return 2; }
trace(BOOST_CONTEXTOF);
return 0;
}
int main() {
trace(BOOST_CONTEXTOF);
return test_executor_adaptor();
}
i think that basic_thread_pool use event loop,
and this event loop is singleton,
so when i define another basic_thread_pool it just uses the same event loop,and in turn when getting futures it run all tasks on the event loop including those of other thread pools,
this is a guess but i cant find proof for it from the impl code
in case this guess is right how can i make different event loop for each basic_thread_pool.
i use msvc 14
thanks

Related

Let main thread wait async threads complete

I'm new to c++ and don't know how to let main thread wait for all async threads done. I refered this but makes void consume() not parallel.
#include <iostream>
#include <vector>
#include <unistd.h> // sleep
#include <future>
using namespace std;
class Myclass {
private:
std::vector<int> resources;
std::vector<int> res;
std::mutex resMutex;
std::vector<std::future<void>> m_futures;
public:
Myclass() {
for (int i = 0; i < 10; i++) resources.push_back(i); // add task
res.reserve(resources.size());
}
void consume() {
for (int i = 0; i < resources.size(); i++) {
m_futures.push_back(std::async(std::launch::async, &Myclass::work, this, resources[i]));
// m_futures.back().wait();
}
}
void work(int x) {
sleep(1); // Simulation time-consuming
std::lock_guard<std::mutex> lock(resMutex);
res.push_back(x);
printf("%d be added.---done by %d.\n", x, std::this_thread::get_id());
}
std::vector<int> &getRes() { return res;}
};
int main() {
Myclass obj;
obj.consume();
auto res = obj.getRes();
cout << "Done. res.size = " << res.size() << endl;
for (int i : res) cout << i << " ";
cout <<"main thread over\n";
}
Main thread ends up when res = 0. I want obj.getRes() be be executed when all results be added into res.
Done. res.size = 0
main thread over
4 be added.---done by 6.
9 be added.---done by 11...
You had the right idea with the commented out line: m_futures.back().wait();, you just have it in the wrong place.
As you note, launching a std::async and then waiting for its result right after, forces the entire thing to execute in series and makes the async pointless.
Instead you want two functions: One, like your consume() that launches all the async's, and then another that loops over the futures and calls wait (or get, whatever suits your needs) on them - and then call that from main.
This lets them all run in parallel, while still making main wait for the final result.
Addition to #Frodyne 's answer,
consume() function calls are parallel, and main thread waits for the all consume() s have their work done;
void set_wait(void)
{
for (int i = 0; i < resources.size(); i++) {
m_futures[i].wait();
}
}
And call it here
void consume() {
for (int i = 0; i < resources.size(); i++) {
m_futures.push_back(std::async(std::launch::async, &Myclass::work, this, resources[i]));
// Calling wait() here makes no sense
}
set_wait(); // Waits for all threads do work
}
I created new function for convenience.
You can use std::future:wait after you add task to m_futures. Example.
void consume() {
for (int i = 0; i < resources.size(); i++) {
m_futures.push_back(std::async(std::launch::async, &Myclass::work, this, resources[i]));
//m_futures.back().wait();
}
for(auto& f: m_futures) f.wait();
}

Potential race condition for a simple multiple threading program

I'm writing a simple program that consists of three threads. Each thread is passed in an object Foo and no matter which thread calls which function, the output for the program will always be "firstsecondthird". I use semaphore and I'm writing the test code for my implementation. Sometimes, my test case passed but sometimes the test case failed:
input: [1,2,3] = firstsecond
Assertion failed: (false), function test, file /home/foo/printInOrder.cc, line 100.
Abort trap: 6
My program looks like below:
#include "cpputility.h"
#include <functional>
#include <iostream>
#include <semaphore.h>
#include <sstream>
#include <string>
#include <thread>
#include <unordered_map>
#include <vector>
using namespace std;
void printFirst()
{
cout << "first" << std::flush;
}
void printSecond()
{
cout << "second" << std::flush;
}
void printThird()
{
cout << "third" << std::flush;
}
class Foo
{
protected:
sem_t firstJobDone;
sem_t secondJobDone;
public:
Foo()
{
sem_init(&firstJobDone, 0, 0);
sem_init(&secondJobDone, 0, 0);
}
void first(function<void()> printFirst)
{
printFirst();
sem_post(&firstJobDone);
}
void second(function<void()> printSecond)
{
sem_wait(&firstJobDone);
printSecond();
sem_post(&secondJobDone);
}
void third(function<void()> printThird)
{
sem_wait(&secondJobDone);
printThird();
}
};
void test()
{
unordered_map<int, pair<void (Foo::*)(function<void()>), function<void()>>> m({
{1, {&Foo::first, printFirst}},
{2, {&Foo::second, printSecond}},
{3, {&Foo::third, printThird}},
});
struct testCase
{
vector<int> input;
string expected;
};
vector<testCase> test_cases = {
{{1, 2, 3}, "firstsecondthird"},
{{1, 3, 2}, "firstsecondthird"},
};
for (auto &&test_case : test_cases)
{
std::stringstream buffer;
std::streambuf *old = std::cout.rdbuf(buffer.rdbuf());
Foo foo;
vector<thread> threads;
for (int i = 0; i < 3; ++i)
{
threads.emplace_back(m[i+1].first, foo, m[i+1].second);
}
for (auto &&th : threads)
{
th.join();
}
auto got = buffer.str();
if (got != test_case.expected)
{
printf("input: %s = %s\n",
CPPUtility::oneDVectorStr<int>(test_case.input).c_str(),
got.c_str());
assert(false);
}
std::cout.rdbuf(old);
}
}
int main()
{
for(int i = 0; i < 10; ++i) {
// Test repeatedly to detect any potential race condition
test();
}
}
The oneDVectorStr is some helper function I write inside a file called cpputility.h to help print out the 1D vector, here is the implementation to compile the code above
template <typename T>
std::string oneDVectorStr(const std::vector<T>& vec) {
std::string cand = "[";
for(int i = 0; i < vec.size(); ++i) {
cand += std::to_string(vec[i]);
i != vec.size() - 1 ? cand += "," : cand += "";
}
cand += "]";
return cand;
}
I've stared at this code for quite a while but couldn't locate any race condition. Any suggestion is welcome. Thanks in advance.
I take a further look at code and realize that there is a subtle bug in my test code: I pass Foo object (e.g., foo) by copy when create each new thread. However, I really want to have multiple threads sharing the same Foo object across multiple threads. Thus, I add std::ref to the foo:
threads.emplace_back(m[i + 1].first, ref(foo), m[i + 1].second);
In addition, I print firstJobDone and secondJobDone semaphore values using sem_getvalue() as following:
Foo()
{
sem_init(&firstJobDone, 0, 0);
sem_init(&secondJobDone, 0, 0);
int value;
sem_getvalue(&firstJobDone, &value);
printf("The initial value of the firstJobDone is %d\n", value);
sem_getvalue(&secondJobDone, &value);
printf("The initial value of the secondJobDone is %d\n", value);
}
And quite shocking, I have:
The initial value of the firstJobDone is 32766
The initial value of the secondJobDone is 32766
input: [1,2,3] = third
Assertion failed: (false), function test, file /home/foo/printInOrder.cc, line 101.
Abort trap: 6
Both semaphores are not properly initialized to 0 with LLVM on the Mac that I'm using. However, my implementation invariant insists that both semaphores have to be initilized to 0. I don't understand why but I'm assuming since sem_init is marked as deprecated by LLVM, the behavior is not guaranteed to be correct. Thus, per the comments to my question, I change my implementation using conditional variable and mutex, and everything works fine.

Processing an array of objects with multithreading - invalid use of void expression error

I need to run some number of threads to process an array of objects.
So I've written this piece of code :
unsigned int object_counter = 0;
while(object_counter != (obj_max - left))
{
thread genThread[thread_num];//create thread objects
///launch threads
int thread_index = 0;
for (; thread_index<thread_num; thread_index++)
{
genThread[thread_index] = thread(object[object_counter].gen_maps());//launch a thread
object_counter++;
if(object_counter == (obj_max - left)
{
break;
}
}
///finish threads
for (; thread_index>0; thread_index--)
{
genThread[thread_index].join();
}
}
Basically, there is an array of objects (number of objects = obj_max - left).
Each object has a function (void type function) called gen_maps() that generates a terrain.
What I want to do is running all gen_maps() functions from all objects using multithreading.
A maximum number of threads is stored in thread_num variable.
But when I'm trying to compile this code I'm getting an error:
error: invalid use of void expression
genThread[thread_index] = thread(object[object_counter].gen_maps(), thread_index);//launch a thread
^
How can I fix this issue?
A more extendable way to manage an arbitrarily large number of jobs with a smaller number of threads is to use a thread pool.
Here's a naive implementation (for better efficiency there would be 2 condition variables to manage control and state reporting) which allows the initiator to add an arbitrary number of jobs or threads and wait for all jobs to be complete.
#include <thread>
#include <condition_variable>
#include <mutex>
#include <vector>
#include <functional>
#include <deque>
#include <cassert>
#include <ciso646>
#include <iostream>
struct work_pool
{
std::mutex control_mutex;
std::condition_variable control_cv;
std::deque<std::function<void()>> jobs;
bool terminating = false;
std::size_t running = 0;
std::vector<std::thread> threads;
work_pool(std::size_t n = std::thread::hardware_concurrency())
{
add_threads(n);
}
work_pool(const work_pool&) = delete;
work_pool& operator=(const work_pool&) = delete;
~work_pool()
{
wait();
shutdown();
}
void add_threads(std::size_t n)
{
while (n--)
{
threads.emplace_back([this]{
run_jobs();
});
}
}
void run_jobs()
{
while (1)
{
auto lock = std::unique_lock(control_mutex);
control_cv.wait(lock, [this] {
return terminating or not jobs.empty();
});
if (terminating) return;
++running;
auto job = std::move(jobs.front());
jobs.pop_front();
lock.unlock();
job();
lock.lock();
--running;
lock.unlock();
control_cv.notify_one();
}
}
void shutdown()
{
auto lock = std::unique_lock(control_mutex);
terminating = true;
lock.unlock();
control_cv.notify_all();
for (auto&& t : threads) {
if (t.joinable()) {
t.join();
}
}
threads.clear();
}
void wait()
{
auto lock = std::unique_lock(control_mutex);
control_cv.wait(lock, [this] {
return jobs.empty() and not running;
});
}
template<class F>
void add_work(F&& f)
{
auto lock = std::unique_lock(control_mutex);
assert(not terminating);
jobs.emplace_back(std::forward<F>(f));
lock.unlock();
control_cv.notify_all();
}
};
// dummy function for exposition
void generate_map() {}
int main()
{
work_pool pool;
for(int i = 0 ; i < 100000 ; ++i)
pool.add_work(generate_map);
pool.wait();
// maps are now all generated
std::cout << "done" << std::endl;
}
With object[object_counter].gen_maps() you call the function gen_maps and use the returned value as the thread function. Apparently gen_maps is declared to return void which leads to the error you get.
You need to pass a pointer to the function, and then pass the object it should be called on as an argument to the thread:
thread(&SomeClass::gen_maps, object[object_counter])

How to apply a concurrent solution to a Producer-Consumer like situation

I have a XML file with a sequence of nodes. Each node represents an element that I need to parse and add in a sorted list (the order must be the same of the nodes found in the file).
At the moment I am using a sequential solution:
struct Graphic
{
bool parse()
{
// parsing...
return parse_outcome;
}
};
vector<unique_ptr<Graphic>> graphics;
void producer()
{
for (size_t i = 0; i < N_GRAPHICS; i++)
{
auto g = new Graphic();
if (g->parse())
graphics.emplace_back(g);
else
delete g;
}
}
So, only if the graphic (that actually is an instance of a class derived from Graphic, a Line, a Rectangle and so on, that is why the new) can be properly parse, it will be added to my data structure.
Since I only care about the order in which thes graphics are added to my list, I though to call the parse method asynchronously, such that the producer has the task of read each node from the file and add this graphic to the data structure, while the consumer has the task of parse each graphic whenever a new graphic is ready to be parsed.
Now I have several consumer threads (created in the main) and my code looks like the following:
queue<pair<Graphic*, size_t>> q;
mutex m;
atomic<size_t> n_elements;
void producer()
{
for (size_t i = 0; i < N_GRAPHICS; i++)
{
auto g = new Graphic();
graphics.emplace_back(g);
q.emplace(make_pair(g, i));
}
n_elements = graphics.size();
}
void consumer()
{
pair<Graphic*, size_t> item;
while (true)
{
{
std::unique_lock<std::mutex> lk(m);
if (n_elements == 0)
return;
n_elements--;
item = q.front();
q.pop();
}
if (!item.first->parse())
{
// here I should remove the item from the vector
assert(graphics[item.second].get() == item.first);
delete item.first;
graphics[item.second] = nullptr;
}
}
}
I run the producer first of all in my main, so that when the first consumer starts the queue is already completely full.
int main()
{
producer();
vector<thread> threads;
for (auto i = 0; i < N_THREADS; i++)
threads.emplace_back(consumer);
for (auto& t : threads)
t.join();
return 0;
}
The concurrent version seems to be at least twice as faster as the original one.
The full code has been uploaded here.
Now I am wondering:
Are there any (synchronization) errors in my code?
Is there a way to achieve the same result faster (or better)?
Also, I noticed that on my computer I get the best result (in terms of elapsed time) if I set the number of thread equals to 8. More (or less) threads give me worst results. Why?
Blockquote
There isn't synchronization errors, but I think that the memory managing could be better, since your code leaked if parse() throws an exception.
There isn't synchronization errors, but I think that your memory managing could be better, since you will have leaks if parse() throw an exception.
Blockquote
Is there a way to achieve the same result faster (or better)?
Probably. You could use a simple implementation of a thread pool and a lambda that do the parse() for you.
The code below illustrate this approach. I use the threadpool implementation
here
#include <iostream>
#include <stdexcept>
#include <vector>
#include <memory>
#include <chrono>
#include <utility>
#include <cassert>
#include <ThreadPool.h>
using namespace std;
using namespace std::chrono;
#define N_GRAPHICS (1000*1000*1)
#define N_THREADS 8
struct Graphic;
using GPtr = std::unique_ptr<Graphic>;
static vector<GPtr> graphics;
struct Graphic
{
Graphic()
: status(false)
{
}
bool parse()
{
// waste time
try
{
throw runtime_error("");
}
catch (runtime_error)
{
}
status = true;
//return false;
return true;
}
bool status;
};
int main()
{
auto start = system_clock::now();
auto producer_unit = []()-> GPtr {
std::unique_ptr<Graphic> g(new Graphic);
if(!g->parse()){
g.reset(); // if g don't parse, return nullptr
}
return g;
};
using ResultPool = std::vector<std::future<GPtr>>;
ResultPool results;
// ThreadPool pool(thread::hardware_concurrency());
ThreadPool pool(N_THREADS);
for(int i = 0; i <N_GRAPHICS; ++i){
// Running async task
results.emplace_back(pool.enqueue(producer_unit));
}
for(auto &t : results){
auto value = t.get();
if(value){
graphics.emplace_back(std::move(value));
}
}
auto duration = duration_cast<milliseconds>(system_clock::now() - start);
cout << "Elapsed: " << duration.count() << endl;
for (size_t i = 0; i < graphics.size(); i++)
{
if (!graphics[i]->status)
{
cerr << "Assertion failed! (" << i << ")" << endl;
break;
}
}
cin.get();
return 0;
}
It is a bit faster (1s) on my machine, more readable, and removes the necessity of shared datas (synchronization is evil, avoid it or hide it in a reliable and efficient way).

Using a C++11 condition variable in VS2012

I can't get code working reliably in a simple VS2012 console application consisting of a producer and consumer that uses a C++11 condition variable. I am aiming at producing a small reliable program (to use as the basis for a more complex program) that uses the 3 argument wait_for method or perhaps the wait_until method from code I have gathered at these websites:
condition_variable:
wait_for,
wait_until
I'd like to use the 3 argument wait_for with a predicate like below except it will need to use a class member variable to be most useful to me later. I am receiving "Access violation writing location 0x__" or "An invalid parameter was passed to a service or function" as errors after only about a minute of running.
Would steady_clock and the 2 argument wait_until be sufficient to replace the 3 argument wait_for? I've also tried this without success.
Can someone show how to get the code below to run indefinitely with no bugs or weird behavior with either changes in wall-clock time from daylight savings time or Internet time synchronizations?
A link to reliable sample code could be just as helpful.
// ConditionVariable.cpp : Defines the entry point for the console application.
//
#include "stdafx.h"
#include <condition_variable>
#include <mutex>
#include <thread>
#include <iostream>
#include <queue>
#include <chrono>
#include <atomic>
#define TEST1
std::atomic<int>
//int
qcount = 0; //= ATOMIC_VAR_INIT(0);
int _tmain(int argc, _TCHAR* argv[])
{
std::queue<int> produced_nums;
std::mutex m;
std::condition_variable cond_var;
bool notified = false;
unsigned int count = 0;
std::thread producer([&]() {
int i = 0;
while (1) {
std::this_thread::sleep_for(std::chrono::microseconds(1500));
std::unique_lock<std::mutex> lock(m);
produced_nums.push(i);
notified = true;
qcount = produced_nums.size();
cond_var.notify_one();
i++;
}
cond_var.notify_one();
});
std::thread consumer([&]() {
std::unique_lock<std::mutex> lock(m);
while (1) {
#ifdef TEST1
// Version 1
if (cond_var.wait_for(
lock,
std::chrono::microseconds(1000),
[&]()->bool { return qcount != 0; }))
{
if ((count++ % 1000) == 0)
std::cout << "consuming " << produced_nums.front () << '\n';
produced_nums.pop();
qcount = produced_nums.size();
notified = false;
}
#else
// Version 2
std::chrono::steady_clock::time_point timeout1 =
std::chrono::steady_clock::now() +
//std::chrono::system_clock::now() +
std::chrono::milliseconds(1);
while (qcount == 0)//(!notified)
{
if (cond_var.wait_until(lock, timeout1) == std::cv_status::timeout)
break;
}
if (qcount > 0)
{
if ((count++ % 1000) == 0)
std::cout << "consuming " << produced_nums.front() << '\n';
produced_nums.pop();
qcount = produced_nums.size();
notified = false;
}
#endif
}
});
while (1);
return 0;
}
Visual Studio Desktop Express had 1 important update which it installed and Windows Update has no other important updates. I'm using Windows 7 32-bit.
Sadly, this is actually a bug in VS2012's implementation of condition_variable, and the fix will not be patched in. You'll have to upgrade to VS2013 when it's released.
See:
http://connect.microsoft.com/VisualStudio/feedback/details/762560
First of all, while using condition_variables I personally prefer some wrapper classes like AutoResetEvent from C#:
struct AutoResetEvent
{
typedef std::unique_lock<std::mutex> Lock;
AutoResetEvent(bool state = false) :
state(state)
{ }
void Set()
{
auto lock = AcquireLock();
state = true;
variable.notify_one();
}
void Reset()
{
auto lock = AcquireLock();
state = false;
}
void Wait(Lock& lock)
{
variable.wait(lock, [this] () { return this->state; });
state = false;
}
void Wait()
{
auto lock = AcquireLock();
Wait(lock);
}
Lock AcquireLock()
{
return Lock(mutex);
}
private:
bool state;
std::condition_variable variable;
std::mutex mutex;
};
This may not be the same behavior as C# type or may not be as efficient as it should be but it gets things done for me.
Second, when I need to implement a producing/consuming idiom I try to use a concurrent queue implementation (eg. tbb queue) or write a one for myself. But you should also consider making things right by using Active Object Pattern. But for simple solution we can use this:
template<typename T>
struct ProductionQueue
{
ProductionQueue()
{ }
void Enqueue(const T& value)
{
{
auto lock = event.AcquireLock();
q.push(value);
}
event.Set();
}
std::size_t GetCount()
{
auto lock = event.AcquireLock();
return q.size();
}
T Dequeue()
{
auto lock = event.AcquireLock();
event.Wait(lock);
T value = q.front();
q.pop();
return value;
}
private:
AutoResetEvent event;
std::queue<T> q;
};
This class has some exception safety issues and misses const-ness on the methods but like I said, for a simple solution this should fit.
So as a result your modified code looks like this:
int main(int argc, char* argv[])
{
ProductionQueue<int> produced_nums;
unsigned int count = 0;
std::thread producer([&]() {
int i = 0;
while (1) {
std::this_thread::sleep_for(std::chrono::microseconds(1500));
produced_nums.Enqueue(i);
qcount = produced_nums.GetCount();
i++;
}
});
std::thread consumer([&]() {
while (1) {
int item = produced_nums.Dequeue();
{
if ((count++ % 1000) == 0)
std::cout << "consuming " << item << '\n';
qcount = produced_nums.GetCount();
}
}
});
producer.join();
consumer.join();
return 0;
}