I am trying to simulate a dice roll program, to be more specific get the number of rolls necessary to get n dices to be 6 at the same time.
So this is a textbook example of fork-join model in order to simulate rolls simultaneously and then check result after each iteration.
#include <iostream>
#include <vector>
#include <thread>
#include <random>
#include <array>
#include <barrier>
auto random_int(int min, int max)
{
static thread_local auto engine = std::default_random_engine{ std::random_device{}() };
auto dist = std::uniform_int_distribution<>(min, max);
return dist(engine);
}
int main()
{
constexpr auto n = 5;
auto done = false;
auto dice = std::array<int, n>{};
auto threads = std::vector<std::thread>{};
auto n_turns = 0;
auto check_result = [&]
{
++n_turns;
auto is_six = [](int i) {return 6 == i; };
done = std::all_of(std::begin(dice), std::end(dice), is_six);
};
auto bar = std::barrier{ n,check_result };
for (int i = 0; i < n; ++i)
{
threads.emplace_back([&, i]{
while (!done)
{
dice[i] = random_int(1, 6);
bar.arrive_and_wait();
}});
}
for (auto&& t : threads)
{
t.join();
}
std::cout << n_turns << std::endl;
}
And I am getting the following error:
error C2338: N4861 [thread.barrier.class]/5:
is_nothrow_invocable_v<CompletionFunction&> shall be true
1>C:\Users\eduar\source\repos\C++20\C++20\main.cpp(114): message : see
reference to class template instantiation
'std::barriermain::<lambda_1>' being compiled
Can someone please hint what am I doing wrong and how to fix this?
The issue is in the error message. Which is great, even cites exactly the part of the standard which has this requirement: [thread.barrier.class]/5:
CompletionFunction shall meet the Cpp17MoveConstructible (Table 28) and Cpp17Destructible (Table 32) requirements.
is_nothrow_invocable_v<CompletionFunction&> shall be true.
You're currently missing the last part: your lambda isn't nothrow-invocable. That's an easy fix tho:
auto check_result = [&]() noexcept
// ^~~~~~~~
{
++n_turns;
auto is_six = [](int i) {return i == 6; };
done = std::all_of(std::begin(dice), std::end(dice), is_six);
};
I also took the opportunity to flip your Yoda conditional, because there is no reason to write Yoda conditionals.
Related
I want to use std::for_each to iterate over vector indexes in range [a, b) in parallel, calculate the value of the Weierstrass function and write it to the std::vector:
std::vector<std::array<float, 2>> values(1000);
auto range = /** equivalent of Pyhthon range(0, values.size()) **/;
std::for_each(std::execution::par, range.begin(), range.end(), [&](auto &&i) {
values[i][0] = static_cast<float>(i) / resolution;
values[i][1] = weierstrass(a, b, static_cast<float>(i) / resolution);
});
// a, b, and resolution are some constants defined before
// weierstrass() is the Weierstrass function
I have found some solutions in the internet, but all of them requires to include some third-party libraries or create my own range class. Is there any standard solution for this?
You can use std::views::iota(), its use is similar (but a bit different) to Python's range(). With help of std::ranges::for_each(). Both are available in C++20.
Try it online!
#include <algorithm>
#include <ranges>
#include <iostream>
int main() {
std::ranges::for_each(std::views::iota(1, 10), [](int i) {
std::cout << i << ' ';
});
}
Output:
1 2 3 4 5 6 7 8 9
As noted by #Afshin, in code mentioned above std::ranges::for_each() doesn't support std::execution::par for multi-threaded execution.
To overcome this issue you may use iota with regular std::for_each() as following:
Try it online!
#include <algorithm>
#include <ranges>
#include <iostream>
#include <execution>
int main() {
auto range = std::views::iota(1, 10);
std::for_each(std::execution::par, range.begin(), range.end(),
[](int i) {
std::cout << i << ' ';
});
}
Output:
1 2 3 4 5 6 7 8 9
I decided to implement Range class plus iterator from scratch, according to how it works in Python's range().
Similar to Python you can use it three ways: Range(stop), Range(start, stop), Range(start, stop, step). All three support any negative value.
To test correctness of implementation I filled two unordered sets, one containing all generated values, another containing all used thread ids (to show that it actually used multi-core CPU execution).
Although I marked my iterator as random access type, still it is missing some methods like -= or -- operators, these extra methods are for further improvements. But for usage of std::for_each() it has enough methods.
If I made some mistakes of implementation please add comments to my answer with explanation.
Try it online!
#include <limits>
#include <execution>
#include <algorithm>
#include <iostream>
#include <iterator>
#include <thread>
#include <unordered_set>
#include <string>
#include <sstream>
#include <mutex>
class Range {
public:
Range(ptrdiff_t start_stop, ptrdiff_t stop =
std::numeric_limits<ptrdiff_t>::max(), ptrdiff_t step = 1)
: step_(step) {
if (stop == std::numeric_limits<ptrdiff_t>::max()) {
start_ = 0;
stop_ = start_stop;
} else {
start_ = start_stop;
stop_ = stop;
}
if (step_ >= 0)
stop_ = std::max(start_, stop_);
else
stop_ = std::min(start_, stop_);
if (step_ >= 0)
stop_ = start_ + (stop_ - start_ + step_ - 1) / step_ * step_;
else
stop_ = start_ - (start_ - stop_ + step_ - 1) / (-step_) * (-step_);
}
class RangeIter {
public:
using iterator_category = std::random_access_iterator_tag;
using value_type = ptrdiff_t;
using difference_type = ptrdiff_t;
using pointer = ptrdiff_t const *;
using reference = ptrdiff_t const &;
RangeIter() {}
RangeIter(ptrdiff_t start, ptrdiff_t stop, ptrdiff_t step)
: cur_(start), stop_(stop), step_(step) {}
RangeIter & operator += (ptrdiff_t steps) {
cur_ += step_ * steps;
if (step_ >= 0)
cur_ = std::min(cur_, stop_);
else
cur_ = std::max(cur_, stop_);
return *this;
}
RangeIter operator + (ptrdiff_t steps) const {
auto it = *this;
it += steps;
return it;
}
ptrdiff_t operator [] (ptrdiff_t steps) const {
auto it = *this;
it += steps;
return *it;
}
ptrdiff_t operator - (RangeIter const & other) const {
return (cur_ - other.cur_) / step_;
}
RangeIter & operator ++ () {
*this += 1;
return *this;
}
ptrdiff_t const & operator * () const {
return cur_;
}
bool operator == (RangeIter const & other) const {
return cur_ == other.cur_;
}
bool operator != (RangeIter const & other) const {
return !(*this == other);
}
ptrdiff_t cur_ = 0, stop_ = 0, step_ = 0;
};
auto begin() const { return RangeIter(start_, stop_, step_); }
auto end() const { return RangeIter(stop_, stop_, step_); }
private:
ptrdiff_t start_ = 0, stop_ = 0, step_ = 0;
};
int main() {
ptrdiff_t start = 1, stop = 1000000, step = 2;
std::mutex mutex;
std::unordered_set<std::string> threads;
std::unordered_set<ptrdiff_t> values;
auto range = Range(start, stop, step);
std::for_each(std::execution::par, range.begin(), range.end(),
[&](int i) {
std::unique_lock<std::mutex> lock(mutex);
std::ostringstream ss;
ss << std::this_thread::get_id();
threads.insert(ss.str());
values.insert(i);
});
std::cout << "Threads:" << std::endl;
for (auto const & s: threads)
std::cout << s << std::endl;
{
bool correct = true;
size_t cnt = 0;
for (ptrdiff_t i = start; i < stop; i += step) {
++cnt;
if (!values.count(i)) {
correct = false;
std::cout << "No value: " << i << std::endl;
break;
}
}
if (values.size() != cnt)
std::cout << "Expected amount of values: " << cnt
<< ", actual " << values.size() << std::endl;
std::cout << "Correct values: " << std::boolalpha
<< (correct && (values.size() == cnt)) << std::endl;
}
}
Output:
Threads:
1628
9628
5408
2136
2168
8636
2880
6492
1100
Correct values: true
If the problem is in creating range similar to python's range() you can look through https://en.cppreference.com/w/cpp/iterator/iterator and use it's example:
#include <iostream>
#include <algorithm>
template<long FROM, long TO>
class Range {
public:
// member typedefs provided through inheriting from std::iterator
class iterator: public std::iterator<
std::input_iterator_tag, // iterator_category
long, // value_type
long, // difference_type
const long*, // pointer
long // reference
>{
long num = FROM;
public:
explicit iterator(long _num = 0) : num(_num) {}
iterator& operator++() {num = TO >= FROM ? num + 1: num - 1; return *this;}
iterator operator++(int) {iterator retval = *this; ++(*this); return retval;}
bool operator==(iterator other) const {return num == other.num;}
bool operator!=(iterator other) const {return !(*this == other);}
reference operator*() const {return num;}
};
iterator begin() {return iterator(FROM);}
iterator end() {return iterator(TO >= FROM? TO+1 : TO-1);}
};
int main() {
// std::find requires an input iterator
auto range = Range<15, 25>();
auto itr = std::find(range.begin(), range.end(), 18);
std::cout << *itr << '\n'; // 18
// Range::iterator also satisfies range-based for requirements
for(long l : Range<3, 5>()) {
std::cout << l << ' '; // 3 4 5
}
std::cout << '\n';
}
Just as an alternative, you could make each work package carry the necessary information by adding the index you need.
Example:
std::vector<std::pair<size_t, std::array<float, 2>>> values(1000);
for(size_t i = 0; i < values.size(); ++i) values[i].first = i;
std::for_each(std::execution::par, values.begin(), values.end(),
[resolution](auto& p) {
p.second[0] = static_cast<float>(p.first) / resolution;
p.second[1] = weierstrass(a, b, static_cast<float>(p.first) / resolution);
});
Not using indexing on values inside the threaded part like above may prevent false sharing and improve performance. You could also make each work package aligned to prevent false sharing to see if that has an effect on performance.
#include <new>
struct alignas(std::hardware_destructive_interference_size) workpackage {
size_t index;
std::array<float, 2> arr;
};
std::vector<workpackage> values(1000);
for(size_t i = 0; i < values.size(); ++i) values[i].index = i;
std::for_each(std::execution::par, values.begin(), values.end(),
[resolution](auto& wp) {
wp.arr[0] = static_cast<float>(wp.index) / resolution;
wp.arr[1] = weierstrass(a, b, static_cast<float>(wp.index) / resolution);
});
You can write your code in another way and drop any need for range at all like this:
std::vector<std::array<float, 2>> values(1000);
std::for_each(std::execution::par, values.begin(), values.end(), [&](std::array<float, 2>& val) {
auto i = std::distance(&values[0], &val);
val[0] = static_cast<float>(i) / resolution;
val[1] = weierstrass(a, b, static_cast<float>(i) / resolution);
});
I should say that this code is valid if and only if you are using std::for_each, because it is stated that:
Unlike the rest of the parallel algorithms, std::for_each is not allowed to make copies of the elements in the sequence even if they are trivially copyable.
I am porting a Java program to C++. I have a piece of code to shuffle two arrays simultaneously in Java, which produces a way to return the indices of the shuffled array so could to used to relocate another array (of the same length) accordingly. In C++, I shuffle the vectors with the following algorithm
#include <vector>
#include <algorithm>
#include <random>
#include <iostream>
using namespace std;
int main(void) {
vector<int> A, B;
for (int n=0; n<10; n++) {
A.push_back(n);
B.push_back(n);
}
std::random_device rd;
std::mt19937 gen;
std::uniform_int_distribution<> rnd(0, A.size()-1);
for (int n=0; n<A.size(); n++) {
int m = rnd(gen);
std::swap(A[n], A[m]);
std::swap(B[n], B[m]);
}
for (auto it: A) cout << it << " ";
cout << endl;
for (auto it: B) cout << it << " ";
cout << endl;
return 0;
}
It works. But I wonder if there is any STL algorithm that can simultaneously shuffle two or more containers.
edit: Armin deleted his answer, so mine seems out of context. Let's expand
tl;dr No, there is no function in the STL that applies the same shuffle to multiple containers. The STL is pretty big, but cannot do everything. There are just too many specific requirements for algorithms. Wat you're asking is not that common practice.
However, nothing keeps you from writing your own algorithms. For instance, you can copy the algo shown on std::shuffle on cppreference.com (third version) and modify it.
#include <iterator>
template<class URBG, class RandomIt1, class RandomIt2>
static void SameShuffleToMany(URBG&& g, RandomIt1 first1, RandomIt1 last1, RandomIt2 first2) {
using diff_t = typename std::iterator_traits<RandomIt1>::difference_type;
using distr_t = std::uniform_int_distribution<diff_t>;
using param_t = typename distr_t::param_type;
distr_t D;
diff_t n = last1 - first1;
for (diff_t i = n-1; i > 0; --i) {
diff_t j = D(g, param_t(0, i));
std::swap(first1[i], first1[j]);
std::swap(first2[i], first2[j]);
}
}
Heck, you could even use variadic template parameter to do more then 2 containers in parallel.
However, there are multiple solutions to this problems, as suggested in the comments. Each solution has it's own pros and cons. I did a speed bench below. However, the solutions by Armin and Jamit will require additional memory to store the new indices and targets for the reorder output.
Old answer
I made some comparison bench code
#include <benchmark/benchmark.h>
#include <vector>
#include <algorithm>
#include <random>
#include <iostream>
static constexpr auto N = 1000;
static std::vector<int> CreateVector() {
std::vector<int> out;
out.reserve(N);
std::generate_n(back_inserter(out), N,
[gen = std::mt19937(std::random_device{}())] () mutable { return gen();}
);
return out;
}
static void BM_Original(benchmark::State& state) {
auto a = CreateVector();
auto b = a; // elementwise copy
for (auto _ : state) {
std::random_device rd;
std::mt19937 gen(std::random_device{}());
std::uniform_int_distribution<> rnd(0, a.size()-1);
for (int n = 0; n < a.size(); ++n) {
int m = rnd(gen);
std::swap(a[n], a[m]);
std::swap(b[n], b[m]);
}
}
if (!std::equal(cbegin(a), cend(a), cbegin(b)))
std::cout << "Vectors are not equal!\n";
}
// Register the function as a benchmark
BENCHMARK(BM_Original);
static void BM_Jamit(benchmark::State& state) {
auto a = CreateVector();
auto b = a; // elementwise copy
for (auto _ : state) {
std::vector<int> idx(N);
std::iota(begin(idx), end(idx), 0);
std::mt19937 g(std::random_device{}());
std::shuffle(begin(idx), end(idx), g);
std::vector<int> aout; aout.reserve(N);
std::transform(cbegin(idx), cend(idx), back_inserter(aout), [&](int i) { return a[i]; });
a = std::move(aout);
std::vector<int> bout; bout.reserve(N);
std::transform(cbegin(idx), cend(idx), back_inserter(aout), [&](int i) { return b[i]; });
b = std::move(bout);
}
if (!std::equal(cbegin(a), cend(a), cbegin(b)))
std::cout << "Vectors are not equal!\n";
}
// Register the function as a benchmark
BENCHMARK(BM_Jamit);
static void BM_Armin(benchmark::State& state) {
auto a = CreateVector();
auto b = a; // elementwise copy
for (auto _ : state) {
const unsigned int seedValue = std::random_device()();
std::mt19937 uniformRandomBitGenerator{};
uniformRandomBitGenerator.seed(seedValue);
std::shuffle(begin(a), end(a), uniformRandomBitGenerator);
uniformRandomBitGenerator.seed(seedValue);
std::shuffle(begin(b), end(b), uniformRandomBitGenerator);
}
if (!std::equal(cbegin(a), cend(a), cbegin(b)))
std::cout << "Vectors are not equal!\n";
}
// Register the function as a benchmark
BENCHMARK(BM_Armin);
#include <iterator>
template<class URBG, class RandomIt1, class RandomIt2>
static void SameShuffleToMany(URBG&& g, RandomIt1 first1, RandomIt1 last1, RandomIt2 first2) {
using diff_t = typename std::iterator_traits<RandomIt1>::difference_type;
using distr_t = std::uniform_int_distribution<diff_t>;
using param_t = typename distr_t::param_type;
distr_t D;
diff_t n = last1 - first1;
for (diff_t i = n-1; i > 0; --i) {
diff_t j = D(g, param_t(0, i));
std::swap(first1[i], first1[j]);
std::swap(first2[i], first2[j]);
}
}
static void BM_Mine(benchmark::State& state) {
auto a = CreateVector();
auto b = a; // elementwise copy
for (auto _ : state) {
std::mt19937 g{std::random_device{}()};
SameShuffleToMany(g, begin(a), end(a), begin(b));
}
if (!std::equal(cbegin(a), cend(a), cbegin(b)))
std::cout << "Vectors are not equal!\n";
}
// Register the function as a benchmark
BENCHMARK(BM_Mine);
BENCHMARK_MAIN();
The resulting CPU times are:
Original: 108600 ns
Jamit: 76000 ns
Armin: 122000 ns
Mine: 102000 ns
link to QuickBench Let me know if this works for you. I don;t think link sharing there works as good as for GodBolt.
I was wondering if it possible to create custom functions like for, for_each, while etc.
There's nothing that I want to do that the existing loops won't do it. I am just curious to learn how they work and if I ever need to create my own.
For example if one wants to create another version of the for function that would take only parameter.
In this example, I want to to create a for that only takes one parameter, an integer.
Instead of writing
for (int i = 0; i < 50; ++i)
I would create a for version like this
for_(50)
and they would act the same. How would I do something like that?
I have posted this question in another forum.
In addition to the proposals in other answers, you could create a function like the one below, but it is, at the very end, very similar to using the standard std::for_each.
#include <iostream>
#include <functional>
template<typename C, typename F>
void for_(C begin_, C end_, F&& f) { // [begin_, end_)
for (C i = begin_; i < end_; ++i) {
f(i);
}
}
template<typename C, typename F>
void for_(C count, F&& f) { // special case for [0, count)
for_(0, count, f);
}
void mul2(int x) {
std::cout << x*2 << " ";
}
int main() {
for_(10, [](int i) { std::cout << i << "\n"; });
for_(2, 10, mul2);
}
An ugly and unsafe solution is to use macro:
#define REPEAT(i,N) for(int (i) = 0; (i) < (N); ++(i))
int main()
{
REPEAT(i,10) std::cout << i << std::endl;
return 0;
}
You can't extend the C++ syntax for new loops.
You could use a macro, but this is pretty ugly, and generally best avoided. Another way to get something similar is by passing a functor as a parameter, greatly helped by the introduction of lambda expressions to C++. You can find some examples of such in the <algorithm> header.
For example:
#include <algorithm>
#include <vector>
int main()
{
std::vector<int> numbers = { 1, 4, 5, 7, 10 };
int even_count = 0;
for (auto x : numbers)
{
if (x % 2 == 0)
{
++even_count;
}
}
auto even_count2 = std::count_if(numbers.begin(), numbers.end(), [](int x) { return x % 2 == 0; });
}
You could use a lambda function and pass in a function object as a parameter to be performed for every iteration of the loop.
#include <iostream>
#include <functional>
int main()
{
auto for_ = [](int start, int size, std::function<void (int i)> fn)
{
int end = start + size;
for (int i = start; i < end; ++i)
{
fn(i);
}
};
for_(0, 10, [](int i) { std::cout << i << std::endl; });
for_(0, 10, [](int i) { std::cout << i*2 << std::endl; });
}
It seems like you are reinventing the wheel here a bit. You could just use std::for_each.
However, you could have custom lambda functions that do different things and just implement the operation within the lambda itself without taking in a function object for the operation.
I have an std::vector of std::function<void()> like this:
std::map<Event, std::vector<std::function<void()>>> observers_;
calling each function like this:
for (const auto& obs : observers_.at(event)) obs();
I want to turn this into a parallel for loop. Since I am using C++14, and don't have access to the std::execution::parallel of C++17, I found a little library that allows me to create a ThreadPool.
How do I turn for (const auto& obs : observers_.at(event)) obs(); into a version that calls each function in observers_ in parallel? I can't seem to get the syntax correct. I tried, but this doesn't work.
std::vector<std::function<void()>> vec = observers_.at(event);
ThreadPool::ParallelFor(0, vec.size(), [&](int i)
{
vec.at(i);
});
The example program that uses the library below:
#include <iostream>
#include <mutex>
#include "ThreadPool.hpp"
////////////////////////////////////////////////////////////////////////////////
int main()
{
std::mutex critical;
ThreadPool::ParallelFor(0, 16, [&] (int i)
{
std::lock_guard<std::mutex> lock(critical);
std::cout << i << std::endl;
});
return 0;
}
The ThreadPool library.
#ifndef THREADPOOL_HPP_INCLUDED
#define THREADPOOL_HPP_INCLUDED
////////////////////////////////////////////////////////////////////////////////
#include <thread>
#include <vector>
#include <cmath>
////////////////////////////////////////////////////////////////////////////////
class ThreadPool {
public:
template<typename Index, typename Callable>
static void ParallelFor(Index start, Index end, Callable func) {
// Estimate number of threads in the pool
const static unsigned nb_threads_hint = std::thread::hardware_concurrency();
const static unsigned nb_threads = (nb_threads_hint == 0u ? 8u : nb_threads_hint);
// Size of a slice for the range functions
Index n = end - start + 1;
Index slice = (Index) std::round(n / static_cast<double> (nb_threads));
slice = std::max(slice, Index(1));
// [Helper] Inner loop
auto launchRange = [&func] (int k1, int k2) {
for (Index k = k1; k < k2; k++) {
func(k);
}
};
// Create pool and launch jobs
std::vector<std::thread> pool;
pool.reserve(nb_threads);
Index i1 = start;
Index i2 = std::min(start + slice, end);
for (unsigned i = 0; i + 1 < nb_threads && i1 < end; ++i) {
pool.emplace_back(launchRange, i1, i2);
i1 = i2;
i2 = std::min(i2 + slice, end);
}
if (i1 < end) {
pool.emplace_back(launchRange, i1, end);
}
// Wait for jobs to finish
for (std::thread &t : pool) {
if (t.joinable()) {
t.join();
}
}
}
// Serial version for easy comparison
template<typename Index, typename Callable>
static void SequentialFor(Index start, Index end, Callable func) {
for (Index i = start; i < end; i++) {
func(i);
}
}
};
#endif // THREADPOOL_HPP_INCLUDED
It seems that you should simply change:
vec.at(i); // Only returns a reference to the element at index i
into:
vec.at(i)(); // The second () calls the function
--- OR ---
vec[i](); // Same
Hint: What does this do?
vec.at(i);
What do you want it to do?
Unrelatedly, you're using at() when you mean [].
This works:
ThreadPool::ParallelFor(0, (int)vec.size(), [&] (int i)
{
vec[i]();
});
Let's say we have a function odd which is a bool(int) function. I'd like to execute this function in parallel but with different parameter (differ numbers).
bool odd(int i) { return (((i&1)==1)?true:false); }
Here's the code I'm trying to use (which works but has a wart).
std::size_t num = 256;
std::vector<bool> results(num);
std::vector<std::function<bool(int)>> funcs(num);
std::vector<std::packaged_task<bool(int)>> tasks(num);
std::vector<std::future<bool>> futures(num);
std::vector<std::thread> threads(num);
for (std::size_t i = 0; i < num; i++) {
results[i] = false;
funcs[i] = std::bind(odd, static_cast<int>(i));
tasks[i] = std::packaged_task<bool(int)>(funcs[i]);
futures[i] = tasks[i].get_future();
threads[i] = std::thread(std::move(tasks[i]),0); // args ignored
}
for (std::size_t i = 0; i < num; i++) {
results[i] = futures[i].get();
threads[i].join();
}
for (std::size_t i = 0; i < num; i++) {
printf("odd(%d)=%s\n", i, (results[i]?"true":"false"));
}
I'd like to get rid of the arguments to the thread creation, as they are dependent on the argument types of the function bool(int). I'd like to make a function template of this code and be able to make a massive parallel function executor.
template <typename _returnType, typename ..._argTypes>
void exec_and_collect(std::vector<_returnType>& results,
std::vector<std::function<_returnType(_argTypes...)>> funcs) {
std::size_t numTasks = (funcs.size() > results.size() ? results.size() : funcs.size());
std::vector<std::packaged_task<_returnType(_argTypes...)>> tasks(numTasks);
std::vector<std::future<_returnType>> futures(numTasks);
std::vector<std::thread> threads(numTasks);
for (std::size_t h = 0; h < numTasks; h++) {
tasks[h] = std::packaged_task<_returnType(_argTypes...)>(funcs[h]);
futures[h] = tasks[h].get_future();
threads[h] = std::thread(std::move(tasks[h]), 0); // zero is a wart
}
// threads are now running, collect results
for (std::size_t h = 0; h < numTasks; h++) {
results[h] = futures[h].get();
threads[h].join();
}
}
Then called like this:
std::size_t num = 8;
std::vector<bool> results(num);
std::vector<std::function<bool(int)>> funcs(num);
for (std::size_t i = 0; i < num; i++) {
funcs[i] = std::bind(odd, static_cast<int>(i));
}
exec_and_collect<bool,int>(results, funcs);
I'd to remove the zero in the std::thread(std::move(task), 0); line since it's completely ignored by the thread. If I do completely remove it, the compiler can't find the arguments to pass to the thread create and it fails.
You could just not be micromanaging/control freak in the generic code. Just take any task returntype() and let the caller handle the binding of arguments:
Live On Coliru
#include <thread>
#include <future>
#include <iostream>
#include <vector>
#include <functional>
bool odd(int i) { return (((i&1)==1)?true:false); }
template <typename _returnType>
void exec_and_collect(std::vector<_returnType>& results,
std::vector<std::function<_returnType()>> funcs
) {
std::size_t numTasks = std::min(funcs.size(), results.size());
std::vector<std::packaged_task<_returnType()>> tasks(numTasks);
std::vector<std::future<_returnType>> futures(numTasks);
std::vector<std::thread> threads(numTasks);
for (std::size_t h = 0; h < numTasks; h++) {
tasks[h] = std::packaged_task<_returnType()>(funcs[h]);
futures[h] = tasks[h].get_future();
threads[h] = std::thread(std::move(tasks[h]));
}
// threads are now running, collect results
for (std::size_t h = 0; h < numTasks; h++) {
results[h] = futures[h].get();
threads[h].join();
}
}
int main() {
std::size_t num = 8;
std::vector<bool> results(num);
std::vector<std::function<bool()>> funcs(num);
for (std::size_t i = 0; i < num; i++) {
funcs[i] = std::bind(odd, static_cast<int>(i));
}
exec_and_collect<bool>(results, funcs);
}
Note this is a quick job, I've seen quite a few things that are overly specific here still.
In particular all the temporary collections are just paper weight (you even move each tasks[h] out of the vector even before moving to the next task, so why keep a vector of dead bits?)
There's no scheduling at all; you just create new threads willy nilly. That's not gonna scale (also, you want pluggable pooling models; see the Executor specifications and Boost Async's implementation of these)
UPDATE
A somewhat more cleaned up version that demonstrates what unneeded dependencies can be shed:
no temporary vectors of packaged tasks/threads
no assumption/requirement to have std::function<> wrapped tasks (this removes dynamic allocations and virtual dispatch internally in the implementation)
no requirement that the results must be in a vector (in fact, you can collect them anywhere you want using a custom output iterator)
move-awareness (this is arguably a "complicated" part of the code seeing that there is no std::move_transform, so go the extra mile using std::make_move_iterator
Live On Coliru
#include <thread>
#include <future>
#include <iostream>
#include <vector>
#include <algorithm>
#include <boost/range.hpp>
bool odd(int i) { return (((i&1)==1)?true:false); }
template <typename Range, typename OutIt>
void exec_and_collect(OutIt results, Range&& tasks) {
using namespace std;
using T = typename boost::range_value<Range>::type;
using R = decltype(declval<T>()());
auto tb = std::make_move_iterator(boost::begin(tasks)),
te = std::make_move_iterator(boost::end(tasks));
vector<future<R>> futures;
transform(
tb, te,
back_inserter(futures), [](auto&& t) {
std::packaged_task<R()> task(std::forward<decltype(t)>(t));
auto future = task.get_future();
thread(std::move(task)).detach();
return future;
});
// threads are now running, collect results
transform(begin(futures), end(futures), results, [](auto& fut) { return fut.get(); });
}
#include <boost/range/irange.hpp>
#include <boost/range/adaptors.hpp>
using namespace boost::adaptors;
int main() {
std::vector<bool> results;
exec_and_collect(
std::back_inserter(results),
boost::irange(0, 8) | transformed([](int i) { return [i] { return odd(i); }; })
);
std::copy(results.begin(), results.end(), std::ostream_iterator<bool>(std::cout << std::boolalpha, "; "));
}
Output
false; false; false; false; false; false; false; false;
Note that you could indeed write
exec_and_collect(
std::ostream_iterator<bool>(std::cout << std::boolalpha, "; "),
boost::irange(0, 8) | transformed([](int i) { return [i] { return odd(i); }; })
);
and do without any results container :)