I'm currently trying to make my first steps in writing multi-threaded code in CPP and I'm getting a error message that I don't understand at all....
Consider the two functions:
void add_vector(std::vector<int> & vec, int & store){
store = std::accumulate(vec.begin(), vec.end(), 0);
}
int parallel_matrix_sum(std::vector<std::vector<int>> const & mat){
std::vector<std::thread> threads;
std::vector<int> tmp(l);
for (int k =0; k<l; k++){
threads.push_back(std::thread(add_vector, std::cref(mat[k]), std::ref(tmp[k])));
}
for(auto && thread: threads){
thread.join();
}
return std::accumulate(tmp.begin(), tmp.end(), 0);
}
The error I get is
error: attempt to use a deleted function
.
.
.
in instantiation of function template specialization 'std::thread::thread<void (&)(std::vector<int> &, int &), std::reference_wrapper<const std::vector<int>>, std::reference_wrapper<int>, void>' requested here
threads.push_back(std::thread(add_vector, std::cref(mat[k]), std::ref(tmp[k])));
^
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/v1/type_traits:1916:5: note: '~__nat' has been explicitly marked deleted here
~__nat() = delete;
Upon googling this issue, all the information points to the fact that std::thread needs to copy the arguments, which cannot be done for references.. The suggested solution seems to be to wrap the arguments in a std::ref call. I am already doing this and it is still not working. Does anybody have some suggestions?
I did some playing. I got this to work:
#include <iostream>
#include <thread>
#include <vector>
void add_vector(const int & a, const int & b, int & c) {
c = a + b;
}
int main() {
std::vector<std::thread> threads;
int a = 5;
int b = 10;
int c = 0;
std::cout << "start threads.\n";
threads.push_back(std::thread(add_vector, a, b, std::ref(c) ));
std::cout << "Join.\n";
for (std::thread &thread: threads) {
thread.join();
}
std::cout << "Done. c == " << c << "\n";
}
This is with g++ using --std=c++17.
I updated this with info from Innocent Bystander. Passing by reference can work if you wrap in std::ref() as shown.
I'm going to try a fresh example with vectors.
#include <iostream>
#include <thread>
#include <vector>
void add_vector(const std::vector<int> & vec, int &result) {
result = 0;
for (const int &value: vec) {
result += value;
}
}
int main() {
std::vector<std::thread> threads;
std::vector<int> values;
std::vector<int> results;
values.push_back(10);
values.push_back(15);
results.push_back(0);
std::cout << "start threads.\n";
threads.push_back(std::thread(add_vector, values, std::ref(results[0]) ));
std::cout << "Join.\n";
for (std::thread &thread: threads) {
thread.join();
}
std::cout << "Done. c == " << results[0] << "\n";
}
This above with vector works. I didn't wrap the first args. I did wrap the final one.
The end answer: take a look at the very first argument to your method. It isn't listed as const.
Is it possible that "Main end" could get displayed before all result.get(); are returned back in below code snippet (Under any scenario)?
OR "Main end" will always be the last one to appear?
#include <iostream>
#include <vector>
#include <future>
#include <chrono>
using namespace std::chrono;
std::vector<std::future<int>> doParallelProcessing()
{
std::vector<std::future<int>> v;
for (int i = 0; i < 10; i++)
{
auto ret = std::async(std::launch::async, [&]() {
std::this_thread::sleep_for(seconds(i + 5));
return 5;
});
v.push_back(std::move(ret));
}
return v;
}
int main() {
std::vector<std::future<int>> results;
results = doParallelProcessing();
for (std::future<int>& result : results)
{
result.get();
}
std::cout << "Main end\n";
return 0;
}
cppreference.com std::atomic_flag listed two examples of spinlock in prior c++20 and c++20. The last modified date is 21 July 2020, at 12:58.
Prior c++20:
#include <thread>
#include <vector>
#include <iostream>
#include <atomic>
std::atomic_flag lock = ATOMIC_FLAG_INIT;
void f(int n)
{
for (int cnt = 0; cnt < 100; ++cnt) {
while (lock.test_and_set(std::memory_order_acquire)) // acquire lock
; // spin
std::cout << "Output from thread " << n << '\n';
lock.clear(std::memory_order_release); // release lock
}
}
int main()
{
std::vector<std::thread> v;
for (int n = 0; n < 10; ++n) {
v.emplace_back(f, n);
}
for (auto& t : v) {
t.join();
}
}
c++ 20:
#include <thread>
#include <vector>
#include <iostream>
#include <atomic>
std::atomic_flag lock = ATOMIC_FLAG_INIT;
void f(int n)
{
for (int cnt = 0; cnt < 100; ++cnt) {
while (lock.test_and_set(std::memory_order_acquire)) // acquire lock
while (lock.test(std::memory_order_relaxed)) // test lock
; // spin
std::cout << "Output from thread " << n << '\n';
lock.clear(std::memory_order_release); // release lock
}
}
int main()
{
std::vector<std::thread> v;
for (int n = 0; n < 10; ++n) {
v.emplace_back(f, n);
}
for (auto& t : v) {
t.join();
}
}
Two things that bother me in the c++20 example are:
(1) ATOMIC_FLAG_INIT is deprecated in c++20 and the default constructor should store value to false for us.
(2) The "optimization" by introducing while (lock.test(std::memory_order_relaxed)) after the flag has been set to true does not make sense to me. Shouldn't while (lock.test(std::memory_order_relaxed)) always immediately return in that case? Why is it an optimization to the prior c++20 example then?
Edit:
c++20 has introduced test() for atomic which simply checks if the flag is true by doing an atomic load. It is placed at the inner loop when test_and_set() has failed so that the computer first spins inside the test() while loop before going back to test_and_set() the second time.
See the comment:
that edit came from stackoverflow.com/q/62318642/2945027 – Cubbi
I tried to use multiple threads to insert into a boost::bimap. I have some shared variable between the threads, which I need to pass by reference and some of them are modified by each thread execution. However, I get the error:
Segmentation fault (core dumped)
I have the following code with me. I have tried to avoid concurrent access to the variables by using std::lock_guard<std::mutex> lock(mtx), but not able to make it work.
parallel_index.cpp
#include <iostream>
#include <string>
#include <algorithm>
#include <thread>
#include <mutex>
#include <boost/bimap.hpp>
#include <boost/bimap/unordered_set_of.hpp>
#include <boost/bimap/unordered_multiset_of.hpp>
#include "parallel_index.h"
namespace bimaps = boost::bimaps;
typedef boost::bimap<bimaps::unordered_set_of<uint64_t>,
bimaps::unordered_multiset_of<std::string> > bimap_reference;
typedef bimap_reference::value_type position;
bimap_reference reference_index_vector;
size_t total_threads = std::thread::hardware_concurrency();
std::string sequence_content = "ABCDDBACDDDCBBAAACBDAADCBDAAADCBDADADACBDDCBBBCDCBCDAADCBBCDAAADCBDA";
uint64_t sequence_length = sequence_content.length();
int split = 5;
uint64_t erase_length = 0;
unsigned int seq_itr = 0;
std::mutex mtx; // to protect against concurent access
int main(){
thread_test::create_index index;
std::thread threads[total_threads];
std::cout << total_threads << " threads lanched" << std::endl;
for(unsigned int i = 0; i < total_threads; i++){
threads[i] = std::thread(&thread_test::create_index::reference_index_hash, index,
std::ref(sequence_length), std::ref(split), std::ref(sequence_content), std::ref(erase_length));
}
for(unsigned int i = 0; i < total_threads; i++){
threads[i].join();
}
}
/*
* Creating index
*/
void thread_test::create_index::reference_index_hash(uint64_t &sequence_length, int &split,
std::string &sequence_content, uint64_t &erase_length ){
for (; seq_itr < sequence_length; ++seq_itr ){
std::lock_guard<std::mutex> lock(mtx);
std::string splitstr = sequence_content.substr(erase_length, split);
reference_index_vector.insert(position(seq_itr, splitstr));
seq_itr += split-1;
erase_length += split;
if(erase_length > 10000){
sequence_content.erase(0,erase_length);
erase_length = 0;
}
}
for( bimap_reference::const_iterator iter = reference_index_vector.begin(), iend = reference_index_vector.end();
iter != iend; ++iter ) {
std::cout << iter->left << " <--> "<< iter->right <<std::endl;
}
}
parallel_index.h
#ifndef PARALLEL_INDEX_H_
#define PARALLEL_INDEX_H_
#include<iostream>
#include <algorithm>
#include <utility>
#include <boost/bimap.hpp>
#include <boost/bimap/unordered_set_of.hpp>
#include <boost/bimap/unordered_multiset_of.hpp>
//typedef boost::unordered_map<int, std::pair<int, unsigned long int>& > reference_map;
namespace bimaps = boost::bimaps;
typedef boost::bimap<bimaps::unordered_set_of<uint64_t>,
bimaps::unordered_multiset_of<std::string > > bimap_reference;
typedef bimap_reference::value_type position;
extern bimap_reference reference_index_vector;
namespace thread_test{
class create_index{
public:
void reference_index_hash(uint64_t &sequence_length, int &split,
std::string &sequence_content, uint64_t &erase_length);
};
}
#endif /* PARALLEL_INDEX_H_ */
-------------------------------EDIT---------------------------------
I tried to divide the string content into partitions equal to the number of threads to have each thread its part locally available. But nothing seems to work. Some times it finishes the first thread and stops there after with a Segmentation fault (core dumped).
parallel_index.cpp
#include <iostream>
#include <string>
#include <algorithm>
#include <thread>
#include <mutex>
#include <boost/bimap.hpp>
#include <boost/bimap/unordered_set_of.hpp>
#include <boost/bimap/unordered_multiset_of.hpp>
#include "parallel_index.h"
namespace bimaps = boost::bimaps;
typedef boost::bimap<bimaps::unordered_set_of<uint64_t>,
bimaps::unordered_multiset_of<std::string> > bimap_reference;
typedef bimap_reference::value_type position;
bimap_reference reference_index_vector;
//create threads
size_t total_threads = std::thread::hardware_concurrency();
std::string sequence_content = "ABCDDBACDDDCBBAAACBDAADCBDAAADCBDADADACBDDCBBBCDCBCDAADCBBCDAAADCBDADDCCCAAABBBAAACDCA";
uint64_t sequence_length = sequence_content.length();
int split = 5;
// split the sequence_content equal to the number of threads, and assign each partition to each thread.
uint64_t each_partition_len = sequence_content.length()/total_threads- (sequence_content.length()/total_threads)%split ;
uint64_t last_partition_len = sequence_content.length()/total_threads +
(((sequence_content.length()/total_threads)%split)*(total_threads-1)) + sequence_content.length()%total_threads;
std::mutex mtx; // to protect against concurent access
int main(){
thread_test::create_index index;
std::thread threads[total_threads];
std::cout << total_threads << " threads lanched" << std::endl;
for(unsigned int i = 0; i < total_threads; i++){
if(i < total_threads-1)
threads[i] = std::thread(&thread_test::create_index::reference_index_hash, index,
std::ref(each_partition_len), std::ref(split), std::ref(sequence_content), i);
else
threads[i] = std::thread(&thread_test::create_index::reference_index_hash, index,
std::ref(last_partition_len), std::ref(split), std::ref(sequence_content), i);
//std::lock_guard<std::mutex> lck(mtx);
std::cout << "launched thread " << i << "with id " << threads[i].get_id() << std::endl;
}
for( bimap_reference::const_iterator iter = reference_index_vector.begin(), iend = reference_index_vector.end();
iter != iend; ++iter ) {
std::cout << iter->left << " <--> "<< iter->right <<std::endl;
}
for( unsigned int i = 0; i < total_threads; ++i){
if(threads[i].joinable()){
std::cout << "trying to join thread " << i << std:: endl;
threads[i].join();
std::cout << "thread joined " << i << std:: endl;
}
}
for( bimap_reference::const_iterator iter = reference_index_vector.begin(), iend = reference_index_vector.end();
iter != iend; ++iter ) {
std::cout << iter->left << " <--> "<< iter->right <<std::endl;
}
}
/*
* Creating index
*/
void thread_test::create_index::reference_index_hash(uint64_t &sequence_length, int &split,
std::string &sequence_content, int i ){
uint64_t seq_strt = 0;
// set seq_strt
if(i == 0)
seq_strt = sequence_length * i;
else
seq_strt = sequence_length * i + 1;
for (uint64_t seq_itr = seq_strt; seq_itr <= sequence_length; ++seq_itr ){
std::string splitstr = sequence_content.substr(seq_itr, split);
mtx.lock();
//std::lock_guard<std::mutex> lock(mtx);
reference_index_vector.insert(position(seq_itr, splitstr));
mtx.unlock();
seq_itr += split-1;
}
}
parallel_index.h
#ifndef PARALLEL_INDEX_H_
#define PARALLEL_INDEX_H_
#include<iostream>
#include <algorithm>
#include <utility>
#include <boost/bimap.hpp>
#include <boost/bimap/unordered_set_of.hpp>
#include <boost/bimap/unordered_multiset_of.hpp>
namespace bimaps = boost::bimaps;
typedef boost::bimap<bimaps::unordered_set_of<uint64_t>,
bimaps::unordered_multiset_of<std::string > > bimap_reference;
typedef bimap_reference::value_type position;
extern bimap_reference reference_index_vector;
namespace thread_test{
class create_index{
public:
void reference_index_hash(uint64_t &sequence_length, int &split,
std::string &sequence_content, int i);
};
}
#endif /* PARALLEL_INDEX_H_ */
I feel the culprit for segmentation fault is nothing but static linking of the libraries. Its not by incrementing seq_itr to a value bigger than the actual sequence length, because your for loop will never allow to enter if seq_itr is greater than actual sequence length. You try by removing the -static flag and it should work by not giving the segmentation fault, however it does not guarantee the correctness of the other code. More details about segmentation fault with thread can be found here
All the threads will try to get the lock in the critical section, to keep the bitmap intact, you need a conditional variable so that threads will orderly get executed. This is justified as you are using seq_itr as local variable inside reference_index_hash() and it needs to be incremented in proper sequence.
One problem in your original code is that unsigned int seq_itr is accessed without synchronization from multiple threads. Besides yielding invalid results it might lead to seq_itr being incremented to a value bigger than the actual sequence length, and the following accesses might lead to a crash.
The new code addresses this by just passing indexes, which should be OK as long as those indexes are non-overlapping and correctly calculated. I can't follow the logic completely, but in case your seq_strt calculation is off the program might also crash due to an invalid index. Should be easy to verify in a debugger or with some index assertions.
However there is an issue in the second code example with printing the map directly after threads are started with
for( bimap_reference::const_iterator iter = reference_index_vector.begin(), iend = reference_index_vector.end();
iter != iend; ++iter ) {
std::cout << iter->left << " <--> "<< iter->right <<std::endl;
}
This will not yield correct results, since the map is concurrently accessed by all worker threads. Access after the join()s is safe.
I'm getting my feet wet with Intel TBB and am trying to figure out why I cannot populate a vector passed in by reference to a TBB Task when I also pass in a function by reference.
Here is the code:
// tbbTesting.cpp : Defines the entry point for the console application.
#include "stdafx.h"
#include "tbb/task.h"
#include <functional>
#include <iostream>
#include <random>
#define NUM_POINTS 10
void myFunc(std::vector<double>& numbers)
{
std::mt19937_64 gen;
std::uniform_real_distribution<double> dis(0.0, 1000.0);
for (size_t i = 0; i < NUM_POINTS; i++)
{
auto val = dis(gen);
std::cout << val << std::endl; //proper values generated
numbers.push_back(val); //why is this failing?
}
std::cout << std::endl;
for (auto i : numbers)
{
std::cout << numbers[i] << std::endl; //garbage values
}
}
class TASK_generateRandomNumbers : public tbb::task
{
public:
TASK_generateRandomNumbers(std::function<void(std::vector<double>&)>& fnc,
std::vector<double>& nums) : _fnc(fnc), _numbers(nums) {}
~TASK_generateRandomNumbers() {};
tbb::task* execute()
{
_fnc(_numbers);
return nullptr;
}
private:
std::function<void(std::vector<double>&)>& _fnc;
std::vector<double>& _numbers;
};
class Manager
{
public:
Manager() { _numbers.reserve(NUM_POINTS); }
~Manager() {}
void GenerateNumbers()
{
_fnc = std::bind(&myFunc, _numbers);
TASK_generateRandomNumbers* t = new(tbb::task::allocate_root())
TASK_generateRandomNumbers(_fnc, _numbers);
tbb::task::spawn_root_and_wait(*t);
}
auto GetNumbers() const { return _numbers; }
private:
std::function<void(std::vector<double>&)> _fnc;
std::vector<double> _numbers;
};
int main()
{
Manager mgr;
mgr.GenerateNumbers();
auto numbers = mgr.GetNumbers(); //returns empty
}
When the execute method performs the operation, I can get values when passing the vector by reference.
When the execute method has to call a function, I get garbage data printed to the console (push_back failing?) and I get an empty container on return.
Can anyone see what I'm missing? Thanks.
I have found a couple of bugs that have nothing to do with tbb.
1) Your myFunc is using range for incorrectly. It does not return an index but each value directly in the vector in turn. Your code is casting each double to an int and using that as index into the array which is why you are gettign garbage.
2) When you use std::bind to create a functor the arguments are copied by value. If you want to pass in a reference then you need to use std::ref to wrap the argument.
If you are using c++11 then you might want to consider using a lambda rather than bind.
I've written a small program using your myFunc in different ways: with and without using std::ref and also a lambda example. You should see that it generates the same numbers 3 times but when it tries to print out v1 it wont contain anything because the generated values were placed in a copy.
#include <vector>
#include <random>
#include <iostream>
#include <functional>
constexpr size_t NUM_POINTS = 10;
void myFunc(std::vector<double>& numbers)
{
std::mt19937_64 gen;
std::uniform_real_distribution<double> dis(0.0, 1000.0);
for (size_t i = 0; i < NUM_POINTS; i++)
{
auto val = dis(gen);
std::cout << val << std::endl; //proper values generated
numbers.push_back(val); //why is this failing? it's not
}
std::cout << std::endl;
}
void printNumbers(std::vector<double>const& numbers)
{
for (auto number : numbers)
{
std::cout << number << std::endl;
}
std::cout << std::endl;
}
int main()
{
std::cout << "generating v1" << std::endl;
std::vector<double> v1;
auto f1 = std::bind(&myFunc, v1);
f1();
printNumbers(v1);
std::cout << "generating v2" << std::endl;
std::vector<double> v2;
auto f2= std::bind(&myFunc, std::ref(v2));
f2();
printNumbers(v2);
std::cout << "generating v3" << std::endl;
std::vector<double> v3;
auto f3 = [&v3]() { myFunc(v3); }; //using a lambda
f3();
printNumbers(v3);
return 0;
}