I wrote a simple C++ code to find the minimal value of a vector, find below. It compiles both on VC++ and g++, but runs onto a segmentation fault on the latter. I cannot tell apart if my code contains an UB or the g++ contains a bug. Can someone identify any mistake in my code?
The segfault arises at thread::join().
some debugging info
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) where
#0 0x0000000000000000 in ?? ()
#1 0x00000000004688f7 in std::thread::join() ()
#2 0x0000000000000000 in ?? ()
(gdb) thread
[Current thread is 1 (Thread 0x7c6880 (LWP 24015))]
Here is the code
#include <iostream>
#include <random>
#include <thread>
#include <vector>
#include <algorithm>
using namespace std;
void find_min(vector<double>& x, double& min_val, int& min_id)
{
min_id = distance(x.begin(), min_element(x.begin(), x.end()));
min_val = x[min_id];
}
void find_part_min(vector<double>& x, vector<int>& min_ids, vector<double>& min_vals, int id)
{
int start_id = (x.size()*id) / min_vals.size();
int end_id = (x.size()*(id + 1)) / min_vals.size();
for (int i = start_id; i < end_id; ++i)
{
if (x[i] < min_vals[id])
{
min_ids[id] = i;
min_vals[id] = x[i];
}
}
}
int main()
{
// define variables
int Nthreads = 16;
vector<double> x(256 * 256);
int min_id = 0;
double min_val = 0;
// fill up vector with random content
mt19937 gen(0);
uniform_real_distribution<> dis(0, 1);
generate(x.begin(), x.end(), bind(dis,gen));
// find min serial
find_min(x, min_val, min_id);
cout << min_id << "\t" << min_val << endl;
// initilaize variables for parallel computing
vector<double> min_vals(Nthreads, numeric_limits<double>::infinity());
vector<int> min_ids(Nthreads, -1);
vector<thread> myThreads;
for (int id = 0; id < Nthreads; ++id) // define each thread
{
thread myThread(find_part_min, ref(x), ref(min_ids), ref(min_vals), id);
myThreads.push_back(move(myThread));
}
for (int id = 0; id < Nthreads; ++id)
myThreads[id].join(); // part-calculations are finished
// merging the results together
min_val = numeric_limits<double>::infinity();
min_id = -1;
for (int i = 0; i < Nthreads; ++i)
{
if (min_vals[i] < min_val)
{
min_val = min_vals[i];
min_id = min_ids[i];
}
}
cout << min_id << "\t" << min_val << endl;
return 0;
}
I looked into the Makefile, and -static was used without -whole-archive, which leads to problem under g++ according https://gcc.gnu.org/ml/gcc-help/2010-05/msg00029.html
It tells that if libstdc++ is configured without __thread support and linking against -lpthread is static, this can happen due to a libstdc++ bug.
You should use -pthread as option for every compilation phase with GCC (g++), rather than linking against -lpthread.
There's more stuff involved than simple linkage using that flag actually.
Related
I have some issues with C++ thread library. I have adapted my old code for optimization of a new problem and suddenly an uncaught exception is thrown (During runtime). Some threads are joined successfully, but there's always at least one throwing an exception. Only thing which I changed in the code, was data representation. They were earlier held in 2D arrays and now I am using 1D vectors. Here is the exact runtime error I receive:
libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: thread::join failed: Invalid argument
Below I provide the minimal reproducible example, throwing the exact same exception like in the case of the complete code.
Here is the main function:
#include <iostream>
#include <thread>
#include <vector>
void reforming(const int, const std::vector<int>&, const std::vector<double>&, const std::vector<double>&);
int main() {
int i, k;
int spec_max = 10;
int seg_max = 4;
size_t pop_size = spec_max * seg_max;
std::vector<int> seg(pop_size);
std::vector<double> por(pop_size);
std::vector<double> por_s(pop_size);
for(i = 0; i < pop_size; i++){
if(i % 2 == 0){
seg[i] = 1;
por[i] = 0.5;
por_s[i] = 0.0015;
} else {
seg[i] = 0;
por[i] = 0.7;
por_s[i] = 0.002;
}
}
std::vector<std::thread> Ref(spec_max);
for(k = 0; k < spec_max; k++){
Ref.emplace_back(reforming, k, seg, por, por_s);
}
for(auto &X : Ref){
X.join();
}
return 0;
}
and the "reforming" function:
#include <iostream>
#include <vector>
void reforming(const int m, const std::vector<int>& cat_check, const std::vector<double>& por,
const std::vector<double>& por_s){
std::cout << m << " Hello from the thread\n";
}
I am using the CLion software on MacOS Catalina and currently have no other OS available to test the code.
The following code:
std::vector<std::thread> Ref(spec_max);
for(k = 0; k < spec_max; k++){
Ref.emplace_back(reforming, k, seg, por, por_s);
}
creates 2 * spec_max threads, the first spec_max threads are default-initialized. Trying to join a default-initialized thread throws std::system_error.
A fix:
std::vector<std::thread> Ref;
Ref.reserve(spec_max);
for(k = 0; k < spec_max; k++){
Ref.emplace_back(reforming, k, seg, por, por_s);
}
I passed a structure into pthread_create. One component of the structure is a vector data. the "data" push_back in a loop in each thread. When the size of the loop is small, the code runs correctly. When the loop is large. I got the following error message:
munmap_chunk(): invalid pointer
munmap_chunk(): invalid pointer
Aborted (core dumped)
I tried m<100, it works. When trying the m<1000, it shows the error.
// compile using: g++ parallel_2.C -o oo -lpthread
#include <iostream>
#include <cstdlib>
#include <vector>
#include <thread>
using namespace std;
const unsigned NUM_THREADS = std::thread::hardware_concurrency();
//
struct INPUT
{
int start;
int end;
vector<int> data;
};
//
void *Loop(void *param)
{
INPUT *in = (INPUT*)param;
int start = in->start;
int end = in->end;
cout<<" start: "<<start<<" end: "<<end<<endl;
//for(int m=0; m<100000000; m++)
for(int i = start;i < end;i++)
for(int m=0; m<1000; m++) {
in->data.push_back(i);
}
//pthread_exit(NULL);
}
//
int main ()
{
pthread_t threads[NUM_THREADS];
INPUT input[NUM_THREADS];
for( int i=0; i < NUM_THREADS; i++ ){
cout << "main() : creating thread, " << i << endl;
input[i].start = i*5;
input[i].end = input[i].start + 5;
int rc = pthread_create(&threads[i], NULL,
Loop, (void *)&input[i]);
if (rc){
cout << "Error:unable to create thread," << rc << endl;
exit(-1);
}
}
for(int i = 0; i<NUM_THREADS; i++)
cout<<"!! size of "<<i<<": "<<input[0].data.size()<<endl;
pthread_exit(NULL);
}
munmap_chunk(): invalid pointer
munmap_chunk(): invalid pointer
Aborted (core dumped)
In the specific case of this example (main() assumes that the threads are done and consults the modified structures), you have to join() a thread before accessing the structure it is modifying.
for(int i = 0; i<NUM_THREADS; i++)
{
pthread_join(threads[i], NULL);
cout<<"!! size of "<<i<<": "<<input[0].data.size()<<endl;
}
This way, you are certain it is done, and not modifying the structure any more.
The problem did not show up with very few iterations because the threads had probably (but nothing is certain) ended their task before your last loop in main() tried to access their structures.
By the way, you should consider using std::thread.
(https://en.cppreference.com/w/cpp/thread/thread/thread)
I have a integer variable, that contains the number of threads to execute. Lets call it myThreadVar. I want to execute myThreadVar threads, and cannot think of any way to do it, without a ton of if statements. Is there any way I can create myThreadVar threads, no matter what myThreadVar is?
I was thinking:
for (int i = 0; i < myThreadVar; ++i) { std::thread t_i(myFunc); }, but that obviously won't work.
Thanks in advance!
Make an array or vector of threads, put the threads in, and then if you want to wait for them to finish have a second loop go over your collection and join them all:
std::vector<std::thread> myThreads;
myThreads.reserve(myThreadVar);
for (int i = 0; i < myThreadVar; ++i)
{
myThreads.push_back(std::thread(myFunc));
}
While other answers use vector::push_back(), I prefer vector::emplace_back(). Possibly more efficient. Also use vector::reserve(). See it live here.
#include <thread>
#include <vector>
void func() {}
int main() {
int num = 3;
std::vector<std::thread> vec;
vec.reserve(num);
for (auto i = 0; i < num; ++i) {
vec.emplace_back(func);
}
for (auto& t : vec) t.join();
}
So, obvious the best solution is not to wait previous thread to done. You need to run all of them in parallel.
In this case you can use vector class to store all of instances and after that make join to all of them.
Take a look at my example.
#include <thread>
#include <vector>
void myFunc() {
/* Some code */
}
int main()
{
int myThreadVar = 50;
std::vector <thread> threadsToJoin;
threadsToJoin.resize(myThreadVar);
for (int i = 0; i < myThreadVar; ++i) {
threadsToJoin[i] = std::thread(myFunc);
}
for (int i = 0; i < threadsToJoin.size(); i++) {
threadsToJoin[i].join();
}
}
#include <iostream>
#include <thread>
void myFunc(int n) {
std::cout << "myFunc " << n << std::endl;
}
int main(int argc, char *argv[]) {
int myThreadVar = 5;
for (int i = 0; i < myThreadVar; ++i) {
std::cout << "Launching " << i << std::endl;
std::thread t_i(myFunc,i);
t_i.detach();
}
}
g++ -std=c++11 -o 35106568 35106568.cpp
./35106568
Launching 0
myFunc 0
Launching 1
myFunc 1
Launching 2
myFunc 2
Launching 3
myFunc 3
Launching 4
myFunc 4
You need to store the thread so you can send it to join.
std::thread t[myThreadVar];
for (int i = 0; i < myThreadVar; ++i) { t[i] = std::thread(myFunc); }//Start all threads
for (int i = 0; i < myThreadVar; ++i) {t[i].join;}//Wait for all threads to finish
I think this is valid syntax, but I'm more used to c so I am unsure if I initialized the array correctly.
I tried to write this code
float* theArray; // the array to find the minimum value
int index, i;
float thisValue, min;
index = 0;
min = theArray[0];
#pragma omp parallel for reduction(min:min_dist)
for (i=1; i<size; i++) {
thisValue = theArray[i];
if (thisValue < min)
{ /* find the min and its array index */
min = thisValue;
index = i;
}
}
return(index);
However this one is not outputting correct answers. Seems the min is OK but the correct index has been destroyed by threads.
I also tried some ways provided on the Internet and here (using parallel for for outer loop and use critical for final comparison) but this cause a speed drop rather than speedup.
What should I do to make both the min value and its index correct? Thanks!
I don't know of an elegant want to do a minimum reduction and save an index. I do this by finding the local minimum and index for each thread and then the global minimum and index in a critical section.
index = 0;
min = theArray[0];
#pragma omp parallel
{
int index_local = index;
float min_local = min;
#pragma omp for nowait
for (i = 1; i < size; i++) {
if (theArray[i] < min_local) {
min_local = theArray[i];
index_local = i;
}
}
#pragma omp critical
{
if (min_local < min) {
min = min_local;
index = index_local;
}
}
}
With OpenMP 4.0 it's possible to use user-defined reductions. A user-defined minimum reduction can be defined like this
struct Compare { float val; sizt_t index; };
#pragma omp declare reduction(minimum : struct Compare : omp_out = omp_in.val < omp_out.val ? omp_in : omp_out)
Then the reduction can be done like this
struct Compare min;
min.val = theArray[0];
min.index = 0;
#pragma omp parallel for reduction(minimum:min)
for(int i = 1; i<size; i++) {
if(theArray[i]<min.val) {
min.val = a[i];
min.index = i;
}
}
That works for C and C++. User defined reductions have other advantages besides simplified code. There are multiple algorithms for doing reductions. For example the merging can be done in O(number of threads) or O(Log(number of threads). The first solution I gave does this in O(number of threads) however using user-defined reductions let's OpenMP choose the algorithm.
Basic Idea
This can be accomplished without any parellelization-breaking critical or atomic sections by creating a custom reduction. Basically, define an object that stores both the index and value, and then create a function that sorts two of these objects by only the value, not the index.
Details
An object to store an index and value together:
typedef std::pair<unsigned int, float> IndexValuePair;
You can access the index by accessing the first property and the value by accessing the second property, i.e.,
IndexValuePair obj(0, 2.345);
unsigned int ix = obj.first; // 0
float val = obj.second; // 2.345
Define a function to sort two IndexValuePair objects:
IndexValuePair myMin(IndexValuePair a, IndexValuePair b){
return a.second < b.second ? a : b;
}
Then, construct a custom reduction following the guidelines in the OpenMP documentation:
#pragma omp declare reduction \
(minPair:IndexValuePair:omp_out=myMin(omp_out, omp_in)) \
initializer(omp_priv = IndexValuePair(0, 1000))
In this case, I've chosen to initialize the index to 0 and the value to 1000. The value should be initialized to some number larger than the largest value you expect to sort.
Functional Example
Finally, combine all these pieces with the parallel for loop!
// Compile with g++ -std=c++11 -fopenmp demo.cpp
#include <iostream>
#include <utility>
#include <vector>
typedef std::pair<unsigned int, float> IndexValuePair;
IndexValuePair myMin(IndexValuePair a, IndexValuePair b){
return a.second < b.second ? a : b;
}
int main(){
std::vector<float> vals {10, 4, 6, 2, 8, 0, -1, 2, 3, 4, 4, 8};
unsigned int i;
IndexValuePair minValueIndex(0, 1000);
#pragma omp declare reduction \
(minPair:IndexValuePair:omp_out=myMin(omp_out, omp_in)) \
initializer(omp_priv = IndexValuePair(0, 1000))
#pragma omp parallel for reduction(minPair:minValueIndex)
for(i = 0; i < vals.size(); i++){
if(vals[i] < minValueIndex.second){
minValueIndex.first = i;
minValueIndex.second = vals[i];
}
}
std::cout << "minimum value = " << minValueIndex.second << std::endl; // Should be -1
std::cout << "index = " << minValueIndex.first << std::endl; // Should be 6
return EXIT_SUCCESS;
}
Because you're not only trying to find the minimal value (reduction(min:___)) but also retain the index, you need to make the check critical. This can significantly slow down the loop (as reported). In general, make sure that there is enough work so you don't encounter overhead as in this question. An alternative would be to have each thread find the minimum and it's index and save them to a unique variable and have the master thread do a final check on those as in the following program.
#include <iostream>
#include <vector>
#include <ctime>
#include <random>
#include <omp.h>
using std::cout;
using std::vector;
void initializeVector(vector<double>& v)
{
std::mt19937 generator(time(NULL));
std::uniform_real_distribution<double> dis(0.0, 1.0);
v.resize(100000000);
for(int i = 0; i < v.size(); i++)
{
v[i] = dis(generator);
}
}
int main()
{
vector<double> vec;
initializeVector(vec);
float minVal = vec[0];
int minInd = 0;
int startTime = clock();
for(int i = 1; i < vec.size(); i++)
{
if(vec[i] < minVal)
{
minVal = vec[i];
minInd = i;
}
}
int elapsedTime1 = clock() - startTime;
// Change the number of threads accordingly
vector<float> threadRes(4, std::numeric_limits<float>::max());
vector<int> threadInd(4);
startTime = clock();
#pragma omp parallel for
for(int i = 0; i < vec.size(); i++)
{
{
if(vec[i] < threadRes[omp_get_thread_num()])
{
threadRes[omp_get_thread_num()] = vec[i];
threadInd[omp_get_thread_num()] = i;
}
}
}
float minVal2 = threadRes[0];
int minInd2 = threadInd[0];
for(int i = 1; i < threadRes.size(); i++)
{
if(threadRes[i] < minVal2)
{
minVal2 = threadRes[i];
minInd2 = threadInd[i];
}
}
int elapsedTime2 = clock() - startTime;
cout << "Min " << minVal << " at " << minInd << " took " << elapsedTime1 << std::endl;
cout << "Min " << minVal2 << " at " << minInd2 << " took " << elapsedTime2 << std::endl;
}
Please note that with optimizations on and nothing else to be done in the loop, the serial version seems to remain king. With optimizations turned off, OMP gains the upper hand.
P.S. you wrote reduction(min:min_dist) and the proceeded to use min instead of min_dist.
Actually, we can use omp critical directive to make only one thread run the code inside the critical region at a time.So only one thread can run it and the indexvalue wont be destroyed by other threads.
About omp critical directive:
The omp critical directive identifies a section of code that must be executed by a single thread at a time.
This code solves your issue:
#include <stdio.h>
#include <omp.h>
int main() {
int i;
int arr[10] = {11,42,53,64,55,46,47, 68, 59, 510};
float* theArray; // the array to find the minimum value
int index;
float thisValue, min;
index = 0;
min = arr[0];
int size=10;
#pragma omp parallel for
for (i=1; i<size; i++) {
thisValue = arr[i];
#pragma omp critical
if (thisValue < min)
{ /* find the min and its array index */
min = thisValue;
index = i;
}
}
printf("min:%d index:%d",min,index);
return 0;
}
I'm writing a program that uses the GSL's random number generator, and I am getting a segmentation fault when I try to pass an instance of the random number generator to a function. Here is my source code:
int main(void)
{
gsl_rng *r;
int deck[52];
int count = 0;
r = gsl_rng_alloc(gsl_rng_mt19937);
gsl_rng_set(r, time(NULL));
// Initialize a custom deck
// code omitted...
// Perform trials
for (int j = 0; j < NUMTRIALS; j++) {
shuffle_two(r, deck);
if (deck[NUMCARDS-1] + deck[NUMCARDS-2] == 11)
count++;
}
// Report result
cout << fixed << setprecision(6) << count/static_cast<double>(NUMTRIALS);
cout << endl;
gsl_rng_free(r);
}
void shuffle_two(gsl_rng* r, int deck[])
{
double u;
int bottom, random;
int temp_card;
for (int i = 0; i < 2; i++) {
u = gsl_rng_uniform(r);
//code for shuffling goes here
}
}
Evidently the value of r is changing while the algorithm is running. When I do a backtrace, I get r as sometimes null, sometimes 0xa. I'm not sure why. I think it might have something to do with the const pointer argument to the gsl_rng_uniform function, as documented here.
Here is the output of the debugger:
Program received signal SIGSEGV, Segmentation fault.
gsl_rng_uniform (r=0x0) at ../gsl/gsl_rng.h:167
167 ../gsl/gsl_rng.h: No such file or directory.
in ../gsl/gsl_rng.h
(gdb) backtrace
#0 gsl_rng_uniform (r=0x0) at ../gsl/gsl_rng.h:167
#1 0x0000000000400d97 in shuffle_two (r=0x0, deck=0x7fffffffdfd0)
at blackjack.cpp:55
#2 0x0000000000400cad in main () at blackjack.cpp:33
(gdb)