Segmentation fault during multithreaded quicksort in c++ - c++

#include <iostream>
#include <algorithm>
#include <future>
#include <iterator>
using namespace std;
void qsort(int *beg, int *end)
{
if (end - beg <= 1)
return;
int lhs = *beg;
int *mid = partition(beg + 1, end,
[&](int arg)
{
return arg < lhs;
}
);
swap(*beg, *(mid - 1));
qsort(beg, mid);
qsort(mid, end);
}
std::future<void> qsortMulti(int *beg, int *end) // SEG FAULT
{
if (end - beg <= 1)
return future<void>();
int lhs = *beg;
int *mid = partition(beg + 1, end,
[&](int arg)
{
return arg < lhs;
}
);
swap(*beg, *(mid - 1));
//spawn new thread for one side of the recursion
auto future = async(launch::async, qsortMulti, beg, mid);
//other side of the recursion is done in the current thread
qsortMulti(mid, end);
future.wait();
inplace_merge(beg, mid, end);
}
void printArray(int *arr, size_t sz)
{
for (size_t i = 0; i != sz; i++)
cout << arr[i] << ' ';
cout << endl;
}
int main()
{
int ia[] = {5,3,6,8,4,6,2,5,2,9,7,8,4,2,6,8};
int ia2[] = {5,3,6,8,4,6,2,5,2,9,7,8,4,2,6,8};
size_t iaSize = 16;
size_t ia2Size = 16;
qsort(ia, ia + iaSize);
printArray(ia, iaSize);
qsortMulti(ia2, ia2 + ia2Size);
printArray(ia2, ia2Size);
}
From the above piece of code it is clear I am simply trying to implement the same qsort function, but with multiple threads. The other questions and answers on stack overflow regarding related issue have led me to this version of the code, which leaves me with a very simple problem and related question:
What is causing the multithreaded section to cause segmentation faults?
To be clear: I do not require anyone to build a solution for me, I'd much rather have an indication or directions as to where to find the source of the segmentation fault, as I don't see it. Thanks in advance!

In order to make std::async return an object of type std::future<T>, the function you pass to it merely has to return T. Example:
int compute() { return 42; }
std::future<int> result = std::async(&compute);
In your case that means that qsortMulti is supposed to have the signature
void qsortMulti(int* beg, int* end);
and nothing has to be returned from it. In the code you provided, qsortMulti returns std::future<void> itself, which leads to std::async returning an object of type std::future<std::future<void>>, which is probably not what you intended. Furthermore, your function is only returning something in the case where the range is empty (in the if at the top). In all other code paths (e.g. reaching the end of the function) you are not returning anything at all, which leads to the caller accessing an uninitialized object, what may be the reason for the seg fault.

Related

Vector processing issues in multi threading

I'm implement about the data process in multi thread.
I want to process data in class DataProcess and merge the data in class DataStorage.
My problem is when the data is add to the vector sometimes occurs the exception error.
In my opinions, there have a different address class
Is it a problem to create a new data handling class and process each data?
Here is my code.
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <thread>
#include <vector>
#include <mutex>
using namespace::std;
static std::mutex m;
class DataStorage
{
private :
std::vector<long long> vecData;
public:
DataStorage()
{
}
~DataStorage()
{
}
void SetDataVectorSize(int size)
{
vecData.clear();
vecData.resize(size);
}
void DataInsertLoop(void* Data, int start, int end)
{
m.lock();
std::vector<long long> const * _v1 = static_cast<std::vector<long long> const *>(Data);
long long num = 0;
for (int idx = start; idx < _v1->size(); ++idx)
{
vecData[idx] = _v1->at(idx);
}
m.unlock();
}
};
class DataProcess
{
private:
int m_index;
long long m_startIndex;
long long m_endIndex;
int m_coreNum;
long long num;
DataStorage* m_mainStorage;
std::vector<long long> m_vecData;
public :
DataProcess(int pindex, long long startindex, long long endindex)
: m_index(pindex), m_startIndex(startindex), m_endIndex(endindex),
m_coreNum(0),m_mainStorage(NULL), num(0)
{
m_vecData.clear();
}
~DataProcess()
{
}
void SetMainAdrr(DataStorage* const mainstorage)
{
m_mainStorage = mainstorage;
}
void SetCoreInCPU(int num)
{
m_coreNum = num;
}
void DataRun()
{
for (long long idx = m_startIndex; idx < m_endIndex; ++idx)
{
num += rand();
m_vecData.push_back(num); //<- exception error position
}
m_mainStorage->DataInsertLoop(&m_vecData, m_startIndex, m_endIndex);
}
};
int main()
{
//auto beginTime = std::chrono::high_resolution_clock::now();
clock_t beginTime, endTime;
DataStorage* main = new DataStorage();
beginTime = clock();
long long totalcount = 200000000;
long long halfdata = totalcount / 2;
std::thread t1,t2;
for (int t = 0; t < 2; ++t)
{
DataProcess* clsDP = new DataProcess(1, 0, halfdata);
clsDP->SetCoreInCPU(2);
clsDP->SetMainAdrr(main);
if (t == 0)
{
t1 = std::thread([&]() {clsDP->DataRun(); });
}
else
{
t2 = std::thread([&]() {clsDP->DataRun(); });
}
}
t1.join(); t2.join();
endTime = clock();
double resultTime = (double)(endTime - beginTime);
std::cout << "Multi Thread " << resultTime / 1000 << " sec" << std::endl;
printf("--------------------\n");
int value = getchar();
}
Interestingly, if none of your threads accesses portions of vecData accessed by another thread, DataInsertLoop::DataInsertLoop should not need to be synchonized at all. That should make processsing much faster. That is, after all bugs are fixed... This also means, you should not need a mutex at all.
There are other issues with your code... The most easily spotted is a memory leak.
In main:
DataStorage* main = new DataStorage(); // you call new, but never call delete...
// that's a memory leak. Avoid caling
// new() directly.
//
// Also: 'main' is kind of a reserved
// name, don't use it except for the
// program entry point.
// How about this, instead ?
DataStorage dataSrc; // DataSrc has a very small footprint (a few pointers).
// ...
std::thread t1,t2; // why not use an array ?
// as in:
std::vector<std::tread> thrds;
// ...
// You forgot to set the size of your data set before starting, by calling:
dataSrc.SetDataVectorSize(200000000);
for (int t = 0; t < 2; ++t)
{
// ...
// Calling new again, and not delete... Use a smart pointer type
DataProcess* clsDP = new DataProcess(1, 0, halfdata);
// Also, fix the start and en indices (NOTE: code below works for t < 2, but
// probably not for t < 3)
auto clsDP = std::make_unique<DataProcess>(t, t * halfdata, (t + 1) * halfdata);
// You need to keep a reference to these pointers
// Either by storing them in an array, or by passing them to
// the threads. As in, for example:
thrds.emplace_back([dp = std::move(clsDP)]() {clsDP->DataRun(); });
}
//...
std::for_each(thrds.begin(), thrds.end(), [](auto& t) { t.join(); });
//...
More...
You create a mutex on your very first line of executable code. That's good... somewhat...
static std::mutex m; // a one letter name is a terrible choice for a variable with
// file scope.
Apart form the name, it's not in the right scope... If you want to use a mutex to protect DataStorage::vecData, this mutex should be declared in the same scope as DataStorage::vecData.
One last thing. Have you considered using iterators (aka pointers) as arguments to DataProcess::DataProcess() ? This would simplify the code quite a bit, and it would very likely run faster.

How do I pass a template function to a thread within the same .cpp file?

I have an assignment to implement a parallel version of the longest common subsequence algorithm (just calculating the LCS length). The program must use threads in order to complete the task as quickly as possible (at least, faster than a sequential implementation). Ideally, it should also utilize TLS in the threads. We had a similar assignment that implemented a template within a .hpp file and I want to use the same template, but it does not seem like I can use a .hpp file in this assignment. My problem lies in passing the template function to my threads. Below is the code:
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <iostream>
#include <unistd.h>
#include <chrono>
#include <thread>
#include <functional>
#ifdef __cplusplus
extern "C" {
#endif
void generateLCS(char* X, int m, char* Y, int n);
void checkLCS(char* X, int m, char* Y, int n, int result);
#ifdef __cplusplus
}
#endif
class ParFor {
public:
template<typename TLS>
void parfor(size_t beg, size_t endm, size_t endn, size_t increment,
std::function<void(TLS&)> before,
std::function<void(int, int, TLS&)> f,
std::function<void(TLS&)> after
) {
TLS tls;
before(tls);
for (size_t a=beg; a<endm; a+= increment) {
for (size_t b=beg; b<endn; b+= increment) {
f(a, b, tls);
}
}
after(tls);
}
};
int main (int argc, char* argv[]) {
if (argc < 4) { std::cerr<<"usage: "<<argv[0]<<" <m> <n> <nbthreads>"<<std::endl;
return -1;
}
int m = atoi(argv[1]);
int n = atoi(argv[2]);
int nbthreads = atoi(argv[3]);
// get string data
char *X = new char[m];
char *Y = new char[n];
generateLCS(X, m, Y, n);
int result = 0; // length of common subsequence
std::vector<std::thread> threads (nbthreads);
int mSubset = m / nbthreads;
int nSubset = n / nbthreads;
ParFor pf;
std::chrono::time_point<std::chrono::system_clock> start = std::chrono::system_clock::now();
for(int j = 0; j < nbthreads; j++) {
threads.push_back(std::thread(&ParFor::parfor, &pf, 0, mSubset, nSubset, 1,
[&](int& tls) -> void{
tls = result;
},
[&](int a, int b, int& tls) -> void{
if (X[a] == Y[b])
tls++;
},
[&](int tls) -> void{
result += tls;
}));
}
for(auto& t: threads)
t.join();
std::chrono::time_point<std::chrono::system_clock> end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
if(m < 10 || n < 10)
result = 0;
checkLCS(X, m, Y, n, result);
std::cerr<<elapsed_seconds.count()<<std::endl;
delete[] X;
delete[] Y;
return 0;
}
The "class ParFor" and everything from "int result" down to the end is what I added, the rest is pre-written code from the TA. If more clarification is needed, please let me know. Thank you.
You are right, you cannot pass a function template as if it was a function. They are different things, just like a cookie cutter is not a cookie.
You have two main problems in your code.
First, since ParFor::parfor is a template, you can only take a member function pointer to it if you provide template parameters that match what your lambdas use for TLS, so change it to this (for example):
&ParFor::parfor<int>
Your second problem is trying to pass a lambda to a function template, and expect deduction to say "this lambda is convertible to a std::function, so it's a match, deduce from that". For deduction, it needs to match the type exactly, and a lambda is not a std::function. If you pass in std::function objects, then it can deduce the template parameters.
So change the loop creating threads to wrap the lambdas before passing them, and it will compile. There's too much going on in your code for me to want to dig into the details beyond this, so if there are other bugs, those are still yours. :)
for(int j = 0; j < nbthreads; j++) {
threads.push_back(std::thread(
&ParFor::parfor<int>, &pf, 0, mSubset, nSubset, 1,
std::function{[&](int& tls) -> void{
tls = result;
}},
std::function{[&](int a, int b, int& tls) -> void {
if (X[a] == Y[b])
tls++;
}},
std::function{[&](int tls) -> void{
result += tls;
}}));
}
also, the trailing return type designating your lambdas return void is redundant. You can simply remove the -> void without changing anything.

When creating a thread how do I run the threads main method?

The main method of my thread is:
void thrMain(const std::vector<long>& list, std::vector<int>& result,
const int startInd, const int endInd) {
for (int i = startInd; (i < endInd); i++) {
result[i] = countFactors(list[i]);
}
}
I create a list of threads each time using another method:
std::vector<int> getFactorCount(const std::vector<long>& numList, const int thrCount) {
// First allocate the return vector
const int listSize = numList.size();
const int count = (listSize / thrCount) + 1;
std::vector<std::thread> thrList; // List of threads
const std::vector<long> interFac(thrCount); // Intermediate factors
// Store factorial counts
std::vector<int> factCounts(numList.size());
for (int start = 0, thr = 0; (thr < thrCount); thr++, start += count) {
int end = std::max(listSize, (start + count));
thrList.push_back(std::thread(thrMain, std::ref(numList),
std::ref(interFac[thr]), start, end));
}
for (auto& t : thrList) {
t.join();
}
// Return the result back
return factCounts;
}
The main problem I am having is that the std::ref(interFac[thr]) is making my #include <thread> file not work. I have tried taking away the pass by reference and that does not help the problem.
I don't know what interFac is for, but it looks like this:
thrList.push_back(std::thread(thrMain, std::ref(numList),
std::ref(interFac[thr]), start, end));
should be this:
thrList.push_back(std::thread(thrMain, std::ref(numList),
std::ref(factCounts), start, end));
Then it compiles.

recursive binary search in c++ using a bool function

I have an school assignement that requires me to create a recursive Binary search function. I'm not allowed to change the function signature.
My experience with pointer isn't the best and i think my problem lies there.
I get an Stackoveflow but i dont really understand way
bool contains(const int* pBegin, const int* pEnd, int x)
{
int length = pEnd - pBegin;//gives me the length of the array
const int* pMid = pBegin + (length / 2);
if(length == 1)
{
if(*pMid != x)
return false;
return true;
}
else if(x < *pMid)
return contains(pBegin, pMid-1, x);
else
return contains(pMid, pEnd, x);
}
void main(){
setlocale(LC_ALL, "swedish");
int arr[10];
for(int i = 0; i < 10; ++i)
arr[i] = i;
bool find = contains(&arr[0], &arr[10], 3);//arr[10] points to the index after the array!
cout <<"found = "<< find << endl;
system("pause");
}
Can somebody please explain to me what I'm doing wrong, and how i could do it in a better way?
Stack overflow is due to too deep recursion.
Its unlikely your array is large enough to really be a problem, so what you have is unbounded recursion ... contains() keeps calling itself and fails to detect this.
Look at how this is possible, and add assertions.
Your code assumes
pEnd > pBegin
Your code doesn't handle this possibility.
#include <assert.h>
bool contains( ... )
{
assert(pBegin > pEnd);
...
Now, it will abort if this assumption is incorrect.
There are two possibities for (pEnd > pBegin) being false, namely "<" or "==".
What does your code do in these two cases?
Spoiler below..
Length can be zero and isn't handled.

making this function thread safe while maintaining recursivity

How do I make this function thread safe while maintaining the recursive nature of the code?
int foo(char *p)
{
static int i = 0;
if (*p == '\0') return i;
i++;
return foo(p+1);
}
#include <iostream>
using namespace std;
int foo(char* p, int start)
{
if (*p == 0) return start;
return foo(p+1, start+1);
}
int main()
{
char test[] = "HI THERE";
cout << foo(test, 0);
return 0;
}
In C++11, you could use thread_local:
int foo(char *p)
{
thread_local int i = 0;
if (*p == '\0') return i;
i++;
return foo(p+1);
}
The function is, I hope, just an example as i=0 will be executed only once (in your example) and once per thread in my example.
Older compilers sometimes support static __thread as a pre-C++11-alternative.
int foo(char *p, int i = 0)
{
if(*p == '\0')
return i;
return foo(p+1, i+1);
}
Recursion is nice and all, but it can be less efficient than a loop if stack frames are created. It's the easiest way to cause a stack overflow. I would recommend getting rid of it. The following is simpler and likely faster:
int foo(char *p)
{
return strlen(p);
}
Or better yet, just call strlen directly and get rid of foo.
Note that this is pretty unsafe. What if a '\0' doesn't come? You'll just read on into who knows what...