C++: Multithreaded merge sort witn N threads

C++: Multithreaded merge sort witn N threads - c++

I`m trying to write merge sort with 2 threads.
I divide array into 2 pieces and sort each half with usual merge sort. After that I just merge two sorted parts.
Usual merge sort works correctly, and if I apply it to eash part without threads, it works correctly too.
I run a lof of tests on randomly generated short arrays, and there can be 2k of correct tests, but sometimes my multithread sort doesn`t work properly.
After sorting each half but before merging them, I check them. Sometimes the set of numbers in current part of array occurs to be different from orinigal set of numbers in that part before sorting, the numbers just appear from nowhere.
There must be some problem with threads, because there is no such problem without them.
As you can see, I made buffer with length = array.size() and I pass reference on it to functions. When merging two sorted arrays, this buffer is used.
Each buffer element is initialized with 0.
I`m sure that there is no shared data, because every function uses separated part of buffer. The correct work of usual merge sort supports that.
Please, help to understand, what is wrong with this way of using threads, I`m absolutely confused.
P. S. my code is supposed to execute sorting in N threads, not in 2, thats why I create array of threads. But even with 2 it doesnt work.
Multithread function:
void merge_sort_multithread(std::vector<int>& arr, std::vector<int>& buffer, unsigned int threads_count)
{
int length = arr.size();
std::vector<std::thread> threads;
// dividing array into nearly equal parts
std::vector<int> thread_from; // array with indexes of part`s start
std::vector<int> thread_length; // array with part`s length
make_parts(thread_from, thread_length, threads_count, length);
// start threads
for (int i = 0; i < threads_count; ++i)
{
threads.push_back(std::thread(merge_sort, std::ref(arr), std::ref(buffer),
thread_length[i], thread_from[i]));
}
// waiting for end of sorting
for (int i = 0; i < threads_count; ++i)
threads[i].join();
// ------- here I check each part and find mistakes, so next function is not important ----
merge_sorted_after_multithreading(arr, buffer, thread_from, thread_length, threads_count, 0);
}
Usual merge sort:
void merge_sort(std::vector<int>& arr, std::vector<int>& buffer, size_t length, int from)
{
if (length == 1)
{
return;
}
int length_left = length / 2;
int length_right = length - length_left;
// sorting each part
merge_sort(arr, buffer, length_left, from);
merge_sort(arr, buffer, length_right, from + length_left);
// merging sorted parts
merge_arrays(arr, buffer, length_left, length - length_left, from, from + length_left);
}
Merging two sorted arrays with buffer:
void merge_arrays(std::vector<int>& arr, std::vector<int>& buffer, size_t length_left, size_t length_right, int start_left, int start_right)
{
int idx_left, idx_right, idx_buffer;
idx_left = idx_right = idx_buffer = 0;
while ((idx_left < length_left) && (idx_right < length_right))
{
if (arr[start_left + idx_left] < arr[start_right + idx_right])
{
do {
buffer[idx_buffer] = arr[start_left + idx_left];
++idx_buffer;
++idx_left;
} while ((idx_left < length_left) && (arr[start_left + idx_left] < arr[start_right + idx_right]));
}
else
{
do {
buffer[idx_buffer] = arr[start_right + idx_right];
++idx_buffer;
++idx_right;
} while ((idx_right < length_right) && (arr[start_right + idx_right] < arr[start_left + idx_left]));
}
}
if (idx_left == length_left)
{
for (; idx_right < length_right; ++idx_right)
{
buffer[idx_buffer] = arr[start_right + idx_right];
++idx_buffer;
}
}
else
{
for (; idx_left < length_left; ++idx_left)
{
buffer[idx_buffer] = arr[start_left + idx_left];
++idx_buffer;
}
}
// copying result to original array
for (int i = 0; i < idx_buffer; ++i)
{
arr[start_left + i] = buffer[i];
}
}
Dividing array into separated parts:
void make_parts(std::vector<int>& thread_from, std::vector<int>& thread_length, unsigned int threads_count, size_t length)
{
int dlength = (length / threads_count);
int odd_length = length % threads_count;
int offset = 0;
for (int i = 0; i < threads_count; ++i)
{
if (odd_length > 0)
{
thread_length.push_back(dlength + 1);
--odd_length;
}
else
thread_length.push_back(dlength);
thread_from.push_back(offset);
offset += thread_length[i];
}
}
P.P.S. Each function except multithread sort was tested and works correctly

Related

How to compare values of two vectors

Is anybody there who has a code on how to compare values of two arrays ?
I have two vectors and I am looking for the biggest and equal value of the both list.
Here is the code:
void fractionInLowestTerm(int fNumerator, int fDenominator)
{
//let's get the dividers of fNumerator and fDenominator
std::vector<int> dividerOfNumerator;
std::vector<int> dividerOfDenominator;
for (int i = 1; i <= fNumerator; i++) {
if (fNumerator % i == 0) {
dividerOfNumerator.push_back(i);
}
}
for (int j = 1; fDenominator <= j; j++) {
if (fDenominator % j == 0) {
dividerOfDenominator.push_back(j);
}
}
// let's get the greatest common divider of a and b;
int pgcd = 1;
// I do not know how to compare the values of dividers to get the greatest common value on a and b there is the code I started writing to get that
for (int m = 0; m <= dividerOfNumerator.size() && m <= dividerOfDenominator.size(); m++) {
}
}

If I understand the problem correctly, you want to compare the elements in two arrays for each index and save the greater one into a third array. In this case, just use your favourite max function for each index. For example:
void compare(int* array1, int* array2, int* array3, int size)
{
for (int member = 0; member < size; ++member) {
array3[member] = std::max(array1[member], array2[member]);
}
}
or if you want to compare lists and write into third array that which array has bigger value in that index you can use following code
void compare(int* array1, int* array2, int* array3, int size)
{
for (int member = 0; member < size; ++member) {
if (array1[member] > array2[member]) {
array3[member] = 1;
}
else if (array1[member] < array2[member]) {
array3[member] = 2;
}
else if (array1[member] == array2[member]) {
array3[member] = 0;
}
}
}

Since the vectors containing the divisors are already sorted, you can use the std::set_intersection algorithm like this:
std::vector<int> commonDivisors;
std::set_intersection(dividerOfNumerator.begin(), dividerOfNumerator.end(),
dividerOfDenominator.begin(), dividerOfDenominator.end(),
std::back_inserter(commonDivisors));
int pgcd = commonDivisors.back(); // guaranteed to be non-empty since 1 is always a divisor
Here's a demo.

Hello as you can see on the function name I wanted to write a function which put a function on the lowest term. I wanted to go through the gcd but I saw that it would consumes too much memory so here is what I've done. If it can help any member of the forum.
void fractionInLowestTerm(int fNumerator, int fDenominator){
//let's get on the divider of the number
for (int i = 1; i < fNumerator and i <fDenominator; i++) {
if (fNumerator%i == 0 and fDenominator%i == 0) {
fNumerator /= i;
fDenominator /= i;
i = 1;
}
}
}

C++ - Efficiently computing a vector-matrix product

I need to compute a product vector-matrix as efficiently as possible. Specifically, given a vector s and a matrix A, I need to compute s * A. I have a class Vector which wraps a std::vector and a class Matrix which also wraps a std::vector (for efficiency).
The naive approach (the one that I am using at the moment) is to have something like
Vector<T> timesMatrix(Matrix<T>& matrix)
{
Vector<unsigned int> result(matrix.columns());
// constructor that does a resize on the underlying std::vector
for(unsigned int i = 0 ; i < vector.size() ; ++i)
{
for(unsigned int j = 0 ; j < matrix.columns() ; ++j)
{
result[j] += (vector[i] * matrix.getElementAt(i, j));
// getElementAt accesses the appropriate entry
// of the underlying std::vector
}
}
return result;
}
It works fine and takes nearly 12000 microseconds. Note that the vector s has 499 elements, while A is 499 x 15500.
The next step was trying to parallelize the computation: if I have N threads then I can give each thread a part of the vector s and the "corresponding" rows of the matrix A. Each thread will compute a 499-sized Vector and the final result will be their entry-wise sum.
First of all, in the class Matrix I added a method to extract some rows from a Matrix and build a smaller one:
Matrix<T> extractSomeRows(unsigned int start, unsigned int end)
{
unsigned int rowsToExtract = end - start + 1;
std::vector<T> tmp;
tmp.reserve(rowsToExtract * numColumns);
for(unsigned int i = start * numColumns ; i < (end+1) * numColumns ; ++i)
{
tmp.push_back(matrix[i]);
}
return Matrix<T>(rowsToExtract, numColumns, tmp);
}
Then I defined a thread routine
void timesMatrixThreadRoutine
(Matrix<T>& matrix, unsigned int start, unsigned int end, Vector<T>& newRow)
{
// newRow is supposed to contain the partial result
// computed by a thread
newRow.resize(matrix.columns());
for(unsigned int i = start ; i < end + 1 ; ++i)
{
for(unsigned int j = 0 ; j < matrix.columns() ; ++j)
{
newRow[j] += vector[i] * matrix.getElementAt(i - start, j);
}
}
}
And finally I modified the code of the timesMatrix method that I showed above:
Vector<T> timesMatrix(Matrix<T>& matrix)
{
static const unsigned int NUM_THREADS = 4;
unsigned int matRows = matrix.rows();
unsigned int matColumns = matrix.columns();
unsigned int rowsEachThread = vector.size()/NUM_THREADS;
std::thread threads[NUM_THREADS];
Vector<T> tmp[NUM_THREADS];
unsigned int start, end;
// all but the last thread
for(unsigned int i = 0 ; i < NUM_THREADS - 1 ; ++i)
{
start = i*rowsEachThread;
end = (i+1)*rowsEachThread - 1;
threads[i] = std::thread(&Vector<T>::timesMatrixThreadRoutine, this,
matrix.extractSomeRows(start, end), start, end, std::ref(tmp[i]));
}
// last thread
start = (NUM_THREADS-1)*rowsEachThread;
end = matRows - 1;
threads[NUM_THREADS - 1] = std::thread(&Vector<T>::timesMatrixThreadRoutine, this,
matrix.extractSomeRows(start, end), start, end, std::ref(tmp[NUM_THREADS-1]));
for(unsigned int i = 0 ; i < NUM_THREADS ; ++i)
{
threads[i].join();
}
Vector<unsigned int> result(matColumns);
for(unsigned int i = 0 ; i < NUM_THREADS ; ++i)
{
result = result + tmp[i]; // the operator+ is overloaded
}
return result;
}
It still works but now it takes nearly 30000 microseconds, which is almost three times as much as before.
Am I doing something wrong? Do you think there is a better approach?
EDIT - using a "lightweight" VirtualMatrix
Following Ilya Ovodov's suggestion, I defined a class VirtualMatrix that wraps a T* matrixData, which is initialized in the constructor as
VirtualMatrix(Matrix<T>& m)
{
numRows = m.rows();
numColumns = m.columns();
matrixData = m.pointerToData();
// pointerToData() returns underlyingVector.data();
}
Then there is a method to retrieve a specific entry of the matrix:
inline T getElementAt(unsigned int row, unsigned int column)
{
return *(matrixData + row*numColumns + column);
}
Now the execution time is better (approximately 8000 microseconds) but maybe there are some improvements to be made. In particular the thread routine is now
void timesMatrixThreadRoutine
(VirtualMatrix<T>& matrix, unsigned int startRow, unsigned int endRow, Vector<T>& newRow)
{
unsigned int matColumns = matrix.columns();
newRow.resize(matColumns);
for(unsigned int i = startRow ; i < endRow + 1 ; ++i)
{
for(unsigned int j = 0 ; j < matColumns ; ++j)
{
newRow[j] += (vector[i] * matrix.getElementAt(i, j));
}
}
}
and the really slow part is the one with the nested for loops. If I remove it, the result is obviously wrong but is "computed" in less than 500 microseconds. This to say that now passing the arguments takes almost no time and the heavy part is really the computation.
According to you, is there any way to make it even faster?

Actually you make a partial copy of matrix for each thread in extractSomeRows. It takes a lot of time.
Redesign it so that "some rows" become virtual matrix pointing at data located in original matrix.

Use vectorized assembly instructions for an architecture by making it more explicit that you want to multiply in 4's, i.e. for the x86-64 SSE2+ and possibly ARM'S NEON.
C++ compilers can often unroll the loop into vectorized code if you explicitly make an operation happen in contingent elements:
Simple and fast matrix-vector multiplication in C / C++
There is also the option of using libraries specifically made for matrix multipication. For larger matrices, it may be more efficient to use special implementations based on the Fast Fourier Transform, alternate algorithms like Strassen's Algorithm, etc. In fact, your best bet would be to use a C library like this, and then wrap it in an interface that looks similar to a C++ vector.

Bubble Sort Using Slides instead of swaps

currently I'm being asked to design four sorting algorithms (insertion, shell, selection, and bubble) and I have 3 of the 4 working perfectly; the only one that isn't functioning correctly is the Bubble Sort. Now, I'm well aware of how the normal bubble sort works with using a temp var to swap the two indexes, but the tricky part about this is that it needs to use the array index[0] as a temp instead of a normal temp, which is used in swapping, and slide the lower array variables down to the front of the list and at the end of the pass assign the last index to the temp which is the greatest value.
I've been playing around with this for a while and even tried to look up references but sadly I cannot find anything. I'm hoping that someone else has done this prior and can offer some helpful tips. This is sort of a last resort as I've been modifying and running through the passes with pen and paper to try and find my fatal error. Anyways, my code is as follows...
void BubbleSort(int TheArray[], int size)
{
for (int i = 1; i < size + 1; i++)
{
TheArray[0] = TheArray[i];
for (int j = i + 1; j < size; j++)
{
if (TheArray[j] > TheArray[0])
TheArray[0] = TheArray[j];
else
{
TheArray[j - 1] = TheArray[j];
}
}
TheArray[size- 1] = TheArray[0];
}
}
Thanks for any feedback whatsoever; it's much appreciated.

If I understand the problem statement, I think you're looking for something along these lines :
void BubbleSort(int theArray[], int size)
{
for (int i = 1; i < size + 1; i++)
{
theArray[0] = theArray[1];
for (int j = 1; j <= size + 1 - i; j++)
{
if (theArray[j] > theArray[0])
{
theArray[j-1] = theArray[0];
theArray[0] = theArray[j];
}
else
{
theArray[j - 1] = theArray[j];
}
}
theArray[size-i+1] = theArray[0];
}
}
The piece that you're code was missing, I think, was that once you find a new maximum, you have to put it back in the array before placing the new maximum in theArray[0] storage location (see theArray[j-1] = theArray[0] after the compare). Additionally, the inner loop wants to run one less each time since the last element will be the current max value so you don't want to revisit those array elements. (See for(int j = 1 ; j <= size + 1 - i ; j++))
For completeness, here's the main driver I used to (lightly) test this :
int main()
{
int theArray[] = { 0, 5, 7, 3, 2, 8, 4, 6 };
int size = 7;
BubbleSort(theArray, size);
for (int i = 1; i < size + 1; i++)
cout << theArray[i] << endl;
return 0;
}

How can I create an array with Fibonacci numbers up to a certain integer n?

So for an assignment I've been asked to create a function that will generate an array of fibonacci numbers and the user will then provide an array of random numbers. My function must then check if the array the user has entered contains any fibonacci numbers then the function will output true, otherwise it will output false. I have already been able to create the array of Fib numbers and check it against the array that the user enters however it is limited since my Fib array has a max size of 100.
bool hasFibNum (int arr[], int size){
int fibarray[100];
fibarray[0] = 0;
fibarray[1] = 1;
bool result = false;
for (int i = 2; i < 100; i++)
{
fibarray[i] = fibarray[i-1] + fibarray[i-2];
}
for (int i = 0; i < size; i++)
{
for(int j = 0; j < 100; j++){
if (fibarray[j] == arr[i])
result = true;
}
}
return result;
}
So basically how can I make it so that I don't have to use int fibarray[100] and can instead generate fib numbers up to a certain point. That point being the maximum number in the user's array.
So for example if the user enters the array {4,2,1,8,21}, I need to generate a fibarray up to the number 21 {1,1,2,3,5,8,13,21}. If the user enters the array {1,4,10} I would need to generate a fibarray with {1,1,2,3,5,8,13}
Quite new to programming so any help would be appreciated! Sorry if my code is terrible.

It is possible that I still don't understand your question, but if I do, then I would achieve what you want like this:
bool hasFibNum (int arr[], int size){
if (size == 0) return false;
int maxValue = arr[0];
for (int i = 1; i < size; i++)
{
if (arr[i] > maxValue) maxValue = arr[i];
}
int first = 0;
int second = 1;
while (second < maxValue)
{
for (int i = 0; i < size; i++)
{
if (arr[i] == first) return true;
if (arr[i] == second) return true;
}
first = first + second;
second = second + first;
}
return false;
}

Here is a function that returns a dynamic array with all of the Fibonacci numbers up to and including max (assuming max > 0)
std::vector<size_t> make_fibs( size_t max ) {
std::vector<size_t> retval = {1,1};
while( retval.back() < max ) {
retval.push_back( retval.back()+*(retval.end()-2) );
}
return retval;
}
I prepopulate it with 2 elements rather than keeping track of the last 2 separately.
Note that under some definitions, 0 and -1 are Fibonacci numbers. If you are using that, start the array off with {-1, 0, 1} (which isn't their order, it is actually -1, 1, 0, 1, but by keeping them in ascending order we can binary_search below). If you do so, change the type to an int not a size_t.
Next, a sketch of an implementation for has_fibs:
template<class T, size_t N>
bool has_fibs( T(&array)[N] ) {
// bring `begin` and `end` into view, one of the good uses of `using`:
using std::begin; using std::end;
// guaranteed array is nonempty, so
T m = *std::max_element( begin(array), end(array) ); will have a max, so * is safe.
if (m < 0) m = 0; // deal with the possibility the `array` is all negative
// use `auto` to not repeat a type, and `const` because we aren't going to alter it:
const auto fibs = make_fibs(m);
// d-d-d-ouble `std` algorithm:
return std::find_if( begin(array), end(array), [&fibs]( T v )->bool {
return std::binary_search( begin(fibs), end(fibs), v );
}) != end(array);
}
here I create a template function that takes your (fixed sized) array as a reference. This has the advantage that ranged-based loops will work on it.
Next, I use a std algorithm max_element to find the max element.
Finally, I use two std algorithms, find_if and binary_search, plus a lambda to glue them together, to find any intersections between the two containers.
I'm liberally using C++11 features and lots of abstraction here. If you don't understand a function, I encourage you to rewrite the parts you don't understand rather than copying blindly.
This code has runtime O(n lg lg n) which is probably overkill. (fibs grow exponentially. Building them takes lg n time, searching them takes lg lg n time, and we search then n times).

Efficient circular list

I want a simple yet efficient circular buffer/queue. If I use std::vector, I have to do this:
if ( v.size() >= limit ) {
std::vector<int> it = v.begin();
v.insert( it, data );
v.erase( it+1 );
}
Is there any simpler solution?

You want to maintain the size of the buffer, overwriting older items. Just overwrite the old ones as time goes on. If you want to deal with the case where nItems < limit, then you would need to deal with that, this is just a simple example of using modulo to insert into a fixed size buffer.
std::vector<int> data(10);
for (int i = 0 ; i < 100 ; ++i)
{
data[i%10] = i;
}
for (std::vector<int>::const_iterator it = data.begin() ; it !=data.end(); ++it)
{
std::cout << *it << std::endl;
}
That method of insertion will keep the last 10 elements in the buffer.

A std::list might be an easier alternative to building a list than std::vector. There's also std::queue.
It's also funny that you're using a vector to implement a circular queue but ask a question on how to implement a circular list. Why not use a map?

In c++11 for a fixed size alternative you should be using std::array:
const unsigned int BUFFER_SIZE = 10;
std::array<int, BUFFER_SIZE> buffer; // The buffer size is fixed at compile time.
for (i = 0; i < 100; ++i) {
buffer[i % BUFFER_SIZE] = i;
}

Try std::deque. The interface is like using a std::vector but insert and removal at beginning and end are more efficient.

You can use your vectors as usual, and then create a get_element(index) function to make it feel circular. It's pretty fast and straight-forward, since it's just integer manipulation.
template<typename T>
T get_element(std::vector<T> vec, int index) {
int vector_size = vec.size();
int vector_max = vector_size - 1;
int vector_min = 0;
int index_diff = 0;
int refined_index = 0;
// index_diff is the amount of index-out-of-range. Positive means index was
// bigger than the vector size, negative means index was smaller than 0
if (index > vector_max) {
index_diff = index - vector_max;
} else if (index < vector_min) {
index_diff = index;
} else {
index_diff = 0;
}
// Make the indexing feel circular
// index mod 16 yields a number from 0 to 15
if (index_diff > 0) {
refined_index = index % vector_size;
} else if (index_diff < 0) {
int temp_index = index % vector_size;
if (temp_index != 0) {
refined_index = vector_size - std::abs(temp_index);
// if the negative mod equals to 0, we can't have 16 - 0 = 16 index,
// so we set it to 0 manually
} else {
refined_index = 0;
}
} else {
refined_index = index;
}
return vec[refined_index];
}
Then use it like:
int result = get_element<int>(myvec, 256);
Note that any index smaller than 0 starts rotating from the last element of your vector, which is of course intended.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++: Multithreaded merge sort witn N threads - c++

Related

How to compare values of two vectors

C++ - Efficiently computing a vector-matrix product

Bubble Sort Using Slides instead of swaps

How can I create an array with Fibonacci numbers up to a certain integer n?

Efficient circular list

Categories

Resources