Openmp - for's inside for's - c++

I'm trying to parallelize a "for" with openmp.
However the result, parallel code vs nonparallel, is different. I believe that it is related with the definition of the sum variable outside of the loop, but I don't know how to solve the problem.
What I want is to parallelize the first "for" loop.
Edit: 1
Here is the simplest example I could find.
//g++ -o test2 test2.cpp -fopenmp
//
//
#include <cmath>
#include <iostream>
using namespace std;
double f(double i, double j)
{
return i + j;
}
int main()
{
const int size = 256;
double sum = 0;
//will use openmp
#pragma omp parallel for
for(int i = 0; i < size; i = i + 1)
{
for(int j = 0; j < size; j=j+1)
{
if(i != j)
{
sum = sum + f(i,j);
}
}
}
cout << "sum = " << sum << endl;
//not using openmp
sum = 0;
for(int i = 0; i < size; i = i + 1)
{
for(int j = 0; j < size; j=j+1)
{
if(i != j)
{
sum = sum + f(i,j);
}
}
}
cout << "sum = " << sum << endl;
}

Your problem is the access to sum being performed by several threads. I.e. when the first thread reaches
sum=sum+f(i,j);
it grabs sum, does the calculations, writes the result to sum. When another thread in the meantime arrived at that line, it grabs the old value of sum and dumps its result, overwriting the first threads results.
A solution would be to set
double increment=f(i,j);
#pragma omp critical
sum+=increment;
Also note that your code's results are not predictable and change when you run it several times.

Thank you for your answer, it finally works.
The following code is a working code with Christoph Solution.
//g++ -o test2 test2.cpp -fopenmp
#include <cmath>
#include <iostream>
using namespace std;
double f(double i, double j)
{
return i + j;
}
int main()
{
const int size = 256;
double sum = 0;
//will use openmp
#pragma omp parallel for
for(int i = 0; i < size; i = i + 1)
{
for(int j = 0; j < size; j=j+1)
{
if(i != j)
{
double increment = f(i,j);
#pragma omp critical
sum = sum + increment;
}
}
}
cout << "sum = " << sum << endl;
//not using openmp
sum = 0;
for(int i = 0; i < size; i = i + 1)
{
for(int j = 0; j < size; j=j+1)
{
if(i != j)
{
sum = sum + f(i,j);
}
}
}
cout << "sum = " << sum << endl;
}

Related

How do I dereference multidimensional vector pointer?

I'm stuck trying to get the value of what A[i][j] is pointing to. double a = A[i][j];. How do I correctly do it? Could someone please explain?
// g++ jacobi.cpp -O0 -o jacobi && ./jacobi
#include <iostream>
#include <iomanip>
#include <vector>
#include <climits>
using namespace std;
void print_matrix(vector<vector<double>>& m) {
for (int i = 0; i < m.size(); i++) {
for (int j = 0; j < m[0].size(); j++) {
cout << setw(5) << fixed << setprecision(2) << m[i][j] << " ";
}
cout << endl;
}
cout << "==================================" << endl;
}
// calculate average temperature based on average of adjacent cells
double avg_temp_at(vector<vector<double>>& matrix, int i, int j) {
return (
matrix[i][j] +
(j-1 >= 0 ? matrix[i][j-1] : 0) +
(i-1 >= 0 ? matrix[i-1][j] : 0) +
(j+1 < matrix[0].size() ? matrix[i][j+1] : 0) +
(i+1 < matrix.size() ? matrix[i+1][j] : 0)
) / 5;
}
// sequential Jacobi algorithm
vector<vector<double>> jacobi_relaxation(vector<vector<double>>& matrix, int& threshold) {
vector<vector<double>> B (matrix.size(), vector<double>(matrix[0].size(), 0));
vector<vector<double>>* A = &matrix;
double max_delta = INT_MAX;
while (max_delta > threshold) {
max_delta = 0;
for (int i = 0; i < matrix.size(); i++) {
for (int j = 0; j < matrix[0].size(); j++) {
B[i][j] = avg_temp_at(*A, i, j);
double a = A[i][j];
double delta = abs(B[i][j] - a);
max_delta = max(max_delta, delta);
}
}
print_matrix(B);
A = &B;
}
return *A;
}
int main() {
int threshold = 1;
int n = 6;
vector<vector<double>> matrix (n, vector<double>(n, 0));
matrix[1][2] = 100;
matrix[2][2] = 100;
matrix[3][2] = 100;
print_matrix(matrix);
vector<vector<double>> x = jacobi_relaxation(matrix, threshold);
}
I tried your code and it gave me error on this line:
double a = A[i][j];
Change that line into this:
double a = (*A)[i][j];
and it will work.
Explanation:
It's basically the same trick as in line B[i][j] = avg_temp_at(*A, i, j);. A is a pointer, which is pointing to a vector. To accessing to pointers "real data" you must use the *.
Here you can find more info about pointers.
Hope it helps.

Why am I getting a Seg Fault when creating threads?

I am confused as to why I am getting a segmentation fault when creating and firing off threads here. It happens in the t[j] = thread(getMax, A); line and I am very confused as to why this is happening. threadMax[] is the max of each thread. getMax() returns the maximum value of an array.
#include <iostream>
#include <stdlib.h>
#include <sys/time.h>
#include <thread>
#define size 10
#define numThreads 10
using namespace std;
int threadMax[numThreads] = {0};
int num =0;
void getMax(double *A){
num += 1;
double max = A[0];
double min = A[0];
for (int i =0; i<size; i++){
if(A[i] > max){
max = A[i];
}
}
threadMax[num] = max;
}
int main(){
int max =0;
double S,E;
double *A = new double[size];
srand(time(NULL));
thread t[numThreads];
//Assign random values to array
for(int i = 0; i<size; i++){
A[i] = (double(rand()%100));
}
//create Threads
for(int j =0; j <numThreads; j++){
cout << A[j] << " " << j << "\n";
t[j] = thread(getMax, A);
}
//join threads
for(int i =0; i< numThreads; i++){
t[i].join();
}
//Find Max from all threads
for(int i =0; i < numThreads; i++){
if(threadMax[i] > max){
max = threadMax[i];
}
}
cout <<max;
delete [] A;
return 0;
}
The behavior of this code is undefined:
void getMax(double *A){
num += 1;
double max = A[0];
double min = A[0];
for (int i =0; i<size; i++){
if(A[i] > max){
max = A[i];
}
}
threadMax[num] = max;
}
The num += 1 can allow multiple threads to attempt to modify num at the same time. Worse, when num is read in the threadMax[num] = max;, threads may see values of num modified by other threads while they were running.
You need to assign each thread a number in some safe way.
Here are three ways it can fail:
Two threads do num += 1; at exactly the same time and as a result, num only increments once.
Every thread does num += 1; before any thread does threadMax[num] = max;. All threads overwrite the same entry in the array. (Which, actually, is out of bounds!)
The code crashes because its behavior is undefined.
As others have stated, your num variable is not protected from race conditions inside of getMax(), which can lead to it being corrupted, thus causing getMax() to access the threadMax[] array out of bounds.
You can avoid that by simply getting rid of that num variable altogether and pass the array index as an input parameter to std::thread instead.
Try something more like this:
#include <iostream>
#include <vector>
#include <array>
#include <thread>
#include <algorithm>
#include <cstdlib>
#include <ctime>
using namespace std;
const size_t size = 10;
const size_t numThreads = 10;
double threadMax[numThreads] = {};
void getMax(int idx, double *A){
threadMax[idx] = *max_element(A, A + size);
}
int main(){
srand(time(nullptr));
vector<double> A(size);
array<thread, numThreads> t;
//Assign random values to array
generate_n(A.begin(), size, [](){ return double(rand() % 100); });
/* or:
for(double &d : A){
d = double(rand() % 100);
}
*/
//create Threads
for(int j = 0; j < numThreads; ++j){
cout << A[j] << " " << j << "\n";
t[j] = thread(getMax, j, A.data());
}
//join threads
for(thread &thd : t){
thd.join();
}
//Find Max from all threads
double max = *max_element(threadMax.begin(), threadMax.end());
cout << max;
return 0;
}

OpenMP implementation increasingly slow with thread count increase

I have been trying to learn to use OpenMP. However my code seemed to be running more quickly in series that parallel.
Indeed the more threads used, the slower the computation time.
To illustrate this I ran an experiment. I am trying to do the following operation:
long int C[num], D[num];
for (i=0; i<num; i++) C[i] = i;
for (i=0; i<num; i++){
for (j=0; j<N; j++) {
D[i] = pm(C[i]);
}
}
where the function pm is simply
int pm(int val) {
val++;
val--;
return val;
}
I implemented the inner loop in parallel and compared the run times as a function of the number of iterations on the inner loop (N) and the number of threads used. The code for the experiment is below.
#include <stdio.h>
#include <iostream>
#include <time.h>
#include "omp.h"
#include <fstream>
#include <cstdlib>
#include <cmath>
static long num = 1000;
using namespace std;
int pm(int val) {
val++;
val--;
return val;
}
int main() {
int i, j, k, l;
int iter = 8;
int iterT = 4;
long inum[iter];
for (i=0; i<iter; i++) inum[i] = pow(10, i);
double serial[iter][iterT], parallel[iter][iterT];
ofstream outdata;
outdata.open("output.dat");
if (!outdata) {
std::cerr << "Could not open file." << std::endl;
exit(1);
}
"""Experiment Start"""
for (l=1; l<iterT+1; l++) {
for (k=0; k<iter; k++) {
clock_t start = clock();
long int A[num], B[num];
omp_set_num_threads(l);
for (i=0; i<num; i++) A[i] = i;
for (i=0; i<num; i++){
#pragma omp parallel for schedule(static)
for (j=0; j<inum[k]; j++) {
B[i] = pm(A[i]);
}
}
clock_t finish = clock();
parallel[k][l-1] = (double) (finish - start) /\
CLOCKS_PER_SEC * 1000.0;
start = clock();
long int C[num], D[num];
for (i=0; i<num; i++) C[i] = i;
for (i=0; i<num; i++){
for (j=0; j<inum[k]; j++) {
D[i] = pm(C[i]);
}
}
finish = clock();
serial[k][l-1] = (double) (finish - start) /\
CLOCKS_PER_SEC * 1000.0;
}
}
"""Experiment End"""
for (j=0; j<iterT; j++) {
for (i=0; i<iter; i++) {
outdata << inum[i] << " " << j + 1 << " " << serial[i][j]\
<< " " << parallel[i][j]<< std::endl;
}
}
outdata.close();
return 0;
}
The link below is a plot of log(T) against log(N) for each thread count.
A plot of the run times for varying number of threads and magnitude of computational task.
(I just noticed that the legend labels for serial and parallel are the wrong way around).
As you can see using more than one thread increases the time greatly. Adding more threads increases the time taken linearly as a function of number of threads.
Can anyone tell me whats going on?
Thanks!
Freakish above was correct about the pm() function doing nothing, and the compiler was getting confused.
It also turns out that the rand() function does not play well withing OpenMP for loops.
Adding the function sqrt(i) (i being the loop index) I achieved the expected speedup to my code.

OpenMP code is aborted

I'm trying to perform matrix multiplication using openMP as follows and I compile it using GCC : g++ -std=gnu++11 -g -Wall -fopenmp -o parallel_not_opt parallel_not_opt.cpp
But when I try to run it by using parallel_not_opt.exe, it aborts giving the typical Windows error parallel_not_opt.exe has stopped working...
Am I missing something?
#include "includes/stdafx.h"
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <vector>
# include <omp.h>
#include <chrono>
#include <fstream>
#include <algorithm>
#include <immintrin.h>
#include <cfloat>
#include <limits>
#include <math.h>
using namespace std::chrono;
using namespace std;
//populate matrix with random values.
double** generateMatrix(int n){
double max = DBL_MAX;
double min = DBL_MIN;
double** matA = new double*[n];
for (int i = 0; i < n; i++) {
matA[i] = new double[n];
for (int j = 0; j < n; j++) {
double randVal = (double)rand() / RAND_MAX;
matA[i][j] = min + randVal * (max - min);
}
}
return matA;
}
//generate matrix for final result.
double** generateMatrixFinal(int n){
double** matA = new double*[n];
for (int i = 0; i < n; i++) {
matA[i] = new double[n];
for (int j = 0; j < n; j++) {
matA[i][j] = 0;
}
}
return matA;
}
//matrix multiplication - parallel
double matrixMultiplicationParallel(double** A, double** B, double** C, int n){
int i, j, k;
clock_t begin_time = clock();
# pragma omp parallel shared ( A,B,C,n ) // private ( i, j, k )
{
# pragma omp for
for (i = 0; i < n; i++) {
// cout<< i << ", " ;
for (j = 0; j < n; j++) {
for (k = 0; k < n; k++) {
C[i][j] += A[i][k] * B[k][j];
}
}
}
}
double t = float(clock() - begin_time);
return t;
}
int _tmain(int argc, _TCHAR* argv[])
{
ofstream out("output.txt", ios::out | ios::app);
out << "--------------STARTED--------------" << "\n";
int start = 200, stop = 2000, step = 200;
for (int n = start; n <= stop; n += step)
{
srand(time(NULL));
cout << "\nn: " << n << "\n";
double t1 = 0;
int my_size = n;
double **A = generateMatrix(my_size);
double **B = generateMatrix(my_size);
double **C = generateMatrixFinal(my_size);
double single_sample_time = matrixMultiplicationParallel(A, B, C, n);
t1 += single_sample_time;
for (int i = 0; i < n; i++) {
delete[] A[i];
delete[] B[i];
delete[] C[i];
}
delete[] A;
delete[] B;
delete[] C;
}
out << "-----------FINISHED-----------------" << "\n";
out.close();
return 0;
}
The private ( i, j, k ) declaration is not optional. Add it back, otherwise the inner loop variables j and k are shared, which completely messes up the inner loops.
It is better to declare variables as locally as possible. That makes reasoning about OpenMP code much easier:
clock_t begin_time = clock();
# pragma omp parallel
{
# pragma omp for
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
for (int k = 0; k < n; k++) {
C[i][j] += A[i][k] * B[k][j];
}
}
}
}
return float(clock() - begin_time);
In that case, A,B,C will be shared by default - coming from the outside, and j,k are private because they are declared within the parallel scope. The loop variable of a parallel for is always implicitly private.

New set of values for testcases using srand() in c++

I am trying to create some test cases for my 'minimum dot product' problem. I want 10 test cases , each generating different set of values for both vector a and b.
The Problem is that even after using srand( time( NULL ) ) though a new input is generated every time I compile and run the code but that same input is used for all the 10 test cases.
#include <algorithm>
#include <iostream>
#include <vector>
#include <cstdlib>
#include <ctime>
using std::vector;
void sort_asc(vector<int> &manav, int sizes)
{
int temp = 0;
for (int i = 0; i<sizes; i++)
{
for (int j = i + 1; j<sizes; j++)
{
if (manav[i] > manav[j])
{
temp = manav[i];
manav[i] = manav[j];
manav[j] = temp;
}
}
}
std::cout << "b in asc order : ";
for (int i = 0; i<sizes; i++)
{
std::cout << manav[i] << " ";
}
std::cout << std::endl;
}
void sort_desc(vector<int> &manav, int sizes)
{
int temp = 0;
for (int i = 0; i<sizes; i++)
{
for (int j = i + 1; j<sizes; j++)
{
if (manav[i] < manav[j])
{
temp = manav[i];
manav[i] = manav[j];
manav[j] = temp;
}
}
}
std::cout << "a in desc : ";
for (int i = 0; i<sizes; i++)
{
std::cout << manav[i] << " ";
}
std::cout << std::endl;
}
long long min_dot_product(vector<int> a, vector<int> b, int sizes) {
long long result = 0;
sort_desc(a, sizes);
sort_asc(b, sizes);
for (size_t i = 0; i < sizes; i++) {
result += a[i] * b[i];
}
return result;
}
int main() {
srand(time(NULL));
/*
std::cin >> n;
vector<int> a(n), b(n);
for (size_t i = 0; i < n; i++) {
std::cin >> a[i];
}
for (size_t i = 0; i < n; i++) {
std::cin >> b[i];
}
*/
//================================================================ TESTING =========================================================================
int z = 0;
int n = (rand() % 10) + 1; // generating the size of the vectors [1-10]
std::cout << "n = " << n << "\n";
vector<int> a;
vector<int> b;
while (z != 10) {
for (int i = 0; i < n; ++i)
{
int p = (rand() % 10) - 5;
a.push_back(p); // input values [-5,4] in 'a'
}
std::cout << "Unsorted Vector a = ";
for (int i = 0; i<n; i++)
{
std::cout << a[i] << " ";
}
std::cout << std::endl;
for (int i = 0; i < n; ++i)
{
int q = (rand() % 10) - 5;
b.push_back(q); // inputing values [-5,4] in 'b'
}
std::cout << "Unsorted Vector b = ";
for (int i = 0; i<n; i++)
{
std::cout << b[i] << " ";
}
std::cout << std::endl;
std::cout << "min_dot_product = " << min_dot_product(a, b, n) << std::endl;
z++;
}
return 0;
}
I somehow want to generate a different set of values for vector a and b for all of the 10 test cases every time I run the code.
I have tried srand(i) within the respective for loops before pushing the value in vectors but its not working for me, also reusing srand( time( NULL ) ) within the for loops is not gonna help either. Is there some other simple way I can achieve this?
The problem is you never clear out the vector on each iteration. Since you don't all of the new random numbers you generate are being added to the end of the vector and you ignore them since n never changes.
What you need to do is add
a.clear();
b.clear();
to the end of the while loop. This will clear out the vectors and then when you start the next iteration the new random numbers will get added into the part of the vector you use in your functions.
You could also set the vector the proper size and then use [] to access the elements. This way you would just overwrite the previous values and you would not have to call clear()
vector<int> a(n);
vector<int> b(n);
//...
for (int i = 0; i < n; ++i)
{
a[i] = (rand() % 10) - 5;
b[i] = (rand() % 10) - 5;
}
I put both assignments in the same for loop to save space. You can do this in two separate loops but it is not needed.