Calculating Operation Count of this code - c++

I'm trying to understand the operation count of the code given below.
void add(matrix a, matrix b, matrix c, int m, int n)
{
for(int i = 0; i < m; i++)
{
count++; //for 'for' i -------(a)
for(int j = 0; j < n; j++)
{
count++; //for 'for' j -------(b)
c[i][j] = a[i][j] + b[i][j];
count++; //for assignment ------(c)
}
count++; //for last time of 'for' j ------(d)
}
count++; //for lastime of 'for' i ---------(e)
}
The variable count has been used by the author in which this code is to calculate operation count. I do know that the declarative statements (int i = 0; int j = 0) have operation count of 0.
Every time each the for loop will run it, the following expressions should do 2 operation counts for i < m; i++ when only 1 operation i < m is performed.
for(int i = 0; i < m; i++)
for(int j = 0; j < n; j++)
But the author has only calculated it once (a) and (b). Does that means i < m and j < n is not calculated in operation count?
In this line the operation count should be 2: 1 addition operation and 1 assignment operation.
c[i][j] = a[i][j] + b[i][j];
Again he has only calculated it once (c).
Because of the questions I don't know why he incremented count at (d) and (e).
//Simplified program with counting only
line void add (matrix a, matrix b, matrix c, int m, int n)
{
for(int i = 0; i < m; i++)
{
for(int j = 0; j < m; j++)
count += 2;
count += 2;
}
count++;
}
Thank you :)

Related

How to add OpenMp to triple nested for-loop

The goal is to add as much OpenMP to the following Cholesky factor function to increase parallelization. So far, I only have one #pragma omp parallel for implemented correctly. vector<vector<double>> represents a 2-D matrix. I've already tried adding #pragma omp parallel for for
for (int i = 0; i < n; ++i), for (int k = 0; k < i; ++k), and for (int j = 0; j < k; ++j) but the parallelization goes wrong. makeMatrix(n, n) initializes a vector<vector<double>> of all zeroes of size nxn.
vector<vector<double>> cholesky_factor(vector<vector<double>> input)
{
int n = input.size();
vector<vector<double>> result = makeMatrix(n, n);
for (int i = 0; i < n; ++i)
{
for (int k = 0; k < i; ++k)
{
double value = input[i][k];
for (int j = 0; j < k; ++j)
{
value -= result[i][j] * result[k][j];
}
result[i][k] = value / result[k][k];
}
double value = input[i][i];
#pragma omp parallel for
for (int j = 0; j < i; ++j)
{
value -= result[i][j] * result[i][j];
}
result[i][i] = std::sqrt(value);
}
return result;
}
I don't think you can parallelize much more than this with this algorithm, as the ith iteration of the outer loop depends on the results of the i - 1th iteration and the kth iteration of the inner loop depends on the results of the k - 1th iteration.
vector<vector<double>> cholesky_factor(vector<vector<double>> input)
{
int n = input.size();
vector<vector<double>> result = makeMatrix(n, n);
for (int i = 0; i < n; ++i)
{
for (int k = 0; k < i; ++k)
{
double value = input[i][k];
// reduction(-: value) does the same
// (private instances of value are initialized to zero and
// added to the initial instance of value when the threads are joining
#pragma omp parallel for reduction(+: value)
for (int j = 0; j < k; ++j)
{
value -= result[i][j] * result[k][j];
}
result[i][k] = value / result[k][k];
}
double value = input[i][i];
#pragma omp parallel for reduction(+: value)
for (int j = 0; j < i; ++j)
{
value -= result[i][j] * result[i][j];
}
result[i][i] = std::sqrt(value);
}
return result;
}

Partial Pivoting/Gaussian elimination- swapping columns instead of rows producing wrong output

I'm trying to implement a quick program to solve a system of linear equations. The program reads the input from a file and then writes the upper-triangular system and solutions to a file. It is working with no pivoting, but when I try to implement the pivoting it produces incorrect results.
As example input, here is the following system of equations:
w+2x-3y+4z=12
2w+2x-2y+3z=10
x+y=-1
w-x+y-2z=-4
I expect the results to be w=1, x=0, y=-1 and z=2. When I don't pivot, I get this answer (with some rounding error on x). When I add in the pivoting, I get the same numbers but in the wrong order: w=2,x=1,y=-1 and z=0.
What do I need to do to get these in the correct order? Am I missing a step somewhere? I need to do column swapping instead of rows because I need to adapt this to a parallel algorithm later that requires that. Here is the code that does the elimination and back substitution:
void gaussian_elimination(double** A, double* b, double* x, int n)
{
int maxIndex;
double temp;
int i;
for (int k = 0; k < n; k++)
{
i = k;
for (int j = k+1; j < n; j++)
{
if (abs(A[k][j]) > abs(A[k][i]))
{
i = j;
}
}
if (i != k)
{
for (int j = 0; j < n; j++)
{
temp = A[j][k];
A[j][k] = A[j][i];
A[j][i] = temp;
}
}
for (int j = k + 1; j < n; j++)
{
A[k][j] = A[k][j] / A[k][k];
}
b[k] = b[k] / A[k][k];
A[k][k] = 1;
for (i = k + 1; i < n; i++)
{
for (int j = k + 1; j < n; j++)
{
A[i][j] = A[i][j] - A[i][k] * A[k][j];
}
b[i] = b[i] - A[i][k] * b[k];
A[i][k] = 0;
}
}
}
void back_substitution(double**U, double*x, double*y, int n)
{
for (int k = n - 1; k >= 0; k--)
{
x[k] = y[k];
for (int i = k - 1; i >= 0; i--)
{
y[i] = y[i] - x[k]*U[i][k];
}
}
}
I believe what you implemented is actually complete pivoting.
With complete pivoting, you must keep track of the permutation of columns, and apply the same permutation to your answer.
You can do this with an array {0, 1, ..., n}, where you swap the i'th and k'th values in the second loop. Then, rearange the solution using this array.
If what you were trying to do is partial pivoting, you need to look for the maximum in the respective row, and swap the rows and the values of 'b' accordingly.

Skipping vector iterations based on index equality

Let's say I have three vectors.
#include <vector>
vector<long> Alpha;
vector<long> Beta;
vector<long> Gamma;
And let's assume I've filled them up with numbers, and that we know they're all the same length. (and we know that length ahead of time - let's say it's 3.)
What I want to have at the end is the minimum of all sums Alpha[i] + Beta[j] + Gamma[k] such that i, j, and k are all unequal to each other.
The naive approach would look something like this:
#include <climits>
long min = LONG_MAX;
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
for (int k=0; k < 3; k++) {
if (i != j && i != k && j != k) {
long sum = Alpha[i] + Beta[j] + Gamma[k];
if (sum < min)
min = sum;
}
}
}
}
Frankly, that code doesn't feel right. Is there a faster and/or more elegant way - one that skips the redundant iterations?
The computational complexity of your algorithm is an O(N^3). You can save a very small bit by using:
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
if ( i == j )
continue;
long sum1 = Alpha[i] + Beta[j];
for (int k=0; k < 3; k++) {
if (i != k && j != k) {
long sum2 = sum1 + Gamma[k];
if (sum2 < min)
min = sum2;
}
}
}
}
However, the complexity of the algorithm is still O(N^3).
Without the if ( i == j ) check, the innermost loop will be executed N^2 times. With that check, you will be able to avoid the innermost loop N times. It will be executed N(N-1) times. The check is almost not worth it .
If you can temporarily modify the input vectors, you can swap the used values with the end of the vectors, and just iterate over the start of the vectors:
for (int i = 0; i < size; i++) {
std::swap(Beta[i],Beta[size-1]); // swap the used index with the last index
std::swap(Gamma[i],Gamma[size-1]);
for (int j = 0; j < size-1; j++) { // don't try the last index
std::swap(Gamma[j],Gamma[size-2]); // swap j with the 2nd-to-last index
for (int k=0; k < size-2; k++) { // don't try the 2 last indices
long sum = Alpha[i] + Beta[j] + Gamma[k];
if (sum < min) {
min = sum;
}
}
std::swap(Gamma[j],Gamma[size-2]); // restore values
}
std::swap(Beta[i],Beta[size-1]); // restore values
std::swap(Gamma[i],Gamma[size-1]);
}

Adding pivoting functionality to a Doolittle algorithm

So far I have this code for an LU decomposition. It takes in an input array and it returns the lower and upper triangular matrix.
void LUFactorization ( int d, const double*S, double*L, double*U )
{
for(int k = 0; k < d; ++k){
if (
for(int j = k; j < d; ++j){
double sum = 0.;
for(int p = 0; p < k; ++p) {
sum+=L[k*d+p]*L[p*d+j];
cout << L[k*d+p] << endl;
}
sum = S[k*d+j] - sum;
L[k*d+j]=sum;
U[k*d+j]=sum;
}
for(int i = k + 1; i < d; ++i){
double sum=0.;
for(int p = 0; p < k; ++p) sum+=L[i*d+p]*L[p*d+k];
L[i*d+k]=(S[i*d+k]-sum)/L[k*d+k];
}
}
for(int k = 0; k < d; ++k){
for(int j = k; j < d; ++j){
if (k < j) L[k*d+j]=0;
else if (k == j) L[k*d+j]=1;
}
}
}
Is there some way I can adapt this to perform row exchanges? If not, is there some other algorithm I could be directed towards?
Thanks
The usual approach for LU decompositions with pivoting is to store a permutation array P which is initialized as the identity permutation (P={0,1,2,...,d - 1}) and then swapping entries in P instead of swapping rows in S
If you have this permutation array, every access to S must use P[i] instead of i for the row number.
Note that P itself is part of the output, since it represents the permutation matrix such that
P*A = L*U, so if you want to use it to solve systems of linear equations, you'll have to apply P on the right-hand side before applying backward and forward substitution

Sort entered number - difference between FOR and WHILE

I have to write a code which sort digits in one entered number.
For example: input: 4713239
output: 1233479
It doesn't work properly when I enter repeating digits(like 33) when I have the last loop as FOR:
for(int j = 0; j < arr[i]; j++) // in this loop my output is: 123479.
When I change this loop from FOR to WHILE it works properly.
It means:
while(arr[i]) // and the number is sorted correctly (1233479)
True be told, I don't know what is the difference between these operations in this code.
Why FOR loop doesn't work properly? Could somebody explain me this?
I wrote a code:
int sort(int arg)
{
int var, score = 0;
int arr[10] = {0};
for(int i = 0; i < 10; i++)
{
var = arg % 10;
arr[var]++;
arg = arg / 10;
}
for(int i = 0; i < 10; i++)
{
for(int j = 0; j < arr[i]; j++) //while(arr[i]) --> works correctly
{
score = score * 10 + i;
arr[i]--;
}
}
return score;
}
You modify both arr[i] and j, therefore the loop will end too fast when both are part of the comparison.
for(int j = 0; j < arr[i]; j++) // increase j, compare with arr[i]
{
score = score * 10 + i;
arr[i]--; // decrease arr[i]
}