I overloaded operator * which multiplying 2D arrays. I have some problems with multiplying, don't understand exactly an indexes when I am multiplying.
Here's some declarations:
int *const e; //pointer to the memory storing all integer elements of A
const int row, column; //r and c are the numbers of rows and columns respectively
And some code:
A A::operator*(const A& matrix)const
{
MAT result(matrix.row, matrix.column);
if (column == matrix.row)
{
for (int i = 0; i < row; ++i)
{
for (int j = 0; j < matrix.column; j++)
{
result.e[j*row + i] = 0;
for (int k = 0; k < column; k++)
{
result.e[j*row + i] += e[j*row + k] * matrix.e[k*row + column];
}
}
}
}
return result;
}
I know that I need 3 loops, I think I have some problems in
result.e[j*row + i] += e[j*row + k] * matrix.e[k*row + column];
Do you have any clue ? You can write me some ideas how can I figure out it myself, because I want to understand it. Thanks
Your line
result.e[j*row + i] += e[j*row + k] * matrix.e[k*row + column];
is broken. The product P of two matrices A (dim M,N) and B (dim N,P) has it's coefficient in position (i,j) defined by the following :
Pi,j = sum(k = 1..N, ai,k . bk,j).
Thus the line mentioned above should be :
result.e[j*row + i] += e[j*row + k] * matrix.e[k*row + i];
Related
I have a working sequential Crout Decomposition algorithm that I need to speed up if possible. I have looked online at various OpenMP methods of parallelising the algorithm and I can only get it to work correctly on the lower triangular matrix part of the code. The upper yields wrong results
I feel like I have been looking at the code too long and I may be blind to a data dependency that I am overlooking
Sequential code is as follows, which works correct
for (i = 0; i < size; i++)
{
// Upper Triangle
for (j = 0; j < i; j++)
{
q = matx[j * size + i];
for (k = 0; k < j; k++)
{
q -= matx[j * size + k] * matx[k * size + i];
}
matx[j * size + i] = q;
}
// Lower Triangle
for (j = i; j < size; j++)
{
q = matx[j * size + i];
for (k = 0; k < i; k++)
{
q -= matx[j * size + k] * matx[k * size + i];
}
matx[j * size + i] = q;
}
}
Now here is the code with the appropriate OpenMP directives
for (i = 0; i < size; i++)
{
// Upper Triangle
#pragma omp parallel for private(j,k,q)
for (j = 0; j < i; j++)
{
q = matx[j * size + i];
for (k = 0; k < j; k++)
{
q -= matx[j * size + k] * matx[k * size + i];
}
matx[j * size + i] = q;
}
// Lower Triangle
#pragma omp parallel for private(j,k,q)
for (j = i; j < size; j++)
{
q = matx[j * size + i];
for (k = 0; k < i; k++)
{
q -= matx[j * size + k] * matx[k * size + i];
}
matx[j * size + i] = q;
}
}
If only the lower triangle is in parallel I yield the correct decompostion, however the upper throws out discrepancies
Many thanks for any help with this
Dear nice and smart people, would you mind sharing with me why my code is unable to swap rows for a matrix please? When I run the code, both rows become the same, omg.
entries[i] is the dynamic array storing the elements in the matrix.
Elements are stored row by row, from left to right.
i.e. in a 3X3 matrix, entries[2] is 3rd element on the 1st row,
entries[3] is 1st element on the 2nd row
n = number of rows in matrix
m = number of columns in matrix
void Matrix::SwapRows(int i, int j) {
double* temp;
temp = new double[n * m];
double* temp2;
temp2 = new double[n * m];
for (int a = 1; a <= n; a++) {
for (int b = 1; b <= m; b++) {
if (a == i) {
temp[(j - 1) * m + b - 1] = entries[(j - 1) * m + b - 1];
entries[(a - 1) * m + b - 1] = temp[(j - 1) * m + b - 1];
}
if (a == j) {
temp2[(i - 1) * m + b - 1] = entries[(i - 1) * m + b - 1];
entries[(a - 1) * m + b - 1] = temp2[(i - 1) * m + b - 1];
}
}
}
delete temp;
delete temp2;
}
THanx to Jesper Juhl, swap does the trick. Correct method is as per below. Thank you Jesper!
void Matrix::SwapRows(int i, int j) {
for (int a = 1; a <= n; a++) {
for (int b = 1; b <= m; b++) {
if (a == i) {
swap (entries[(a - 1) * m + b - 1], entries[(j - 1) * m + b - 1]);
}
}
}
}
Dear Friends I am having problem to transpose a matrix. The transposed matrix has elements that are undefined. Not sure what is wrong. Thank you for your time!
entries[i] is the dynamic array storing the elements in the matrix. Elements are stored row by row, from left to right. i.e. in a 3X3 matrix, entries[2] is 3rd element on the 1st row, entries[3] is 1st element on the 2nd row
n is the number of rows of matrix
m is the number of columns of matrix
Matrix Matrix::Transpose() const {
double* temp;
temp = new double[n * m];
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= m; j++)
temp[(j - 1) * m + i - 1] = entries[(i - 1) * m + j - 1];
}
Matrix Result(m, n, temp);
delete temp;
return Result;
}
When the original matrix is a square, all elements of the transposed matrix are defined. When the original matrix is 1x3, then the resulting transposed 3x1 matrix has undefined elements for the 2nd and 3rd elements. I.e. (1 1 3) after transposed returns (1 -3452346326236 -12351251515)
The Matrix Print out function is below. The error likely comes from here too.
void Matrix::Print() const
{
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= m; j++)
cout << setw(13) << entries[(i - 1) * m + j - 1];
cout << endl;
}
}
When the matrix is not square, the line
temp[(j - 1) * m + i - 1] = entries[(i - 1) * m + j - 1];
is not right. It needs to be:
temp[(j - 1) * n + i - 1] = entries[(i - 1) * m + j - 1];
// ^^ needs to be n, not m.
Think of the 2D analogue. You want to use:
temp[j][i] = entries[i][j];
entries is a n x m matrix. For it, the 2D indices [i][j] are translated as [i*m + j] for the 1D index.
temp is a m x n matrix. For it, the 2D indices [j][i] are translated as [j*n + i] for the 1D index.
Suggestion for improved readability
Instead of n and m, use num_rows, and num_columns. You will find your code a lot more readable.
I have an array with the elements {7,2,1} and the idea is to do 7 * 2 + 7 * 1 + 2 * 1 which is basically this algorithm:
for(int i=0;i<n-1;++i)
for(int k=i+1;k<n;++k)
sum += a[i] * a[k];
Where a is the array in which I have the numbers and n is the number of elements, I need a more efficient algorithm for doing this, and I have no clue how to do it, can someone give me a hand?
Thank you!
You can do better in the general case. Time to do some math. Let's look at the 3-element version, we have:
ab + ac + bc
= 1/2 * (2ab + 2ac + 2bc)
= 1/2 * (2ab + 2ac + 2bc + a^2 + b^2 + c^2 - (a^2 + b^2 + c^2))
= 1/2 * ((a+b+c)^2 - (a^2 + b^2 + c^2))
That is:
int sum = 0;
int sum_sq = 0;
for (int i : arr) {
sum += i;
sum_sq += i*i;
}
int result = (sum*sum - sum_sq) / 2;
This is O(n) multiplications, instead of O(n^2). This'll certainly be better than the naive implementation at some point. Whether or not it's better for just 3 elements is something I haven't timed.
#chux's suggestion is essentially to redistribute operations:
ai * ai + 1 + ai * ai + 2 + ... + ai * an
-->
ai * (ai + 1 + ... + an)
combined with the avoiding unnecessary recomputation of partial sums of the (ai + 1 + ... + an) terms by leveraging the fact that each differs from the next by the value of one element of the input array.
Here's a one-pass implementation with O(1) overhead:
int psum(size_t n, int array[n]) {
int result = 0;
int rsum = array[n - 1];
for (int i = n - 2; i >= 0; i--) {
result += array[i] * rsum;
rsum += array[i];
}
return result;
}
The sum of all elements to the right of index i is maintained from iteration to iteration in variable rsum. It's unnecessary to track its various values in an array, because we need each value only for one iteration of the loop.
This scales linearly with the number of elements in the input array. You'll see that the number and type of operations is quite similar to #Barry's answer, but nothing analogous to his final step is required, which saves a few operations.
As #Barry observes in comments, the iteration can also be run in the other direction, in conjunction with tracking the left-hand partial sums intead of the right-hand ones. That would diverge a bit more from #chux's description, but it relies on exactly the same principles.
We have (a + b + c + ...)2 = (a2 + b2 + c2 + ...) + 2(ab + bc + ca + ...)
You want the sum S = ab + bc + ca + ..., which has O(n2) pairs (using 2 nested loops)
You can do 2 separated loops, one calculates P = a2 + b2 + c2 + ... in O(n) time, and another calculates Q = (a + b + c + ...)2 also in O(n) time. Then take S = (Q - P) / 2.
Make 1 pass, walk from the end of [a] to the front and form a sum of all the elements "to the right".
2nd pass, Multiple a[i] * sum[i].
O(n).
long sum0(int a[], int n) {
long sum = 0;
for (int i = 0; i < n - 1; ++i)
for (int k = i + 1; k < n; ++k)
sum += a[i] * a[k];
return sum;
}
long sum1(int a[], int n) {
int long sums[n];
sums[n - 1] = 0;
for (int i = n - 2; i >= 0; i--) {
sums[i] = a[i+1] + sums[i + 1];
}
long sum = 0;
for (int i = 0; i < n - 1; ++i)
sum += a[i] * sums[i];
return sum;
}
void test(int a[], int n) {
long s0 = sum0(a, n);
long s1 = sum1(a, n);
if (s0 != s1) printf("%9ld %9ld\n", s0, s1);
}
void tests(int k) {
while (k--) {
int n = rand() % 10 + 2;
int a[n + 1];
for (int m = 0; m < n; m++)
a[m] = rand() % 256;
test(a, n);
}
}
int main() {
int a[3] = { 7, 2, 1 };
printf("%d\n", sum1(a, 3));
tests(1000000);
puts("Done");
}
As it turns out the sums[] array is not needed either as the the running sums needs only 1 location. This effectively makes this answers similar to others
long sum1(int a[], int n) {
int long sums = 0;
long sum = 0;
for (int i = n - 2; i >= 0; i--) {
sums = a[i+1] + sums;
sum += a[i] * sums;
}
return sum;
}
I am building a game of life CA in C++ (openFrameworks). As I am new to C++ I was wondering if someone could let me know if I am setting up the vectors correctly in the following code. the CA does not draw to the screen and I am not sure if this is as a result of how I set up the vectors. I have to use 1D vectors as I intend to send data to Pure Data which only handles 1D structures.
GOL::GOL() {
init();
}
void GOL::init() {
for (int i =1;i < cols-1;i++) {
for (int j =1;j < rows-1;j++) {
board.push_back(rows * cols);
board[i * cols + j] = ofRandom(2);
}
}
}
void GOL::generate() {
vector<int> next(rows * cols);
// Loop through every spot in our 2D array and check spots neighbors
for (int x = 0; x < cols; x++) {
for (int y = 0; y < rows; y++) {
// Add up all the states in a 3x3 surrounding grid
int neighbors = 0;
for (int i = -1; i <= 1; i++) {
for (int j = -1; j <= 1; j++) {
neighbors += board[((x+i+cols)%cols) * cols + ((y+j+rows)%rows)];
}
}
// A little trick to subtract the current cell's state since
// we added it in the above loop
neighbors -= board[x * cols + y];
// Rules of Life
if ((board[x * cols + y] == 1) && (neighbors < 2)) next[x * cols + y] = 0; // Loneliness
else if ((board[x * cols + y] == 1) && (neighbors > 3)) next[x * cols + y] = 0; // Overpopulation
else if ((board[x * cols + y] == 0) && (neighbors == 3)) next[x * cols + y] = 1; // Reproduction
else next[x * cols + y] = board[x * cols + y]; // Stasis
}
}
// Next is now our board
board = next;
}
this looks weird in your code:
void GOL::init() {
for (int i =1;i < cols-1;i++) {
for (int j =1;j < rows-1;j++) {
board.push_back(rows * cols);
board[i * cols + j] = ofRandom(2);
}
}
}
"vector.push_back( value )" means "append value to the end of this vector" see std::vector::push_back reference
After doing this, you access the value of board[i * cols + j] and change it into a random value. What I think you are trying to do is:
void GOL::init() {
// create the vector with cols * rows spaces:
for(int i = 0; i < cols * rows; i++){
board.push_back( ofRandom(2));
}
}
This is how you would access every element at position x,y in your vector:
for (int x = 0; x < cols; x++) {
for (int y = 0; y < rows; y++) {
board[x * cols + y] = blabla;
}
}
This means that in void GOL::generate() you are not accessing the right position when you do this:
neighbors += board[((x+i+cols)%cols) * cols + ((y+j+rows)%rows)];
I think you want to do this:
neighbors += board[((x+i+cols)%cols) * rows + ((y+j+rows)%rows)];
so x * rows + y instead of x * cols + y