I am writing a CUDA kernel to create a 3x3 covariance matrix for each location in the rows*cols main matrix. So that 3D matrix is rows*cols*9 in size, which i allocated in a single malloc accordingly. I need to access this in a single index value
the 9 values of the 3x3 covariance matrix get their values set according to the appropriate row r and column c from some other 2D arrays.
In other words - I need to calculate the appropriate index to access the 9 elements of the 3x3 covariance matrix, as well as the row and column offset of the 2D matrices that are inputs to the value, as well as the appropriate index for the storage array.
i have tried to simplify it down to the following:
//I am calling this kernel with 1D blocks who are 512 cols x 1row. TILE_WIDTH=512
int bx = blockIdx.x;
int by = blockIdx.y;
int tx = threadIdx.x;
int ty = threadIdx.y;
int r = by + ty;
int c = bx*TILE_WIDTH + tx;
int offset = r*cols+c;
int ndx = r*cols*rows + c*cols;
if((r < rows) && (c < cols)){ //this IF statement is trying to avoid the case where a threadblock went bigger than my original array..not sure if correct
d_cov[ndx + 0] = otherArray[offset];//otherArray just contains a value that I might do some operations on to set each of the ndx0-ndx9 values in d_cov
d_cov[ndx + 1] = otherArray[offset];
d_cov[ndx + 2] = otherArray[offset];
d_cov[ndx + 3] = otherArray[offset];
d_cov[ndx + 4] = otherArray[offset];
d_cov[ndx + 5] = otherArray[offset];
d_cov[ndx + 6] = otherArray[offset];
d_cov[ndx + 7] = otherArray[offset];
d_cov[ndx + 8] = otherArray[offset];
}
When I check this array with the values calculated on the CPU, which loops over i=rows, j=cols, k = 1..9
The results do not match up.
in other words d_cov[i*rows*cols + j*cols + k] != correctAnswer[i][j][k]
Can anyone give me any tips on how to sovle this problem? Is it an indexing problem, or some other logic error?
Rather than the answer (which I haven't stared hard enough to find), here's the technique I usually use for debugging these sorts of issues. First, set all values in your destination array to NaN. (You can do this via cudaMemset -- set every byte to 0xFF.) Then try uniformly setting every location to the value of the row, then inspect the results. In theory, it should look something like:
0 0 0 ... 0
1 1 1 ... 1
. . . . .
. . . . .
. . . . .
n n n ... n
If you see NaNs, you've failed to write to an element; if you see row elements out of place, something is wrong, and they'll usually be out of place in a suggestive pattern. Do something similar with the column value, and with the plane. Usually, this trick helps me find part of the index calculation is awry, which is most of the battle. Hope that helps.
I might be just stupid, but what is the logic in this line?
int ndx = r*cols*rows + c*cols;
Shouldn't you have
int ndx = offset*9;
If you said that the size of your covariance array was rows*cols*9, then wouldn't offset*9 take you at the same location in the 3D covariance array as where you are in your input array. So then offset*9+0 would be the location (0,0) of the 3x3 covariance matrix of the element at offset, offset*9+1 would be (0,1), offset*9+2 would be (0,2), offset*9+3 would be (1,0) and so on until offset*9+8.
Related
I am trying to learn CUDA. I started to try matrix multiplication with the help of this article based on GPU.
My main problem is that I am unable too understand how to access 2D array in Kernel since accessing a 2D array is a bit different than the conventional method (matrix[i][j]).
This is the part where i am stuck:
for (int i = 0; i < N; i++) {
tmpSum += A[ROW * N + i] * B[i * N + COL];
}
C[ROW * N + COL] = tmpSum;
I could understand how ROW and COLUMN were derived.
int ROW = blockIdx.y*blockDim.y+threadIdx.y;
int COL = blockIdx.x*blockDim.x+threadIdx.x;
Any explanation with an example is highly appreciated. Thanks!
Matrices are stored contiguously, i.e. every row after the other at consecutive locations. What you see here is called flat adressing, i.e turning the two element index to an offset from the first element.
How can I find the element at index x,y in a given matrix after rotating the total matrix without performing the matrix rotation.
That means I am just interested in that coordinate don't want to perform total operation on total matrix and than simply get the element at any index.
Example:
suppose a matrix is given
1 2 3
4 5 6
7 8 9
and i want to find the element at 1,1 after rotating the matrix by 90 degree.
answer should be "7".
**NOTE**: Without performing the rotation on total matrix.
and if i want the element at 1,2 than the answer should be "4".
I hope I clearly communicated the question please help if you know the solution or algorithm for this question.
Thank you.
Suppose you have a m x n matrix and you are interested in the position of M[i][j] after rotation.
So, after a rotation of 90 degrees clockwise, M[i][j] -> M[j][m+1-i].
As in your example, M[3][1] will be M[1][3+1-3] after rotation.
Hope this solves your problem.
Here's one way to solve the problem (other than using somebody else's solution).
It's fairly clear that the column index of each element is the row index of that element after rotation (at least, I hope that's clear).
So, the problem is the column index of an element after rotation.
The first row will become the last column, the second will be the second last, and so on until the last row which becomes the first column.
One way of viewing this is that we have the sequence (of rows) i = 1, 2, ..., m and want to map that to the sequence (of columns) j = m, m - 1, m - 2, ..., 2, 1.
But m = m + 1 - 1, m - 1 = m + 1 - 2, m - 2 = m + 1 - 3, ..., 1 = m + 1 - m.
So the desired sequence is j = m + 1 - i.
In other words, M[i][j] -> M[j][m + 1 - i].
You want to map:
(x,y) -> (x', y')
Assume following:1
x' = ax + by + c
y' = dx + ey + f
Now, (1, 1) maps to (W, 1)2
w = a + b + c
1 = d + e + f
(1, W) maps to (1, 1)3
1 = a + bw + c
1 = d + ew + f
and (W, H) maps to (1, H)4
1 = aw + bh + c
h = dw = eH + f
Solve 2, 3 and 4 equation and fill in to 1 get the value. (Hint: b = -1, e = 0)
// For 90 degree rotation using correct indexing for x and y (starting at 0 not 1)
// Assuming square matrix
template<class T, int size>
T elemAfter90degRot(int x, int y, T[size][size] mat) {
int j = y;
int i = size - 1 - x;
return mat[i][j];
}
I think that should do the trick for a 90 degree rotation of a square matrix
I am trying to create a neighborhood of pixel by using pixel matrix. The pixel matrix is the matrix of pixels in a 1 band image. Now I have to form matrix of 3*3 keeping each element of 9*9 matrix at center and have a neighbor for each element. Thus the element at (0,0) position will have neighboring elements as
[[0 0 0],
[0 2 3],
[0 3 4]]
Same case will happen to all elements in the first and last row and column. Attached image can help understanding better.
So the resultant matrix will have the size of 81*81. It is not necessary to save the small matrix in the form of matrix.
I have tried below,
n = size[0]
z= 3
x=y=0
m =0
while all( [x<0, y<0, x>=n, y>=n]):
continue
else:
for i in range(0, n):
arcpy.AddMessage("Hello" )
for x in range(m,m+3):
temp_matrix = [ [ 0 for i in range(3) ] for j in range(3) ]
for y in range(m,m+3):
temp_matrix[x][y] = arr_Pixels[x][y]
m+=1
y+=1
temp_List.append(temp_matrix)
But I am getting error: list assignment out of index. Also it looks too lengthy and confusing. I understood the error is occurring because, there is no increment in the array temp_matrix length.
Is there any better way to implement the matrix in image? Smaller matrices can be saved into list rather than matrix. Please help me.
Update #2
n = size[0]
new_matrix = []
for i in range(0,n):
for j in range(0,n):
temp_mat = [ [ 0 for k in range(3) ] for l in range(3) ]
for k in range(i-1, i+2):
for l in range(j-1,j+2):
if any([k<0, l<0, k>n-1, l>n-1]):
temp_mat[k][l] = 0
else:
temp_mat[k][l] = arr_Pixels[k][l]
new_matrix.append(temp_mat)
I think one issue is your use of while/else. The code in else only executes after the while condition is true and the while will not repeat again. This question might be helpful.
Thus, once it enters else, it will never check again that x<=n and y<=n, meaning that x and y can increase beyond n, which I assume is the length of arr_Pixels.
One better way to do it would be to create two nested for loops that increment from 0 to n and create the temp neighborhood matrices and add them to the 9x9 matrix. Here is an rough outline for that:
new_matrix = [] //future 9x9 matrix
for i in range(0, n):
for j in range(0, n):
// create a neighborhood matrix going around (i, j)
// add temp matrix to new_matrix
This method would avoid having to check that the indexes you are accessing are less than n because it assures that i and j will always be less than n-3.
I found better way of doing it by padding the whole matrix by zero. Thus it resolves the negative indexing problems.
matrix can be padded as
pixels = np.pad(arr_Pixels, (1,1), mode='constant', constant_values=(0, 0))
It adds rows and columns of zeros along the axes.
I would like to make sums for each element of the matrix in an interval of +30 -30 of the curent position. To be more precise suppose I have an element a[i][j] and I like to make the sum of all elements
a[i][j - 30] + a[i][j - 29] + a[i][j - 28] + ..... + a[i][ j + 28] + a[i][j+29] + a[i][j + 30;
I have also computed the integral image of the matrix such that I can easily and efficiently make the sum by the formula A + D - C - D;
Here you can see a post how it works
http://docs.opencv.org/modules/imgproc/doc/miscellaneous_transformations.html#integral
My question is how can I make the sum efficiently using the already computed integral image. Or is there another efficient way?
Thank you for your time!
P.S. I know that I could compute the sum for the first 30 elements and at each step add and subtract 1 element - add one from the front and subtract one from the bottom. But I wonder if I could do it faster
By using integral images, you are able to get the sum of the values in a given rectangle, like (from Wikipedia):
You just need to set the proper values for A,B,C,D.
Mat1f I; // your integral image
// for each i,j (check boundaries!)
int radius = 30;
float A = I[i-1][j-radius-1];
float B = I[i-1][j+radius];
float C = I[i][j-radius-1];
float D = I[i][j + radius];
float sum = D - B - C + A;
My teacher let me use linked list to make a polynomial like as this code
class Node {
public:
int data; // only data, has no power
Node* next;
};
class PolyList {
private:
Node* pHead;
public:
PolyList();
~PolyList();
............
}
List is read by file input.txt.
Example : 2 4 0 3 ---> Polynomial = ( 2x^3 + 4x^2 + 3 )
How can i implement method multiply 2 polynomial between list1 and list2.
I search google and on this site but only found polynomial is created with data include coefficient variable and power variable. My polynomial have only coefficient and i can not change this structure.
I need help from everybody. Thanks a lot.
What you want to do is convolution of the coefficients. If you have access to Matlab (or Octave), you can try it out:
% Note this is Matlab, just for demonstration
p1 = [1 1]; % x + 1
p2 = [1 0]; % x
p3 = conv(p1, p2) %x*(x + 1) => x^2 + x
% gives p3 = [1 1 0], i.e., x^2 + x
Edit: I didn't give any details about implementing this - You can probably find examples of convolution using linked lists by googling it.
You could use two nested for-loops to multiply the two lists together, while saving them in another list: (pseudocode)
define list3 as a new PolyList of length x + y
for each element A at index x in list1
for each element B at index y in list2
save list3 element at index x + y as (A * B + (element at index x + y))
So for example with x^3 - 2x + 1 * x^2 + 4 = [1, 0, -2, 1] * [0, 0, 1, 4] = [0, 0, 0, 1, 0, 2, 1, -8, 4].
*Note: Your resulting list can be up to twice the size of the original length of the two arrays, because for example x^3 * x^3 = x^6, which would be recorded in the 6th index, starting from the right.*
Also note: The two arrays must be the same length for the algorithm to work properly! If this isn't assumed by the function you're creating, you will have to handle this situation.
A good way to figure out how to program a problem like this is to imagine exactly the steps you would do to solve the problem, write those down, and then translate that into the language you're using.
A way to do it can be the following:
list<int> multiply(list<int> l1, list<int> l2) {
int m[l1.size() + l2.size() - 1]; // Only positive powers of l1 "augments" the powers of l2!
for (unsigned int i = 0; i < l1.size(); i++) {
for (unsigned int j = 0; j < l2.size(); i++) {
m[i + j] += l1.get() * l2.get();
l2.next();
}
l2.reset();
l1.next();
}
list<int> to_ret;
for (unsigned int i = 0; i < m.length; i++)
to_ret.push_back(m[i]);
return to_ret;
}
You put the coefficient of the i-th power of the resulting polynomial in m[i].
To fill m, it's sufficient to iterate over every couple (i, j) of [0, l1.size) x [0, l2.size) and put in m[i + j] the coefficient you get multiplying the coefficient of the i-th power of the first polynomial with that of the j-th power of the second one.
If you want to multiply two polynomials of order M and N then the resulting polynomial will be of order M + N. So you need to create an output linked list whose length is the sum of the lengths of the two input lists. You then just iterate through the two input lists multiplying and summing the terms into the output list.
Hint: you might want to try doing this by hand first, i.e. with pencil and paper, so that you understand the process before you try to code it.