Iterate through 2D Array block by block in C++ - c++

I'm working on a homework assignment for an image shrinking program in C++. My picture is represented by a 2D array of pixels; each pixel is an object with members "red", "green" and "blue." To solve the problem I am trying to access the 2D array one block at a time and then call a function which finds the average RGB value of each block and adds a new pixel to a smaller image array. The size of each block (or scale factor) is input by the user.
As an example, imagine a 100-item 2D array like myArray[10][10]. If the user input a shrink factor of 3, I would need to break out mini 2D arrays of size 3 by 3. I do not have to account for overflow, so in this example I can ignore the last row and the last column.
I have most of the program written, including the function to find the average color. I am confused about how to traverse the 2D array. I know how to cycle through a 2D array sequentially (one row at a time), but I'm not sure how to get little squares within an array.
Any help would be greatly appreciated!

Something like this should work:
for(size_t bx = 0; bx < width; bx += block_width)
for(size_t by = 0; by < height; by += block_height) {
float sum = 0;
for(size_t x = 0; x < block_width; ++x)
for(size_t y = 0; y < block_height; ++y) {
sum += array[bx + x][by + y];
}
average = sum / (block_width * block_height);
new_array[bx][by] = average;
}
width is the whole width, block_width is the length of your blue squares on diagram

This is how you traverse an array in C++:
for(i=0; i < m; i++) {
for(j=0; j < n; j++) {
// do something with myArray[i][j] where i represents the row and j the column
}
}
I'll leave figuring out how to cylcle through the array in different ways as an exercise to the reader.

you could use two nested loops one for x and one for y and move the start point of those loops across the image. As this is homework I wont put any code up but you should be able to work it out.

Related

How can I most efficiently map a kernel range for a hermitian (symmetric) matrix in OpenCL?

I'm working on an OpenCL project to generate very large hermitian (symmetric) matrices, and I am trying to determine the best way to generate the work IDs.
A hermitian matrix is symmetric along the diagonal, so that M(i,j) = M*(j,i).
In the brute force way, the for loop looks like:
for(int i = 0; i < N; i++)
{
for(int j = 0; j < N; j++)
{
complex<float> result = doSomeCalculation();
M(i,j) = result;
}
}
However, taking advantage of the hermitian property, the loop can be made to be twice as efficient by only calculating the upper triangular part of the matrix and duplicating the result in the lower triangular part:
for(int i = 0; i < N; i++)
{
for(int j = i; j < N; j++)
{
complex<float> result = doSomeCalculation();
M(i,j) = result;
M(j,i) = conj(result);
}
}
In both loops, doSomeCalculation() is an expensive operation, and each entry in the matrix is completely uncorrelated from every other entry (i.e. the problem is stupidly parallel).
My question is this:
How can I implement the second loop with doSomeCalculation as an OpenCL kernel so that the thread IDs are most efficiently used (i.e. so that the thread calculates both M(i,j) and M(j,i) without having to call doSomeCalculation() twice)?
You need to use a linear index, for example you can index every element of your matrix in this way:
0 1 2 ... N-1
* N-2 ... 2N-2
....
* * 2N-1 ... N(N+1)/2 -1
That is, the index K is given by:
k=iN-i*(i+1)/2+j
Where N is the size of the matrix and (i,j) are respectively the 0-based indices of the row and the column.
This relationship can be inverted; see the answer of this question, which I report here for completeness:
i = floor( ( 2*N+1 - sqrt( (2N+1)*(2N+1) - 8*k ) ) / 2 ) ;
j = k - N*i + i*(i+1)/2 ;
So you need to enqueue a 1D kernel with N(N+1)/2 work items, and you can decide by yourself the size of the workgroup (usually 64 items per work group is a good choice).
Then in the OpenCL code you can retrieve the index K by using:
int k = get_group_id(0)*64 + get_local_id(0);
And then use the two relationships above the index of the matrix element you need to compute.
Moreover, notice that you can also save space by representing your hermitian matrix as a linear vector with N(N+1)/2 elements.
If your matrices are really big, than you can dice up your NxN matrix into (N/k)x(N/k) tiles, each of size kxk. As soon as you need only a half of the data, you create 1D NDRange of size local_group_size * (N/k)x(N/k)/2 roughly.
Every tile of matrix is processed by one LocalGroup (size of LocalGroup is of your choice). The idea is that you create an array on Host side, which contain position of every WorkGroup in matrix. Kernel stub should look like follows:
void __kernel myKernel(
__global int* coords,
....)
{
int2 WorkGroupPositionInMatrix = vload2(get_group_id(0), coords);
...
DoCalculation();
...
WriteResultTwice();
...
return;
}
What you need to do by hand - is to cope with thouse WorkGroups, which will be placed on the matrix diagonal. If matrix size is big, than overhead for LocalGroups, placed on diagonal is negligible.
A right triangle can be cut in half vertically and the smaller portion rotated to fit with the larger portion to form a rectangle of equal area. Therefore it is easy to make your triangular global work area into one that is rectangular, which fits OpenCL.
See my answer here: OpenCL efficient way to group a lower triangular matrix

Put a multidimensional array into a one-dimensional array

I've got a question. I'm writing a simple application in C++ and I have the following problem:
I want to use a two-dimensional array to specify the position of an object (x and y coordinates). But when I created such an array, I got many access violation problems, when I accessed it. I'm not pretty sure, where that violations came from, but I think, my stack is not big enough and I shuld use pointers. But when I searched for a solution to use a multidimensional array in heap and point on it, the solutions where too complicated for me.
So I remembered there's a way to use a "normal" one-dimensional array as an multidimensional array. But I do not remember exactly, how I can access it the right way. I declared it this way:
char array [SCREEN_HEIGHT * SCREEN_WIDTH];
Then I tried to fill it this way:
for(int y = 0; y < SCREEN_HEIGHT; y++) {
for(int x = 0; x < SCREEN_WIDTH; x++) {
array [y + x * y] = ' ';
}
}
But this is not right, because the char that is at position y + x * y is not exactly specified (because y + y * x points to the same position)
But I am pretty sure, there was a way to do this. Maybe I am wrong, so tell it to me :D
In this case, a solution to use multidimensional array would be great!
You don't want y + x*y, you want y * SCREEN_WIDTH + x. That said, a 2D array declared as:
char array[SCREEN_HEIGHT][SCREEN_WIDTH];
Has exactly the same memory layout, and you could just access it directly the way you want:
array[y][x] = ' ';
char array2D[ROW_COUNT][COL_COUNT] = { {...} };
char array1D[ROW_COUNT * COL_COUNT];
for (int row = 0; row < ROW_COUNT; row++)
{
for (int col = 0; col < COL_COUNT; col++)
{
array1D[row * COL_COUNT + col] = array2D[row][col];
}
}
You access the correct element for your 1D array by taking "current row * total columns + current column," or vice-versa if you're looping through columns first.

Generating incomplete iterated function systems

I am doing this assignment for fun.
http://groups.csail.mit.edu/graphics/classes/6.837/F04/assignments/assignment0/
There are sample outputs at site if you want to see how it is supposed to look. It involves iterated function systems, whose algorithm according the the assignment is:
for "lots" of random points (x0, y0)
for k=0 to num_iters
pick a random transform fi
(xk+1, yk+1) = fi(xk, yk)
display a dot at (xk, yk)
I am running into trouble with my implementation, which is:
void IFS::render(Image& img, int numPoints, int numIterations){
Vec3f color(0,1,0);
float x,y;
float u,v;
Vec2f myVector;
for(int i = 0; i < numPoints; i++){
x = (float)(rand()%img.Width())/img.Width();
y = (float)(rand()%img.Height())/img.Height();
myVector.Set(x,y);
for(int j = 0; j < numIterations;j++){
float randomPercent = (float)(rand()%100)/100;
for(int k = 0; k < num_transforms; k++){
if(randomPercent < range[k]){
matrices[k].Transform(myVector);
}
}
}
u = myVector.x()*img.Width();
v = myVector.y()*img.Height();
img.SetPixel(u,v,color);
}
}
This is how my pick a random transform from the input matrices:
fscanf(input,"%d",&num_transforms);
matrices = new Matrix[num_transforms];
probablility = new float[num_transforms];
range = new float[num_transforms+1];
for (int i = 0; i < num_transforms; i++) {
fscanf (input,"%f",&probablility[i]);
matrices[i].Read3x3(input);
if(i == 0) range[i] = probablility[i];
else range[i] = probablility[i] + range[i-1];
}
My output shows only the beginnings of a Sierpinski triangle (1000 points, 1000 iterations):
My dragon is better, but still needs some work (1000 points, 1000 iterations):
If you have RAND_MAX=4 and picture width 3, an evenly distributed sequence like [0,1,2,3,4] from rand() will be mapped to [0,1,2,0,1] by your modulo code, i.e. some numbers will occur more often. You need to cut off those numbers that are above the highest multiple of the target range that is below RAND_MAX, i.e. above ((RAND_MAX / 3) * 3). Just check for this limit and call rand() again.
Since you have to fix that error in several places, consider writing a utility function. Then, reduce the scope of your variables. The u,v declaration makes it hard to see that these two are just used in three lines of code. Declare them as "unsigned const u = ..." to make this clear and additionally get the compiler to check that you don't accidentally modify them afterwards.

C++ Filling an 1D array to represent a n-dimensional object based on a straight line segment

READ FIRST: I have rewritten this question with the help of a friend to be hopefully more specific in what is required. It can be found here
I'm not very clear on n-cubes, but I believe they are what I am referring to as the square family.
New Question Wording:
Perhaps I wasn't clear enough. What I'm asking, is how to set a 1D array to hold data for a cloud of a number of evenly-spaced points that form the most complete representation of the space occupied by an n-cube of n dimensions.
In 1D this would simply fill the array with a series of 1D co-ordinates creating a line segment. A 1-cube.
In 2D however this would fill every first co-ordinate to the x value and the every second to the y, generating the most complete square possible for that spacing and number of particles. The most complete possible 2-cube.
In 3D, this would fill ever first with x, every second with y and every third with z, generating the most complete possible cube for that spacing and number of particles. The most complete possible 3-cube.
I wish to be able to do this for any reasonable combination of number of particles, spacing and dimensions. Ideally I could do at least up to a 4-cube using a generic fill algorithm for all n-cubes initialised to double * parts_
Yet another definition of what kind of object I'm trying to represent:
In 1D its a line. Sweep it through the second dimension it becomes a square. Sweep that square through the third, it becomes a cube. I presume this behaviour extends past three dimensions and wish to store a cloud of points representing the space taken up by one of these objects of any reasonable dimension, spacing and number of points in a 1D array.
The original wording of the question:
I'm struggling to find a good way to put this question but here goes. I'm making a system that uses a 1D array implemented as double * parts_ = new double[some_variable];. I want to use this to hold co-ordinates for a particle system that can run in various dimensions.
What I want to be able to do is write a generic fill algorithm for filling this in n-dimensions with a common increment in all direction to a variable size. Examples will serve best I think.
Consider the case where the number of particles stored by the array is 4
In 1D this produces 4 elements in the array because each particle only has one co-ordinate.
1D:
{0, 25, 50, 75};
In 2D this produces 8 elements in the array because each particle has two co-ordinates..
2D:
{0, 0, 0, 25, 25, 0, 25, 25}
In 3D this produces 12 elements in the array because each particle now has three co-ordinates
{0, 0, 0, 0, 0, 25, 0, 0, 50, ... }
These examples are still not quite accurate, but they hopefully will suffice.
The way I would do this normally for two dimensions:
int i = 0;
for(int x = 0; x < parts_size_ / dims_ / dims_ * 25; x += 25) {
for(int y = 0; y < parts_size_ / dims_ / dims_ * 25; y += 25) {
parts_[i] = x;
parts_[i+1] = y;
i+=2;
}
}
How can I implement this for n-dimensions where 25 can be any number?
The straight line part is because it seems to me logical that a line is a somewhat regular shape in 1D, as is a square in 2D, and a cube in 3D. It seems to me that it would follow that there would be similar shapes in this family that could be implemented for 4D and higher dimensions via a similar fill pattern. This is the shape I wish to set my array to represent.
EDIT: Apparently I'm trying to fill this array to represent the n-cube with the fewest missing elements for the given n, spacing and number of elements. If that makes my goal any clearer.
As I understand it, you aren't sure how to process every element in multi-dimensional array (stored as 1D array), where N is arbitrary number of dimensions.
Processing of multidimensional array with arbitrary number of dimensions goes like this:
#include <stdio.h>
#include <vector>
using std::vector;
int main(int argc, char** argv){
int index = 0;
const int numDimensions = 10;
vector<int> counters;
vector<int> dimensionSizes;
counters.resize(numDimensions);
dimensionSizes.resize(numDimensions);
for (int i = 0; i < numDimensions; i++){
counters[i] = 0;
dimensionSizes[i] = 13;
}
long long arraySize = 1;
for (int i = 0; i < numDimensions; i++)
arraySize *= dimensionSizes[i];
printf("%d\n", arraySize);
for (int elementIndex = 0; elementIndex < arraySize; elementIndex++){
fprintf(stderr, "element %08d: ", elementIndex);
for (int i = 0; i < numDimensions; i++)
fprintf(stderr, "%04d ", counters[i]);
fprintf(stderr, "\n");
//at this point you have 1D element index
//AND all n-dimensional coordinates stored in counters array.
//Just use them to for your data
//"counters" is N-dimensional coord. XYZW etc.
for (int i = 0; i < numDimensions; i++){
counters[i] = counters[i] + 1;
if (counters[i] < dimensionSizes[i])
break;
else
counters[i] = 0;
}
}
return 0;
}
Just make an array of structs you need to access in N dimensions, and access them using calculated index somewhere after comment. It is better to use array of structs representing the data you want to be stored in N dimensionals. If you don't want to do that, you'll have to multiply elementIndex by number of doubles per element.

UBLAS Matrix Finding Surrounding Values of a Cell?

I am looking for an elegant way to implement this. Basically i have a m x n matrix. Where each cell represents the pixel value, and the rows and columns represent the pixel rows and pixel columns of the image.
Since i basically mapped points from a HDF file, along with their corresponding pixel values. We basically have alot of empty pixels. Which are filled with 0.
Now what i need to do is take the average of the surrounding cell's, to average out of a pixel value for the missing cell.
Now i can brute force this but it becomes ugly fast. Is there any sort of elegant solution for this?
There's a well-known optimization to this filtering problem.
Integrate the cells in one direction (say horizontally)
Integrate the cells in the other direction (say vertically)
Take the difference between each cell and it's N'th neighbor to the left.
Take the difference between each cell and it's N'th lower neighbor
Like this:
for (i = 0; i < h; ++i)
for (j = 0; j < w-1; ++j)
A[i][j+1] += A[i][j];
for (i = 0; i < h-1; ++i)
for (j = 0; j < w; ++j)
A[i+1][j] += A[i][j]
for (i = 0; i < h; ++i)
for (j = 0; j < w-N; ++j)
A[i][j] -= A[i][j+N];
for (i = 0; i < h-N; ++i)
for (j = 0; j < w; ++j)
A[i][j] -= A[i-N][j];
What this does is:
The first pass makes each cell the sum of all of the cells on that row to it's left, including itself.
After the 2nd pass , each cell is the sum of all of the cells in a rectangle above and left of itselt (including it's own row and column)
After the 3rd pass, each cell is the sum of a rectangle above and to the right of itself, N columns wide.
After the 4th pass each cell is the sum of an NxN rectangle below and to the right of itself.
This takes 4 operations per cell to compute the sum, as opposed to 8 for brute force (assuming you're doing a 3x3 averaging filter).
The cool thing is that if you use ordinary two's-complement arithmetic, you don't have to worry about any overflows in the first two passes; they cancel out in the last two passes.
The main issues here are utilizing all available cores and cache effeciency.
You might be interested in checking fast implementation of convolution.
However, since you do it with Boost, you can check how this is done in this Boost example
I beleive you have to change only the convolution kernel for your specialized task.