I programmed a piece of code where I have a huge 3D matrix in C++ using boost::multi_array.
For every matrix element I want to sum up the neighborhood in a certain distance dist. Every element is weighted according to its distance to the center. The center element should not be included in the sum.
The distance dist is given by the user and can vary. My program is doing the right calculations but is slow when the matrix gets big. I have sometimes matrices with more than 100000 elements...
So my question is: Is there any way to make this computation faster? Maybe also by using another library?
The part consists mainly of two functions. In the first function I access every matrix element and calculate the sum of the neighborhood. The inputMatrix is a 3D boost multi array:
boost::multi_array<float, 3> inputMatrix = imageMatrix;
T actualElement;
int posActualElement;
for (int depth = 0; depth<inputMatrix.shape()[2]; depth++) {
for (int row = 0; row<inputMatrix.shape()[0]; row++) {
for (int col = 0; col<inputMatrix.shape()[1]; col++) {
indexOfElement[0] = row;
indexOfElement[1] = col;
indexOfElement[2] = depth;
//get actual Element if it is the centre of a whole neighborhood
actualElement = inputMatrix[row][col][depth];
if (!std::isnan(actualElement)) {
//get the sum of the actual neighborhood
sumOfActualNeighborhood = getNeighborhood3D(inputMatrix, indexOfElement);
}
}
}
}
The function neighborhood3D looks as follows:
template <class T, size_t R>
T NGTDMFeatures3D<T, R>::getNeighborhood3D(boost::multi_array<T, R> inputMatrix, int *indexOfElement) {
std::vector<T> neighborhood;
T actElement;
float weight;
for (int k = -dist; k<dist + 1; k++) {
for (int i = -dist; i<dist + 1; i++) {
for (int j = -dist; j<dist + 1; j++) {
if (i != 0 || j != 0 || k != 0) {
if (indexOfElement[0] + i>-1 && indexOfElement[0] + i<inputMatrix.shape()[0] && indexOfElement[1] + j>-1 && indexOfElement[1] + j<inputMatrix.shape()[1] && indexOfElement[2] + k>-1 && indexOfElement[2] + k<inputMatrix.shape()[2]) {
actElement = inputMatrix[indexOfElement[0] + i][indexOfElement[1] + j][indexOfElement[2] + k];
if (!std::isnan(actElement)) {
weight = calculateWeight3D(i, j, k, normNGTDM, actualSpacing);
neighborhood.push_back(weight*actElement);
}
}
}
}
}
}
T sum = accumulate(neighborhood.begin(), neighborhood.end(), 0);
sum = sum / neighborhood.size();
return sum;
}
Related
I'm building Space Invaders in C++ (using the MBed platform) for a microcontroller. I've used a 2D Vector of object pointers to organise the invaders.
The movement algorithm is below, and runs in the main while loop for the game. Basically, I get the highest/lowest x and y values of invaders in the vector, and use those to set bounds based on screensize (the HEIGHT variable);
I also get the first invader's position, velocity, and width, which I apply changes to based on the bounds above.
Then I iterate through the whole vector again and apply all those changes. It sort of works – the invaders move – but the bounds don't seem to take effect, and so they fly off screen. I feel like I'm missing something really dumb, thanks in advance!
void Army::move_army() {
int maxy = HEIGHT - 20;
int Ymost = 0; // BOTTOM
int Yleast = 100; // TOP
int Xmost = 0; // LEFT
int Xleast = 100; // RIGHT
int first_row = _rows;
int first_column = _columns;
int firstWidth = 0;
Vector2D firstPos;
Vector2D firstVel;
for (int i = 0; i < _rows; i++) {
for (int n = 0; n < _columns; n++) {
bool state = invaders[i][n]->get_death();
if (!state) {
if (i < first_row && n < first_column) {
firstPos = invaders[i][n]->get_pos();
firstVel = invaders[i][n]->get_velocity();
firstWidth = invaders[i][n]->get_width();
}
Vector2D pos = invaders[i][n]->get_pos();
if (pos.y > Ymost) {Ymost = pos.y;} // BOTTOM
else if (pos.y < Yleast) {Yleast = pos.y;} // TOP
else if (pos.x > Xmost) {Xmost = pos.x;} // LEFT
else if (pos.x < Xleast) {Xleast = pos.x;} // RIGHT
}
}
}
firstVel.y = 0;
if (Xmost >= (WIDTH - 8) || Xleast <= 2) {
firstVel.x = -firstVel.x;
firstPos.y += _inc;
// reverse x velocity
// increment y position
}
else if (Ymost > maxy) {
_inc = -_inc;
// reverse increment
}
else if (Yleast < 2) {
_inc = -_inc;
// reverse increment
}
for (int i = 0; i < _rows; i++) {
int setx = firstPos.x;
if (i > 0) {firstPos.y += 9;}
for (int n = 0; n < _columns; n++) {
invaders[i][n]->set_velocity(firstVel);
invaders[i][n]->set_pos(setx,firstPos.y);
setx += firstWidth + 2;
}
}
It looks like you have your assignment cases reversed. Assignment always goes: right <- left, so in the first case you're changing the YMost value, not pos.y. It looks like if you swap those four assignments in your bounds checking it should work. Good luck!
The code inside the for loop is for the x and y (j and i) "coordinates" from a 2d array. How could I implement this neighbor/index finding in a 1d array?
I think I could implement it for the first four equations. But i'm confused as how to implement up-left etc.
for(int i=0; i<cols*rows; i++){
//Counts current index's 8 neigbour int values
int count=0;
int x = i%cols;
int y = i/rows;
//rows y i
//cols x j
count+= [grid][i][(j-1+cols)%cols] //left
+[grid][i][(j+1+cols)%cols] //right
+[grid][(i-1+rows)%rows][j] //up
+[grid][(i+1+rows)%rows][j] //down
+[grid][(i-1+rows)%rows][ (j-1+cols)%cols] //up-left
+[grid][(i+1+rows)%rows][ (j+1+cols)%cols] //down-right
+[grid][(i+1+rows)%rows][ (j-1+cols)%cols] //down-left
+[grid][(i-1+rows)%rows][ (j+1+cols)%cols] ;//up-right
}
Starting with a 1-D vector:
int rows = 10;
int cols = 10;
vector<int> grid(rows * cols);
You can manage this in different ways, example
for(int y = 0; y < rows; y++)
{
for(int x = 0; x < cols; x++)
{
int point = grid[y * rows + x];
}
}
Where you can access any point at any given x and y in a 2-dimensional plane.
Top-left is:
x = 0;
y = 0;
bottom-right is
x = cols - 1;
y = rows - 1;
And so on.
Use a function like this
inline int idx(const int i, const int j, const int rows) const
{
return i * rows + j;
}
to convert the 2d indices to 1d indices.
This way you don't have to change your algorithm.
Usage would be grid[idx(i, (j-1+cols)%cols, rows)].
The basic formula for computing the 1d coordinate from the 2d index pattern is usually one of the following:
row_index * row_length + column_index
column_index * column_length + row_index
Which one applies to your case depends on whether you would like to have a row-based or column-based memory layout for your 2d array. It makes sense to factor out the computation of this index into a separate function, as suggested in the other answer.
Then you just need to fill in the values somehow.
You could do it like this, for example:
// iterate big picture
// TODO: make sure to handle the edge cases appropriately
for (int i_row = 1; i_row < n_rows - 1; i_row++) {
for (int i_col = 1; i_col < n_cols -1; i_col++) {
// compute values
dst[i_row*n_cols+i_col] = 0;
for (int r = i_row-1; r < i_row+2; r++) {
for (int c = i_col-1; c < i_col+2; c++) {
dst[i_row*n_cols+i_col] += src[r*n_cols + c];
}
}
}
}
Assuming src and dst are distinct 1d vectors of size n_rows*n_cols...
Integer Range = 1;
for(Integer k = -Range; k <= Range; ++k)
{
for(Integer j = -Range; j <= Range; ++j)
{
for(Integer i = -Range; i <= Range; ++i)
{
Integer MCID = GetCellID(&CONSTANT_BOUNDINGBOX,CIDX +i, CIDY + j,CIDZ
+ k);
if(MCID < 0 || MCID >= c_CellNum)
{
continue;
}
unsigned int TriangleNum = c_daCell[MCID].m_TriangleNum;
for(unsigned int l = 0; l < TriangleNum; ++l)
{
TriangleID=c_daCell[MCID].m_TriangleID[l];
if( TriangleID >= 0 && TriangleID < c_TriangleNum && TriangleID
!= NearestID)// No need to calculate again for the same triangle
{
CDistance Distance ;
Distance.Magnitude = CalcDistance(&c_daTriangles[TriangleID], &TargetPosition,
&Distance.Direction);
if(Distance.Magnitude < NearestDistance.Magnitude)
{
NearestDistance = Distance;
NearestID = TriangleID;
}
}
}
}
}
}
}
c_daSTLDistance[ID] = NearestDistance;
c_daSTLID[ID] = NearestID;
GetCellID is the function to return the cellid in the variable CID with CIDX,CIDY,CIDZ with its position in the 3 axes
here the above code is a function to calculate the distance ,actually STL distance between a point and the triangles of the stl. This code runs fine however the problem is it is too slow as it has large number of loops within the code. Now my concern is to optimize the loop. Is there any technique of optimizing the loops within the code?
Hello I'm having trouble with a little program I am trying to write. The problem is if I'm given any matrix size (lets just say a 4x4 for this example), find the largest product of n numbers in a row (lets say n = 3). The 3 numbers in a row can be horizontal, vertical, or diagonal. So heres a matrix:
1 1 2 5
1 5 2 4
1 7 2 3
1 8 2 1
If n was equal to 3 then my largest product would be 280 (5*7*8). Now I have my matrix loaded into a 2D vector. I'm not too picky on how the program works(brute force is fine), so far I know I'm going to have to have at least two nested for loops to go through each staring location of the matrix but I haven't been successful in finding the current answer. Any advice will help, thank you.
Version to find max product in rows using rolling multiplication to save some resources. This rolling procedure means that we don't have to multiply n values to find each product of these n values, but instead we have to just do one multiplication and one division:
if (currN == N) { // compute full product first time
while (currn) {
product *= (*it3++);
--currn;
}
} else { // rolling computation
product *= (*(it3 + n - 1)) / (*(it3 - 1));
it3 += n;
}
It is up to you to complete this to handle also columns:
populate matrix:
#include <cstdio>
#include <vector>
#include <algorithm>
#include <iterator>
#include <iostream>
using namespace std;
typedef vector< vector< int> > Matrix;
typedef Matrix::iterator outIt;
typedef vector< int>::iterator inIt;
void fillMatrix( Matrix& matrix) {
outIt it = matrix.begin();
(*it).push_back( 1);
(*it).push_back( 1);
(*it).push_back( 2);
(*it).push_back( 5);
++it;
(*it).push_back( 1);
(*it).push_back( 5);
(*it).push_back( 2);
(*it).push_back( 4);
++it;
(*it).push_back( 1);
(*it).push_back( 7);
(*it).push_back( 2);
(*it).push_back( 3);
++it;
(*it).push_back( 1);
(*it).push_back( 8);
(*it).push_back( 2);
(*it).push_back( 1);
}
print matrix and find max product in rows:
void printMatrix( Matrix& matrix) {
outIt it = matrix.begin();
while ( it != matrix.end()) {
inIt it2 = (*it).begin();
while ( it2 != (*it).end()) {
printf( "%d", *it2);
++it2;
}
printf( "\n");
++it;
}
}
/**
*
* #param matrix
* Largest product in row using rolling multiplication
* #param n number of factors
* #param v factors of largest product
* #return largest product
*/
int largestProductInRow( Matrix& matrix, int n, vector< int>& v) {
if ( n > matrix.size()) return -1;
int res = 0;
int N = matrix.size() - n + 1; // number of products in row (or column)
/* search in rows */
outIt it = matrix.begin();
while (it != matrix.end()) {
inIt it2 = (*it).begin();
int currN = N;
int product = 1;
while (currN) { // rolling product calculation
inIt it3 = it2;
int currn = n;
if (currN == N) { // compute full product first time
while (currn) {
product *= (*it3++);
--currn;
}
} else { // rolling computation
product *= (*(it3 + n - 1)) / (*(it3 - 1));
it3 += n;
}
if (product > res) {
res = product;
copy(it3 - n, it3, v.begin());
}
--currN;
++it2;
}
++it;
}
return res;
}
usage:
/*
*
*/
int main(int argc, char** argv) {
Matrix matrix( 4, vector< int>());
fillMatrix( matrix);
printMatrix( matrix);
vector< int> v(3);
int res = largestProductInRow( matrix, 3, v);
printf( "res:%d\n", res);
copy( v.begin(), v.end(), ostream_iterator<int>(cout, ","));
return 0;
}
result:
res:42
7,2,3,
RUN SUCCESSFUL (total time: 113ms)
Lets say we have s x t matrix (s columns and t rows).
int res = 0;
if(s >= n)
{
for (int r = 0; r < t; ++r) // for each row
{
for (int i = 0; i <= s-n; ++i) //moving through the row
{
int mul = m[i][0];
for (int j = 1; j < n; ++j) //calculating product in a row
{
mul*=m[i][j];
}
if(mul > res)
{
res = mul;
//save i, j here if needed
}
}
}
}
if(t >= n)
{
for (int c = 0; c < s; ++c) // for each column
{
for (int i = 0; i <= t-n; ++i) //moving through the column
{
int mul = m[0][i];
for (int j = 1; j < n; ++j) //calculating product in a column
{
mul*=m[j][i];
}
if(mul > res)
{
res = mul;
//save i, j here if needed
}
}
}
}
If you insist on brute-force, then as you said, you need to iterate over all [x,y],
which will be the starting points of the rows.
From these you can iterate over k adjacent elements in all directions.
You can store the directions as vectors in an array.
This would run in O(k n^2).
For n x n matrix and looking for k elements in row, C-like pseudocode would look like this (note there is no bounds checking, for the sake of simplicity):
// define an array of directions as [x,y] unit vectors
// you only need to check in 4 directions, other 4 are the same, just reversed
int[4][2] dirs = {{1,0}, {1,1}, {0,1}, {-1,1}};
// iterate over all starting positions
for (x = 0; x < n; ++x) {
for (y = 0; y < n; ++y) {
// iterate over all directions
for (d = 0; d < 4; ++d) {
result = 1;
// iterate over elements in row starting at [x,y]
// going in direction dirs[d]
for (i = 0; i < k; ++i) {
// multiply current result by the element,
// which is i places far from the beginning [x,y]
// in the direction pointed by dirs[d]
new_x = x + i * dirs[d][0];
new_y = y + i * dirs[d][1];
// you need to check the bounds, i'm not writing it here
// if new_x or new_y are outside of the matrix
// then continue with next direction
result *= matrix[new_x][new_y];
}
if (result > max) {
max = result;
}
}
}
}
Slightly better, less of a brute-force way would be to
start on the boundary of a matrix, pick a direction and go in this direction to the opposite side of the matrix, keeping the product of the last k numbers on the way.
While walking, you keep the product, multiplying it by the number you got to and dividing by the number you left k steps ago.
This way, with some bounds checking of course,
the product is always product of the last k numbers,
therefore if the current product is more than maximum, just let max = product.
This runs always in O(n^2).
I am trying to speed up a piece of code that is ran a total of 150,000,000 times.
I have analysed it using "Very Sleepy", which has indicated that the code is spending the most time in these 3 areas, shown in the image:
The code is as follows:
double nonLocalAtPixel(int ymax, int xmax, int y, int x , vector<nodeStructure> &nodeMST, int squareDimension, Mat &inputImage) {
vector<double> nodeWeights(8,0);
vector<double> nodeIntensities(8,0);
bool allZeroWeights = true;
int numberEitherside = (squareDimension - 1) / 2;
int index = 0;
for (int j = y - numberEitherside; j < y + numberEitherside + 1; j++) {
for (int i = x - numberEitherside; i < x + numberEitherside + 1; i++) {
// out of range or the centre pixel
if (j<0 || i<0 || j>ymax || i>xmax || (j == y && i == x)) {
index++;
continue;
}
else {
int centreNodeIndex = y*(xmax+1) + x;
int thisNodeIndex = j*(xmax+1) + i;
// add to intensity list
Scalar pixelIntensityScalar = inputImage.at<uchar>(j, i);
nodeIntensities[index] = ((double)*pixelIntensityScalar.val);
// find weight from p to q
float weight = findWeight(nodeMST, thisNodeIndex, centreNodeIndex);
if (weight!=0 && allZeroWeights) {
allZeroWeights = false;
}
nodeWeights[index] = (weight);
index++;
}
}
}
// find min b
int minb = -1;
int bCost = -1;
if (allZeroWeights) {
return 0;
}
else {
// iteratate all b values
for (int i = 0; i < nodeWeights.size(); i++) {
if (nodeWeights[i]==0) {
continue;
}
double thisbCost = nonLocalWithb(nodeIntensities[i], nodeIntensities, nodeWeights);
if (bCost<0 || thisbCost<bCost) {
bCost = thisbCost;
minb = nodeIntensities[i];
}
}
}
return minb;
}
Firstly, I assume the spent time indicated by Very Sleepy means that the majority of time is spent allocating the vector and deleting the vector?
Secondly, are there any suggestions to speed this code up?
Thanks
use std::array
reuse the vectors by passing it as an argument of the function or a global variable if possible (not aware of the structure of the code so I need more infos)
allocate one 16 vector size instead of two vectors of size 8. Will make your memory less fragmented
use parallelism if findWeight is thread safe (you need to provide more details on that too)