C++ Multidimensional array - c++

I have a 3D array
double values[30][30][30];
I have a loop where I assign values to the array;
Something like:
for(int z = 0;z<30; z++)
for (int y = 0;y<30; y++)
for (int x = 0;x<30; x++)
values[z][y][x] = intensity;
end
So this is how I am filling the array. The problem is that I want to create column in addition to intensity to store another variable. For instance, the second to last line should be something like
values[z][y][x] = intensity | distance;
I hope you get the idea. My knowledge is limited and I couldn't come up with a solution. Thanks for your suggestions.

This is really dependant on your datatypes. The easiest solution is using a struct:
struct data {
float intensity; // or replace 'float' with whatever datatype you need
float distance;
};
Use this struct instead of the datatype you're using now for the array, then later on set the values:
values[z][y][x].intensity = intensity;
values[z][y][x].distance = distance;
If you're using small values only (e.g. char for each value only) you could as well use bitwise operators to store everything in an integer:
values[z][y][x] = intensity << 8 | distance;
intensity = values[z][y][x] >> 8;
distance = values[z][y][x] & 255;
But I wouldn't advise you to do so unless you're really savy with that value ranges (e.g. for saving bitmap/texture stuff).

Related

How to access matrix data in opencv by another mat with locations (indexing)

Suppose I have a Mat of indices (locations) called B, We can say that this Mat has dimensions of 1 x 100 and We suppose to have another Mat, called A, full of data of the same dimensions of B.
Now, I would access to the data of A with B. Usually I would create a for loop and I would take for each elements of B, the right elements of A. For the most fussy of the site, this is the code that I would write:
for(int i=0; i < B.cols; i++){
int index = B.at<int>(0, i);
std::cout<<A.at<int>(0, index)<<std:endl;
}
Ok, now that I showed you what I could do, I ask you if there is a way to access the matrix A, always using the B indices, in a more intelligent and fast way. As someone could do in python thanks to the numpy.take() function.
This operation is called remapping. In OpenCV, you can use function cv::remap for this purpose.
Below I present the very basic example of how remap algorithm works; please note that I don't handle border conditions in this example, but cv::remap does - it allows you to use mirroring, clamping, etc. to specify what happens if the indices exceed the dimensions of the image. I also don't show how interpolation is done; check the cv::remap documentation that I've linked to above.
If you are going to use remapping you will probably have to convert indices to floating point; you will also have to introduce another array of indices that should be trivial (all equal to 0) if your image is one-dimensional. If this starts to represent a problem because of performance, I'd suggest you implement the 1-D remap equivalent yourself. But benchmark first before optimizing, of course.
For all the details, check the documentation, which covers everything you need to know to use te algorithm.
cv::Mat<float> remap_example(cv::Mat<float> image,
cv::Mat<float> positions_x,
cv::Mat<float> positions_y)
{
// sizes of positions arrays must be the same
int size_x = positions_x.cols;
int size_y = positions_x.rows;
auto out = cv::Mat<float>(size_y, size_x);
for(int y = 0; y < size_y; ++y)
for(int x = 0; x < size_x; ++x)
{
float ps_x = positions_x(x, y);
float ps_y = positions_y(x, y);
// use interpolation to determine intensity at image(ps_x, ps_y),
// at this point also handle border conditions
// float interpolated = bilinear_interpolation(image, ps_x, ps_y);
out(x, y) = interpolated;
}
return out;
}
One fast way is to use pointer for both A (data) and B (indexes).
const int* pA = A.ptr<int>(0);
const int* pIndexB = B.ptr<int>(0);
int sum = 0;
for(int i = 0; i < Bi.cols; ++i)
{
sum += pA[*pIndexB++];
}
Note: Be carefull with pixel type, in this case (as you write in your code) is int!
Note2: Using cout for each point access put the optimization useless!
Note3: In this article Satya compare four methods for pixel access and fastest seems "foreach": https://www.learnopencv.com/parallel-pixel-access-in-opencv-using-foreach/

SSE copy data to variables

I'm optimizing a piece of code that moves particles on the screen around gravity fields. For this we're told to use SSE. Now after rewriting this little bit of code, I was wondering if there is an easier/smaller way of storing the values back in the array of particles.
Here's the code before:
for (unsigned int i = 0; i < PARTICLES; i++) {
m_Particle[i]->x += m_Particle[i]->vx;
m_Particle[i]->y += m_Particle[i]->vy;
}
And here's the code after:
for (unsigned int i = 0; i < PARTICLES; i += 4) {
// Particle position/velocity x & y
__m128 ppx4 = _mm_set_ps(m_Particle[i]->x, m_Particle[i+1]->x,
m_Particle[i+2]->x, m_Particle[i+3]->x);
__m128 ppy4 = _mm_set_ps(m_Particle[i]->y, m_Particle[i+1]->y,
m_Particle[i+2]->y, m_Particle[i+3]->y);
__m128 pvx4 = _mm_set_ps(m_Particle[i]->vx, m_Particle[i+1]->vx,
m_Particle[i+2]->vx, m_Particle[i+3]->vx);
__m128 pvy4 = _mm_set_ps(m_Particle[i]->vy, m_Particle[i+1]->vy,
m_Particle[i+2]->vy, m_Particle[i+3]->vy);
union { float newx[4]; __m128 pnx4; };
union { float newy[4]; __m128 pny4; };
pnx4 = _mm_add_ps(ppx4, pvx4);
pny4 = _mm_add_ps(ppy4, pvy4);
m_Particle[i+0]->x = newx[3]; // Particle i + 0
m_Particle[i+0]->y = newy[3];
m_Particle[i+1]->x = newx[2]; // Particle i + 1
m_Particle[i+1]->y = newy[2];
m_Particle[i+2]->x = newx[1]; // Particle i + 2
m_Particle[i+2]->y = newy[1];
m_Particle[i+3]->x = newx[0]; // Particle i + 3
m_Particle[i+3]->y = newy[0];
}
It works, but it looks way too large for something as simple as adding a value to another value. Is there a shorter way of doing this without changing the m_Particle structure?
There's no reason why you couldn't put x and y side by side in one __m128, shortening the code somewhat:
for (unsigned int i = 0; i < PARTICLES; i += 2) {
// Particle position/velocity x & y
__m128 pos = _mm_set_ps(m_Particle[i]->x, m_Particle[i+1]->x,
m_Particle[i]->y, m_Particle[i+1]->y);
__m128 vel = _mm_set_ps(m_Particle[i]->vx, m_Particle[i+1]->vx,
m_Particle[i]->vy, m_Particle[i+1]->vy);
union { float pnew[4]; __m128 pnew4; };
pnew4 = _mm_add_ps(pos, vel);
m_Particle[i+0]->x = pnew[0]; // Particle i + 0
m_Particle[i+0]->y = pnew[2];
m_Particle[i+1]->x = pnew[1]; // Particle i + 1
m_Particle[i+1]->y = pnew[3];
}
But really, you've encountered the "Array of structs" vs. "Struct of arrays" issue. SSE code works better with a "Struct of arrays" like:
struct Particles
{
float x[PARTICLES];
float y[PARTICLES];
float xv[PARTICLES];
float yv[PARTICLES];
};
Another option is a hybrid approach:
struct Particles4
{
__m128 x;
__m128 y;
__m128 xv;
__m128 yv;
};
Particles4 particles[PARTICLES / 4];
Either way will give simpler and faster code than your example.
I went a slightly different route to simplify: process 2 elements per iteration and pack them as (x,y,x,y) instead of (x,x,x,x) and (y,y,y,y) as you did.
If in your particle class x and y are contiguous floats and you align fields on 32 bits, a single operation loading a x as a double will in fact load the two floats x and y.
for (unsigned int i = 0; i < PARTICLES; i += 2) {
__m128 pos = _mm_set1_pd(0); // zero vector
// I assume x and y are contiguous in memory
// so loading a double at x loads 2 floats: x and the following y.
pos = _mm_loadl_pd(pos, (double*)&m_Particle[i ]->x);
// a register can contain 4 floats so 2 positions
pos = _mm_loadh_pd(pos, (double*)&m_Particle[i+1]->x);
// same for velocities
__m128 vel = _mm_set1_pd(0);
vel = _mm_loadl_pd(pos, (double*)&m_Particle[i ]->vx);
vel = _mm_loadh_pd(pos, (double*)&m_Particle[i+1]->vy);
pos = _mm_add_ps(pos, vel); // do the math
// store the same way as load
_mm_storel_pd(&m_Particle[i ]->x, pos);
_mm_storeh_pd(&m_Particle[i+1]->x, pos);
}
Also, since you mention particle, do you intend to draw them with OpenGL / DirectX ? If so, you could perform this kind of permutation on the GPU faster while also avoiding data transfers from main memory to GPU, so it's a gain on all fronts.
If that's not the case and you intend to stay on the CPU, using an SSE friendly layout like one array for positions and one for velocities could be a solution:
struct particle_data {
std::vector<float> xys, vxvys;
};
But it would have the drawback of either breaking your architecture or requiring a copy from your current array of structs to a temporary struct of arrays. The compute would be faster but the additional copy might outweigh that. Only benchmarking can show...
A last option is to sacrifice a little performance and load your data as it is, and use SSE shuffle instructions to rearrange the data locally at each iteration. But arguably this would make the code even harder to maintain.
For the design of performance you should avoid handling array of structure but you should work with structure of array.

C++ - Convert uint8_t* image data to double** image data

I am working on a C++ function (inside my iOS app) where I have image data in the form uint8_t*.
I obtained the image data using the code using the CVPixelBufferGetBaseAddress() method of the iOS SDK:
uint8_t *bPixels = (uint8_t *)CVPixelBufferGetBaseAddress(imageBuffer);
I have another function (from a third part source) that does some of the image processing functions I would like to use on my image data, but the input for the image data for these functions is double**.
Does anyone have any idea how to go about converting this?
What other information can I provide?
The constructor prototype for the class that use double** look like:
Image(double **iPixels, unsigned int iWidth, unsigned int iHeight);
Your uint8_t *bPixels seems to hold image data as 1-dimensional continuous array of height*width lenght. So to access pixel in the x-th row and y-th column you have to write bPixels[x*width+y].
Image() seems to work on 2-dimensional arrays. To access pixel like above you would have to write iPixels[x][y].
So you need to copy your existing 1-dimensional array to a 2-dimensional:
double **mypixels = new double* [height];
for (int x=0; x<height; x++)
{
mypixels[x] = new double [width];
for (int y=0; y<width; y++)
mypixels[x][y] = bPixels[x*width+y]; // attention here, maybe normalization is necessary
// e.g. mypixels[x][y] = bPixels[x*width+y] / 255.0
}
Because your 1-dimensional array has pixel of type uint8_t and the 2-dimensional one pixel of type double, you must allocate new memory. Otherwise, if both would have same pixel type, the more elegant solution (a simple map) would be:
uint8_t **mypixels = new uint8_t* [height];
for (int x=0; x<height; x++)
mypixels[x] = bPixels+x*width;
Attention: beside the problem of eventually necessary normalization, there is also a problem with the indices-compatibility! My examples assume that the 1-dimensional array is stored row-by-row and that the functions working on 2-dimensional index with [x][y] (that means first-row-then-column). The declaration of Image() however, could lead to the conclusion that it needs its arrays to be indexed with [y][x] maybe.
I'm going to take a giant bunch of guesses here in hopes that this will lead you towards getting at the documentation and answering back. If there's no further documentation, well, here's a starting point.
Guess 1) The Image constructor requires a doubly dimensioned array where each component is an R,G,B,Alpha channel in that order. So iPixels[0] is the red data, iPixels[1] is the green data, etc.
Guess 2) Because it's not integer data, the values range from 0 to 1.
Guess 3) All of this must be pre-allocated.
Guess 4) Image data is row-major
Guess 5) Source data is BRGA
So with that in mind, starting with bPixels
double *redData = new double[width*height];
double *greenData = new double[width*height];
double *blueData = new double[width*height];
double *alphaData = new double[width*height];
double **iPixels = new double*[4];
iPixels[0] = redData;
iPixels[1] = greenData;
iPixels[2] = blueData;
iPixels[3] = alphaData;
for(int y = 0;y < height;y++)
{
for(int x = 0;x < width;x++)
{
int alpha = bPixels[(y*width + x)*4 + 3];
int red = bPixels[(y*width +x)*4 + 2];
int green = bPixels[(y*width + x)*4 + 1];
int blue = bPixels[(y*width + x)*4];
redData[y*width + x] = red/255.0;
greenData[y*width + x] = green/255.0;
blueData[y*width + x] = blue/255.0;
alphaData[y*width + x] = alpha/255.0;
}
}
Image newImage(iPixels,width,height);
some of the things that can go wrong.
Source is not BGRA but RGBA, which will make the colors all wrong.
Not row major or destination is not in slices which will make things look all screwed up and/or seg-fault

Voxel unique ID in 3D space

i am looking for a way to collect a set of voxels. A voxel is a 3D cell that can be full/empty/unknown and is built upon a point cloud (for data reduction). The voxels collection once built is never modified (destroyed and rebuilt each round), but different kind of access are required (neighborood, iterative on all, direct).
The voxel space is very very sparse, out of order of 1.000.000 possible voxels in space only at most 1000 are used.
So i decided to use a (unordered since using c++)hashmap to collect them (an octree is an overkill i think) with the voxel ID as a key. Now i need a function to convert in both way a 3D point to voxel ID and the ID to the voxel 3D point centroid.
What i find hard is a very fast way to do it, i'd like to have they key as a single int value like:
unsigned int VoxelsMap::pointToVoxelId(const Vector3f & point){
unsigned int id = 0;
int x = (int)floor(roundpoint[0]);
int y = (int)floor(roundpoint[1]);
int z = (int)floor(roundpoint[2]);
id = A-BIJECTIVE-FUNCTION(x, y, z);
return id;
}
but for the bijective function i cant come up with anything very fast (as for the previous casts etc that i dont like for a function that must be used so often (200FPS x ~1000 x 3) ).
So:
is the hashmap a good datastructure (what worries me is the neighborood search)
what can be a function for A-BIJECTIVE-FUNCTION or for the whole function
Thanks.
#include <iostream>
using namespace std;
int main()
{
int x = 2.1474e+009;
int y = -2097152;
int z = -2048;
int rx = x;
int ry = y << 10;
int rz = z << 20;
int hashed = rx + ry + rz;
x = rx;
y = ry >> 10;
z = rz >> 20;
cout << hashed << endl;
cout << x << " " << y << " " << z << endl;
return 0;
}
This hash/unhash method should be the fastest.
Note that I only use 30 bits out of 32 bits of the integer. This allows a maxmimum world size of 4.2950e+009 x 4194304 x 4096. If you want to extend the world limits, you will have to use more/bigger integers.
Hope this can help.
Do you want to collect them space-wise in a way that links every adjacent voxel? If this is what you want, then you could use the Hoshen-Kopelman algorithm in 3D. Writing the code for that should take like a day or two, and you're done. The exmaple in the link is for 2D; expanding that to 3D isn't an issue at all.
Hope this helps.
Why not using a more elaborate key for your hashmap? Instead of a simple int, you could build a tuple with your x,y,z coordinates or implement you own struct. The later option would require implementing operator==() and a hash function. Some information about a good hash function can be found here.

Confusion about nested loops and array access

I have some C++ I'm trying to port, and I'm confused about a couple lines and what exactly they're doing. The code is as follows. The variable im is a 2D float array of size num_rows by num_cols.
for(x=0; x < num_cols; x++){
float *im_x_cp = im[1]+x; //(1)
for(y = 1; y < num_rows; y++, im_x_cp+=num_cols){
float s1 = *im_x_cp;
//et cetera
}
}
The code marked (1) is particularly confusing to me. What part of the 2d array im is this referencing?
Thanks for your help in advance.
im[1] is a pointer to an array of floats, that is, it's the second line/column of your matrix.
im[1] + x is a pointer to the element at coordinate (1,x) (recall how pointer arithmetic works) and s1 is its value.
The type of im[1] is float *. So, according to the rules of C++ pointer arithmetic:
float* im_x_cp = im[1];
im_x_cp = im_x_cp + x;
Now it's a float* pointing to item '1+x' in that slice.