Faster mean calculation (openCV, C++) - c++

I am calculating the mean value of multiple squared regions inside an image in C++. Therefore I am shifting a squared region over the image and calculating the mean calculation with the openCV "mean" function, but replaced it with a std. mean calculation (see below), which was unexpectedly faster. Nevertheless it takes ~8ms on an Android device, since the mean calculation is called about 400 times (each mean calculation takes ~0.025ms)
uchar rectSize = 10;
Rect roi(0,0,rectSize, rectSize);
int pxNumber = rectSize * rectSize;
uchar value;
//Shifting the region to the bottom
for(uchar y=0; y<NumberOfRectangles_Y; y++)
{
p = outBitMat.ptr<uchar>(y);
roi.x = rectSize;
//Shifting the region to the right
for(uchar x=0; x<NumberOfRectangles_X; x++, ++p)
{
meanCalc(normalized(roi),rectSize, pxNumber, value);
roi.x += rectSize;
}
roi.y += rectSize;
}
void meanCalc(const cv::Mat& normalized, uchar& rectSize, int& pxNumber, uchar& value)
{
for(uchar y=0; y < rectSize; y++)
{
p = normalized.ptr<uchar>(y);
for(uchar x=0; x < rectSize; x++, ++p)
{
sum += *p;
}
}
value = sum / (float)pxNumber;
}
Is there a way to speed this mean calculation of each rectangular window inside the image up? Can I do some kind of pre-pixel ordering to only calculate the mean once and being faster?
Thanks in advance
Update
Based on the answer of User 6502 and the use of a sum table I resulted in the following:
Mat tab;
integral(image,tab);
int* input = (int*)(tab.data);
value = (input[yStart*tabWidth + xStart] + input[(yStart+rectSize)*tabWidth + xStart+rectSize]
- input[yStart*tabWidth + xStart+rectSize] - input[(yStart+rectSize)*tabWidth + xStart]) / (double)pxNumber;
Whereby this function needs almost the same times. Can it be that the sum table is only useful, when calculating a lot of overlapping regions. Because in my case, I only touch each pixel for calculation only once.

You can compute the mean over any rectangle in constant time (independently of the rectangle size) by pre-computing a "summed area table".
You need to compute a table where the element (i, j) is the sum of all original data in the rectangle from (0, 0) to (i, j) and this can be done in a single pass over the data.
Once you have the table the sum of values between (x0, y0) and (x1, y1) can be computed in constant time with:
tab(x0, y0) + tab(x1, y1) - tab(x0, y1) - tab(x1, y0)
To understand how the algorithm works it's easier to think first to the one-dimensional case: to compute in constant time the sum of values from v[x0] to v[x1] you can pre-compute a table with
st[0] = v[0];
for (int i=1; i<n; i++) st[i] = st[i-1] + v[i];
and then you can take the difference st[x1] - st[x0] to know the sum of any interval of the original data.
The algorithm can indeed be easily extended to n dimensions.
Moreover it may be not evident at a first sight but the precomputation of the summed area table can be implemented for a multi-core architecture to take advantage of parallel execution.
For 2d a simple decomposition is to consider that computing the 2d sum table is the same as computing a 1d sum table on each row and then computing an 1d sum table for each column on the result.

You can use optimized version your function menCalc from Simd Library:
SIMD_API void SimdValueSum(const uint8_t * src, size_t stride,
size_t width, size_t height, uint64_t * sum);
It is much faster because uses different SIMD like SSE, AVX and so on.

Related

Increase the resolution of a geometry shape

The task is to increase the resolution of a geometry shape like this:
So that the shape becomes this by adding data points:
The shape is define by data points which has x and y coordinates and a index. The index represent the order to connect them.
What type of algorithm should I use to achieve this?
You can use linear interpolation between segment ends.
It is not clear yet - how many new points you want to insert in every segment. Does it depend on segment length or other reasons? Seems your picture shows mixed approach: n is at least nmin, part length is at most lmax like this
n = max(nmin, int(seglength / lmax)) ;
For XStart, XEnd referring to starting and ending coordinates of segment, we insert n-1 points, dividing segment into n equal parts:
for (int i = 1; i < n; i++) {
X[i] = XStart + (XEnd - XStart) * i / n;
Y[i] = YStart + (YEnd - YStart) * i / n;
}

Procedurally generate seamless fractal noise textures

I have been generating noise textures to use as height maps for terrain generation. In this application, initially there is a 256x256 noise texture that is used to create a block of land that the user is free to roam around. When the user reaches a certain boundary in-game the application generates a new texture and thus another block of terrain.
In the code, a table of 64x64 random values are generated, and the values in the texture are the result of interpolating between these points at various 'frequencies' and 'wavelengths' using a smoothstep function, and then combined to form the final noise texture; and finally the values in the texture are divided through by its largest value to effectively normalize it. When the player is at the boundary and a new texture is created, the random number table that is created re-uses the values from the appropriate edge of the previous texture (eg. if the new texture is for a block of land that is on the +X side of the previous one, the last value in every row of the previous texture is used as the first value in every row of random numbers in the next.)
My problem is this: even though the same values are being used across the edges of adjacent textures, they are nowhere near seamless - some neighboring points on the terrain are mismatched by many many metres. My guess is that the changing frequencies that are used to sample the random number table are probably having a significant effect on all areas of the texture. So how might one generate fractal noise poceduraly, ie. as needed, AND have it look continuous with adjacent values?
Here is a section of the code that returns a value interpolated between the points on the random number table given a point P:
float MainApp::assessVal(glm::vec2 P){
//Integer component of P
int xi = (int)P.x;
int yi = (int)P.y;
//Decimal component ofP
float xr = P.x - xi;
float yr = P.y - yi;
//Find the grid square P lies inside of
int x0 = xi % randX;
int x1 = (xi + 1) % randX;
int y0 = yi % randY;
int y1 = (yi + 1) % randY;
//Get random values for the 4 nodes
float r00 = randNodes->randNodes[y0][x0];
float r10 = randNodes->randNodes[y0][x1];
float r01 = randNodes->randNodes[y1][x0];
float r11 = randNodes->randNodes[y1][x1];
//Smoother interpolation so
//texture appears less blocky
float sx = smoothstep(xr);
float sy = smoothstep(yr);
//Find the weighted value of the 4
//random values. This will be the
//final value in the noise texture
float sx0 = mix(r00, r10, sx);
float sx1 = mix(r01, r11, sx);
return mix(sx0, sx1, sy);
}
Where randNodes is a 2 dimensional array containing the random values.
And here is the code that takes all the values returned from the above function and constructs texture data:
int layers = 5;
float wavelength = 1, frequency = 1;
for (int k = 0; k < layers; k++) {
for (int i = 0; i < stepsY; i++) {
for(int j = 0; j < stepsX; j++){
//Compute value for (stepsX * stepsY) interpolation points
//across the grid of random numbers
glm::vec2 P = glm::vec2((float)j/stepsX * randX, (float)i/stepsY * randY);
buf[i * stepsY + j] += assessVal(P * wavelength) * frequency;
}
}
//repeat (layers) times with different signals
wavelength *= 0.5;
frequency *= 2;
}
for(int i = 0; i < buf.size(); i++){
//divide all data by the largest value.
//this normalises the data to avoid saturation
buf[i] /= largestVal;
}
Finally, here is an example of two textures generated by these functions that should be seamless, but aren't:
The 2 images placed side by side as they are now are obviously mis-matched.
Your code wraps the values only in the domain of the noise texture you read from, but not in the domain of the texture being generated.
For the texture T of size stepX to be repeatable (let's consider 1-d case for simplicity) you must have
T(0) == T(stepX)
Or in your case (substitute j = 0 and j = stepX):
assessVal(0) == assessVal(randX * wavelength)
For when k >= 1 this is clearly not true in your code, because
(randX / pow(2, k)) % randX != 0
One solution is to decrease randX and randY while you go up the frequencies.
But my typical approach would rather be starting from a 2x2 random texture, upscale it to 4x4 with GL_REPEAT, add a bit more per-pixel noise, continue upscaling to 8x8 etc.. till I get to the desired size.
The root cause of course is that your smoothing changes pixels to match their neighbors, but you later add new neighbors and do not re-smooth the pixels who got new neighbors.
One simple and common workaround is to keep an edge of invisible pixels, the width of which is half that of your smoothing kernel. Now, when expanding the area, you can resmooth those invisible pixels just before they're revealed. Don't forget to add a new edge of invisible pixels!

Improving C++ algorithm for finding all points within a sphere of radius r

Language/Compiler: C++ (Visual Studio 2013)
Experience: ~2 months
I am working in a rectangular grid in 3D-space (size: xdim by ydim by zdim) where , "xgrid, ygrid, and zgrid" are 3D arrays of the x,y, and z-coordinates, respectively. Now, I am interested in finding all points that lie within a sphere of radius "r" centered about the point "(vi,vj,vk)". I want to store the index locations of these points in the vectors "xidx,yidx,zidx". For a single point this algorithm works and is fast enough but when I wish to iterate over many points within the 3D-space I run into very long run times.
Does anyone have any suggestions on how I can improve the implementation of this algorithm in C++? After running some profiling software I found online (very sleepy, Luke stackwalker) it seems that the "std::vector::size" and "std::vector::operator[]" member functions are bogging down my code. Any help is greatly appreciated.
Note: Since I do not know a priori how many voxels are within the sphere, I set the length of vectors xidx,yidx,zidx to be larger than necessary and then erase all the excess elements at the end of the function.
void find_nv(int vi, int vj, int vk, vector<double> &xidx, vector<double> &yidx, vector<double> &zidx, double*** &xgrid, double*** &ygrid, double*** &zgrid, int r, double xdim,double ydim,double zdim, double pdim)
{
double xcor, ycor, zcor,xval,yval,zval;
vector<double>xyz(3);
xyz[0] = xgrid[vi][vj][vk];
xyz[1] = ygrid[vi][vj][vk];
xyz[2] = zgrid[vi][vj][vk];
int counter = 0;
// Confine loop to be within boundaries of sphere
int istart = vi - r;
int iend = vi + r;
int jstart = vj - r;
int jend = vj + r;
int kstart = vk - r;
int kend = vk + r;
if (istart < 0) {
istart = 0;
}
if (iend > xdim-1) {
iend = xdim-1;
}
if (jstart < 0) {
jstart = 0;
}
if (jend > ydim - 1) {
jend = ydim-1;
}
if (kstart < 0) {
kstart = 0;
}
if (kend > zdim - 1)
kend = zdim - 1;
//-----------------------------------------------------------
// Begin iterating through all points
//-----------------------------------------------------------
for (int k = 0; k < kend+1; ++k)
{
for (int j = 0; j < jend+1; ++j)
{
for (int i = 0; i < iend+1; ++i)
{
if (i == vi && j == vj && k == vk)
continue;
else
{
xcor = pow((xgrid[i][j][k] - xyz[0]), 2);
ycor = pow((ygrid[i][j][k] - xyz[1]), 2);
zcor = pow((zgrid[i][j][k] - xyz[2]), 2);
double rsqr = pow(r, 2);
double sphere = xcor + ycor + zcor;
if (sphere <= rsqr)
{
xidx[counter]=i;
yidx[counter]=j;
zidx[counter] = k;
counter = counter + 1;
}
else
{
}
//cout << "counter = " << counter - 1;
}
}
}
}
// erase all appending zeros that are not voxels within sphere
xidx.erase(xidx.begin() + (counter), xidx.end());
yidx.erase(yidx.begin() + (counter), yidx.end());
zidx.erase(zidx.begin() + (counter), zidx.end());
return 0;
You already appear to have used my favourite trick for this sort of thing, getting rid of the relatively expensive square root functions and just working with the squared values of the radius and center-to-point distance.
One other possibility which may speed things up (a) is to replace all the:
xyzzy = pow (plugh, 2)
calls with the simpler:
xyzzy = plugh * plugh
You may find the removal of the function call could speed things up, however marginally.
Another possibility, if you can establish the maximum size of the target array, is to use an real array rather than a vector. I know they make the vector code as insanely optimal as possible but it still won't match a fixed-size array for performance (since it has to do everything the fixed size array does plus handle possible expansion).
Again, this may only offer very marginal improvement at the cost of more memory usage but trading space for time is a classic optimisation strategy.
Other than that, ensure you're using the compiler optimisations wisely. The default build in most cases has a low level of optimisation to make debugging easier. Ramp that up for production code.
(a) As with all optimisations, you should measure, not guess! These suggestions are exactly that: suggestions. They may or may not improve the situation, so it's up to you to test them.
One of your biggest problems, and one that is probably preventing the compiler from making a lot of optimisations is that you are not using the regular nature of your grid.
If you are really using a regular grid then
xgrid[i][j][k] = x_0 + i * dxi + j * dxj + k * dxk
ygrid[i][j][k] = y_0 + i * dyi + j * dyj + k * dyk
zgrid[i][j][k] = z_0 + i * dzi + j * dzj + k * dzk
If your grid is axis aligned then
xgrid[i][j][k] = x_0 + i * dxi
ygrid[i][j][k] = y_0 + j * dyj
zgrid[i][j][k] = z_0 + k * dzk
Replacing these inside your core loop should result in significant speedups.
You could do two things. Reduce the number of points you are testing for inclusion and simplify the problem to multiple 2d tests.
If you take the sphere an look at it down the z axis you have all the points for y+r to y-r in the sphere, using each of these points you can slice the sphere into circles that contain all the points in the x/z plane limited to the circle radius at that specific y you are testing. Calculating the radius of the circle is a simple solve the length of the base of the right angle triangle problem.
Right now you ar testing all the points in a cube, but the upper ranges of the sphere excludes most points. The idea behind the above algorithm is that you can limit the points tested at each level of the sphere to the square containing the radius of the circle at that height.
Here is a simple hand draw graphic, showing the sphere from the side view.
Here we are looking at the slice of the sphere that has the radius ab. Since you know the length ac and bc of the right angle triangle, you can calculate ab using Pythagoras theorem. Now you have a simple circle that you can test the points in, then move down, it reduce length ac and recalculate ab and repeat.
Now once you have that you can actually do a little more optimization. Firstly, you do not need to test every point against the circle, you only need to test one quarter of the points. If you test the points in the upper left quadrant of the circle (the slice of the sphere) then the points in the other three points are just mirror images of that same point offset either to the right, bottom or diagonally from the point determined to be in the first quadrant.
Then finally, you only need to do the circle slices of the top half of the sphere because the bottom half is just a mirror of the top half. In the end you only tested a quarter of the point for containment in the sphere. This should be a huge performance boost.
I hope that makes sense, I am not at a machine now that I can provide a sample.
simple thing here would be a 3D flood fill from center of the sphere rather than iterating over the enclosing square as you need to visited lesser points. Moreover you should implement the iterative version of the flood-fill to get more efficiency.
Flood Fill

Optimized float Blur variations

I am looking for optimized functions in c++ for calculating areal averages of floats. the function is passed a source float array, a destination float array (same size as source array), array width and height, "blurring" area width and height.
The function should "wrap-around" edges for the blurring/averages calculations.
Here is example code that blur with a rectangular shape:
/*****************************************
* Find averages extended variations
*****************************************/
void findaverages_ext(float *floatdata, float *dest_data, int fwidth, int fheight, int scale, int aw, int ah, int weight, int xoff, int yoff)
{
printf("findaverages_ext scale: %d, width: %d, height: %d, weight: %d \n", scale, aw, ah, weight);
float total = 0.0;
int spos = scale * fwidth * fheight;
int apos;
int w = aw;
int h = ah;
float* f_temp = new float[fwidth * fheight];
// Horizontal
for(int y=0;y<fheight ;y++)
{
Sleep(10); // Do not burn your processor
total = 0.0;
// Process entire window for first pixel (including wrap-around edge)
for (int kx = 0; kx <= w; ++kx)
if (kx >= 0 && kx < fwidth)
total += floatdata[y*fwidth + kx];
// Wrap
for (int kx = (fwidth-w); kx < fwidth; ++kx)
if (kx >= 0 && kx < fwidth)
total += floatdata[y*fwidth + kx];
// Store first window
f_temp[y*fwidth] = (total / (w*2+1));
for(int x=1;x<fwidth ;x++) // x width changes with y
{
// Substract pixel leaving window
if (x-w-1 >= 0)
total -= floatdata[y*fwidth + x-w-1];
// Add pixel entering window
if (x+w < fwidth)
total += floatdata[y*fwidth + x+w];
else
total += floatdata[y*fwidth + x+w-fwidth];
// Store average
apos = y * fwidth + x;
f_temp[apos] = (total / (w*2+1));
}
}
// Vertical
for(int x=0;x<fwidth ;x++)
{
Sleep(10); // Do not burn your processor
total = 0.0;
// Process entire window for first pixel
for (int ky = 0; ky <= h; ++ky)
if (ky >= 0 && ky < fheight)
total += f_temp[ky*fwidth + x];
// Wrap
for (int ky = fheight-h; ky < fheight; ++ky)
if (ky >= 0 && ky < fheight)
total += f_temp[ky*fwidth + x];
// Store first if not out of bounds
dest_data[spos + x] = (total / (h*2+1));
for(int y=1;y< fheight ;y++) // y width changes with x
{
// Substract pixel leaving window
if (y-h-1 >= 0)
total -= f_temp[(y-h-1)*fwidth + x];
// Add pixel entering window
if (y+h < fheight)
total += f_temp[(y+h)*fwidth + x];
else
total += f_temp[(y+h-fheight)*fwidth + x];
// Store average
apos = y * fwidth + x;
dest_data[spos+apos] = (total / (h*2+1));
}
}
delete f_temp;
}
What I need is similar functions that for each pixel finds the average (blur) of pixels from shapes different than rectangular.
The specific shapes are: "S" (sharp edges), "O" (rectangular but hollow), "+" and "X", where the average float is stored at the center pixel on destination data array. Size of blur shape should be variable, width and height.
The functions does not need to be pixelperfect, only optimized for performance. There could be separate functions for each shape.
I am also happy if anyone can tip me of how to optimize the example function above for rectangluar blurring.
What you are trying to implement are various sorts of digital filters for image processing. This is equivalent to convolving two signals where the 2nd one would be the filter's impulse response. So far, you regognized that a "rectangular average" is separable. By separable I mean, you can split the filter into two parts. One that operates along the X axis and one that operates along the Y axis -- in each case a 1D filter. This is nice and can save you lots of cycles. But not every filter is separable. Averaging along other shapres (S, O, +, X) is not separable. You need to actually compute a 2D convolution for these.
As for performance, you can speed up your 1D averages by properly implementing a "moving average". A proper "moving average" implementation only requires a fixed amount of little work per pixel regardless of the averaging "window". This can be done by recognizing that neighbouring pixels of the target image are computed by an average of almost the same pixels. You can reuse these sums for the neighbouring target pixel by adding one new pixel intensity and subtracting an older one (for the 1D case).
In case of arbitrary non-separable filters your best bet performance-wise is "fast convolution" which is FFT-based. Checkout www.dspguide.com. If I recall correctly, there is even a chapter on how to properly do "fast convolution" using the FFT algorithm. Although, they explain it for 1-dimensional signals, it also applies to 2-dimensional signals. For images you have to perform 2D-FFT/iFFT transforms.
To add to sellibitze's answer, you can use a summed area table for your O, S and + kernels (not for the X one though). That way you can convolve a pixel in constant time, and it's probably the fastest method to do it for kernel shapes that allow it.
Basically, a SAT is a data structure that lets you calculate the sum of any axis-aligned rectangle. For the O kernel, after you've built a SAT, you'd take the sum of the outer rect's pixels and subtract the sum of the inner rect's pixels. The S and + kernels can be implemented similarly.
For the X kernel you can use a different approach. A skewed box filter is separable:
You can convolve with two long, thin skewed box filters, then add the two resulting images together. The center of the X will be counted twice, so will you need to convolve with another skewed box filter, and subtract that.
Apart from that, you can optimize your box blur in many ways.
Remove the two ifs from the inner loop by splitting that loop into three loops - two short loops that do checks, and one long loop that doesn't. Or you could pad your array with extra elements from all directions - that way you can simplify your code.
Calculate values like h * 2 + 1 outside the loops.
An expression like f_temp[ky*fwidth + x] does two adds and one multiplication. You can initialize a pointer to &f_temp[ky*fwidth] outside the loop, and just increment that pointer in the loop.
Don't do the division by h * 2 + 1 in the horizontal step. Instead, divide by the square of that in the vertical step.

OpenCV detect if points lie along line/plane

I am working on a form of autocalibration for an optics device which is currently performed manually. The first part of the calibration is to determine whether a light beam has illuminated the set of 'calibration' points.
I am using OpenCV and have thresholded and cropped the image to leave only the possible relevant points. I know want to determine if these points lie along a stright (horizontal) line; if they a sufficient number do the beam is in the correct position! (The points lie in a straight line but the beam is often bent so hitting most of the points suffices, there are 21 points which show up as white circles when thresholded).
I have tried using a histogram but on the thresholded image the results are not correct and am now looking at Hough lines, but this detects straight lines from edges wwhere as I want to establish if detected points lie on a line.
This is the threshold code I use:
cvThreshold(output, output, 150, 256, CV_THRESH_BINARY);
The histogram results with anywhere from 1 to 640 bins (image width) is two lines at 0 and about 2/3rds through of near max value. Not the distribution expected or obtained without thresholding.
Some pictures to try to illistrate the point (note the 'noisy' light spots which are a feature of the system setup and cannot be overcome):
12 points in a stright line next to one another (beam in correct position)
The sort of output wanted (for illistration, if the points are on the line this is all I need to know!)
Any help would be greatly appreciated. One thought was to extract the co-ordinates of the points and compare them but I don't know how to do this.
Incase it helps anyone here is a very basic (the first draft) of some simple linaear regression code I used.
// Calculate the averages of arrays x and y
double xa = 0, ya = 0;
for(int i = 0; i < n; i++)
{
xa += x[i];
ya += y[i];
}
xa /= n;
ya /= n;
// Summation of all X and Y values
double sumX = 0;
double sumY = 0;
// Summation of all X*Y values
double sumXY = 0;
// Summation of all X^2 and Y^2 values
double sumXs = 0;
double sumYs = 0;
for(int i = 0; i < n; i++)
{
sumX = sumX + x[i];
sumY = sumY + y[i];
sumXY = sumXY + (x[i] * y[i]);
sumXs = sumXs + (x[i] * x[i]);
sumYs = sumYs + (y[i] * y[i]);
}
// (X^2) and (Y^2) sqaured
double Xs = sumX * sumX;
double Ys = sumY * sumY;
// Calculate slope, m
slope = (n * sumXY - sumX * sumY) / (n* sumXs - Xs);
// Calculate intercept
intercept = ceil((sumY - slope * sumX) / n);
// Calculate regression index, r^2
double r_top = (n * sumXY - sumX * sumY);
double r_bottom = sqrt((n* sumXs - Xs) * (n* sumYs - Ys));
double r = 0;
// Check line is not perfectly vertical or horizontal
if(r_top == 0 || r_bottom == 0)
r = 0;
else
r = r_top/ r_bottom;
There are more efficeint ways of doing this (see CodeCogs or AGLIB) but as quick fix this code seems to work.
To detect Circles in OpenCV I dropped the Hough Transform and adapeted codee from this post:
Detection of coins (and fit ellipses) on an image
It is then a case of refining the co-ordinates (removing any outliers etc) to determine if the circles lie on a horizontal line from the slope and intercept values of the regression.
Obtain the x,y coordinates of the thresholded points, then perform a linear regression to find a best-fit line. With that line, you can determine the r^2 value which effectively gives you the quality of fit. Based on that fitness measure, you can determine your calibration success.
Here is a good discussion.
you could do something like this, altough it is an aproximation:
var dw = decide a medium dot width in pixels
maxdots = 0;
for each line of the image {
var dots = 0;
scan by incrementing x by dw {
if (color==dotcolor) dots++;
}
if (dots>maxdots) maxdots=dots;
}
maxdots would be the best result...