Calculating offset of linear gaussian blur based on variable amount of weights

Calculating offset of linear gaussian blur based on variable amount of weights - glsl

I've recently been looking into optimising a gaussian blur shader by using a linear sampling method instead of a discrete method.
I read a very informative article:
Efficient Gaussian Blur With Linear Sampling
In case of such a merge of two texels we have to adjust the coordinates that the distance of the determined coordinate from the texel #1 center should be equal to the weight of texel #2 divided by the sum of the two weights. In the same style, the distance of the determined coordinate from the texel #2 center should be equal to the weight of texel #1 divided by the sum of the two weights.
While I understand the logic behind this, I'm not sure how they arrived at the figures for the offset with the given weights. Would anyone be kind enough to shed more light on this for me and to also explain how, given uniform weight variables we could calculate correct offsets?
Regarding non hard coded offsets, I found another post which recommended a method of calculating the offsets, however no solution was posted for a variable amount of samples. How could I achieve that?
vec2 offsets[3];
offsets[0] = vec2(0.0, 0.0);
offsets[1] = vec2(dFdx(gl_TexCoord[0].s), dFdy(gl_TexCoord[0].t));
offsets[2] = offsets[1] + offsets[1];

I just came across the same article, and found it tremendously useful as well. The formula to calculate the weights and offsets is given in it:
(source: rastergrid.com)
The author arrived at the weights by using the 12th row in the pascal triangle. So for example the second offset is calculated by:
1.3846153846 = (1 * 792 + 2 * 495) / (792 + 495)
The second weight is calculated by:
0.1945945946 = (792 + 495) / 4070
I'm not sure what you mean by calculating the offsets given uniform weight variables but if it's of help I've included a C++ program at the end of this post that outputs the offsets and weights for an arbitrary row in the pascal triangle.
If I understand your question about non hardcoded offsets, then you want to be able to calculate the offsets on the fly in GLSL? You could do that by porting the program below, but you'll still need to hardcode the binomial coefficients, or calculate those on the fly as well. However, that will be expensive since it will have to be done for every pixel. I think a better alternative is to precalculate the offsets and weights in C (or whatever programming language you're using), and then bind them to a uniform array value in GLSL. Here's the GLSL snippet for what I mean:
uniform float offset[5];
uniform float weight[5];"
uniform int numOffsets;
You'll want to replace "5" with the maximum number of offsets/weights you plan to use, and set numOffsets to the number you're using for a particular operation.
Here's the program that outputs the weights and offsets. "coeffs" should be replaced with the binomial coefficients of the row you want in the pascal table. The one included here is from the 22nd row
#include <iostream>
#include <vector>
using namespace std;
int main(int argc, char* argv[])
{
float coeffs[] = { 705432, 646646, 497420, 319770, 170544, 74613, 26334, 7315, 1540, 231 };
double total = coeffs[0];
for (int i = 1; i < sizeof(coeffs) / sizeof(float); i++)
{
total += 2 * coeffs[i];
}
vector<float> offsets;
vector<float> weights;
offsets.push_back(0);
weights.push_back(coeffs[0] / total);
for (int i = 1; i <= (sizeof(coeffs) / sizeof(float) - 1) / 2; i++)
{
int index = (i - 1) * 2 + 1;
float weight = coeffs[index] + coeffs[index + 1];
offsets.push_back((coeffs[index] * index + coeffs[index + 1] * (index + 1)) / weight);
weights.push_back(weight / total);
}
for (int i = 0; i < offsets.size(); i++)
{
cout << offsets[i] << ", ";
}
cout << "\n";
for (int i = 0; i < weights.size(); i++)
{
cout << weights[i] << ", ";
}
cout << "\n";
}

Related

Procedurally generate seamless fractal noise textures

I have been generating noise textures to use as height maps for terrain generation. In this application, initially there is a 256x256 noise texture that is used to create a block of land that the user is free to roam around. When the user reaches a certain boundary in-game the application generates a new texture and thus another block of terrain.
In the code, a table of 64x64 random values are generated, and the values in the texture are the result of interpolating between these points at various 'frequencies' and 'wavelengths' using a smoothstep function, and then combined to form the final noise texture; and finally the values in the texture are divided through by its largest value to effectively normalize it. When the player is at the boundary and a new texture is created, the random number table that is created re-uses the values from the appropriate edge of the previous texture (eg. if the new texture is for a block of land that is on the +X side of the previous one, the last value in every row of the previous texture is used as the first value in every row of random numbers in the next.)
My problem is this: even though the same values are being used across the edges of adjacent textures, they are nowhere near seamless - some neighboring points on the terrain are mismatched by many many metres. My guess is that the changing frequencies that are used to sample the random number table are probably having a significant effect on all areas of the texture. So how might one generate fractal noise poceduraly, ie. as needed, AND have it look continuous with adjacent values?
Here is a section of the code that returns a value interpolated between the points on the random number table given a point P:
float MainApp::assessVal(glm::vec2 P){
//Integer component of P
int xi = (int)P.x;
int yi = (int)P.y;
//Decimal component ofP
float xr = P.x - xi;
float yr = P.y - yi;
//Find the grid square P lies inside of
int x0 = xi % randX;
int x1 = (xi + 1) % randX;
int y0 = yi % randY;
int y1 = (yi + 1) % randY;
//Get random values for the 4 nodes
float r00 = randNodes->randNodes[y0][x0];
float r10 = randNodes->randNodes[y0][x1];
float r01 = randNodes->randNodes[y1][x0];
float r11 = randNodes->randNodes[y1][x1];
//Smoother interpolation so
//texture appears less blocky
float sx = smoothstep(xr);
float sy = smoothstep(yr);
//Find the weighted value of the 4
//random values. This will be the
//final value in the noise texture
float sx0 = mix(r00, r10, sx);
float sx1 = mix(r01, r11, sx);
return mix(sx0, sx1, sy);
}
Where randNodes is a 2 dimensional array containing the random values.
And here is the code that takes all the values returned from the above function and constructs texture data:
int layers = 5;
float wavelength = 1, frequency = 1;
for (int k = 0; k < layers; k++) {
for (int i = 0; i < stepsY; i++) {
for(int j = 0; j < stepsX; j++){
//Compute value for (stepsX * stepsY) interpolation points
//across the grid of random numbers
glm::vec2 P = glm::vec2((float)j/stepsX * randX, (float)i/stepsY * randY);
buf[i * stepsY + j] += assessVal(P * wavelength) * frequency;
}
}
//repeat (layers) times with different signals
wavelength *= 0.5;
frequency *= 2;
}
for(int i = 0; i < buf.size(); i++){
//divide all data by the largest value.
//this normalises the data to avoid saturation
buf[i] /= largestVal;
}
Finally, here is an example of two textures generated by these functions that should be seamless, but aren't:
The 2 images placed side by side as they are now are obviously mis-matched.

Your code wraps the values only in the domain of the noise texture you read from, but not in the domain of the texture being generated.
For the texture T of size stepX to be repeatable (let's consider 1-d case for simplicity) you must have
T(0) == T(stepX)
Or in your case (substitute j = 0 and j = stepX):
assessVal(0) == assessVal(randX * wavelength)
For when k >= 1 this is clearly not true in your code, because
(randX / pow(2, k)) % randX != 0
One solution is to decrease randX and randY while you go up the frequencies.
But my typical approach would rather be starting from a 2x2 random texture, upscale it to 4x4 with GL_REPEAT, add a bit more per-pixel noise, continue upscaling to 8x8 etc.. till I get to the desired size.

The root cause of course is that your smoothing changes pixels to match their neighbors, but you later add new neighbors and do not re-smooth the pixels who got new neighbors.
One simple and common workaround is to keep an edge of invisible pixels, the width of which is half that of your smoothing kernel. Now, when expanding the area, you can resmooth those invisible pixels just before they're revealed. Don't forget to add a new edge of invisible pixels!

Suggestions to Compute the Intersetions of Multiple Convex 2D Polygons

I am writing this question fishing for any state-of-the-art software or methods that can quickly compute the intersection of N 2D polygons (the convex hulls of projected convex polyhedrons), and M 2D polygons where typically N >> M. N may be in the order or at least 1M polygons and N in the order 50k. I've searched for some time now, but I keep coming up with the same answer shown below.
Use boost and a loop to
compute the projection of the polyhedron (not the bottleneck)
compute the convex hull of said polyhedron (bottleneck)
compute the intersection of the projected polyhedron and existing 2D polygon (major bottleneck).
This loop is repeated NK times where typically K << M, and K is the average number of 2D polygons intersecting a single projected polyhedron. This is done to reduce the number of computations.
The problem with this is that if I have N=262144 and M=19456 it takes about 129 seconds (when multithreaded by polyhedron), and this must be done about 300 times. Ideally, I would like to reduce the computation time to about 1 second for the above sizes, so I was wondering if someone could help point to some software or literature that could improve efficiency.
[EDIT]
#sehe's request I'm posting the most relevant parts of the code. I haven't compiled it, so this is just to get the gist... this code assumes, there are voxels and pixels, but the shapes can be anything. The order of the points in the grid can be any, but the indices of where the points reside in the grid are the same.
#include <boost/geometry/geometry.hpp>
#include <boost/geometry/geometries/point.hpp>
#include <boost/geometry/geometries/ring.hpp>
const std::size_t Dimension = 2;
typedef boost::geometry::model::point<float, Dimension, boost::geometry::cs::cartesian> point_2d;
typedef boost::geometry::model::polygon<point_2d, false /* is cw */, true /* closed */> polygon_2d;
typedef boost::geometry::model::box<point_2d> box_2d;
std::vector<float> getOverlaps(std::vector<float> & projected_grid_vx, // projected voxels
std::vector<float> & pixel_grid_vx, // pixels
std::vector<int> & projected_grid_m, // number of voxels in each dimension
std::vector<int> & pixel_grid_m, // number of pixels in each dimension
std::vector<float> & pixel_grid_omega, // size of the pixel grid in cm
int projected_grid_size, // total number of voxels
int pixel_grid_size) { // total number of pixels
std::vector<float> overlaps(projected_grid_size * pixel_grid_size);
std::vector<float> h(pixel_grid_m.size());
for(int d=0; d < pixel_grid_m.size(); d++) {
h[d] = (pixel_grid_omega[2*d+1] - pixel_grid_omega[2*d]) / pixel_grid_m[d];
}
for(int i=0; i < projected_grid_size; i++){
std::vector<float> point_indices(8);
point_indices[0] = i;
point_indices[1] = i + 1;
point_indices[2] = i + projected_grid_m[0];
point_indices[3] = i + projected_grid_m[0] + 1;
point_indices[4] = i + projected_grid_m[0] * projected_grid_m[1];
point_indices[5] = i + projected_grid_m[0] * projected_grid_m[1] + 1;
point_indices[6] = i + (projected_grid_m[1] + 1) * projected_grid_m[0];
point_indices[7] = i + (projected_grid_m[1] + 1) * projected_grid_m[0] + 1;
std::vector<float> vx_corners(8 * projected_grid_m.size());
for(int vn = 0; vn < 8; vn++) {
for(int d = 0; d < projected_grid_m.size(); d++) {
vx_corners[vn + d * 8] = projected_grid_vx[point_indices[vn] + d * projeted_grid_size];
}
}
polygon_2d proj_voxel;
for(int vn = 0; vn < 8; vn++) {
point_2d poly_pt(vx_corners[2 * vn], vx_corners[2 * vn + 1]);
boost::geometry::append(proj_voxel, poly_pt);
}
boost::geometry::correct(proj_voxel);
polygon_2d proj_voxel_hull;
boost::geometry::convex_hull(proj_voxel, proj_voxel_hull);
box_2d bb_proj_vox;
boost::geometry::envelope(proj_voxel_hull, bb_proj_vox);
point_2d min_pt = bb_proj_vox.min_corner();
point_2d max_pt = bb_proj_vox.max_corner();
// then get min and max indices of intersecting bins
std::vector<float> min_idx(projected_grid_m.size() - 1),
max_idx(projected_grid_m.size() - 1);
// compute min and max indices of incidence on the pixel grid
// this is easy assuming you have a regular grid of pixels
min_idx[0] = std::min( (float) std::max( std::floor((min_pt.get<0>() - pixel_grid_omega[0]) / h[0] - 0.5 ), 0.), pixel_grid_m[0]-1);
min_idx[1] = std::min( (float) std::max( std::floor((min_pt.get<1>() - pixel_grid_omega[2]) / h[1] - 0.5 ), 0.), pixel_grid_m[1]-1);
max_idx[0] = std::min( (float) std::max( std::floor((max_pt.get<0>() - pixel_grid_omega[0]) / h[0] + 0.5 ), 0.), pixel_grid__m[0]-1);
max_idx[1] = std::min( (float) std::max( std::floor((max_pt.get<1>() - pixel_grid_omega[2]) / h[1] + 0.5 ), 0.), pixel_grid_m[1]-1);
// iterate only over pixels which intersect the projected voxel
for(int iy = min_idx[1]; iy <= max_idx[1]; iy++) {
for(int ix = min_idx[0]; ix <= max_idx[0]; ix++) {
int idx = ix + iy * pixel_grid_size[0]; // `first' index of pixel corner point
polygon_2d pix_poly;
for(int pn = 0; pn < 4; pn++) {
point_2d pix_corner_pt(
pixel_grid_vx[idx + pn % 2 + (pn / 2) * pixel_grid_m[0]],
pixel_grid_vx[idx + pn % 2 + (pn / 2) * pixel_grid_m[0] + pixel_grid_size]
);
boost::geometry::append(pix_poly, pix_corner_pt);
}
boost::geometry::correct( pix_poly );
//make this into a convex hull since the order of the point may be any
polygon_2d pix_hull;
boost::geometry::convex_hull(pix_poly, pix_hull);
// on to perform intersection
std::vector<polygon_2d> vox_pix_ints;
polygon_2d vox_pix_int;
try {
boost::geometry::intersection(proj_voxel_hull, pix_hull, vox_pix_ints);
} catch ( std::exception e ) {
// skip since these may coincide at a point or line
continue;
}
// both are convex so only one intersection expected
vox_pix_int = vox_pix_ints[0];
overlaps[i + idx * projected_grid_size] = boost::geometry::area(vox_pix_int);
}
} // end intersection for
} //end projected_voxel for
return overlaps;
}

You could create the ratio of polygon to bounding box:
This could be done computationally once to arrive at an avgerage poly area to BB ratio R constant.
Or you could do it with geometry using a circle bounded by its BB Since your using only projected polyhedron:
R = 0.0;
count = 0;
for (each poly) {
count++;
R += polyArea / itsBoundingBoxArea;
}
R = R/count;
Then calculate the summation of intersection of bounding boxes.
Sbb = 0.0;
for (box1, box2 where box1.isIntersecting(box2)) {
Sbb += box1.intersect(box2);
}
Then:
Approximation = R * Sbb
All of this would not work if concave polys were allowed. Because a concave poly can occupy less than 1% of it's bounding box. You will still have to find the convex hull.
Alternatively, If you can find the polygons area quicker than its hull, you could use the actual computed average poly area. This would give you a decent approximation as well while avoiding both poly intersection and wrapping.

Hm, the problem seems similar to doing "collision-detection" i game-engines. Or "potentially visible sets".
While I don't know much about the current state-of-the-art, i remember an optimization was to enclose objects in spheres, since checking overlaps between spheres (or circles in 2D) is really cheap.
In order to speed-up checks for collisions, objects were often put into search-structures (e.g. a sphere-tree (circle-tree in 2D case)). Basically organizing the space into a hierarchical structure, to make queries for overlaps fast.
So basically my suggestion boils down to: Try looking at algorithms for collision-detection i game-engines.

Assumption
I'm assuming that you mean "intersections" and not intersection. Moreover, It is not the expected use case that most of the individual polys from M and N will overlap at the same time. If this assumption is true then:
Answer
The way this is done with 2D game engines is by having a scene graph where every object has a bounding box. Then place all the the polygons into a node in an quadtree according to their location determined by bounding box. Then the task becomes parallel because each node can be processed separately for intersection.
Here is the wiki for quadtree:
Quadtree Wiki
An octree could be used when in 3D.
It actually doesn't even have to be a octree. You could get the same results with any space partition. You could find the maximum separation of polys (lets call it S). And create say S/10 space partitions. Then you would have 10 separate spaces to execute in parallel. Not only would it be concurrent, but It would no longer be M * N time since not every poly must be compared against every other poly.

Subsampling an array of numbers

I have a series of 100 integer values which I need to reduce/subsample to 77 values for the purpose of fitting into a predefined space on screen. This gives a fraction of 77/100 values-per-pixel - not very neat.
Assuming the 77 is fixed and cannot be changed, what are some typical techniques for subsampling 100 numbers down to 77. I get a sense that it will be a jagged mapping, by which I mean the first new value is the average of [0, 1] then the next value is [3], then average [4, 5] etc. But how do I approach getting the pattern for this mapping?
I am working in C++, although I'm more interested in the technique than implementation.
Thanks in advance.

Either if you downsample or you oversample, you are trying to reconstruct a signal over nonsampled points in time... so you have to make some assumptions.
The sampling theorem tells you that if you sample a signal knowing that it has no frequency components over half the sampling frequency, you can continously and completely recover the signal over the whole timing period. There's a way to reconstruct the signal using sinc() functions (this is sin(x)/x)
sinc() (indeed sin(M_PI/Sampling_period*x)/M_PI/x) is a function that has the following properties:
Its value is 1 for x == 0.0 and 0 for x == k*Sampling_period with k == 0, +-1, +-2, ...
It has no frequency component over half of the sampling_frequency derived from Sampling_period.
So if you consider the sum of the functions F_x(x) = Y[k]*sinc(x/Sampling_period - k) to be the sinc function that equals the sampling value at position k and 0 at other sampling value and sum over all k in your sample, you'll get the best continous function that has the properties of not having components on frequencies over half the sampling frequency and have the same values as your samples set.
Said this, you can resample this function at whatever position you like, getting the best way to resample your data.
This is by far, a complicated way of resampling data, (it has also the problem of not being causal, so it cannot be implemented in real time) and you have several methods used in the past to simplify the interpolation. you have to constructo all the sinc functions for each sample point and add them together. Then you have to resample the resultant function to the new sampling points and give that as a result.
Next is an example of the interpolation method just described. It accepts some input data (in_sz samples) and output interpolated data with the method described before (I supposed the extremums coincide, which makes N+1 samples equal N+1 samples, and this makes the somewhat intrincate calculations of (in_sz - 1)/(out_sz - 1) in the code (change to in_sz/out_sz if you want to make plain N samples -> M samples conversion:
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
/* normalized sinc function */
double sinc(double x)
{
x *= M_PI;
if (x == 0.0) return 1.0;
return sin(x)/x;
} /* sinc */
/* interpolate a function made of in samples at point x */
double sinc_approx(double in[], size_t in_sz, double x)
{
int i;
double res = 0.0;
for (i = 0; i < in_sz; i++)
res += in[i] * sinc(x - i);
return res;
} /* sinc_approx */
/* do the actual resampling. Change (in_sz - 1)/(out_sz - 1) if you
* don't want the initial and final samples coincide, as is done here.
*/
void resample_sinc(
double in[],
size_t in_sz,
double out[],
size_t out_sz)
{
int i;
double dx = (double) (in_sz-1) / (out_sz-1);
for (i = 0; i < out_sz; i++)
out[i] = sinc_approx(in, in_sz, i*dx);
}
/* test case */
int main()
{
double in[] = {
0.0, 1.0, 0.5, 0.2, 0.1, 0.0,
};
const size_t in_sz = sizeof in / sizeof in[0];
const size_t out_sz = 5;
double out[out_sz];
int i;
for (i = 0; i < in_sz; i++)
printf("in[%d] = %.6f\n", i, in[i]);
resample_sinc(in, in_sz, out, out_sz);
for (i = 0; i < out_sz; i++)
printf("out[%.6f] = %.6f\n", (double) i * (in_sz-1)/(out_sz-1), out[i]);
return EXIT_SUCCESS;
} /* main */

There are different ways of interpolation (see wikipedia)
The linear one would be something like:
std::array<int, 77> sampling(const std::array<int, 100>& a)
{
std::array<int, 77> res;
for (int i = 0; i != 76; ++i) {
int index = i * 99 / 76;
int p = i * 99 % 76;
res[i] = ((p * a[index + 1]) + ((76 - p) * a[index])) / 76;
}
res[76] = a[99]; // done outside of loop to avoid out of bound access (0 * a[100])
return res;
}
Live example

Create 77 new pixels based on the weighted average of their positions.
As a toy example, think about the 3 pixel case which you want to subsample to 2.
Original (denote as multidimensional array original with RGB as [0, 1, 2]):
|----|----|----|
Subsample (denote as multidimensional array subsample with RGB as [0, 1, 2]):
|------|------|
Here, it is intuitive to see that the first subsample seems like 2/3 of the first original pixel and 1/3 of the next.
For the first subsample pixel, subsample[0], you make it the RGB average of the m original pixels that overlap, in this case original[0] and original[1]. But we do so in weighted fashion.
subsample[0][0] = original[0][0] * 2/3 + original[1][0] * 1/3 # for red
subsample[0][1] = original[0][1] * 2/3 + original[1][1] * 1/3 # for green
subsample[0][2] = original[0][2] * 2/3 + original[1][2] * 1/3 # for blue
In this example original[1][2] is the green component of the second original pixel.
Keep in mind for different subsampling you'll have to determine the set of original cells that contribute to the subsample, and then normalize to find the relative weights of each.
There are much more complex graphics techniques, but this one is simple and works.

Everything depends on what you wish to do with the data - how do you want to visualize it.
A very simple approach would be to render to a 100-wide image, and then smooth scale the image down to a narrower size. Whatever graphics/development framework you're using will surely support such an operation.
Say, though, that your goal might be to retain certain qualities of the data, such as minima and maxima. In such a case, for each bin, you're drawing a line of darker color up to the minimum value, and then continue with a lighter color up to the maximum. Or, you could, instead of just putting a pixel at the average value, you draw a line from the minimum to the maximum.
Finally, you might wish to render as if you had 77 values only - then the goal is to somehow transform the 100 values down to 77. This will imply some kind of an interpolation. Linear or quadratic interpolation is easy, but adds distortions to the signal. Ideally, you'd probably want to throw a sinc interpolator at the problem. A good list of them can be found here. For theoretical background, look here.

Improving C++ algorithm for finding all points within a sphere of radius r

Language/Compiler: C++ (Visual Studio 2013)
Experience: ~2 months
I am working in a rectangular grid in 3D-space (size: xdim by ydim by zdim) where , "xgrid, ygrid, and zgrid" are 3D arrays of the x,y, and z-coordinates, respectively. Now, I am interested in finding all points that lie within a sphere of radius "r" centered about the point "(vi,vj,vk)". I want to store the index locations of these points in the vectors "xidx,yidx,zidx". For a single point this algorithm works and is fast enough but when I wish to iterate over many points within the 3D-space I run into very long run times.
Does anyone have any suggestions on how I can improve the implementation of this algorithm in C++? After running some profiling software I found online (very sleepy, Luke stackwalker) it seems that the "std::vector::size" and "std::vector::operator[]" member functions are bogging down my code. Any help is greatly appreciated.
Note: Since I do not know a priori how many voxels are within the sphere, I set the length of vectors xidx,yidx,zidx to be larger than necessary and then erase all the excess elements at the end of the function.
void find_nv(int vi, int vj, int vk, vector<double> &xidx, vector<double> &yidx, vector<double> &zidx, double*** &xgrid, double*** &ygrid, double*** &zgrid, int r, double xdim,double ydim,double zdim, double pdim)
{
double xcor, ycor, zcor,xval,yval,zval;
vector<double>xyz(3);
xyz[0] = xgrid[vi][vj][vk];
xyz[1] = ygrid[vi][vj][vk];
xyz[2] = zgrid[vi][vj][vk];
int counter = 0;
// Confine loop to be within boundaries of sphere
int istart = vi - r;
int iend = vi + r;
int jstart = vj - r;
int jend = vj + r;
int kstart = vk - r;
int kend = vk + r;
if (istart < 0) {
istart = 0;
}
if (iend > xdim-1) {
iend = xdim-1;
}
if (jstart < 0) {
jstart = 0;
}
if (jend > ydim - 1) {
jend = ydim-1;
}
if (kstart < 0) {
kstart = 0;
}
if (kend > zdim - 1)
kend = zdim - 1;
//-----------------------------------------------------------
// Begin iterating through all points
//-----------------------------------------------------------
for (int k = 0; k < kend+1; ++k)
{
for (int j = 0; j < jend+1; ++j)
{
for (int i = 0; i < iend+1; ++i)
{
if (i == vi && j == vj && k == vk)
continue;
else
{
xcor = pow((xgrid[i][j][k] - xyz[0]), 2);
ycor = pow((ygrid[i][j][k] - xyz[1]), 2);
zcor = pow((zgrid[i][j][k] - xyz[2]), 2);
double rsqr = pow(r, 2);
double sphere = xcor + ycor + zcor;
if (sphere <= rsqr)
{
xidx[counter]=i;
yidx[counter]=j;
zidx[counter] = k;
counter = counter + 1;
}
else
{
}
//cout << "counter = " << counter - 1;
}
}
}
}
// erase all appending zeros that are not voxels within sphere
xidx.erase(xidx.begin() + (counter), xidx.end());
yidx.erase(yidx.begin() + (counter), yidx.end());
zidx.erase(zidx.begin() + (counter), zidx.end());
return 0;

You already appear to have used my favourite trick for this sort of thing, getting rid of the relatively expensive square root functions and just working with the squared values of the radius and center-to-point distance.
One other possibility which may speed things up (a) is to replace all the:
xyzzy = pow (plugh, 2)
calls with the simpler:
xyzzy = plugh * plugh
You may find the removal of the function call could speed things up, however marginally.
Another possibility, if you can establish the maximum size of the target array, is to use an real array rather than a vector. I know they make the vector code as insanely optimal as possible but it still won't match a fixed-size array for performance (since it has to do everything the fixed size array does plus handle possible expansion).
Again, this may only offer very marginal improvement at the cost of more memory usage but trading space for time is a classic optimisation strategy.
Other than that, ensure you're using the compiler optimisations wisely. The default build in most cases has a low level of optimisation to make debugging easier. Ramp that up for production code.
(a) As with all optimisations, you should measure, not guess! These suggestions are exactly that: suggestions. They may or may not improve the situation, so it's up to you to test them.

One of your biggest problems, and one that is probably preventing the compiler from making a lot of optimisations is that you are not using the regular nature of your grid.
If you are really using a regular grid then
xgrid[i][j][k] = x_0 + i * dxi + j * dxj + k * dxk
ygrid[i][j][k] = y_0 + i * dyi + j * dyj + k * dyk
zgrid[i][j][k] = z_0 + i * dzi + j * dzj + k * dzk
If your grid is axis aligned then
xgrid[i][j][k] = x_0 + i * dxi
ygrid[i][j][k] = y_0 + j * dyj
zgrid[i][j][k] = z_0 + k * dzk
Replacing these inside your core loop should result in significant speedups.

You could do two things. Reduce the number of points you are testing for inclusion and simplify the problem to multiple 2d tests.
If you take the sphere an look at it down the z axis you have all the points for y+r to y-r in the sphere, using each of these points you can slice the sphere into circles that contain all the points in the x/z plane limited to the circle radius at that specific y you are testing. Calculating the radius of the circle is a simple solve the length of the base of the right angle triangle problem.
Right now you ar testing all the points in a cube, but the upper ranges of the sphere excludes most points. The idea behind the above algorithm is that you can limit the points tested at each level of the sphere to the square containing the radius of the circle at that height.
Here is a simple hand draw graphic, showing the sphere from the side view.
Here we are looking at the slice of the sphere that has the radius ab. Since you know the length ac and bc of the right angle triangle, you can calculate ab using Pythagoras theorem. Now you have a simple circle that you can test the points in, then move down, it reduce length ac and recalculate ab and repeat.
Now once you have that you can actually do a little more optimization. Firstly, you do not need to test every point against the circle, you only need to test one quarter of the points. If you test the points in the upper left quadrant of the circle (the slice of the sphere) then the points in the other three points are just mirror images of that same point offset either to the right, bottom or diagonally from the point determined to be in the first quadrant.
Then finally, you only need to do the circle slices of the top half of the sphere because the bottom half is just a mirror of the top half. In the end you only tested a quarter of the point for containment in the sphere. This should be a huge performance boost.
I hope that makes sense, I am not at a machine now that I can provide a sample.

simple thing here would be a 3D flood fill from center of the sphere rather than iterating over the enclosing square as you need to visited lesser points. Moreover you should implement the iterative version of the flood-fill to get more efficiency.
Flood Fill

Optimized float Blur variations

I am looking for optimized functions in c++ for calculating areal averages of floats. the function is passed a source float array, a destination float array (same size as source array), array width and height, "blurring" area width and height.
The function should "wrap-around" edges for the blurring/averages calculations.
Here is example code that blur with a rectangular shape:
/*****************************************
* Find averages extended variations
*****************************************/
void findaverages_ext(float *floatdata, float *dest_data, int fwidth, int fheight, int scale, int aw, int ah, int weight, int xoff, int yoff)
{
printf("findaverages_ext scale: %d, width: %d, height: %d, weight: %d \n", scale, aw, ah, weight);
float total = 0.0;
int spos = scale * fwidth * fheight;
int apos;
int w = aw;
int h = ah;
float* f_temp = new float[fwidth * fheight];
// Horizontal
for(int y=0;y<fheight ;y++)
{
Sleep(10); // Do not burn your processor
total = 0.0;
// Process entire window for first pixel (including wrap-around edge)
for (int kx = 0; kx <= w; ++kx)
if (kx >= 0 && kx < fwidth)
total += floatdata[y*fwidth + kx];
// Wrap
for (int kx = (fwidth-w); kx < fwidth; ++kx)
if (kx >= 0 && kx < fwidth)
total += floatdata[y*fwidth + kx];
// Store first window
f_temp[y*fwidth] = (total / (w*2+1));
for(int x=1;x<fwidth ;x++) // x width changes with y
{
// Substract pixel leaving window
if (x-w-1 >= 0)
total -= floatdata[y*fwidth + x-w-1];
// Add pixel entering window
if (x+w < fwidth)
total += floatdata[y*fwidth + x+w];
else
total += floatdata[y*fwidth + x+w-fwidth];
// Store average
apos = y * fwidth + x;
f_temp[apos] = (total / (w*2+1));
}
}
// Vertical
for(int x=0;x<fwidth ;x++)
{
Sleep(10); // Do not burn your processor
total = 0.0;
// Process entire window for first pixel
for (int ky = 0; ky <= h; ++ky)
if (ky >= 0 && ky < fheight)
total += f_temp[ky*fwidth + x];
// Wrap
for (int ky = fheight-h; ky < fheight; ++ky)
if (ky >= 0 && ky < fheight)
total += f_temp[ky*fwidth + x];
// Store first if not out of bounds
dest_data[spos + x] = (total / (h*2+1));
for(int y=1;y< fheight ;y++) // y width changes with x
{
// Substract pixel leaving window
if (y-h-1 >= 0)
total -= f_temp[(y-h-1)*fwidth + x];
// Add pixel entering window
if (y+h < fheight)
total += f_temp[(y+h)*fwidth + x];
else
total += f_temp[(y+h-fheight)*fwidth + x];
// Store average
apos = y * fwidth + x;
dest_data[spos+apos] = (total / (h*2+1));
}
}
delete f_temp;
}
What I need is similar functions that for each pixel finds the average (blur) of pixels from shapes different than rectangular.
The specific shapes are: "S" (sharp edges), "O" (rectangular but hollow), "+" and "X", where the average float is stored at the center pixel on destination data array. Size of blur shape should be variable, width and height.
The functions does not need to be pixelperfect, only optimized for performance. There could be separate functions for each shape.
I am also happy if anyone can tip me of how to optimize the example function above for rectangluar blurring.

What you are trying to implement are various sorts of digital filters for image processing. This is equivalent to convolving two signals where the 2nd one would be the filter's impulse response. So far, you regognized that a "rectangular average" is separable. By separable I mean, you can split the filter into two parts. One that operates along the X axis and one that operates along the Y axis -- in each case a 1D filter. This is nice and can save you lots of cycles. But not every filter is separable. Averaging along other shapres (S, O, +, X) is not separable. You need to actually compute a 2D convolution for these.
As for performance, you can speed up your 1D averages by properly implementing a "moving average". A proper "moving average" implementation only requires a fixed amount of little work per pixel regardless of the averaging "window". This can be done by recognizing that neighbouring pixels of the target image are computed by an average of almost the same pixels. You can reuse these sums for the neighbouring target pixel by adding one new pixel intensity and subtracting an older one (for the 1D case).
In case of arbitrary non-separable filters your best bet performance-wise is "fast convolution" which is FFT-based. Checkout www.dspguide.com. If I recall correctly, there is even a chapter on how to properly do "fast convolution" using the FFT algorithm. Although, they explain it for 1-dimensional signals, it also applies to 2-dimensional signals. For images you have to perform 2D-FFT/iFFT transforms.

To add to sellibitze's answer, you can use a summed area table for your O, S and + kernels (not for the X one though). That way you can convolve a pixel in constant time, and it's probably the fastest method to do it for kernel shapes that allow it.
Basically, a SAT is a data structure that lets you calculate the sum of any axis-aligned rectangle. For the O kernel, after you've built a SAT, you'd take the sum of the outer rect's pixels and subtract the sum of the inner rect's pixels. The S and + kernels can be implemented similarly.
For the X kernel you can use a different approach. A skewed box filter is separable:
You can convolve with two long, thin skewed box filters, then add the two resulting images together. The center of the X will be counted twice, so will you need to convolve with another skewed box filter, and subtract that.
Apart from that, you can optimize your box blur in many ways.
Remove the two ifs from the inner loop by splitting that loop into three loops - two short loops that do checks, and one long loop that doesn't. Or you could pad your array with extra elements from all directions - that way you can simplify your code.
Calculate values like h * 2 + 1 outside the loops.
An expression like f_temp[ky*fwidth + x] does two adds and one multiplication. You can initialize a pointer to &f_temp[ky*fwidth] outside the loop, and just increment that pointer in the loop.
Don't do the division by h * 2 + 1 in the horizontal step. Instead, divide by the square of that in the vertical step.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js