Fast data structure or algorithm to find mean of each pixel in a stack of images - c++

I have a stack of images in which I want to calculate the mean of each pixel down the stack.
For example, let (x_n,y_n) be the (x,y) pixel in the nth image. Thus, the mean of pixel (x,y) for three images in the image stack is:
mean-of-(x,y) = (1/3) * ((x_1,y_1) + (x_2,y_2) + (x_3,y_3))
My first thought was to load all pixel intensities from each image into a data structure with a single linear buffer like so:
|All pixels from image 1| All pixels from image 2| All pixels from image 3|
To find the sum of a pixel down the image stack, I perform a series of nested for loops like so:
for(int col=0; col<img_cols; col++)
{
for(int row=0; row<img_rows; row++)
{
for(int img=0; img<num_of_images; img++)
{
sum_of_px += px_buffer[(img*img_rows*img_cols)+col*img_rows+row];
}
}
}
Basically img*img_rows*img_cols gives the buffer element of the first pixel in the nth image and col*img_rows+row gives the (x,y) pixel that I want to find for each n image in the stack.
Is there a data structure or algorithm that will help me sum up pixel intensities down an image stack that is faster and more organized than my current implementation?
I am aiming for portability so I will not be using OpenCV and am using C++ on linux.

The problem with the nested loop in the question is that it's not very cache friendly. You go skipping through memory with a long stride, effectively rendering your data cache useless. You're going to spend a lot of time just accessing the memory.
If you can spare the memory, you can create an extra image-sized buffer to accumulate totals for each pixel as you walk through all the pixels in all the images in memory order. Then you do a single pass through the buffer for the division.
Your accumulation buffer may need to use a larger type than you use for individual pixel values, since it has to accumulate many of them. If your pixel values are, say, 8-bit integers, then your accumulation buffer might need 32-bit integers or floats.

Usually, a stack of pixels
(x_1,y_1),...,(x_n,y_n)
is conditionally independent from a stack
(a_1,b_1),...,(a_n,b_n)
And even if they weren't (assuming a particular dataset), then modeling their interactions is a complex task and will give you only an estimate of the mean. So, if you want to compute the exact mean for each stack, you don't have any other choice but to iterate through the three loops that you supply. Languages such as Matlab/octave and libraries such as Theano (python) or Torch7 (lua) all parallelize these iterations. If you are using C++, what you do is well suited for Cuda or OpenMP. As for portability, I think OpenMP is the easier solution.

A portable, fast data structure specifically for the average calculation could be:
std::vector<std::vector<std::vector<sometype> > > VoVoV;
VoVoV.resize(img_cols);
int i,j;
for (i=0 ; i<img_cols ; ++i)
{
VoVoV[i].resize(img_rows);
for (j=0 ; j<img_rows ; ++j)
{
VoVoV[i][j].resize(num_of_images);
// The values of all images at this pixel are stored continguously,
// therefore should be fast to access.
}
}
VoVoV[col][row][img] = foo;
As a side note, 1/3 in your example will evaluate to 0 which is not what you want.
For fast summation/averaging you can now do:
sometype sum = 0;
std::vector<sometype>::iterator it = VoVoV[col][row].begin();
std::vector<sometype>::iterator it_end = VoVoV[col][row].end();
for ( ; it != it_end ; ++it)
sum += *it;
sometype avg = sum / num_of_images; // or similar for integers; check for num_of_images==0
Basically you should not rely that the compiler would optimize away the repeated calculation of always the same offsets.

Related

Are there any good strategies to bilinear sample from a tiled image?

When doing a bilinear sample of an image one needs the 4 neighbor pixels. This is easy for an image that is linear in memory.
However, if the image is made of individual tiles in memory, in the worst case each of the four samples is in a different tile.
What are some strategies to make this fast? Assume that tiles are powers of two squares. In most cases a gather should stay within one tile.
There is one w x h array T of tile pointers, each tile is a raw array of k x k pixels. How to make a fast Gather (x,y,dest) function that returns the four pixels at (x,y),(x+1,y),(x,y+1),(x+1,y+1)?
What if the tiles are not raw pointers but objects that potentially need to be paged in? So there needs to be a test if(T[o]==0) PageIn(o);
Also edges of the whole image should clamp, so value(-1,y)==value(0,y) etc.
This is a pretty open ended question. I know how to do it. I am looking for tricks and tips on how to do it fast.
Have your tiles store redundant pixels - that is, store pixels near a tile boundary in both neighboring tiles (or actually in 4 tiles if the pixel is near a tile's corner).
This completely eliminates the overhead of reading pixels (including near the boundary) at the cost of wasted memory. Also, writing pixels is harder - writing an individual pixel may require updating up to 4 tiles. However, if you calculate an entire image, generating the redundant pixels is a uniform procedure.
You might want to choose a particular size for a tile. For example, width = 62 pixels (when pixel = byte); after adding two redundant pixels, the width is equal to a cache line (assuming it's 64 bytes).
If using bicubic interpolation, add 2 redundant pixels from each side.
You can always pre-allocate pointers to lines within a tile. Sort them into a big array of pointers in a way that you can get to the tile scanline by accessing the pointer array
char* ptr [numTilesH][numTilesV * tileHeight];
// ... fill the pointer array with pointers to the beginning of each
// scanline in a tile...
// ...assume 256 grey scales
unsigned char getPixel (int x, int y) {
int hTileNr = x / tileWidth;
int hTileOffs = x % tileWidth;
char * pixelPtr = ptr [hTileNr][y];
return pixelPtr [hTileOfs];
}
In this case, the optimisation is a typical memory vs. CPU trade-in. As you can see here, access to a pixel is one division with remainder, and two array accesses. Should be blazingly fast, once you have set up the pointer arrays. The more tiles you have, however, the more memory you'll need for the scanline pointers - Maybe up to a point where the pointer arrays simply become too expensive.

Fast 7x7 2D median filter in C and C++

I'm trying to convert the following code from MATLAB to C++:
function data = process(data)
data = medfilt2(data, [7 7], 'symmetric');
mask = fspecial('gaussian', [35 35], 12);
data = imfilter(data, mask, 'replicate', 'same');
maximum = max(data(:));
data = 1 ./ (data/maximum);
data(data > 10) = 16;
end
My problem is in the medfilt2, which is a 2D median filter. I need it to support 10 bits per pixels and more images.
I have looked into OpenCV it has a 5x5 median filter which supports 16 bits, but 7x7 only supports bytes.
medianBlur
I have also looked into Intel IPP, but I can see only a 1D median filter.
https://software.intel.com/en-us/node/502283
Is there a fast implementation for a 2D filter?
I am looking for something like:
Fast Median Search: An ANSI C Implementation using parallel programming and vectorized (AVX/SSE) operations...
Two Dimensional Digital Signal Processing II. Transforms and median filters.
Edited by T.S.Huang. Springer-Verlag. 1981.
There are more code examples in Fast median filtering with implementations in C/C++/C#/VB.NET/Delphi.
I also found Median Filtering in Constant Time.
Motivated by the fact that OpenCV does not implement 16-bit median filter for large kernel sizes (larger than 5), I tried three different strategies.
All of them are based on Huang's [2] sliding window algorithm. That is, the histogram is updated by removing and inserting pixel entries as the window slides from left to right. This is quite straightforward for 8-bit image and already implemented in OpenCV. However, a large 65536 bin histogram makes computation a bit difficult.
...The algorithm still remains O(log r), but storage considerations render it impractical for 16-bit images and impossible for floating-point images. [3]
I used the algorithm C++ standard library where applicable, and did not implement Weiss' additional optimization strategies.
1) A naive sorting implementation. I think this is the best starting point for arbitrary pixel type (floats particularly).
// copy pixels in the sliding window to a temporary vec and
// compute the median value (size is always odd)
memcpy( &v[0], &window[0], window.size() * sizeof(_Type) );
std::vector< _Type >::iterator it = v.begin() + v.size()/2;
std::nth_element( v.begin(), it, v.end() );
return *it;
2) A sparse histogram. We wouldn't want to step over 65536 bins to find the median of each pixel, so how about storing the sparse histogram then? Again, this is suitable for all pixel types, but it doesn't make sense if all pixels in the window are different (e.g. floats).
typedef std::map< _Type, int > Map;
//...
// inside the sliding window, update the histogram as follows
for ( /* pixels to remove */ )
{
// _Type px
Map::iterator it = map.find( px );
if ( it->second > 1 )
it->second -= 1;
else
map.erase( it );
}
// ...
for ( /* pixels to add */ )
{
// _Type px
Map::iterator lower = map.lower_bound( px );
if ( lower != map.end() && lower->first == px )
lower->second += 1;
else
map.insert( lower, std::pair<_Type,int>( px, 1 ) );
}
//... and compute the median by integrating from the one end until
// until the appropriate sum is reached ..
3) A dense histogram. So this is the dense histogram, but instead of a simple 65536 array, we make searching a little easier by dividing it into sub-bins e.g.:
[0...65535] <- px
[0...4095] <- px / 16
[0...255] <- px / 256
[0...15] <- px / 4096
This makes insertion a bit slower (by constant time), but search a lot faster. I found 16 a good number.
Figure I tested methods (1) red, (2) blue and (3) black against each other and 8bpp OpenCV (green). For all but OpenCV, the input image is 16-bpp gray scale. The dotted lines are truncated at dynamic range [0,255] and smooth lines are truncated at [0, 8020] ( via multiplication by 16 and smoothing to add more variance on pixel values).
Interesting is the divergence of sparse histogram as the variance of pixel values increases. Nth-element is safe bet always, OpenCV is the fastest (if 8bpp is ok) and the dense histogram is trailing behind.
I used Windows 7, 8 x 3.4 GHz and Visual Studio v. 10. Mine were running multithreaded, OpenCV implementation is single-threaded. Input image size 2136x3201 (http://i.imgur.com/gg9Z2aB.jpg, from Vogue).
[2]: Huang, T: "Two-Dimensional Signal Processing II: Transforms
and Median Filters", 1981
[3]: Weiss, B: "Fast median and Bilateral Filtering", 2006
I just implemented, in DIPlib, an efficient algorithm for computing the median filter (and the more generic percentile filter). This algorithm works for integer images of any bit depth as well as floating-point images, works for images of any number of dimensions, and works for kernels of any shape.
The algorithm is similar to the binary search tree implementation suggested by #mainactual in their answer to this question (as method #2), but uses a more appropriate order statistic tree. #mainactual's implementation needs O(n) to find the median in the search tree, for a tree with n nodes, because it iterates through half the nodes in the tree. This is only efficient if there are many fewer nodes than pixels in the kernel, which is typically only true for integer images with a small bit depth. In contrast, the order statistic tree can find the median value in O(log n), by storing an additional value in each node: the size of the subtree rooted at that node. The filter has a cost of O(k log k) for a compact 2D kernel with a height of k pixels (independent of the width).
I wrote down a more detailed description of the algorithm in my blog.
The C++ code is available on GitHub.
Here is a timing comparison for square kernels, comparing:
the new implementation in DIPlib (blue),
the naive implementation in scikit-image (which computes the median for each pixel's neighborhood independently, method #1 in #mainactual's answer, with a quadratic cost) (green), and
the O(1) implementation in OpenCV that only works for 8-bit images and square kernels (red).
"SFLOAT" stands for single-precision floating-point, "UINT8" stands for 8-bit unsigned integer, and "0-10" is also 8-bit unsigned integer, but containing only pixel values between 0 and 10 (this one tests what happens when there are many repeated values in each neighborhood).
The new implementation in DIPlib kicks in at k = 13, the lower part of the graph is the naive, quadratic cost algorithm.
I found this online. It is the same algorithm which OpenCV has. However, it is extended to 16 bit and optimized to SSE.
medianFilter.c
I happened to find (my) solution online as open source (image-quality-and-characterization-utilities, from include/Teisko/Image/Algorithm.hpp
The algorithm finds the Kth element of any set of size M<=64 in N steps, where N is the number of bits in the elements.
This is a radix-2 sort algorithm, which needs the original bit pattern int16_t data[7][7]; to be transposed into N planes of uint64_t bits[N] (10 for 10-bit images), with the MSB first.
// Runs N iterations for pixel data as bit planes in `bits`
// to recover the K(th) largest item (as set by initial threshold)
// The parameter `mask` must be initialized to contain a set bit for all those bits
// of interest in the corresponding pixel data bits[0..N-1]
template <int N> inline
uint64_t median8x8_iteration(uint64_t(&bits)[N], uint64_t mask, uint64_t threshold)
{
uint64_t result = 0;
int i = 0;
do
{
uint64_t ones = mask & bits[i];
uint64_t ones_size = popcount(ones);
uint64_t mask_size = popcount(mask);
auto zero_size = mask_size - ones_size;
int new_bit = 0;
if (zero_size < threshold)
{
new_bit = 1;
threshold -= zero_size;
mask = 0;
}
result = result * 2 + new_bit;
mask ^= ones;
} while (++i < N);
return result;
}
Use threshold = 25 to get median of 49 and mask = 0xfefefefefefefe00ull in the case the planes bits[] contain the support for 8x8 adjacent bits.
Toggling the MSB plane one can use the same inner loop for signed integers - and toggling conditionally MSB and the other planes one can use the algorithm for floating points as well.
Much after year 2016, Ice Lake with AVX-512 introduced _mm256_mask_popcnt_epi64 even on consumer machines, allowing the inner loop to be almost trivially vectorised for all the four submatrices in the common 8x8 support; the masks would be 0xfefefefefefefe00ull >> {0,1,8,9}.
The idea here is that the mask marks the set of pixels under inspection. Counting the number of ones (or zeros) in that set and comparing to a threshold, we can determine at each step if the Kth element belongs to the set of ones or zeros, producing also one correct output bit.
EDIT
Another method I tried was SSE2 version where one keeps a window of size Width*(7 + 1) sorted by rows:
original sorted
1 2 1 3 .... 1 1 0
2 1 3 4 .... 2 2 1
5 2 0 1 .... -> 5 2 3
. . . . .... . . .
Sorting 7 rows is efficiently done by a sorting network using 16 primitive sorting operations (32 instructions with 3-parameter VEX encoding + 14 instructions for memory access).
One can also incrementally remove element from input[row-1][column] from a presorted SSE2 register and add an element from input[row+7][column] to the register (which takes about 12 instructions per sorted column).
Having 7 sorted columns in 7 SSE2 registers, one can now implement bitonic merge sort of three different widths,
which at first column will sort in groups of
(r0),(r1,r2), ((r3,r4), (r5,r6))
<merge> <merge> <merge> // level #0
<--merge---> <---- merge* ----> // level #1
<----- merge + take middle ----> // partial level #2
At column 1, one needs to sort columns
(r1,r2), ((r3,r4),(r5,r6)), (r7)
******* **** <-- merge (r1,r2,r7)
<----- merge + take middle ----> <-- partial level #2
At column 2, with
r2, ((r3,r4),(r5,r6)), (r7,r8)
<merge> // level #0
** ******* // level #1
<----- merge + take middle ----> // level #2
This takes advantage of memorizing partially sorted substrings (and does it better than e.g. heap based priority queue). The final merging of the 3x7 and 4x7 element substrings does not need to compute every element correctly, since we are only interested in the element #24.
Still, while being better than (my implementations) of heap based priority queues and several hierarchical / flat histogram based methods, the overall winner was the popcount method (with a 64-bit popcount instruction).

super fast median of matrix in opencv (as fast as matlab)

I'm writing some code in openCV and want to find the median value of a very large matrix array (single channel grayscale, float).
I tried several methods such as sorting the array (using std::sort) and picking the middle entry but it is extremely slow when comparing with the median function in matlab. To be precise - what takes 0.25 seconds in matlab takes over 19 seconds in openCV.
My input image is originally a 12-bit greyscale image with the dimensions 3840x2748 (~10.5 megapixels), converted to float (CV_32FC1) where all the values are now mapped to the range [0,1] and at some point in the code I request the median value by calling:
double myMedianValue = medianMat(Input);
Where the function medianMat is:
double medianMat(cv::Mat Input){
Input = Input.reshape(0,1); // spread Input Mat to single row
std::vector<double> vecFromMat;
Input.copyTo(vecFromMat); // Copy Input Mat to vector vecFromMat
std::sort( vecFromMat.begin(), vecFromMat.end() ); // sort vecFromMat
if (vecFromMat.size()%2==0) {return (vecFromMat[vecFromMat.size()/2-1]+vecFromMat[vecFromMat.size()/2])/2;} // in case of even-numbered matrix
return vecFromMat[(vecFromMat.size()-1)/2]; // odd-number of elements in matrix
}
I timed the function medinaMat by itself and also the various parts - as expected the bottleneck is in:
std::sort( vecFromMat.begin(), vecFromMat.end() ); // sort vecFromMat
Does anyone here have an efficient solution?
Thanks!
EDIT
I have tried using std::nth_element given in the answer of Adi Shavit.
The function medianMat now reads as:
double medianMat(cv::Mat Input){
Input = Input.reshape(0,1); // spread Input Mat to single row
std::vector<double> vecFromMat;
Input.copyTo(vecFromMat); // Copy Input Mat to vector vecFromMat
std::nth_element(vecFromMat.begin(), vecFromMat.begin() + vecFromMat.size() / 2, vecFromMat.end());
return vecFromMat[vecFromMat.size() / 2];
}
The runtime has lowered from over 19 seconds to 3.5 seconds. This is still nowhere near the 0.25 second in Matlab using the median function...
Sorting and taking the middle element is not the most efficient way to find a median. It requires O(n log n) operations.
With C++ you should use std::nth_element() and take the middle iterator. This is an O(n) operation:
nth_element is a partial sorting algorithm that rearranges elements in [first, last) such that:
The element pointed at by nth is changed to whatever element would occur in that position if [first, last) was sorted.
All of the elements before this new nth element are less than or equal to the elements after the new nth element.
Also, your original data is 12 bit integers. Your implementation does a few things that make the comparison to Matlab problematic:
You converted to floating point (CV_32FC1 or double or both) this is costly and takes time
The code has an extra copy to a vector<double>
Operations on float and especially doubles cost more than on integers.
Assuming your image is continuous in memory, as is the default for OpenCV you should use CV_16C1, and work directly on the data array after reshape()
Another option which should be very fast is to simply build a histogram of the image - this is a single pass on the image. Then, working on the histogram, find the bin that corresponds to half the pixels on each side - this is at most a single pass over the bins.
The OpenCV docs have several tutorials on how to build a histograms. Once you have the histogram, accumulate the bin values until you get pass 3840x2748/2. This bin is your median.
OK.
I actually tried this before posting the question and due to some silly mistakes I disqualified it as a solution... anyway here it is:
I basically create a histogram of values for my original input with 2^12 = 4096 bins, compute the CDF and normalize it so it is mapped from 0 to 1 and find the smallest index in the CDF that is equal or larger than 0.5. I then divide this index by 12^2 and thus find the median value requested. Now it runs in 0.11 seconds (and that's in debug mode without heavy optimizations) which is less than half the time required in Matlab.
Here's the function (nVals = 4096 in my case corresponding with 12-bits of values):
double medianMat(cv::Mat Input, int nVals){
// COMPUTE HISTOGRAM OF SINGLE CHANNEL MATRIX
float range[] = { 0, nVals };
const float* histRange = { range };
bool uniform = true; bool accumulate = false;
cv::Mat hist;
calcHist(&Input, 1, 0, cv::Mat(), hist, 1, &nVals, &histRange, uniform, accumulate);
// COMPUTE CUMULATIVE DISTRIBUTION FUNCTION (CDF)
cv::Mat cdf;
hist.copyTo(cdf);
for (int i = 1; i <= nVals-1; i++){
cdf.at<float>(i) += cdf.at<float>(i - 1);
}
cdf /= Input.total();
// COMPUTE MEDIAN
double medianVal;
for (int i = 0; i <= nVals-1; i++){
if (cdf.at<float>(i) >= 0.5) { medianVal = i; break; }
}
return medianVal/nVals; }
It's probably faster to find it from the original data.
Since the original data has 12-bit values, there are only
4096 different possible values. That's a nice and small table!
Go through all the data in one pass, and count how many of each value
you have. That is a O(n) operation. Then it's easy to find the median,
only count size/2 items from either end of the table.

Use map instead of array in C++ to protect searching outside of array bounds?

I have a gridded rectangular file that I have read into an array. This gridded file contains data values and NODATA values; the data values make up a continuous odd shape inside of the array, with NODATA values filling in the rest to keep the gridded file rectangular. I perform operations on the data values and skip the NODATA values.
The operations I perform on the data values consist of examining the 8 surrounding neighbors (the current cell is the center of a 3x3 grid). I can handle when any of the eight neighbors are NODATA values, but when actual data values fall in the first or last row/column, I trigger an error by trying to access an array value that doesn't exist.
To get around this I have considered three options:
Add a new first and last row/column with NODATA values, and adjust my code accordingly - I can cycle through the internal 'original' array and handle the new NODATA values like the edges I'm already handling that don't fall in the first and last row/column.
I can create specific processes for handling the cells in first and last row/column that have data - modified for loops (a for loop that steps through a specific sequence/range) that only examine the surrounding cells that exist, though since I still need 8 neighboring values (NODATA/non-existent cells are given the same value as the central cell) I would have to copy blank/NODATA values to a secondary 3x3 grid. Though there maybe a way to avoid the secondary grid. This solution is annoying as I have to code up specialized routines to all corner cells (4 different for loops) and any cell in the 1st or last row/column (another 4 different for loops). With a single for loop for any non-edge cell.
Use a map, which based on my reading, appears capable of storing the original array while letting me search for locations outside the array without triggering an error. In this case, I still have to give these non-existent cells a value (equal to the center of the array) and so may or may not have to set up a secondary 3x3 grid as well; once again there maybe a way to avoid the secondary grid.
Solution 1 seems the simplest, solution 3 the most clever, and 2 the most annoying. Are there any solutions I'm missing? Or does one of these solutions deserve to be the clear winner?
My advice is to replace all read accesses to the array by a function. For example, arr[i][j] by getarr(i,j). That way, all your algorithmic code stays more or less unchanged and you can easily return NODATA for indices outside bounds.
But I must admit that it is only my opinion.
I've had to do this before and the fastest solution was to expand the region with NODATA values and iterate over the interior. This way the core loop is simple for the compiler to optimize.
If this is not a computational hot-spot in the code, I'd go with Serge's approach instead though.
To minimize rippling effects I used an array structure with explicit row/column strides, something like this:
class Grid {
private:
shared_ptr<vector<double>> data;
int origin;
int xStride;
int yStride;
public:
Grid(int nx, int ny) :
data( new vector<double>(nx*ny) ),
origin(0),
xStride(1),
yStride(nx) {
}
Grid(int nx, int ny, int padx, int pady) :
data( new vector<double>((nx+2*padx)*(ny+2*pady));
xStride(1),
yStride(nx+2*padx),
origin(nx+3*padx) {
}
double& operator()(int x, int y) {
return (*data)[origin + x*xStride + y*yStride];
}
}
Now you can do
Grid g(5,5,1,1);
Grid g2(5,5);
//Initialise
for(int i=0; i<5; ++i) {
for(int j=0; j<5; ++j) {
g(i,j)=i+j;
}
}
// Convolve (note we don't care about going outside the
// range, and our indices are unchanged between the two
// grids.
for(int i=0; i<5; ++i) {
for(int j=0; j<5; ++j) {
g2(i,j)=0;
g2(i,j)+=g(i-1,j);
g2(i,j)+=g(i+1,j);
g2(i,j)+=g(i,j-1);
g2(i,j)+=g(i,j+1);
}
}
Aside: This data structure is awesome for working with transposes, and sub-matrices. Each of those is just an adjustment of the offset and stride values.
Solution 1 is the standard solution. It takes maximum advantage of modern computer architectures, where a few bytes of memory are no big deal, and correct instruction prediction accelerates performance. As you keep accessing memory in a predictable pattern (with fixed strides), the CPU prefetcher will successfully read ahead.
Solution 2 saves a small amount of memory, but the special handling of the edges incurs a real slowdown. Still, the large chunk in the middle benefits from the prefetcher.
Solution 3 is horrible. Map access is O(log N) instead of O(1), and in practice it can be 10-20 times slower. Maps have poor locality of reference; the CPU prefetcher will not kick in.
If simple means "easy to read" I'd recommend you declare a class with an overloaded [] operator. Use it like a regular array but it'll have bounds checking to handle NODATA.
If simple means "high performance" and you have sparse grid with isolated DATA consider implementing linked lists to the DATA values and implement optimal operators that go directly to tge DATA values.
1 wastes memory proportional to your overall rectangle size, 3/maps are clumsy here, 2 is actually very easy to do:
T d[X][Y] = ...;
for (int x = 0; x < X; ++x)
for (int y = 0; y < Y; ++y) // move over d[x][y] centres
{
T r[3][3] = { { d[i,j], d[i,j], d[i,j] },
d[i,j], d[i,j], d[i,j] },
d[i,j], d[i,j], d[i,j] } };
for (int i = std::min(0, x-1); i < std::max(X-1, x+1); ++i)
for (int j = std::min(0, y-1); j < std::max(Y-1, y+1); ++j)
if (d[i][j] != NoData)
r[i-x][j-y] = d[i][j];
// use r for whatever...
}
Note that I'm using signed int very deliberately so x-1 and y-1 don't become huge positive numbers (as they would with say size_t) and break the std::min logic... but you could express it differently if you had some reason to prefer size_t (e.g. x == 0 ? 0 : x - 1).

OpenCV: Fill in missing elements in CVMat with avg of nearest non-zero neighbors?

The basic problem is this:
I have a CVMat, type CV_8UC1, which is mostly filled in with integers (well, chars, actually, but whatever) between 1 and 100 inclusive. The remaining elements are zeros.
In this case, 0 basically means "unknown". I want to fill in the unknown elements with, essentially, the average of its nearest neighbors... i.e. if this matrix were representing a 3d surface with a bunch of holes in it, I want to smoothly fill in the holes.
Keeping in mind, of course, that it's possible there are some rather big holes.
Efficiency isn't super important, as this operation is only going to be happening once, and the matrix in question isn't bigger than around 1000x1000.
Here's the code I need to finish:
for(int x=0; x<heightMatrix.cols; x++) {
for (int y=0; y<heightMatrix.rows; y++) {
if (heightMatrix.at<char>(x,y) == 0) {
// ???
}
}
}
Thanks!!
How about this instead:
put your data in an image and use image closing with a large kernel (or with a lot of iterations):
http://opencv.willowgarage.com/documentation/image_filtering.html#morphologyex
What about this?
int sum = 0;
... paste the following part inside the loop ...
sum += heightMatrix.at<char>(x - 1,y);
sum += heightMatrix.at<char>(x + 1,y);
sum += heightMatrix.at<char>(x,y - 1);
sum += heightMatrix.at<char>(x,y + 1);
heightMatrix.at<char>(x,y) = sum / 4;
Since you deal with a CV_8UC1 Mat you have in practice a 2d array and each pixel has just 4 nearest neighbors.
There are some caveats however:
1) put your averaged pixel in a Mat of floats to avoid round off!
2) to fill the whole Mat with this average may not be what you are looking for if the non-zero pixels are quite sparse: when there is a lot of empty pixels and really few non-zero pixels the more you move away from a non-zero pixel, the more the average converges to 0. And this may happen in as few as 3-4 iterations (another good reason to store not to store the values in a Mat of integers).