How i can take the average of 100 image using opencv? - c++

i have 100 image, each one is 598 * 598 pixels, and i want to remove the pictorial and noise by taking the average of pixels, but if i want to use Adding for "pixel by pixel"then dividing i will write a loop until 596*598 repetitions for one image, and 598*598*100 for hundred of image.
is there a method to help me in this operation?

You need to loop over each image, and accumulate the results. Since this is likely to cause overflow, you can convert each image to a CV_64FC3 image, and accumualate on a CV_64FC3 image. You can use also CV_32FC3 or CV_32SC3 for this, i.e. using float or integer instead of double.
Once you have accumulated all values, you can use convertTo to both:
make the image a CV_8UC3
divide each value by the number of image, to get the actual mean.
This is a sample code that creates 100 random images, and computes and shows the
#include <opencv2\opencv.hpp>
using namespace cv;
Mat3b getMean(const vector<Mat3b>& images)
if (images.empty()) return Mat3b();
// Create a 0 initialized image to use as accumulator
Mat m(images[0].rows, images[0].cols, CV_64FC3);
// Use a temp image to hold the conversion of each input image to CV_64FC3
// This will be allocated just the first time, since all your images have
// the same size.
Mat temp;
for (int i = 0; i < images.size(); ++i)
// Convert the input images to CV_64FC3 ...
images[i].convertTo(temp, CV_64FC3);
// ... so you can accumulate
m += temp;
// Convert back to CV_8UC3 type, applying the division to get the actual mean
m.convertTo(m, CV_8U, 1. / images.size());
return m;
int main()
// Create a vector of 100 random images
vector<Mat3b> images;
for (int i = 0; i < 100; ++i)
Mat3b img(598, 598);
randu(img, Scalar(0), Scalar(256));
// Compute the mean
Mat3b meanImage = getMean(images);
// Show result
imshow("Mean image", meanImage);
return 0;

Suppose that the images will not need to undergo transformations (gamma, color space, or alignment). The numpy package lets you do this quickly and succinctly.
# List of images, all must be the same size and data type.
images=[img0, img1, ...]
avg_img = np.mean(images, axis=0)
This will auto-promote the elements to float. If you want the as BGR888, then:
avg_img = avg_img.astype(np.uint8)
Could also do uint16 for 16 bits per channel. If you are dealing with 8 bits per channel, you almost certainly won't need 100 images.

Firstly- convert images to floats. You have N=100 images. Imagine that a single image is an array of average pixel values of 1 image. You need to calculate an array of average pixel values of N images.
Let A- array of average pixel values of X images, B - array of average pixel values of Y images. Then C = (A * X + B * Y) / (X + Y) - array of average pixel values of X + Y images. To get better accuracy in floating point operations X and Y should be approximately equal
You may merge all you images like subarrays in merge sort. In you case merge operation is C = (A * X + B * Y) / (X + Y) where A and B are arrays of average pixel values of X and Y images


Image Gaussian convolution in Fourier domain: works, while should't

The problem is I can't fully understand the principles of convolution in frequency domain.
I have an image of size 256x256, which I want to convolve with 3x3 gaussian matrix. It's coefficients are (1/16, 1/8, 1/4):
PlainImage<float> FourierRunner::getGaussMask(int sz)
PlainImage<float> G(3,3);
*, 0) = 1.0/16; *, 1) = 1.0/8; *, 2) = 1.0/16;
*, 0) = 1.0/8; *, 1) = 1.0/4; *, 2) = 1.0/8;
*, 0) = 1.0/16; *, 1) = 1.0/8; *, 2) = 1.0/16;
return G;
To get FFT of both image and filter kernel, I zero-pad them. sz_common stands for the extended size. Image and kernel are moved to the center of h and g ComplexImages respectively, so they are zero-padded at right, left, bottom and top.
I've read that size should be sz_common >= sz+gsz-1 because of circular convolution property: filter can change undesired image values on boundaries.
But it don't works: adequate results are only when sz_common = sz, when sz_common = sz+gsz-1 or sz_common = 2*sz, after IFFT I get 2-3 times smaller convolved image! Why?
Also I'm confused that filter matrix values should be multiplied by 256, like pixel values: other questions on SO contain Matlab code without such normalization. As in previous case, without such multiplying it works bad: I get black image. Why?
// fft_in is shifted fourier image with center in [sz/2;sz/2]
void FourierRunner::convolveImage(ComplexImage& fft_in)
int sz = 256; // equal to fft_in.width()
// Get original complex image (backward fft_in)
ComplexImage original_complex = fft_in;
fft2d_backward(fft_in, original_complex);
int gsz = 3;
PlainImage<float> filter = getGaussMask(gsz);
ComplexImage filter_complex = ComplexImage::fromFloat(filter);
int sz_common = pow2ceil(sz); // should be sz+gsz-1 ???
ComplexImage h = ComplexImage::zeros(sz_common,sz_common);
ComplexImage g = ComplexImage::zeros(sz_common,sz_common);
copyImageToCenter(h, original_complex);
copyImageToCenter(g, filter_complex);
LOOP_2D(sz_common, sz_common) g.setPoint(x, y,, y)*256);
fft2d_forward(g, g);
fft2d_forward(h, h);
LOOP_2D(sz_common,sz_common) h.setPoint(x, y,, y)*, y));
copyImageToCenter(fft_in, h);
fft2d_backward(fft_in, fft_in);
PlainImage<float> frequency_res(sz,sz);
writeComplexToPlainImage(fft_in, frequency_res);
fft2d_forward(fft_in, fft_in);
I tried to zero-padd image at right and bottom, such that smaller image is copied to the start of bigger, but it also doesn't work.
I wrote convolution in spatial domain to compare results, frequency blur results are almost the same as in spatial domain (avg. error btw pixels is 5), only when sz_common = sz.
So, could you explain phenomena of zero-padding and normalization for this case? Thanks in advance.
Convolution in the Spatial Domain is equivalent of Multiplication in the Fourier Domain.
This is the truth for Continuous functions which are defined everywhere.
Yet in practice, we have discrete signals and convolution kernels.
Which require more gentle caring.
If you have an image of the size M x N and a Kernel of the size of MM x NN if you apply DFT (FFT is an efficient way to calculate the DFT) on them you'll get functions of the size of M x N and MM x NN respectively.
Moreover, the theorem above, about the multiplication equivalence requires to multiply the same frequencies one with each other.
Since practically the Kernel is much smaller than the image, usually it is zero padded to the size of the image.
Now, by applying the DFT you'll get to matrices of the same M x N size and will be able to multiply them.
Yet, this will be equivalent of the Circular Convolution between the Image and Kernel.
To apply the linear convolution you should make them both in the size of (M + MM - 1) x (N + NN - 1).
Usually this would be by applying "Replicate" boundary condition on the image and zero pad the Kernel.
How can I turn a three channel Mat into a summed up one channel Mat?

I want to add up all channels of a Mat image to a Mat image with only one sum-channel. I've tried it this way:
// sum up the channels of the image:
// 1 .store initial nr of rows/columns
int initialRows = frameVid1.rows;
int initialCols = frameVid1.cols;
// 2. check if matrix is continous
if (!frameVid1.isContinuous())
frameVid1 = frameVid1.clone();
// 3. reshape matrix to 3 color vectors
frameVid1 = frameVid1.reshape(3, initialRows*initialCols);
// 4. convert matrix to store bigger values than 255
frameVid1.convertTo(frameVid1, CV_32F);
// 5. sum up the three color vectors
reduce(frameVid1, frameVid1, 1, CV_REDUCE_SUM);
// 6. reshape to initial size
frameVid1 = frameVid1.reshape(1, initialRows);
// 7. convert back to CV_8UC1
frameVid1.convertTo(frameVid1, CV_8U);
But somehow reduce does not touch the color channels as a Matrix Dimension. Is there another function that can sum them up?
Also why does using CV_16U in step 4.) not work? (I had to put a CV_32F in there)
Thanks in advance!
You can sum the RGB channels with a single line
cv::transform(frameVid1, frameVidSum, cv::Matx13f(1,1,1))
You may need one more line, as before applying the transform you shall convert the image to some appropriate type to avoid saturation (I assumed CV_32FC3). -Output array is of the same size and depth as source.
Some explanation:
cv::transform may operate on per-pixel channel values.
Having the third argument cv::Matx13f(a, b, c) for each pixel [u,v] it does the following:
frameVidSum[u,v] = frameVid1[u,v].B * a + frameVid1[u,v].G * b + frameVid1[u,v].R * c
By using third argument cv::Matx13f(1,0,1) you will sum only blue and red channels.
cv::transform is so clever, you can even use cv::Matx14f and then the fourth value will be added (offset) to each pixel in the frameVidSum.
Every 3rd element (in RGB) is one similar colour. Probably it will work if you grab every group of 3 elements (R, G and B) sum them up and store it in another 1-channel matrix. Before storing you should use saturate cast to avoid unexpected results. So, I think the better way is to use saturate cast instead of adapting your matrix.
Have a look at cv::split() and cv::add() functions.
You can use the split function to split the image into separate channels and then the add function to add the images. But be careful when using add because adding may lead to saturation of values. You may have to first convert types and then add. Have a look here:

sobel filter algorithm thresholding (no external libs used)

I am writing my own implementation of the sobel egde detection. My function's interface is
void sobel_filter(volatile PIXEL * pixel_in, FLAG *EOL, volatile PIXEL * pixel_out, int rows, int cols)
(PIXEL being an 8bit greyscale pixel)
For testing I changed the interface to:
void sobel_filter(PIXEL pixels_in[MAX_HEIGHT][MAX_WIDTH],PIXEL
pixels_out[MAX_HEIGHT][MAX_WIDTH], int rows,int cols);
But Still, the thing is I get to read one pixel at a time, which brings me to the problem of managing the output values of sobel when they are bigger then 255 or smaller then 0. If I had the whole picture from the start, I could normalize all sobel output values with their min and max values. But this is not possible for me.
This is my sobel operator code, ver1:
const char x_op[KERNEL_SIZE][KERNEL_SIZE] = { {-1,0,1},
const char y_op[KERNEL_SIZE][KERNEL_SIZE] = { {1,2,1},
short x_weight=0;
short y_weight=0;
PIXEL ans;
for (short i=0; i<KERNEL_SIZE; i++){
for(short j=0; j<KERNEL_SIZE; j++){
short val=ABS(x_weight)+ABS(y_weight);
//make sure the pixel value is between 0 and 255 and add thresholds
else if(val<100)
ans=255-(unsigned char)(val);
return ans;
this is ver 2, changes are made only after summing up the weights:
short val=ABS(x_weight)+ABS(y_weight);
unsigned char char_val=(255-(unsigned char)(val));
//make sure the pixel value is between 0 and 255 and add thresholds
else if(char_val<100)
return ans;
Now, for a 3x3 sobel both seem to be giving OK results:
But when I try with a 5x5 sobel
const char x_op[KERNEL_SIZE][KERNEL_SIZE] = { {1,2,0,-2,-1},
const char y_op[KERNEL_SIZE][KERNEL_SIZE] = { {-1,-4,-6,-4,-1},
it gets tricky:
As you can see, for the 5x5 the results are quite bad and I don't know how to normalize the values. Any ideas?
Think about the range of values that your filtered values can take.
For the Sobel 3x3, the highest X/Y value is obtained when the pixels with a positive coefficient are white (255), and the ones with a negative coefficient are black (0), which gives a total of 1020. Symmetrically, the lowest value is -1020. After taking the absolute value, the range is from 0 to 1020 = 4 x 255.
For the magnitude, Abs(X)+Abs(Y), the computation is a little more complicated as the two components cannot reach 1020 at the same time. If I am right, the range is from 0 to 1530 = 6 x 255.
Similar figures for the 5x5 are 48 x 255 and 66 x 255.
Knowing that, you should rescale the values to a smaller range (apply a reduction coefficient), and adjust the thresholds. Logically, if you apply a coefficient 3/66 to the Sobel 5x5, you will return to similar conditions.
It all depends on the effect that you want to achieve.
Anyway, the true question is: how are the filtered values statistically distributed for typical images ? Because it is unnecessary to keep the far tails of the distribution.
You have to normalize the results of your computation. For that you have to find out how "big" is the filter with all absoltue values. So I do this:
for(int i = 0; i < mask.length; i++)
for(int j = 0; j < mask[i].length; j++)
size += Math.abs(mask[i][j]);
Where mask is my sobel filter of each size. So after apply your sobel filter you have to normalize your value in your code it should look like:
for (short i=0; i<KERNEL_SIZE; i++){
for(short j=0; j<KERNEL_SIZE; j++){
x_weight /= size;
y_weight /= size;
After that for visualization you have to shift the values about 128. Just do that if you want to visualize the image. Otherwise you get problems with later calculations (gradient for example).
x_weight += 128;
y_weight += 128;
Hope it works and help.

Adaptive median filter with Opencv c++

I had a problem with writing the code of the adaptive median.
Which is the best way to compute the min intensity pixel max n median?
Til now I read every value of the pixels of the image
for (int y = 0; y < h; y++)
uchar *ptr = (uchar*)(img->imageData + y * step);
for (int x = 0; x < w; x++){
printf("%u, ", ptr[x]);
For the maxima and minima in a rectangular window, I would look to van Herk's dilation algorithm, as grayscale dilation corresponds to the maximum operator, and grayscale erosion to the minimum operator and a rectangular structuring element could be decomposed to a vertical and a horizontal line.
For the median filtering I would look to moving histogram techniques.
For the min/max pixel you'll need to record the value of the first pixel and then compare each other pixel to it, storing the new value if it's lower/higher respectively. OpenCV provides the cv::minmaxLoc to make this easy.
For the median you'll need to sort all pixels and select the middle one (once sorted of course, finding the min/max is trivial as they'll be on either end of the list). This is more tricky, how far have you got and what is not working?

Optimized float Blur variations

I am looking for optimized functions in c++ for calculating areal averages of floats. the function is passed a source float array, a destination float array (same size as source array), array width and height, "blurring" area width and height.
The function should "wrap-around" edges for the blurring/averages calculations.
Here is example code that blur with a rectangular shape:
* Find averages extended variations
void findaverages_ext(float *floatdata, float *dest_data, int fwidth, int fheight, int scale, int aw, int ah, int weight, int xoff, int yoff)
printf("findaverages_ext scale: %d, width: %d, height: %d, weight: %d \n", scale, aw, ah, weight);
float total = 0.0;
int spos = scale * fwidth * fheight;
int apos;
int w = aw;
int h = ah;
float* f_temp = new float[fwidth * fheight];
// Horizontal
for(int y=0;y<fheight ;y++)
Sleep(10); // Do not burn your processor
total = 0.0;
// Process entire window for first pixel (including wrap-around edge)
for (int kx = 0; kx <= w; ++kx)
if (kx >= 0 && kx < fwidth)
total += floatdata[y*fwidth + kx];
// Wrap
for (int kx = (fwidth-w); kx < fwidth; ++kx)
if (kx >= 0 && kx < fwidth)
total += floatdata[y*fwidth + kx];
// Store first window
f_temp[y*fwidth] = (total / (w*2+1));
for(int x=1;x<fwidth ;x++) // x width changes with y
// Substract pixel leaving window
if (x-w-1 >= 0)
total -= floatdata[y*fwidth + x-w-1];
// Add pixel entering window
if (x+w < fwidth)
total += floatdata[y*fwidth + x+w];
total += floatdata[y*fwidth + x+w-fwidth];
// Store average
apos = y * fwidth + x;
f_temp[apos] = (total / (w*2+1));
// Vertical
for(int x=0;x<fwidth ;x++)
Sleep(10); // Do not burn your processor
total = 0.0;
// Process entire window for first pixel
for (int ky = 0; ky <= h; ++ky)
if (ky >= 0 && ky < fheight)
total += f_temp[ky*fwidth + x];
// Wrap
for (int ky = fheight-h; ky < fheight; ++ky)
if (ky >= 0 && ky < fheight)
total += f_temp[ky*fwidth + x];
// Store first if not out of bounds
dest_data[spos + x] = (total / (h*2+1));
for(int y=1;y< fheight ;y++) // y width changes with x
// Substract pixel leaving window
if (y-h-1 >= 0)
total -= f_temp[(y-h-1)*fwidth + x];
// Add pixel entering window
if (y+h < fheight)
total += f_temp[(y+h)*fwidth + x];
total += f_temp[(y+h-fheight)*fwidth + x];
// Store average
apos = y * fwidth + x;
dest_data[spos+apos] = (total / (h*2+1));
delete f_temp;
What I need is similar functions that for each pixel finds the average (blur) of pixels from shapes different than rectangular.
The specific shapes are: "S" (sharp edges), "O" (rectangular but hollow), "+" and "X", where the average float is stored at the center pixel on destination data array. Size of blur shape should be variable, width and height.
The functions does not need to be pixelperfect, only optimized for performance. There could be separate functions for each shape.
I am also happy if anyone can tip me of how to optimize the example function above for rectangluar blurring.
What you are trying to implement are various sorts of digital filters for image processing. This is equivalent to convolving two signals where the 2nd one would be the filter's impulse response. So far, you regognized that a "rectangular average" is separable. By separable I mean, you can split the filter into two parts. One that operates along the X axis and one that operates along the Y axis -- in each case a 1D filter. This is nice and can save you lots of cycles. But not every filter is separable. Averaging along other shapres (S, O, +, X) is not separable. You need to actually compute a 2D convolution for these.
As for performance, you can speed up your 1D averages by properly implementing a "moving average". A proper "moving average" implementation only requires a fixed amount of little work per pixel regardless of the averaging "window". This can be done by recognizing that neighbouring pixels of the target image are computed by an average of almost the same pixels. You can reuse these sums for the neighbouring target pixel by adding one new pixel intensity and subtracting an older one (for the 1D case).
In case of arbitrary non-separable filters your best bet performance-wise is "fast convolution" which is FFT-based. Checkout If I recall correctly, there is even a chapter on how to properly do "fast convolution" using the FFT algorithm. Although, they explain it for 1-dimensional signals, it also applies to 2-dimensional signals. For images you have to perform 2D-FFT/iFFT transforms.
To add to sellibitze's answer, you can use a summed area table for your O, S and + kernels (not for the X one though). That way you can convolve a pixel in constant time, and it's probably the fastest method to do it for kernel shapes that allow it.
Basically, a SAT is a data structure that lets you calculate the sum of any axis-aligned rectangle. For the O kernel, after you've built a SAT, you'd take the sum of the outer rect's pixels and subtract the sum of the inner rect's pixels. The S and + kernels can be implemented similarly.
For the X kernel you can use a different approach. A skewed box filter is separable:
You can convolve with two long, thin skewed box filters, then add the two resulting images together. The center of the X will be counted twice, so will you need to convolve with another skewed box filter, and subtract that.
Apart from that, you can optimize your box blur in many ways.
Remove the two ifs from the inner loop by splitting that loop into three loops - two short loops that do checks, and one long loop that doesn't. Or you could pad your array with extra elements from all directions - that way you can simplify your code.
Calculate values like h * 2 + 1 outside the loops.
An expression like f_temp[ky*fwidth + x] does two adds and one multiplication. You can initialize a pointer to &f_temp[ky*fwidth] outside the loop, and just increment that pointer in the loop.
Don't do the division by h * 2 + 1 in the horizontal step. Instead, divide by the square of that in the vertical step.