I had a problem with writing the code of the adaptive median.
Which is the best way to compute the min intensity pixel max n median?
Til now I read every value of the pixels of the image
for (int y = 0; y < h; y++)
{
uchar *ptr = (uchar*)(img->imageData + y * step);
for (int x = 0; x < w; x++){
printf("%u, ", ptr[x]);
}
printf("\n");
}
For the maxima and minima in a rectangular window, I would look to van Herk's dilation algorithm, as grayscale dilation corresponds to the maximum operator, and grayscale erosion to the minimum operator and a rectangular structuring element could be decomposed to a vertical and a horizontal line.
For the median filtering I would look to moving histogram techniques.
For the min/max pixel you'll need to record the value of the first pixel and then compare each other pixel to it, storing the new value if it's lower/higher respectively. OpenCV provides the cv::minmaxLoc to make this easy.
For the median you'll need to sort all pixels and select the middle one (once sorted of course, finding the min/max is trivial as they'll be on either end of the list). This is more tricky, how far have you got and what is not working?
Related
Assume that I have 2 matrix: image, filter; with size MxM and NxN.
My regular convolution looks like this and produces matrix output size (M-N+1)x(M-N+1). Basically it places the top-left corner of a filter on a pixel, convolute, then assign the sum onto that pixel:
for (int i=0; i<M-N; i++)
for (int j=0; j<M-N; j++)
{
float sum = 0;
for (int u=0; u<N; u++)
for (int v=0; v<N; v++)
sum += image[i+u][j+v] * filter[u][v];
output[i][j] = sum;
}
Next, to perform FFT:
Apply zero-padding to both image, filter to the right and bottom (that is, adding more zero columns to the right, zero rows to the bottom). Now both have size (M+N)x(M+N); the original image is at
image[0->M-1][0-M-1].
(Do the same for both matrix) Calculate the FFT of each row into a new matrix, then calculate the FFT of each column of that new matrix.
Now, I have 2 matrices imageFreq and filterFreq, both size (M+N)x(M+N), which is the FFT-ed form of the image and the filter.
But how can I get the convolution values that I need (as described in the sample code) from them?
convolution between A,B using FFT is done by per element multiplication in the frequency domain so in 1D something like this:
convert A,B by FFT
assuming the sizes are N,M of A[N],B[M] first zero pad to common size Q which is a power of 2 and at least M+N in size and then apply FFT:
Q = exp2(ceil(log2(M+N)));
zeropad(A,Q);
zeropad(B,Q);
a = FFT(A);
b = FFT(B);
convolute
in frequency domain use just element wise multiplication:
for (i=0;i<Q;i++) a[i]*=b[i];
reconstruct result
simply apply IFFT (inverse of FFT)...
AB = IFFT(a); // crop to first N (real) elements
and use only the first N element (unless algorithm used need more depends on what you are doing...)
For 2D you can either convolute directly in 2D (using 2 nested for loops) or convolve each axis separately. Beware that separating axises need also to normalize the result by some constant (which depends on dimensionality, resolution and kernel used)
So when put together (also assuming the same resolution NxN and MxM) first zero pad to (QxQ) and then:
Q = exp2(ceil(log2(M+N)));
zeropad(A,Q,Q);
zeropad(B,Q,Q);
a = FFT(A);
b = FFT(B);
for (i=0;i<Q;i++)
for (j=0;j<Q;j++) a[i][j]*=b[i][j];
AB = IFFT(a); // crop to first NxN (real) elements
And again crop to AB to NxN size (unless ...) for more info see:
How to compute Discrete Fourier Transform?
and all sublinks there... Also here at the end is 1D convolution example using NTT (its a special form of FFT) to compute bignum multiplication:
Modular arithmetics and NTT (finite field DFT) optimizations
Also if you want real result then just use only the real parts of the result (ignore imaginary part).
The problem is I can't fully understand the principles of convolution in frequency domain.
I have an image of size 256x256, which I want to convolve with 3x3 gaussian matrix. It's coefficients are (1/16, 1/8, 1/4):
PlainImage<float> FourierRunner::getGaussMask(int sz)
{
PlainImage<float> G(3,3);
*G.at(0, 0) = 1.0/16; *G.at(0, 1) = 1.0/8; *G.at(0, 2) = 1.0/16;
*G.at(1, 0) = 1.0/8; *G.at(1, 1) = 1.0/4; *G.at(1, 2) = 1.0/8;
*G.at(2, 0) = 1.0/16; *G.at(2, 1) = 1.0/8; *G.at(2, 2) = 1.0/16;
return G;
}
To get FFT of both image and filter kernel, I zero-pad them. sz_common stands for the extended size. Image and kernel are moved to the center of h and g ComplexImages respectively, so they are zero-padded at right, left, bottom and top.
I've read that size should be sz_common >= sz+gsz-1 because of circular convolution property: filter can change undesired image values on boundaries.
But it don't works: adequate results are only when sz_common = sz, when sz_common = sz+gsz-1 or sz_common = 2*sz, after IFFT I get 2-3 times smaller convolved image! Why?
Also I'm confused that filter matrix values should be multiplied by 256, like pixel values: other questions on SO contain Matlab code without such normalization. As in previous case, without such multiplying it works bad: I get black image. Why?
// fft_in is shifted fourier image with center in [sz/2;sz/2]
void FourierRunner::convolveImage(ComplexImage& fft_in)
{
int sz = 256; // equal to fft_in.width()
// Get original complex image (backward fft_in)
ComplexImage original_complex = fft_in;
fft2d_backward(fft_in, original_complex);
int gsz = 3;
PlainImage<float> filter = getGaussMask(gsz);
ComplexImage filter_complex = ComplexImage::fromFloat(filter);
int sz_common = pow2ceil(sz); // should be sz+gsz-1 ???
ComplexImage h = ComplexImage::zeros(sz_common,sz_common);
ComplexImage g = ComplexImage::zeros(sz_common,sz_common);
copyImageToCenter(h, original_complex);
copyImageToCenter(g, filter_complex);
LOOP_2D(sz_common, sz_common) g.setPoint(x, y, g.at(x, y)*256);
fft2d_forward(g, g);
fft2d_forward(h, h);
fft2d_fft_shift(g);
// CONVOLVE
LOOP_2D(sz_common,sz_common) h.setPoint(x, y, h.at(x, y)*g.at(x, y));
copyImageToCenter(fft_in, h);
fft2d_backward(fft_in, fft_in);
fft2d_fft_shift(fft_in);
// TEST DIFFERENCE BTW DOMAINS
PlainImage<float> frequency_res(sz,sz);
writeComplexToPlainImage(fft_in, frequency_res);
fft2d_forward(fft_in, fft_in);
}
I tried to zero-padd image at right and bottom, such that smaller image is copied to the start of bigger, but it also doesn't work.
I wrote convolution in spatial domain to compare results, frequency blur results are almost the same as in spatial domain (avg. error btw pixels is 5), only when sz_common = sz.
So, could you explain phenomena of zero-padding and normalization for this case? Thanks in advance.
Convolution in the Spatial Domain is equivalent of Multiplication in the Fourier Domain.
This is the truth for Continuous functions which are defined everywhere.
Yet in practice, we have discrete signals and convolution kernels.
Which require more gentle caring.
If you have an image of the size M x N and a Kernel of the size of MM x NN if you apply DFT (FFT is an efficient way to calculate the DFT) on them you'll get functions of the size of M x N and MM x NN respectively.
Moreover, the theorem above, about the multiplication equivalence requires to multiply the same frequencies one with each other.
Since practically the Kernel is much smaller than the image, usually it is zero padded to the size of the image.
Now, by applying the DFT you'll get to matrices of the same M x N size and will be able to multiply them.
Yet, this will be equivalent of the Circular Convolution between the Image and Kernel.
To apply the linear convolution you should make them both in the size of (M + MM - 1) x (N + NN - 1).
Usually this would be by applying "Replicate" boundary condition on the image and zero pad the Kernel.
Enjoy...
P.S.
Could you support a new Community Proposal for SE at - http://area51.stackexchange.com/proposals/86832/.
We need more people to follow, up vote questions with less than 10 up votes and more question to be asked.
Thank You.
I'm trying to program a simulation. Originally I'd randomly create points like so...
for (int c = 0; c < number; c++){
for(int d = 0; d < 3; d++){
coordinate[c][d] = randomrange(low, high);
}
}
Where randomrange() is an arbitrary range randomizer, number is the amount of created points, and d represents the x,y,z coordinate. It works, however I want to take things further. How would I define a known shape? Say I want 80 points on a circle's circumference, or 500 that form the edges of a cube. I can explain well on paper, but have a problem describing the process as coding. This doesn't pertain to the question, but I end up taking the points to txt file and then use matlab, scatter3 to plot the points. Creating the "shape" points is my issue.
Both a circle and a cube edges set are 1-dimensional sets, so you can represent them as real intervals. For a circle it's straightforward: use an interval (0, 2pi) and transform a random value phi from the interval into a point:
xcentre + R cos(phi), ycentre + R sin(phi)
For a cube you have 12 segments, so use interval (0, 12) and split a random number from the interval into an integer part and a fraction. Then use the integer as an edge number and the fraction as a position inside the edge.
Easy variant:
First think of the min/max x/y values (separately; to reduce the faulty values for the step below), generate some coordinates matching this range, and then check if it fulfills eg. a^2+b^2=r^2 (circle)
If not, try again.
Better, but only possible for certain shapes:
Generate a radius between (0-max) and an angle (0-360)
(or just an angle if it should be on the circle border)
and use some math (sin/cos...) to transform it into x and y.
http://en.wikipedia.org/wiki/Polar_coordinate_system
I am writing my own implementation of the sobel egde detection. My function's interface is
void sobel_filter(volatile PIXEL * pixel_in, FLAG *EOL, volatile PIXEL * pixel_out, int rows, int cols)
(PIXEL being an 8bit greyscale pixel)
For testing I changed the interface to:
void sobel_filter(PIXEL pixels_in[MAX_HEIGHT][MAX_WIDTH],PIXEL
pixels_out[MAX_HEIGHT][MAX_WIDTH], int rows,int cols);
But Still, the thing is I get to read one pixel at a time, which brings me to the problem of managing the output values of sobel when they are bigger then 255 or smaller then 0. If I had the whole picture from the start, I could normalize all sobel output values with their min and max values. But this is not possible for me.
This is my sobel operator code, ver1:
PIXEL sobel_op(PIXEL_CH window[KERNEL_SIZE][KERNEL_SIZE]){
const char x_op[KERNEL_SIZE][KERNEL_SIZE] = { {-1,0,1},
{-2,0,2},
{-1,0,1}};
const char y_op[KERNEL_SIZE][KERNEL_SIZE] = { {1,2,1},
{0,0,0},
{-1,-2,-1}};
short x_weight=0;
short y_weight=0;
PIXEL ans;
for (short i=0; i<KERNEL_SIZE; i++){
for(short j=0; j<KERNEL_SIZE; j++){
x_weight+=window[i][j]*x_op[i][j];
y_weight+=window[i][j]*y_op[i][j];
}
}
short val=ABS(x_weight)+ABS(y_weight);
//make sure the pixel value is between 0 and 255 and add thresholds
if(val>200)
val=255;
else if(val<100)
val=0;
ans=255-(unsigned char)(val);
return ans;
}
this is ver 2, changes are made only after summing up the weights:
short val=ABS(x_weight)+ABS(y_weight);
unsigned char char_val=(255-(unsigned char)(val));
//make sure the pixel value is between 0 and 255 and add thresholds
if(char_val>200)
char_val=255;
else if(char_val<100)
char_val=0;
ans=char_val;
return ans;
Now, for a 3x3 sobel both seem to be giving OK results:
;
But when I try with a 5x5 sobel
const char x_op[KERNEL_SIZE][KERNEL_SIZE] = { {1,2,0,-2,-1},
{4,8,0,-8,-4},
{6,12,0,-12,-6},
{4,8,0,-8,-4},
{1,2,0,-2,-1}};
const char y_op[KERNEL_SIZE][KERNEL_SIZE] = { {-1,-4,-6,-4,-1},
{-2,-8,-12,-8,-2},
{0,0,0,0,0},
{2,8,12,8,2},
{1,4,6,4,1}};
it gets tricky:
As you can see, for the 5x5 the results are quite bad and I don't know how to normalize the values. Any ideas?
Think about the range of values that your filtered values can take.
For the Sobel 3x3, the highest X/Y value is obtained when the pixels with a positive coefficient are white (255), and the ones with a negative coefficient are black (0), which gives a total of 1020. Symmetrically, the lowest value is -1020. After taking the absolute value, the range is from 0 to 1020 = 4 x 255.
For the magnitude, Abs(X)+Abs(Y), the computation is a little more complicated as the two components cannot reach 1020 at the same time. If I am right, the range is from 0 to 1530 = 6 x 255.
Similar figures for the 5x5 are 48 x 255 and 66 x 255.
Knowing that, you should rescale the values to a smaller range (apply a reduction coefficient), and adjust the thresholds. Logically, if you apply a coefficient 3/66 to the Sobel 5x5, you will return to similar conditions.
It all depends on the effect that you want to achieve.
Anyway, the true question is: how are the filtered values statistically distributed for typical images ? Because it is unnecessary to keep the far tails of the distribution.
You have to normalize the results of your computation. For that you have to find out how "big" is the filter with all absoltue values. So I do this:
for(int i = 0; i < mask.length; i++)
for(int j = 0; j < mask[i].length; j++)
size += Math.abs(mask[i][j]);
Where mask is my sobel filter of each size. So after apply your sobel filter you have to normalize your value in your code it should look like:
for (short i=0; i<KERNEL_SIZE; i++){
for(short j=0; j<KERNEL_SIZE; j++){
x_weight+=window[i][j]*x_op[i][j];
y_weight+=window[i][j]*y_op[i][j];
}
}
x_weight /= size;
y_weight /= size;
After that for visualization you have to shift the values about 128. Just do that if you want to visualize the image. Otherwise you get problems with later calculations (gradient for example).
x_weight += 128;
y_weight += 128;
Hope it works and help.
I wanted to detect ellipse in an image. Since I was learning Mathematica at that time, I asked a question here and got a satisfactory result from the answer below, which used the RANSAC algorithm to detect ellipse.
However, recently I need to port it to OpenCV, but there are some functions that only exist in Mathematica. One of the key function is the "GradientOrientationFilter" function.
Since there are five parameters for a general ellipse, I need to sample five points to determine one. Howevere, the more sampling points indicates the lower chance to have a good guess, which leads to the lower success rate in ellipse detection. Therefore, the answer from Mathematica add another condition, that is the gradient of the image must be parallel to the gradient of the ellipse equation. Anyway, we'll only need three points to determine one ellipse using least square from the Mathematica approach. The result is quite good.
However, when I try to find the image gradient using Sobel or Scharr operator in OpenCV, it is not good enough, which always leads to the bad result.
How to calculate the gradient or the tangent of an image accurately? Thanks!
Result with gradient, three points
Result without gradient, five points
----------updated----------
I did some edge detect and median blur beforehand and draw the result on the edge image. My original test image is like this:
In general, my final goal is to detect the ellipse in a scene or on an object. Something like this:
That's why I choose to use RANSAC to fit the ellipse from edge points.
As for your final goal, you may try
findContours and [fitEllipse] in OpenCV
The pseudo code will be
1) some image process
2) find all contours
3) fit each contours by fitEllipse
here is part of code I use before
[... image process ....you get a bwimage ]
vector<vector<Point> > contours;
findContours(bwimage, contours, CV_RETR_LIST, CV_CHAIN_APPROX_NONE);
for(size_t i = 0; i < contours.size(); i++)
{
size_t count = contours[i].size();
Mat pointsf;
Mat(contours[i]).convertTo(pointsf, CV_32F);
RotatedRect box = fitEllipse(pointsf);
/* You can put some limitation about size and aspect ratio here */
if( box.size.width > 20 &&
box.size.height > 20 &&
box.size.width < 80 &&
box.size.height < 80 )
{
if( MAX(box.size.width, box.size.height) > MIN(box.size.width, box.size.height)*30 )
continue;
//drawContours(SrcImage, contours, (int)i, Scalar::all(255), 1, 8);
ellipse(SrcImage, box, Scalar(0,0,255), 1, CV_AA);
ellipse(SrcImage, box.center, box.size*0.5f, box.angle, 0, 360, Scalar(200,255,255), 1, CV_AA);
}
}
imshow("result", SrcImage);
If you focus on ellipse(no other shape), you can treat the value of the pixels of the ellipse as mass of the points.
Then you can calculate the moment of inertial Ixx, Iyy, Ixy to find out the angle, theta, which can rotate a general ellipse back to a canonical form (X-Xc)^2/a + (Y-Yc)^2/b = 1.
Then you can find out Xc and Yc by the center of mass.
Then you can find out a and b by min X and min Y.
--------------- update -----------
This method can apply to filled ellipse too.
More than one ellipse on a single image will fail unless you segment them first.
Let me explain more,
I will use C to represent cos(theta) and S to represent sin(theta)
After rotation to canonical form, the new X is [eq0] X=xC-yS and Y is Y=xS+yC where x and y are original positions.
The rotation will give you min IYY.
[eq1]
IYY= Sum(m*Y*Y) = Sum{m*(xS+yC)(xS+yC)} = Sum{ m(xxSS+yyCC+xySC) = Ixx*S^2 + Iyy*C^2 + Ixy*S*C
For min IYY, d(IYY)/d(theta) = 0 that is
2IxxSC - 2IyySC + Ixy(CC-SS) = 0
2(Ixx-Iyy)/Ixy = (SS-CC)/SC = S/C+C/S = Z+1/Z
While programming, the LHS is just a number, let's said N
Z^2 - NZ +1 =0
So there are two roots of Z hence theta, let's said Z1 and Z2, one will min the IYY and the other will max the IYY.
----------- pseudo code --------
Compute Ixx, Iyy, Ixy for a hollow or filled ellipse.
Compute theta1=atan(Z1) and theta2=atan(Z2)
Put These two theta into eq1 find which is smaller. Then you get theta.
Go back to those non-zero pixels, transfer them to new X and Y by the theta you found.
Find center of mass Xc Yc and min X and min Y by sort().
-------------- by hand -----------
If you need the original equation of the ellipse
Just put [eq0] into the canonical form
You're using terms in an unusual way.
Normally for images, the term "gradient" is interpreted as if the image is a mathematical function f(x,y). This gives us a (df/dx, df/dy) vector in each point.
Yet you're looking at the image as if it's a function y = f(x) and the gradient would be f(x)/dx.
Now, if you look at your image, you'll see that the two interpretations are definitely related. Your ellipse is drawn as a set of contrasting pixels, and as a result there are two sharp gradients in the image - the inner and outer. These of course correspond to the two normal vectors, and therefore are in opposite directions.
Also note that your image has pixels. The gradient is also pixelated. The way your ellipse is drawn, with a single pixel width means that your local gradient takes on only values that are a multiple of 45 degrees:
▄▄ ▄▀ ▌ ▀▄