I was looking for an implementation of Generalized Hough Transform,and then I found this website,which showed me a complete implementation of GHT .
I can totally understand how the algorithm processes except this:
Vec2i referenceP = Vec2i(id_max[0]*rangeXY+(rangeXY+1)/2, id_max[1]*rangeXY+(rangeXY+1)/2);
which calculates the reference point of the object based on the maximum value of the hough space,then mutiplied by rangXY to get back to the corresponding position of origin image.(rangeXY is the dimensions in pixels of the squares in which the image is divided. )
I edited the code to
Vec2i referenceP = Vec2i(id_max[0]*rangeXY, id_max[1]*rangeXY);
and I got another reference point then show all edgePoints in the image,which apparently not fit the shape.
I just cannot figure out what the factor(rangeXY+1)/2means.
Is there anyone who has implemented this code or familiared with the rationale of GHT can tell me what the factor rangeXYmeans? Thanks~
I am familiar with the classic Hough Transform, though not with the generalised one. However, I believe you give enough information in your question for me to answer it without being familiar with the algorithm in question.
(rangeXY+1)/2 is simply integer division by 2 with rounding. For instance (4+1)/2 gives 2 while (5+1)/2 gives 3 (2.5 rounds up). Now, since rangeXY is the side of a square block of pixels and id_max is the position (index) of such a block, then id_max[dim]*rangeXY+(rangeXY+1)/2 gives the position of the central pixel in that block.
On the other hand, when you simplified the expression to id_max[dim]*rangeXY, you were getting the position of the top-left rather than the central pixel.
Related
I am writing a disparity matching algorithm using block matching, but I am not sure how to find the corresponding pixel values in the secondary image.
Given a square window of some size, what techniques exist to find the corresponding pixels? Do I need to use feature matching algorithms or is there a simpler method, such as summing the pixel values and determining whether they are within some threshold, or perhaps converting the pixel values to binary strings where the values are either greater than or less than the center pixel?
I'm going to assume you're talking about Stereo Disparity, in which case you will likely want to use a simple Sum of Absolute Differences (read that wiki article before you continue here). You should also read this tutorial by Chris McCormick before you read more here.
side note: SAD is not the only method, but it's really common and should solve your problem.
You already have the right idea. Make windows, move windows, sum pixels, find minimums. So I'll give you what I think might help:
To start:
If you have color images, first you will want to convert them to black and white. In python you might use a simple function like this per pixel, where x is a pixel that contains RGB.
def rgb_to_bw(x):
return int(x[0]*0.299 + x[1]*0.587 + x[2]*0.114)
You will want this to be black and white to make the SAD easier to computer. If you're wondering why you don't loose significant information from this, you might be interested in learning what a Bayer Filter is. The Bayer Filter, which is typically RGGB, also explains the multiplication ratios of the Red, Green, and Blue portions of the pixel.
Calculating the SAD:
You already mentioned that you have a window of some size, which is exactly what you want to do. Let's say this window is n x n in size. You would also have some window in your left image WL and some window in your right image WR. The idea is to find the pair that has the smallest SAD.
So, for each left window pixel pl at some location in the window (x,y) you would the absolute value of difference of the right window pixel pr also located at (x,y). you would also want some running value, which is the sum of these absolute differences. In sudo code:
SAD = 0
from x = 0 to n:
from y = 0 to n:
SAD = SAD + absolute_value|pl - pr|
After you calculate the SAD for this pair of windows, WL and WR you will want to "slide" WR to a new location and calculate another SAD. You want to find the pair of WL and WR with the smallest SAD - which you can think of as being the most similar windows. In other words, the WL and WR with the smallest SAD are "matched". When you have the minimum SAD for the current WL you will "slide" WL and repeat.
Disparity is calculated by the distance between the matched WL and WR. For visualization, you can scale this distance to be between 0-255 and output that to another image. I posted 3 images below to show you this.
Typical Results:
Left Image:
Right Image:
Calculated Disparity (from the left image):
you can get test images here: http://vision.middlebury.edu/stereo/data/scenes2003/
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Please explain as to what happens to an image when we use histeq function in MATLAB? A mathematical explanation would be really helpful.
Histogram equalization seeks to flatten your image histogram. Basically, it models the image as a probability density function (or in simpler terms, a histogram where you normalize each entry by the total number of pixels in the image) and tries to ensure that the probability for a pixel to take on a particular intensity is equiprobable (with equal probability).
The premise behind histogram equalization is for images that have poor contrast. Images that look like they're too dark, or if they're too washed out, or if they're too bright are good candidates for you to apply histogram equalization. If you plot the histogram, the spread of the pixels is limited to a very narrow range. By doing histogram equalization, the histogram will thus flatten and give you a better contrast image. The effect of this with the histogram is that it stretches the dynamic range of your histogram.
In terms of the mathematical definition, I won't bore you with the details and I would love to have some LaTeX to do it here, but it isn't supported. As such, I defer you to this link that explains it in more detail: http://www.math.uci.edu/icamp/courses/math77c/demos/hist_eq.pdf
However, the final equation that you get for performing histogram equalization is essentially a 1-to-1 mapping. For each pixel in your image, you extract its intensity, then run it through this function. It then gives you an output intensity to be placed in your output image.
Supposing that p_i is the probability that you would encounter a pixel with intensity i in your image (take the histogram bin count for pixel intensity i and divide by the total number of pixels in your image). Given that you have L intensities in your image, the output intensity at this location given the intensity of i is dictated as:
g_i = floor( (L-1) * sum_{n=0}^{i} p_i )
You add up all of the probabilities from pixel intensity 0, then 1, then 2, all the way up to intensity i. This is familiarly known as the Cumulative Distribution Function.
MATLAB essentially performs histogram equalization using this approach. However, if you want to implement this yourself, it's actually pretty simple. Assume that you have an input image im that is of an unsigned 8-bit integer type.
function [out] = hist_eq(im, L)
if (~exist(L, 'var'))
L = 256;
end
h = imhist(im) / numel(im);
cdf = cumsum(h);
out = (L-1)*cdf(double(im)+1);
out = uint8(out);
This function takes in an image that is assumed to be unsigned 8-bit integer. You can optionally specify the number of levels for the output. Usually, L = 256 for an 8-bit image and so if you omit the second parameter, L would be assumed as such. The first line computes the probabilities. The next line computes the Cumulative Distribution Function (CDF). The next two lines after compute input/output using histogram equalization, and then convert back to unsigned 8-bit integer. Note that the uint8 casting implicitly performs the floor operation for us. You'll need to take note that we have to add an offset of 1 when accessing the CDF. The reason why is because MATLAB starts indexing at 1, while the intensities in your image start at 0.
The MATLAB command histeq pretty much does the same thing, except that if you call histeq(im), it assumes that you have 32 intensities in your image. Therefore, you can override the histeq function by specifying an additional parameter that specifies how many intensity values are seen in the image just like what we did above. As such, you would do histeq(im, 256);. Calling this in MATLAB, and using the function I wrote above should give you identical results.
As a bit of an exercise, let's use an image that is part of the MATLAB distribution called pout.tif. Let's also show its histogram.
im = imread('pout.tif');
figure;
subplot(2,1,1);
imshow(im);
subplot(2,1,2);
imhist(im);
As you can see, the image has poor contrast because most of the intensity values fit in a narrow range. Histogram equalization will flatten the image and thus increase the contrast of the image. As such, try doing this:
out = histeq(im, 256); %//or you can use my function: out = hist_eq(im);
figure;
subplot(2,1,1);
imshow(out);
subplot(2,1,2);
imhist(out);
This is what we get:
As you can see the contrast is better. Darker pixels tend to move towards the darker end, while lighter pixels get pushed towards the lighter end. Successful result I think! Bear in mind that not all images will give you a good result when you try and do histogram equalization. Image processing is mostly a trial and error thing, and so you put a mishmash of different techniques together until you get a good result.
This should hopefully get you started. Good luck!
I have Problem understanding all Parameter of backgroundsubtractormog2.
I looked in the code (located in bfgf_gaussmix2.cpp), but don't see the connection to the mentioned paper. For exmaple is Tb = varThreshold, but what is the name of Tb in the paper?
I am especially interested in the fat marked parameter.
Let's start with the easy parameter [my remarks]:
int nmixtures
Maximum allowed number of mixture components. Actual number is determined dynamically per pixel.
[set 0 for GMG]
uchar nShadowDetection
The value for marking shadow pixels in the output foreground mask. Default value is 127.
float fTau
Shadow threshold. The shadow is detected if the pixel is a darker version of the background. Tau is a threshold defining how much darker the shadow can be. Tau= 0.5 means that if a pixel is more than twice darker then it is not shadow.
Now to the ones i don't understand:
float backgroundRatio
Threshold defining whether the component is significant enough to be included into the background model ( corresponds to TB=1-cf from the paper??which paper??). cf=0.1 => TB=0.9 is default. For alpha=0.001, it means that the mode should exist for approximately 105 frames before it is considered foreground.
float varThresholdGen
Threshold for the squared Mahalanobis distance that helps decide when a sample is close to the existing components (corresponds to Tg). If it is not close to any component, a new component is generated. 3 sigma => Tg=3*3=9 is default. A smaller Tg value generates more components. A higher Tg value may result in a small number of components but they can grow too large. [i don't understand a word of this]
In the Constructor the variable varThreshold is used. Is it the same as varThresholdGen?
Threshold on the squared Mahalanobis distance to decide whether it is well described by the background model (see Cthr??). This parameter does not affect the background update. A typical value could be 4 sigma, that is, varThreshold=4*4=16; (see Tb??).
float fVarInit
Initial variance for the newly generated components. It affects the speed of adaptation. The parameter value is based on your estimate of the typical standard deviation from the images. OpenCV uses 15 as a reasonable value.
float fVarMin
Parameter used to further control the variance.
float fVarMax
Parameter used to further control the variance.
float fCT
Complexity reduction parameter. This parameter defines the number of samples needed to accept to prove the component exists. CT=0.05 is a default value for all the samples. By setting CT=0 you get an algorithm very similar to the standard Stauffer&Grimson algorithm.
Someone asked pretty much the same question on the OpenCV website, but without an answer.
Well, I don't think anyone could tell you which parameter is what if you don't know the details of the algorithm that you are using. Besides, you should not need anyone to tell you which parameter is what if you know the details of the algorithm. I'm telling this for detailed parameters (fCT, fVarMax, etc.) not for straightforward ones (nmixtures, nShadowDetection, etc.).
So, I think you should read the papers referenced in the documentation. Here are the links for the papers 1, 2, 3.
And also you should read this paper as well, which is the beginning of background estimation.
After reading these papers and checking out the code with, I'm sure you will understand what those parameters are.
Good luck!
I am struggling with the interpretation of kinect depth data.
In order to obtain real world distance from kinect, i used the following formula :
if(i<2047){
depthToMeterTable[i] = i * -0.0030711016 + 3.3309495161;
}
else{
depthToMeterTable[i] = 0;
}
This formula gives something pretty good as a distance estimator.
However i do obtain strange output from a 90° wall corner visualisation.
On the following image is two different information. First, the violet lines represent the wall as i SHOULD see it. A 90° corner. The red dots represent the wall seen from the kinect. As you can see, the angle of the two planes is now bigger.
http://img843.imageshack.us/img843/4061/kinectbias.jpg
Do you have any idea where i could correct this bias, and how to do it ?
Thank you for reading,
Al_th
I'm not familiar with that conversion formula (also not sure how your depthToMeterTable gets filled - what formula is used there).
There's a built-in function in libfreenect for that though: freenect_camera_to_world
Before that utility function was added I used Matt Fischer's conversion functions(RawDepthToMeters and DepthToWorld).
HTH
I was studying Perlin's Noise through some examples # http://dindinx.net/OpenGL/index.php?menu=exemples&submenu=shaders and couldn't help to notice that his make3DNoiseTexture() in perlin.c uses noise3(ni) instead of PerlinNoise3D(...)
Now why is that? Isn't Perlin's Noise supposed to be a summation of different noise frequencies and amplitudes?
Qestion 2 is what does ni, inci, incj, inck stand for? Why use ni instead of x,y coordinates? Why is ni incremented with
ni[0]+=inci;
inci = 1.0 / (Noise3DTexSize / frequency);
I see Hugo Elias created his Perlin2D with x,y coordinates, and so does PerlinNoise3D(...).
Thanks in advance :)
I now understand why and am going to answer my own question in hopes that it helps other people.
Perlin's Noise is actually a synthesis of gradient noises. In its production process, we must compute the dot product of a vector pointing from one of the corners flooring the input point to the input point itself with the random-generated gradient vector.
Now if the input point were a whole number, such as the xyz coordinates of a texture you want to create, the dot product would always return 0, which would give you a flat noise. So instead, we use inci, incj, inck as an alternative index. Yep, just an index, nothing else.
Now returning to question 1, there are two methods to implement Perlin's Noise:
1.Calculate the noise values separately and store them in the RGBA slots in the texture
2.Synthesize the noises up before-hand and store them in one of the RGBA slots in the texture
noise3(ni) is the actual implementation of method 1, while PerlinNoise3D(...) suggests the latter.
In my personal opinion, method 1 is much better because you have much more flexibility over how you use each octave in your shaders.
My guess on the reason for using noise3(ni) in make3DNoiseTexture() instead if PerlinNoise3D(...) is that when you use that noise texture in your shader you want to be able to replicate and modify the functionality of PerlinNoise3D(...) directly in the shader.
My guess for the reasoning behind ni, inci, incj, inck is that using x,y,z of the volume directly don't give a good result so by scaling the the noise with the frequency instead it is possible to adjust the resolution of the noise independently from the volume size.