Why do edge detection filters sum to 0 whereas blur filters sum to 1? - computer-vision

I am now learning about filters in computer vision. I can see that the elements of the kernel for edge detection sum to 0, whereas for blurring sum to 1.
I am wondering, does it have to do with the fact that the one is a high-pass and the other is a low-pass filter? Is there some kind of rule or explanation?
Thanks in advance!

Blur filters must preserve the mean image intensity. This is why their kernels sum to 1. If you look at their frequency response, you’ll see that the zero-frequency component (DC component) is 1. This component is the sum over the kernel. And it being 1 means that the DC component of the image is not modified when applying the convolution. Yes, this is a property of any low-pass filter. Modifying the zero frequency means you don’t let low frequencies pass unaltered.
What you call edge detection filters are really estimators of the derivative. They add to zero because of the definition of the derivative: the slope at any one point does not depend on how high up that point is. Adding or subtracting a constant from the function (or image) will not change the derivative, the derivative of I and I+1 are the same. Therefore the derivative filter cannot preserve the mean image intensity: you’d get a different result for dI/dx and for d(I+1)/dx, which would not make sense.
The Laplace filter (not an edge detector) is a generalized second order derivative, the same reasoning as above applies.

Related

Optical Flow: What exactly is the temporal derivative?

I'm trying to understand what the meaning of a temporal derivative is in an image. While I understand the brightness constancy equation, I don't understand why taking the difference between two images gives me the temporal derivative.
Taking the difference between two frames gives me the difference in pixel intensity per pixel between the two, but how is that the same as asking how much the image changed over a certain span of time?
The temporal derivative dI/dt of the image I(x,y,t) is the rate of change of the image over time at a particular position. As you noted, this is the difference in pixel intensity between the two frames. Considering a single pixel at (x,y), the finite difference approximation to the derivative is
f_d = ( I(x,y,t+delta) - I(x,y,t) ) / delta so that f_d -> dI/dt as delta -> 0.
In this case delta is simply set to one. So we are approximating the image derivative (with respect to time) by the difference between adjacent frames.
One aspect that may be confusing is how that relates to the movement of objects in the image. If you have some physics background, for instance, you might think about the difference between Eulerian and Lagrangian frames of reference: in the more intuitive Lagrangian viewpoint, you consider an object moving by tracking it over the pixels (space) in which it moves, e.g. watching a cat as it hops over a fence. The Eulerian view, which is closer to what we do in optical flow, is to track what happens at a single pixel, and never take our eyes off of it. As the cat passes over that area of (pixel) space, the pixel's values will change, and then go back to "normal" when it's gone.
These two views are in some sense equivalent, but may be useful in difference situations. In computer vision, tracking an object is hard, while computing these Eulerian-like temporal derivatives is easy. Ideally, we could track the cat: consider a point p(t)=(x_p(t),y_p(t)) on say its head, then compute dp/dt and figure out p(t) for all t, and use that for downstream processing. Unfortunately, this is hard, so instead we hope that brightness constancy is usually locally true, and use the optical flow to estimate dp/dt. Of course, dI/dt often does not correspond well to dp/dt (this is why the brightness constancy is an assumption). For instance, consider a light moving around a stationary sphere: dI/dt will be large, but dp/dt will be zero.
The difference between subsequent frames is the finite difference approximation to the temporal derivative.
Proper units would be obtained if the value were divided by the time between frames (i.e. multiplied by the frames per second value).

Template Matching Subpixel Accuracy

I use template matching to detect a specific pattern in image.The shift determined is very shaky. Currently I apply it to R,G,B channel separately and average the result to obtain float values.Please, suggest how to obtain subpixel accuracy. I was planning to resize the image and then return the data in original scale, please suggest any other better method.
.
I use the code mentioned on Opencv site "http://docs.opencv.org/2.4/doc/tutorials/imgproc/histograms/template_matching/template_matching.html"
I believe the underlying issue is that minMaxLoc has only pixel accuracy. You could try out a subpixel-accurate patch http://www.longrange.net/Temp/CV_SubPix.cpp from the discussion here: http://answers.opencv.org/question/29665/getting-subpixel-with-matchtemplate/ .
As a quick and dirty experiment if a sub-pixel accurate minMaxLoc would resolve your issue, you can scale up the template matching result image (by a factor of 4, for instance) with cubic interpolation INTER_CUBIC http://docs.opencv.org/2.4/modules/imgproc/doc/geometric_transformations.html#resize and apply minMaxLoc on it. (Contrary to linear interpolation, cubic interpolation does move maxima.)
Apart from this, you can always apply Gaussian blur to both input images and template matching results to reduce high-frequency noise and suppress local maxima.
I would first try out the quick experiment. If it helps, you can integrate the minMaxLogSubPix implementation, but that will take longer.
It's a good thing to start with pixel accuracy before moving to subpixel accuracy. Checking the whole image at subpixel accuracy would be way to expensive.
An easy solution could be to have 4 versions of your template. Besides the base one, have one that's shifted 1/2 pixel left, another that's shifted 1/2 pixel down, and finally one that's shifted 1/2 pixel in both directions. When you have a match at {x,y}, check the neighborhood to see if the half-shifted templates are a better match.
The benefit of this method is that you only need to shift the small template, and it can be done up front.
Having said that, it seems you're tracking an object position over time. It may be worthwhiel to low-pass filter that position.

Compute edge for intensity decreases only

I want to find edges in my image, specifically vertical changes in intensity which go from light to dark. Is this possible? I'm using the Canny/Sobel edge detectors in OpenCV but they're picking up edges where the intensity increases, which I don't want.
You can write a custom filter and use cvFilter2D (2D convolution).
To give a very simple example, the convolution kernel {1 0 -1;1 0 -1; 1 0 -1} is a 3x3 filter that can highlight intensity decreases going from left to right. You can threshold the result to get the edges.
You will have to select the right size of the kernel, and also the right values, to suit your images.
Here is a good link that shows how to use cvFilter2d:
Once you understand what these filters do mathematically, it is quite clear what and you have to change. And where in the pipeline this must be. In his answer, Totoro already pointed out that you can pass your own filters to be run.
Sobel edge detection works by first running two filters on the image. These filters give the gradient of the image in X and Y direction. Edges and gradients are linked in the way that a large magnitude of the gradient means that there is a lot of change in the image, which would indicate an edge!
So the next step (iirc) in the Sobel algorithm is to find the magnitude of the gradients. And finally you threshold this, to take only large changes in the image as edges. Finally, you do some edge thinning and hysteresis thresholding along the direction of the edge but that is not very important here.
The important step where you want to be different than the Sobel algorithm is that you care about the direcction of change. If you compute the direction of change from the X and Y gradients (using sine and cosine), then you can filter out edges that only go in the direction you want.
If you just care about vertical changes, you can run a convolution kernel that computes the gradient along the horizontal and take only positive values. All positive values will indicate that there was a vertical change from light to dark. If you want you can do the following processing steps just as Sobel would do.

Opencv - How to differentiate jitter from panning?

I'm working on a video stabilizer using Opencv in C++.
At this time of the project I'm correctly able to find the translation between two consecutive frames with 3 different technique (Optical Flow, Phase Correlation, BFMatcher on points of interests).
To obtain a stabilized image I add up all the translation vector (from consecutive frame) to one, which is used in warpAffine function to correct the output image.
I'm having good result on fixed camera but result on camera in translation are really bad : the image disappear from the screen.
I think I have to distinguish jitter movement that I want to remove from panning movement that I want to keep. But I'm open to others solutions.
Actually the whole problem is a bit more complex than you might have thought in the beginning. Let's look a it this way: when you move your camera through the world, things that move close to the camera move faster than the ones in the background - so objects at different depths change their relative distance (look at your finder while moving the head and see how it points to different things). This means the image actually transforms and does not only translate (move in x or y) - so how do you want to accompensate for that? What you you need to do is to infer how much the camera moved (translation along x,y and z) and how much it rotated (with the angles of yaw, pan and tilt). This is a not very trivial task but openCV comes with a very nice package: http://opencv.willowgarage.com/documentation/camera_calibration_and_3d_reconstruction.html
So I recommend you to read as much on Homography(http://en.wikipedia.org/wiki/Homography), camera models and calibration as possible and then think what you actually want to stabilize for and if it is only for the rotation angles, the task is much simpler than if you would also like to stabilize for translational jitters.
If you don't want to go fancy and neglect the third dimension, I suggest that you average the optic flow, high-pass filter it and compensate this movement with a image translation into the oposite direction. This will keep your image more or less in the middle of the frame and only small,fast changes will be counteracted.
I would suggest you the possible approaches (in complexity order):
apply some easy-to-implement IIR low pass filter on the translation vectors before applying the stabilization. This will separate the high frequency (jitter) from the low frequency (panning)
same idea, a bit more complex, use Kalman filtering to track a motion with constant velocity or acceleration. You can use OpenCV's Kalman filter for that.
A bit more tricky, put a threshold on the motion amplitude to decide between two states (moving vs static camera) and filter the translation or not.
Finaly, you can use some elaborate technique from machine Learning to try to identify the user's desired motion (static, panning, etc.) and filter or not the motion vectors used for the stabilization.
Just a threshold is not a low pass filter.
Possible low pass filters (that are easy to implement):
there is the well known averaging, that is already a low-pass filter whose cutoff frequency depends on the number of samples that go into the averaging equation (the more samples the lower the cutoff frequency).
One frequently used filter is the exponential filter (because it forgets the past with an exponential rate decay). It is simply computed as x_filt(k) = a*x_nofilt(k) + (1-a)x_filt(k-1) with 0 <= a <= 1.
Another popular filter (and that can be computed beyond order 1) is the Butterworth filter.
Etc Low pass filters on Wikipedia, IIR filters...

MagickQuantizeImage usage

I am processing some images using ImageMagick library. As part of the processing I want to minimize the number of colors if this doesn't affect image quality (too much).
For this I have tried to use MagickQuantizeImage function. Can someone explain me whow should I choose the parameters ?
treedepth:
Normally, this integer value is zero or one. A zero or one tells Quantize to choose a optimal tree depth of Log4(number_colors).% A tree of this depth generally allows the best representation of the reference image with the least amount of memory and the fastest computational speed. In some cases, such as an image with low color dispersion (a few number of colors), a value other than Log4(number_colors) is required. To expand the color tree completely, use a value of 8.
dither:
A value other than zero distributes the difference between an original image and the corresponding color reduced algorithm to neighboring pixels along a Hilbert curve.
measure_error:
A value other than zero measures the difference between the original and quantized images. This difference is the total quantization error. The error is computed by summing over all pixels in an image the distance squared in RGB space between each reference pixel value and its quantized value.
ps: I have made some tests but sometimes the quality of images in severely affected, and I don't want find a result by trial and error.
This is a really good description of the algorithm
http://www.imagemagick.org/www/quantize.html
They are referencing the command-line version, but the concepts are the same.
The parameter measure_error is meant to give you an indication of how good an answer you got. Set to non-zero, then look at the Image object's mean_error_per_pixel field after you quantize to see how good a quantization you got.
If it's not good enough, increase the number of colors.