What is the difference between dense SIFT and HoG? - computer-vision

I am new to Computer Vision. I am studying Dense SIFT and HOG. For dense SIFT, the algorithm just considers every point as an interesting point and computes its gradient vector. HOG is another way to describe an image with a gradient vector.
I think Dense SIFT is a special case for HOG. In HoG, if we set the bin size to 8, for each window there are 4 blocks, for each block, there are 4 cells and the block stride is the same as the block size, we can still get a 128 dim vector for this window. And we can set any window stride to slide the window to detect the whole image. If the window stride of both these two algorithms is the same, they can get identical results.
I am not sure whether I am correct. Can anyone help me?

SIFT descriptor chooses a 16x16 and then divides it into 4x4 windows. Over each of these 4 windows it computes a Histogram of Oriented gradients. While computing this histogram, it also performs an interpolation between neighboring angles. Once you have all the 4x4 windows, it uses a gaussian of half the window size, centered at the center of the 16x16 block to weight the values in the whole 16x16 descriptor.
HoG on the other hand only computes a simple histogram of oriented gradients as the name says.
I feel that SIFT is more suited in describing the importance of a point, due to the gaussian weighting involved, while HoG does not have such a bias. Due to this reason, (ideally) HoG should be better suited at classification of images over dense SIFT, if all feature vectors are concatenated into one huge vector (this is my opinion, may not be true)

Related

GPU-Computation (CUDA) tex2d/tex3d - How to deal with anisotropic pixel/voxel

I am quite new to cuda programming and i have a question about the texXD function. My goal is to implement a simple GPU-based ray tracer using the optimized CUDA functionality.
See CUDA texture API that is used by NVIDIA.
At my research I have to deal with images that have a different resolution for every dimension (like CT images, (x,y) have a different resolution as (z)). Resampling to an isotropic pixel/voxel size might bring up some problems (especially for medical diagnosis).
For example i have an image with size (100px x 50px) and a resolution of (2px/mm x 1px/mm). The ray enters the image at an arbitrary point and leaves is somewhere else. The ray is sampled in the direction form entrance to leaving point. At each sample point (pos.x,pos.y) the tex2D function carries out an (bicubicbilinear) interpolation taking the neighbour pixel values into account weighted by their distance from the sample point.
example image:
In both shapes the corner points are named the same way(x1,y1),.... The only difference is the physical space between the corner points. The interpolation point is (x,y). I computed an example using the formula for rectangular grids and yield a different results for both grids. But if I use the ratio of areas of the numbered rectangles I got a different result.
My Question: Will CUDA take care of the different resolutions of the dimensions or does CUDA see all pixel in the same distance (and therefore as a squared grid)?
The formula used by CUDA seems to be the one for a squared grid (google:CUDA Texture fetching).
Or can I resample the image to squared grid before using tex2D without a substantial information loss?
Any suggestions are recommended. If you need some more clarification, feel free to ask. I will specifiy my question.
I don't believe what (I think it is) you are trying to do can be achieved using textures. The sole filtering mode supported using textures is described here.
Some salient points:
Textures don't have resolution. The just have dimensions.
Textures data is implicitly uniformly spaced in all dimensions.
Texture interpolation is done in a reduced accuracy fixed point arithmetic format which gives 8 bits of representational accuracy
None of this seems like anything that would be useful for the interpolation on a non-uniform grid which you are describing. At a minimum you would need to perform a coordinate transformation before you could use the uniform filtering mode. The amount of effort and expense would be about the same as just writing an interpolation routine yourself in user code.

Noise reduction OpenCV skindetection sample

I'm working on a facedetection app using segmentation of skinpixels with a predefined skinmodel in YCrCb space.
I'm loosely basing my algorithms of this report; http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=767122&tag=1, by Douglas Chai and King N. Ngan.
I first segment out all skin pixels (see left).
After that I perform some calculations to reduce the noise (see steps below). It results in a filtered bitmap 1/8th the size of original. Ideally this would be noise free both in face and background area, but it isn't. I have already tried to reduce it by using my density map and then checking neighbouring 3x3 area pixels and eroding/dilating pixel values depending on their neighbours. Then I resize this bitmap and apply the result as a mask on the original image (see right image for result, ignore my censorship).
My question is, what methods do you recommend for getting rid of the noise?
Also, are there any good methods to get smoother contours ? Ideally I would not like to use "find biggest contour and flood fill", preferably something more sophisticated.
There also seem to be bit of a displacement of the resized mask (it cuts of a bit too much on my right side of the face, and shows a bit too much on the left side). What can be causing this?
the easiest way to do smoother contours is to interpolate your data to a higher resolution using an interpolation scheme. You may look in openCV, that will result in smoother transitions between points.
I hope it will help a little bit. Good luck.

OpenCV HOGDescriptor return value

Why does the HOG descriptor returns a vector of float and not int? It's suppose to return a histogram..
To complement the previous answers that are right in my opinion, according to this HoG note I found clearer than the initial Dalal & Triggs paper, there are two normalization steps involved:
Block Normalization
Group the cells into overlapping blocks of 2 x 2 cells each, so that
each block has size 2C x 2C pixels. Two horizontally or vertically
consecutive blocks overlap by two cells, that is, the block stride is
C pixels. As a consequence, each internal cell is covered by four
blocks. Concatenate the four cell histograms in each block into a
single block feature b and normalize the block feature by its
Euclidean norm.
HOG Feature Normalization
The final normalization makes the HOG feature independent of overall
image contrast.
There should be also a bilinear interpolation voting between two consecutive bins to prevent quantization artifacts.
Also, it cannot be an int as you do not only count the number of gradient vectors that fall in a bin but add also the gradient magnitude.
I believe that #Micka is right: the histograms are probably normalized (maybe not to 1). On the Wikipedia page on HOG Descriptors, it is written that:
For improved accuracy, the local histograms can be contrast-normalized by calculating a measure of the intensity across a larger region of the image, called a block, and then using this value to normalize all cells within the block. This normalization results in better invariance to changes in illumination and shadowing.
Hence the need for a vector<float> instead of vector<int>.

How can I remove small parallel line in image?

I have black and white image after binarization. After that I have image like below:
How can I remove the small lines parallel to the long curves using OpenCV?. I can remove them by removing all small objects, but I want to remove only the small parallel
lines.
This looks like a Canny artifact (or some kind of ringing artifact) to me. There are several ways to remove them.
An empiric but not too computing intensive method would be to locate all small features, and superimpose them with the same image shifted by [+/-]X, [+/-]Y. If the feature is completely coincident with the shifted image, i.e., all pixels in the white feature are also white in the shifted image, then you are probably looking at an artifact.
To evaluate "smallness" of feature, you can use a basic floodfill. This method is cheap because you can simulate shifting with pointers, without really allocating four shifted images. It is prone to false positives wherever you really have small parallel lines, and to false negatives if the artifacts are very large.
Another method would be to posterize twice the original image with different thresholds. While the "real" lines will stay together, the ringing artifacts will have a different strength. At that point you evaluate the image difference, and consider "artifact" all features that are farther than a given threshold from the image track. This is a bit more computation intensive, yields better results, but depends on what you have for an original image, i.e. what is your workflow.
It is possible that reevaluating the workflow (altering the edge detection phase) could avoid the creation of the artifacts altogether.
use cvBlobslib library to detect the white patches as blobs...the cvBlobslib library gives functions by which you can find out different features of the blobs like area , and ellipticity...so if you want only the smaller patches parallel to the long curve...then ..
Get the long curve on the basis of area covered by the blob or the preimeter i.e. contour length of the blob...
Get the ellipticity or the orientation of the major axis of the long curve after fitting an ellipse(cvBlobslib library will do that for you..!!)...
Filter all those blobs which are less than a threshold in terms of area or contour and have the same orientation as the long curve....
hope this might work..
If you know the orientation of your line in advance, you can do a morphological closing with a custom structuring element adapted to your needs.
See morphomat on wikipedia
See opencv documentation
Perhaps similar to what the others said, but in simpler words: since the small lines seem to have roughly half the thickness of the long ones, if you don't really care about preserving the long lines the way they are, you could apply several times a simple algorithm that "makes the lines thinner", until the small ones disappear. What you need to do is scan the image pixel by pixel and when you detect a white pixel above or below or to the left or to the right of a black pixel, you store its coordinates in a vector. After you traverse the entire image, you make all the pixels specified by the coordinates in the vector black. You could define some threshold empirically for the number of iterations of this algorithm.
Here are steps exploiting the fact that parallel lines are increasing edge density.
1) Apply adaptive Threshold on gray image to get many edges.
2) Erode 3x3 (or experiment but small) Morphological Operation.
3) Take Logical Not to get edge density.
4) Apply Dilate of like 3x3 or 5x5. It will dilate edges to merge and make a region.
5) Now Erode 7x7 (or experiment for higher then last dilate) Morphological Operation. It will remove most of the non-required region, long lines and small stray areas.
Output is is MASK for removal region. You can apply contour detection on original image and remove contour-object for matching position in mask high precision removal.
OR if you don't need high-precision result simply And with mask's NOT.
Why not doing something like:
Find the long curves (using findContours and filter by size).
Find the small curves
For each long curve, calculate the minimal distance between each point of every small curve and the long curve.
Calculate the mean and the standard deviation of these minimal distances.
Reject small curves for which either the mean minimal distance to the long curve is too large, or small curves for which the standard deviation of the minimal distances is large.
The result will probably be better (and faster) is you skeletonize the image first.
Good luck with it,

Gaussian Blur with FFT Questions

I have a current implementation of Gaussian Blur using regular convolution. It is efficient enough for small kernels, but once the kernels size gets a little bigger, the performance takes a hit. So, I am thinking to implement the convolution using FFT. I've never had any experience with FFT related image processing so I have a few questions.
Is a 2D FFT based convolution also separable into two 1D convolutions ?
If true, does it go like this - 1D FFT on every row, and then 1D FFT on every column, then multiply with the 2D kernel and then inverse transform of every column and the inverse transform of every row? Or do I have to multiply with a 1D kernel after each 1D FFT Transform?
Now I understand that the kernel size should be the same size as the image (row in case of 1D). But how will it affect the edges? Do I have to pad the image edges with zeros? If so the kernel size should be equal to the image size before or after padding?
Also, this is a C++ project, and I plan on using kissFFT, since this is a commercial project. You are welcome to suggest any better alternatives. Thank you.
EDIT: Thanks for the responses, but I have a few more questions.
I see that the imaginary part of the input image will be all zeros. But will the output imaginary part will also be zeros? Do I have to multiply the Gaussian kernel to both real and imaginary parts?
I have instances of the same image to be blurred at different scales, i.e. the same image is scaled to different sizes and blurred at different kernel sizes. Do I have to perform a FFT every time I scale the image or can I use the same FFT?
Lastly, If I wanted to visualize the FFT, I understand that a log filter has to be applied to the FFT. But I am really lost on which part should be used to visualize FFT? The real part or the imaginary part.
Also for an image of size 512x512, what will be the size of real and imaginary parts. Will they be the same length?
Thank you again for your detailed replies.
The 2-D FFT is seperable and you are correct in how to perform it except that you must multiply by the 2-D FFT of the 2D kernel. If you are using kissfft, an easier way to perform the 2-D FFT is to just use kiss_fftnd in the tools directory of the kissfft package. This will do multi-dimensional FFTs.
The kernel size does not have to be any particular size. If the kernel is smaller than the image, you just need to zero-pad up to the image size before performing the 2-D FFT. You should also zero pad the image edges since the convoulution being performed by multiplication in the frequency domain is actually circular convolution and results wrap around at the edges.
So to summarize (given that the image size is M x N):
come up with a 2-D kernel of any size (U x V)
zero-pad the kernel up to (M+U-1) x (N+V-1)
take the 2-D fft of the kernel
zero-pad the image up to (M+U-1) x (N+V-1)
take the 2-D FFT of the image
multiply FFT of kernel by FFT of image
take inverse 2-D FFT of result
trim off garbage at edges
If you are performing the same filter multiple times on different images, you don't have to perform 1-3 every time.
Note: The kernel size will have to be rather large for this to be faster than direct computation of convolution. Also, did you implement your direct convolution taking advantage of the fact that a 2-D gaussian filter is separable (see this a few paragraphs into the "Mechanics" section)? That is, you can perform the 2-D convolution as 1-D convolutions on the rows and then the columns. I have found this to be faster than most FFT-based approaches unless the kernels are quite large.
Response to Edit
If the input is real, the output will still be complex except for rare circumstances. The FFT of your gaussian kernel will also be complex, so the multiply must be a complex multiplication. When you perform the inverse FFT, the output should be real since your input image and kernel are real. The output will be returned in a complex array, but the imaginary components should be zero or very small (floating point error) and can be discarded.
If you are using the same image, you can reuse the image FFT, but you will need to zero pad based on your biggest kernel size. You will have to compute the FFTs of all of the different kernels.
For visualization, the magnitude of the complex output should be used. The log scale just helps to visualize smaller components of the output when larger components would drown them out in a linear scale. The Decibel scale is often used and is given by either 20*log10(abs(x)) or 10*log10(x*x') which are equivalent. (x is the complex fft output and x' is the complex conjugate of x).
The input and output of the FFT will be the same size. Also the real and imaginary parts will be the same size since one real and one imaginary value form a single sample.
Remember that convolution in space is equivalent to multiplication in frequency domain. This means that once you perform FFT of both image and mask (kernel), you only have to do point-by-point multiplication, and then IFFT of the result. Having said that, here are a few words of caution.
You probably know that in digital signal processing, we often use circular convolution, not linear convolution. This happens because of curious periodicity. What this means in simple terms is that DFT (and FFT which is its computationally efficient variant) assumes that you signal is periodic, and when you filter your signal in such manner -- suppose your image is N x M pixels -- that it takes pixel at (1,m) to the the neighbor or pixel at (N, m) for some m<M. You signal virtually wraps around onto itself. This means that your Gaussian mask will be averaging pixels on the far right with pixels on the far left, and same goes for top and bottom. This might or might not be desired, but in general one has to deal with edging artifacts anyway. It is however much easier to forget about this issue when dealing with FFT multiplication because the problem stops being apparent. There are many ways to take care of this problem. The best way is to simply pad your image with zeros and remove the extra pixels later.
A very neat thing about using a Gaussian filter in frequency domain is that you never really have to take its FFT. It si a well-know fact that Fourier transform of a Gaussian is a Gaussian (some technical details here). All you would have to do then is pad you image with zeros (both top and bottom), generate a Gaussian in the frequency domain, multiply them together and take IFFT. Then you're done.
Hope this helps.