Implementation of Non Local Means Noise reduction algorithm in image processing - c++

I am working over implementation of Non Local Means noise reduction algorithm in C++ . There are papers on this algorithm (such as this paper), but they are also not very clear on it.
I know, it is using the weighted mean but I don't know what is the use of research window here and how is it related to comparison window.
Being a new user, StackOverflow is not allowing me to upload images. but, you can find formula under the nl means section the link provided above.

From the paper you refer to, when determining the result value for a given pixel p, all the other pixels of the image will be weighted and summed according to the similarity between their neighborhoods and the neighborhood of the pixel p.
But that is computationally very expensive. So the authors restrict the number of pixels which will contribute to the weighted sum; that must be what you call the search window. This search window is a 21x21 region centered on the pixel p. The neighborhoods being compared are of size 7x7 (section 5).
I could make a prototype quickly with Mathematica and I confirm it becomes very costly when the size of the search window increases. I expect the same behavior when you implement in C++.

There's some GPL'd C++ code along with a brief writeup of the algorithm by the original authors here: http://www.ipol.im/pub/algo/bcm_non_local_means_denoising/

This had been added to OpenCV
http://docs.opencv.org/modules/photo/doc/denoising.html

Related

Bundle adjustment with focal length correction not converging

I am trying to add a new feature to our existing implementation of the bundle adjustment in code.
The algorithm uses the Gauss-Newton method and has been working for well over a decade. The least squares "A" matrix is populated using initial approximations of the image exterior orientations, as well as the object points. The book from Kraus - "Photogrammetry: Fundamental and Standard Processes" - was used for this.
A while ago, self calibration was added to this algorithm, however, only the formulae by Ebner and Gruen were added (formula for Ebner here). I am now trying to add the "Brown-Conrady" formula which is well documented in this paper (final algorithm under "concluding remarks"). It uses 10 parameters to determine deltaX and deltaY.
When I include all the parameters except for deltaC (the correction to the focal length/camera constant), our algorithm works and the adjustment converges and produces the desired residuals. However, as soon as I introduce deltaC (which mathematically I see as "allowing" the image points to scale by some amount in X and Y) the adjustment diverges.
The input to the algorithm is a large set of already undistorted aerial images, along with their control points and a large number of image points. We are therefore expecting the distortion/correction parameters to be close to zero, since the images are already undistorted. This is indeed the case for Ebner and Grun.
For Brown, however, some of the parameters (and therefore the delta corrections) grow uncontrollably. I have tried scaling these parameters (the principle points and focal length correction deltaC) so that they are closer in magnitude to the other parameters (K1,K2,K3,P1,P2) however this did not help - the adjustment diverges all the same.
Is there any reason for this? Could it perhaps be because the images are already undistorted? Or something to do with this aerial job in particular?
I have not provided code as it is simply too complex, however I feel it is maybe an understanding of the implementation as opposed to specific code where I am going wrong.
Thanks!

What's the difference between "BB regression algorithms used in R-CNN variants" vs "BB in YOLO" localization techniques?

Question:
What's the difference between the bounding box(BB) produced by "BB regression algorithms in region-based object detectors" vs "bounding box in single shot detectors"? and can they be used interchangeably if not why?
While understanding variants of R-CNN and Yolo algorithms for object detection, I came across two major techniques to perform object detection i.e Region-based(R-CNN) and niche-sliding window based(YOLO).
Both use different variants(complicated to simple) in both regimes but in the end, they are just localizing objects in the image using Bounding boxes!. I am just trying to focus on the localization(assuming classification is happening!) below since that is more relevant to the question asked & explained my understanding in brief:
Region-based:
Here, we let the Neural network to predict continuous variables(BB coordinates) and refers to that as regression.
The regression that is defined (which is not linear at all), is just a CNN or other variants(all layers were differentiable),outputs are four values (𝑟,𝑐,ℎ,𝑤), where (𝑟,𝑐) specify the values of the position of the left corner and (ℎ,𝑤) the height and width of the BB.
In order to train this NN, a smooth L1 loss was used to learn the precise BB by penalizing when the outputs of the NN are very different from the labeled (𝑟,𝑐,ℎ,𝑤) in the training set!
niche-Sliding window(convolutionally implemented!) based:
first, we divide the image into say 19*19 grid cells.
the way you assign an object to a grid-cell is by selecting the midpoint of an object and then assigning that object to whichever one grid cell contains the midpoint of the object. So each object, even if the objects span multiple grid cells, that object is assigned only to one of the 19 by 19 grid cells.
Now, you take the two coordinates of this grid-cell and calculate the precise BB(bx, by, bh, bw) for that object using some method such as
(bx, by, bh, bw) are relative to the grid cell where x & y are center point and h & w are the height of precise BB i.e the height of the bounding box is specified as a fraction of the overall width of the grid cell and h& w can be >1.
There multiple ways of calculating precise BB specified in the paper.
Both Algorithms:
outputs precise bounding boxes.!
works in supervised learning settings, they were using labeled dataset where the labels are bounding boxes stored(manually marked my some annotator using tools like labelimg ) for each image in a JSON/XML file format.
I am trying to understand the two localization techniques on a more abstract level(as well as having an in-depth idea of both techniques!) to get more clarity on:
in what sense they are different?, &
why 2 were created, I mean what are the failure/success points of 1 on the another?.
and can they be used interchangeably, if not then why?
please feel free to correct me if I am wrong somewhere, feedback is highly appreciated! Citing to any particular section of a research paper would be more rewarding!
The essential differences are that two-stage Faster R-CNN-like are more accurate while single-stage YOLO/SSD-like are faster.
In two-stage architectures, the first stage is usually of region proposal, while the second stage is for classification and more accurate localization. You can think of the first stage as similar to the single-stage architectures, when the difference is that the region proposal only separates "object" from "background", while the single-stage distinguishes between all object classes. More explicitly, in the first stage, also in a sliding window-like fashion, an RPN says whether there's an object present or not, and if there is - to roughly give the region (bounding box) in which it lies. This region is used by the second stage for classification and bounding box regression (for better localization) by first pooling the relevant features from the proposed region, and then going through the Fast R-CNN-like architecture (which does the classificaion+regression).
Regarding your question about interchanging between them - why would you want to do so? Usually you would choose an architecture by your most pressing needs (e.g. latency/power/accuracy), and you wouldn't want to interchange between them unless there's some sophisticated idea which will help you somehow.

opencv clahe parameters explanation

I would like to know proper explanation of the clahe parameters
i.e clipLimit and tileGridSize.
and how does clipLimit value effects the contrast of the image and what factors(like image resolution, object sizes) to be considered to select tileGridSize.
Thanks in advance
this question is for a long time ago but i searched for the answer and saw this,then i found some links which may help,obviously most of below information are from different sites.
AHE is a computer image processing technique used to improve contrast in images. It differs from ordinary histogram equalization in the respect that the adaptive method computes several histograms, each corresponding to a distinct section of the image, and uses them to redistribute the lightness values of the image. It is therefore suitable for improving the local contrast and enhancing the definitions of edges in each region of an image.
and , AHE has a tendency to over-amplify noise in relatively homogeneous regions of an image ,A variant of adaptive histogram equalization called contrast limited adaptive histogram equalization (CE) prevents this by limiting the amplification.
for first one this image can be useful:
CLAHE limits the amplification by clipping the histogram at a predefined value (called clip limit)
tileGridSize refers to Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. tileGridSize defines the number of tiles in row and column.
it is opencv documentation about it's available functions:
https://docs.opencv.org/master/d6/db6/classcv_1_1CLAHE.html
and this link was good at all:
https://en.wikipedia.org/wiki/Adaptive_histogram_equalization#Contrast_Limited_AHE
http://www.cs.utah.edu/~sujin/courses/reports/cs6640/project2/clahe.html
clipLimit is the threshold value.
tileGridSize defines the number of tiles in row and column.
More Information

Target Detection - Algorithm suggestions

I am trying to do image detection in C++. I have two images:
Image Scene: 1024x786
Person: 36x49
And I need to identify this particular person from the scene. I've tried to use Correlation but the image is too noisy and therefore doesn't give correct/accurate results.
I've been thinking/researching methods that would best solve this task and these seem the most logical:
Gaussian filters
Convolution
FFT
Basically, I would like to move the noise around the images, so then I can use Correlation to find the person more effectively.
I understand that an FFT will be hard to implement and/or may be slow especially with the size of the image I'm using.
Could anyone offer any pointers to solving this? What would the best technique/algorithm be?
In Andrew Ng's Machine Learning class we did this exact problem using neural networks and a sliding window:
train a neural network to recognize the particular feature you're looking for using data with tags for what the images are, using a 36x49 window (or whatever other size you want).
for recognizing a new image, take the 36x49 rectangle and slide it across the image, testing at each location. When you move to a new location, move the window right by a certain number of pixels, call it the jump_size (say 5 pixels). When you reach the right-hand side of the image, go back to 0 and increment the y of your window by jump_size.
Neural networks are good for this because the noise isn't a huge issue: you don't need to remove it. It's also good because it can recognize images similar to ones it has seen before, but are slightly different (the face is at a different angle, the lighting is slightly different, etc.).
Of course, the downside is that you need the training data to do it. If you don't have a set of pre-tagged images then you might be out of luck - although if you have a Facebook account you can probably write a script to pull all of yours and your friends' tagged photos and use that.
A FFT does only make sense when you already have sort the image with kd-tree or a hierarchical tree. I would suggest to map the image 2d rgb values to a 1d curve and reducing some complexity before a frequency analysis.
I do not have an exact algorithm to propose because I have found that target detection method depend greatly on the specific situation. Instead, I have some tips and advices. Here is what I would suggest: find a specific characteristic of your target and design your code around it.
For example, if you have access to the color image, use the fact that Wally doesn't have much green and blue color. Subtract the average of blue and green from the red image, you'll have a much better starting point. (Apply the same operation on both the image and the target.) This will not work, though, if the noise is color-dependent (ie: is different on each color).
You could then use correlation on the transformed images with better result. The negative point of correlation is that it will work only with an exact cut-out of the first image... Not very useful if you need to find the target to help you find the target! Instead, I suppose that an averaged version of your target (a combination of many Wally pictures) would work up to some point.
My final advice: In my personal experience of working with noisy images, spectral analysis is usually a good thing because the noise tend to contaminate only one particular scale (which would hopefully be a different scale than Wally's!) In addition, correlation is mathematically equivalent to comparing the spectral characteristic of your image and the target.

Noise Removal in Opencv

I'm currently working in a project where noise removal in document image is required. But i cant create any useful code to start my project. thanks.
According to what I've studied, noise (specifically salt/pepper noise) that produce in faulty scanner can be removed by k-Fill algorithm, but i can't understand that theory.
I'm using OpenCV in C++ , and Codeblocks IDE.
I'm new in the world of image processing.
Source code or any related link/s are appreciated.
If you do not understand k-fill try to use a simpler approach first.
Here is an article of alternative noise reduction algorithms with their performances.
I would suggest you to take a try with opening.
The OpenCV documentation has a short explanation on built-in morphological operations. You can make experiments with the example code as well.
K-filter, isnt that hard to understand.
Take a small area (ea 3x3 pixels or 5x5 pixels or so).
Now count the 'enabled' (ea dark) pixels on the border.
If total count is greater then n, fill central pixel(s) (which is a single pixel on (3x3 grid). and repeat this on the whole image. Or delete it, if total border is lower then n
I do not know how effective k-fill can be; But,
I explain this; it might be useful for someone else:
I will give an example with Python but CPlusPlus and Java should be similar (I do not know)
One way to reduce noise is the medianFilter algorithm, which definitely reduces image quality. How much this quality decreases depends on the ksize parameter. You must select a small number for this parameter (for example 3); This makes the quality not too low. Eliminates very small noise.
import cv2
im = cv2.imread("noisy_flower.png")
im = cv2.medianBlur(im, ksize=5)
cv2.imwrite("clean_flower.png", im)
This mode is applicable to images. For the text inside a photo, you may be able to create a mask and copy the text back to the final image according to the mask. It depends a lot on your case.
Java Version:
Imgproc.medianBlur(src, dst, 5);