relation between ROI and wavelet transform output in jpeg2000 - wavelet

Consider a single tile image.
I read that ROI involves scaling the coefficients in matrix obtained after wavelet transform.
My understanding is - we shall multiply ROI related coefficients by power of 2 so that they will have greater precision then nearby unchanged coefficients.
My question is - suppose , I am encoder and I intend to focus a specific rectangular region in image then how do I know which coefficients I should manipulate ?
Thanks.

Related

Final Descriptor in SIFT

I am new to computer vision and start to learn a very popular topic in the computer vision community, which is SIFT. But I am confused with one implementation detail:
After the detection of a key point, we have to construct 4 by 4 local histograms, serving as the final SIFT descriptor, right? Each local histogram contains the orientation of a local neighborhood of 4 by 4 pixels. So overall we have 16 times 16 equals 256 pixels, which are within a neighborhood around the key point. So this neighborhood is a 16 by 16 grid of pixels.
But how is this neighborhood determined in details? Is the neighborhood rotated according to the orientation of key point? Are pixels within this 256-pixel neighborhood separate according to the scale at which the key point is detected?
Thanks for all coming help!
First, SIFT keypoints are extracted at multiple scales. The descriptors are computed using the respective scale. So, I would not say 'pixels' since it can be ambiguous. For your question, I would like to quote the original paper (Section 6.1):
First the image gradient magnitudes
and orientations are sampled around the keypoint location, using the scale of the
keypoint to select the level of Gaussian blur for the image.
In order to achieve orientation
invariance, the coordinates of the descriptor and the gradient orientations are rotated relative
to the keypoint orientation.
A Gaussian weighting function with σ equal to one half the width of the descriptor window
is used to assign a weight to the magnitude of each sample point.
I hope this answers your question. Please do not hesitate to ask if something is unclear.

OpenCV weight approach for correspondence search and disparities C++

I have an OpenCV application and I have to implement a correspondence search using varying support weights between two image pair. This work is very similar to "Adaptive Support-Weight Approach for Correspondence Search" by Kuk-Jin Yoon and In So Kweon. The support weights are in a given support window.
I calculate dissimilarity between pixels using the support weights in the two images. Dissimilarity between pixel 'p' and 'Pd' is given by
where Pd and Qd are the pixels in the target image when pixels p and q in the reference image have a disparity value d; Np and Npd are the support weight.
After this, the disparity of each pixel is selected by the WTA (Winner-Takes-All) method as:
What I would like to know is how to proceed starting with the formula of the fig.1 (function computing dissimilarity and weights that I have written), i.e. which pixel to consider? Where to start? What pixel with? Any suggestion?
The final result of the work should be similar to:
What could be a good way to do it?
UPDATE1
Should I start creating a new image, and then consider the cost between the pixel (0,0) and all the other pixels, find the minimum value and set this value as the value in the new image at pixel (0,0) ? And so on with the other pixels?

Luminance of 8 * 8 block

Is there a c++ function or an opencv library that can calculate the average log of the luminance of a given 8*8 block or total image? My aim is to calculate the average luminance and store it back in the block. Also, is there another way to calculate the overall luminance or average luminance in another scientific method that fits the human visual system? If someone can point me to a lib or function in c++ I would appreciate that.
To calculate the average luminance of an 8x8 block, centred at each pixel in an input greyscale image, you could perform a 2D convolution of that image with an 8x8 kernel containing the value 1/64 i.e. 1/(8*8) in each cell.
This is referred to as a normalised box filter / box blur.
You can then sample the resulting image e.g. at (x,y) to yield the average luminance of the 8x8 block centred at (x,y).
There is code in the OpenCV manual for a normalised box filter, with user selectable size.
http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/filter_2d/filter_2d.html
Regarding the 'log' of this value, you can then use OpenCV function cvLog to take the log of the filtered image and obtain your result.

Calculating the precision of homography on 2D plane

I am trying to find a way to parametrize the precision of my homography calculation. I would like to obtain a value that describes the precision of the homography calculation for a measurement taken at a certain position.
I currently have succesfully calculated the homography (with cv::findHomography) and I can use it to map a point on my camera image onto a 2D map (using cv::perspectiveTransform). Now I want to track these objects on my 2D map and to do this I want to take in account that objects that are in the back of my camera image have a less precise position on my 2D map than the objects that are all the way in the front.
I have looked at the following example on this website that mentions plane fitting but I don't really understand how to fill the matrices correctly using this method. The visualisation of the result does seem to fit my needs. Is there any way to do this with standard OpenCV functions?
EDIT:
Thanks Francesco for your recommendations. But, I think I am looking for something different than your answer. I am not looking to test the precision of the homography itself, but the relation between the density of measurements in one real camera view and the actual size on a map I create. I want to know that when I am 1 pixel off on my detection in the camera image, how many meters this will be on my map at this point.
I can of course calculate by taking some pixels around my measurement on my camera image and then use the homography to see how many meters on my map this represent every time I do a homography, but I don't want to calculate this every time. What I would like is to have a formula that tells me the relation between pixels in my image and pixels on my map so I can take this in account for my tracking on the map.
What you are looking for is called "predictive error bars" or "prediction uncertainty". You should definitely consult a good introductory book on estimation theory for details (e.g. this one). But briefly, the predictive uncertainty is the probability that...
A certain pixel p in image 1 will is the mapping H(p') of a pixel p' in image 2 under the homography H...
Given the uncertainty in H which is due to the errors in the matched pairs (q0, q0'), (q1, q1'), ..., that have been used to estimate H, ...
But assuming the model is correct, that is, that the true map between images 1 and 2 is, in fact, a homography (although the estimated parameters of the homography itself may be affected by errors).
In order to estimate this probability distribution you'll need a model for the errors in the measurements, and a model for how they propagate through the (homography) model.

Affine homography computation

Suppose you have an homography H between two images. The first image is the reference image, where the planar object cover the entire image (and it is parallel to the image). The second image depicts the planar object from another abritrary view (run-time image). Now, given a point in the reference image p=(x,y), i have a rectangular region of pixels of size SxS (with S<=20 pixel) around p (call it patch). I can unwarp this patch using the pixels in the run-time image and the inverse homography H^(-1).
Now, what i want to do is to compute, given H, an affine homography H_affine suitable for the patch around the point p. The naive way that i am using is to compute 4 point correspondences: the four corners of the patch and the corresponding points in the run-time image (computed using the full homography H). Given this four point correspondences (all belonging to a small neighborhood of the point p), one can compute the affine homography solving a simple linear system (using the gold standard algorithm). The affine homography so computed will approximate with reasonable precision (below .5 pixel) the full projective homography, since we are in a small neighboorhood of p (if the scale is not too unfavorable, that is, the patch SxS does not correspond to a big image region in the run-time image).
Is there a faster way to compute H_affine given H (related to the point p and the patch SxS)?
You say that you already know H, but then it sounds like you're trying to compute it all over again but this time call the result H_affine. The correct H would be a projective transformation and it can be uniquely decomposed into 3 parts representing the projective part, the affine part and the similarity part. If you already know H and only want the affine part and below, then decompose H and ignore its projective component. If you don't know H, then the 4 point correspondence is the way to go.