I would like to calculate the corner points or contours of the star in this in a Larger image. For that I'm scaling down the size to a smaller one & I'm able to get this points clearly. Now How to map this point in original image? I'm using opencv c++.
Consider a trivial example: the image size is reduced exactly by half.
So, the cartesian coordinate (x, y) in the original image becomes coordinate (x/2, y/2) in the reduced image, and coordinate (x', y') in the reduced image corresponds to coordinate (x*2, y*2) in the original image.
Of course, fractional coordinates get typically rounded off, in a reduced scale image, so the exact mapping is only possible for even-numbered coordinates in this example's original image.
Generalizing this, if the image's width is scaled by a factor of w horizontally and h vertically, coordinate (x, y) becomes coordinate(x*w, y*h), rounded off. In the example I gave, both w and h are 1/2, or .5
You should be able to figure out the values of w and h yourself, and be able to map the coordinates trivially. Of course, due to rounding off, you will not be able to compute the exact coordinates in the original image.
I realize this is an old question. I just wanted to add to Sam's answer above, to deal with "rounding off", in case other readers are wondering the same thing I faced.
This rounding off becomes obvious for even # of pixels across a coordinate axis. For instance, along a 1-D axis, a point demarcating the 2nd quartile gets mapped to an inaccurate value:
axis_prev = [0, 1, 2, 3]
axis_new = [0, 1, 2, 3, 4, 5, 6, 7]
w_prev = len(axis_prev) # This is an axis of length 4
w_new = len(axis_new) # This is an axis of length 8
x_prev = 2
x_new = x_prev * w_new / w_prev
print(x_new)
>>> 4
### x_new should be 5
In Python, one strategy would be to linearly interpolate values from one axis resolution to another axis resolution. Say for the above, we wish to map a point from the smaller image to its corresponding point of the star in the larger image:
import numpy as np
from scipy.interpolate import interp1d
x_old = np.linspace(0, 640, 641)
x_new = np.linspace(0, 768, 769)
f = interp1d(x_old, x_new)
x = 35
x_prime = f(x)
Related
I am looking at the kitti dataset and particularly how to convert a world point into the image coordinates. I looked at the README and it says below that I need to transform to camera coordinates first then multiply by the projection matrix. I have 2 questions, coming from a non computer vision background
I looked at the numbers from calib.txt and in particular the matrix is 3x4 with non-zero values in the last column. I always thought this matrix = K[I|0], where K is the camera's intrinsic matrix. So, why is the last column non-zero and what does it mean? e.g P2 is
array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
[0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
[0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
After applying projection into [u,v,w] and dividing u,v by w, are these values with respect to origin at the center of image or origin being at the top left of the image?
README:
calib.txt: Calibration data for the cameras: P0/P1 are the 3x4
projection
matrices after rectification. Here P0 denotes the left and P1 denotes the
right camera. Tr transforms a point from velodyne coordinates into the
left rectified camera coordinate system. In order to map a point X from the
velodyne scanner to a point x in the i'th image plane, you thus have to
transform it like:
x = Pi * Tr * X
Refs:
How to understand the KITTI camera calibration files?
Format of parameters in KITTI's calibration file
http://www.cvlibs.net/publications/Geiger2013IJRR.pdf
Answer:
I strongly recommend you read those references above. They may solve most, if not all, of your questions.
For question 2: The projected points on images are with respect to origin at the top left. See ref 2 & 3, the coordinates of a far 3d point in image are (center_x, center_y), whose values are provided in the P_rect matrices. Or you can verify this with some simple codes:
import numpy as np
p = np.array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
[0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
[0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
x = [0, 0, 1E8, 1] # A far 3D point
y = np.dot(p, x)
y[0] /= y[2]
y[1] /= y[2]
y = y[:2]
print(y)
You will see some output like:
array([6.018873e+02, 1.831104e+02 ])
which is quite near the (p[0, 2], p[1, 2]), a.k.a. (center_x, center_y).
For all the P matrices (3x4), they represent:
P(i)rect = [[fu 0 cx -fu*bx],
[0 fv cy -fv*by],
[0 0 1 0]]
Last column are baselines in meters w.r.t. the reference camera 0. You can see the P0 has all zeros in the last column because it is the reference camera.
This post has more details:
How Kitti calibration matrix was calculated?
I have few questions regarding Scharr derivatives and its OpenCV implementation.
I am interested in second order image derivatives with (3X3) kernels.
I started with Sobel second derivative, which failed to find some thin lines in the images. After reading the Sobel and Charr comparison in the bottom of this page, I decided to try Scharr instead by changing this line:
Sobel(gray, grad, ddepth, 2, 2, 3, scale, delta, BORDER_DEFAULT);
to this line:
Scharr(img, gray, ddepth, 2, 2, scale, delta, BORDER_DEFAULT );
My problem is that it seems like cv::Scharr allows performing an only first order of one partial derivative at a time, So I get the following error:
error: (-215) dx >= 0 && dy >= 0 && dx+dy == 1 in function getScharrKernels
(see assertion line here)
Following this restriction, I have a few questions regarding Scharr derivatives:
Is it considered bad-practice to use high order Scharr derivatives? Why did OpenCV choose to assert dx+dy == 1?
If I am to call Scharr twice for each axis, What is the correct way to combine the results?
I am currently using:
addWeighted( abs_grad_x, 0.5, abs_grad_y, 0.5, 0, grad );
but I am not sure that this how the Sobel function combines the two axis and in what order it should be done for all 4 derivatives.
If I am to compute the (dx=2,dy=2) derivative by using 4 different kernels, I would like to reduce processing time by unifying all 4 kernels into 1 before applying it on the image (I assume that this is what cv::Sobel does). Is there a reasonable way to create such combined Shcarr kernel and convolve it with my image?
Thanks!
I've never read the original Scharr paper (the dissertation is in German) so I don't know the answer to why the Scharr() function doesn't allow higher order derivatives. Maybe because of the first point I make in #3 below?
The Scharr function is supposed to be a derivative. And the total derivative of a multivariable function f(x) = f(x0, ..., xN) is
df/dx = dx0*df/dx0 + ... + dxN*df/dxN
That is, the sum of the partials each multiplied by the change. In the case of images of course, the change dx in the input is a single pixel, so it's equivalent to 1. In other words, just sum the partials; not weighting them by half. You can use addWeighted() with 1s as the weights, or you can just sum them, but to make sure you won't saturate your image you'll need to convert to a float or 16-bit image first. However, it's also pretty common to compute the Euclidean magnitude of the derivatives, too, if you're trying to get the gradient instead of the derivative.
However, that's just for the first-order derivative. For higher orders, you need to apply some chain ruling. See here for the details of combining a second order.
Note that an optimized kernel for first-order derivatives is not necessarily the optimal kernel for second-order derivatives by applying it twice. Scharr himself has a paper on optimizing second-order derivative kernels, you can read it here.
With that said, filters are split into x and y directions to make linear separable filters, which basically turn your 2d convolution problem into two 1d convolutions with smaller kernels. Think of the Sobel and Scharr kernels: for the x direction, they both just have the single column on either side with the same values (except one is negative). When you slide the kernel across the image, at the first location, you're multiplying the first column and the third column by the values in your kernel. And then two steps later, you're multiplying the third and the fifth. But the third was already computed, so that's wasteful. Instead, since both sides are the same, just multiply each column by the vector since you know you need those values, and then you can just look up the values for the results in column 1 and 3 and subtract them.
In short, I don't think you can combine them with built-in separable filter functions, because certain values are positive sometimes, and negative otherwise; and the only way to know when applying a filter linearly is to do them separately. However, we can examine the result of applying both filters and see how they affect a single pixel, construct the 2D kernel, and then convolve with OpenCV.
Suppose we have a 3x3 image:
image
=====
a b c
d e f
g h i
And we have the Scharr kernels:
kernel_x
========
-3 0 3
-10 0 10
-3 0 3
kernel_y
========
-3 -10 -3
0 0 0
3 10 3
The result of applying each kernel to this image gives us:
image * kernel_x
================
-3a -10b -3c
+0d +0e +0f
+3g +10h +3i
image * kernel_y
================
-3a +0b +3c
-10d +0e +10f
-3g +0h +3i
These values are summed and placed into pixel e. Since the sum of both of these is the total derivative, we sum all these values into pixel e at the end of the day.
image * kernel_x + image * kernel y
===================================
-3a -10b -3c +3g +10h +3i
-3a +3c -10d +10f -3g +3i
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
-6a -10b +0c -10d +10f +0g +10h +6i
And this is the same result we'd have gotten if we multiplied by the kernel
kernel_xy
=============
-6 -10 0
-10 0 10
0 10 6
So there's a 2D kernel that does a single-order derivative. Notice anything interesting? It's just the addition of the two kernels. Is that surprising? Not really, as x(a+b) = ax + bx. Now we can pass that into filter2D()
to compute the addition of the derivatives. Does that actually give the same result?
import cv2
import numpy as np
img = cv2.imread('cameraman.png', 0).astype(np.float32)
kernel = np.array([[-6, -10, 0],
[-10, 0, 10],
[0, 10, 6]])
total_first_derivative = cv2.filter2D(img, -1, kernel)
scharr_x = cv2.Scharr(img, -1, 1, 0)
scharr_y = cv2.Scharr(img, -1, 0, 1)
print((total_first_derivative == (scharr_x + scharr_y)).all())
True
Yep. Now I guess you can just do it twice.
I have picture from front-view. and I want to turn this into bird's eye view.
Now I want to calculate for each point in the rectangle (x,y) what will be transformed x,y in the trapezoid.
there must be a formula for this transformation with a given x and y and also the angle of the trapezoid (a).
I am programming in C and using opencv.
Thanks a lot in advance.
Did you consider the homography transform. You use this to create or correct perspective in an image, I think that it is exactly what you want.
With OpenCV, you can use the method cv::findHomography(). The arguments are the 4 initial points (vertices of your rectangle) and the 4 final points (the vertices of the trapeze). You get a transformation matrix that you can then use with cv::warpPerspective() or cv::perspectiveTransform().
I was able to figure out a way for your problem.
Here is the code I used for the same:
Importing the required packages:
import cv2
import numpy as np
Reading the image to be used:
filename = '1.jpg'
img = cv2.imread(filename)
cv2.imwrite('img.jpg',img)
Storing the height and width of the image in separate variables:
ih, iw, _ = img.shape
Creating a black window whose size is bigger than that of the image and storing its height and width in separate variables:
black = np.zeros((ih + 300, iw + 300, 3), np.uint8)
cv2.imwrite('black.jpg',black)
bh, bw, _ = black.shape
Storing the 4 corner points of the image in an array:
pts_src = np.array([[0.0, 0.0],[float(iw), 0.0],[float(iw), float(ih)],[0.0,float(ih)]])
Storing the 4 corner points of the trapezoid to be obtained:
pts_dst = np.array([[bw * 0.25, 0],[bw * 0.75, 0.0],[float(bw), float(bh)],[0.0,float(bh)]])
Calculating the homography matrix using pts_src and pts_dst:
h, status = cv2.findHomography(pts_src, pts_dst)
Warping the given rectangular image into the trapezoid:
im_out = cv2.warpPerspective(img, h, (black.shape[1],black.shape[0]))
cv2.imwrite("im_outImage.jpg", im_out)
cv2.waitKey(0)
cv2.destroyAllWindows()
If you alter the values in the array pts_dst you will be able to get different kinds of quadrilaterals.
I'm trying to calculate the cameras position for an image. I have 2 images of a rubiks cube. The first image is considered to be the base image and the next image is the image after the camera has moved. So for the first image I assume that the camera is at (0,0,0). On this image I then identify the 4 corners of the front face of the rubiks cube as shown here (4 corners identified by the 4 blue circles).
Then for the next image (after camera movement), I identify the same face of the rubiks cube as show here
So by assuming the first image as the base image, does anyone know if/how i can calculate how much the camera has moved for image 2 as shown here:
I would suggest you use OpenCV for this. I also think, this question would be more suited to StackOverflow.
The textbook on this subject would be "Multiple-View Geometry" by Hartley and Zisserman. http://www.robots.ox.ac.uk/~vgg/hzbook/ (There is a sample chapter on the Fundamental Matrix on that website.)
Basically, first find the Fundamental Matrix, then by knowing the intrinsic parameters of the camera, find a solution to the position.
Fundamental Matrix: http://en.wikipedia.org/wiki/Fundamental_matrix_%28computer_vision%29
Intrinsic Parameters: Stuff like the focal length and where the principal point is on the image plane. If you have F, then E = K^t * F * K, if K is the intrinsic matrix and the same for both images.
How to find a solution to the camera position: http://en.wikipedia.org/wiki/Essential_matrix#Determining_R_and_t_from_E
Algorithm
This is how I would do it in OpenCV. I have done this before, so it ought to work.
1. Run Feature Detection and Detector Extractor on both images.
2. Match Features.
3. Use F = cv::findFundamentalMatrix with Ransac.
4. E = K.t() * F * K. // K needs to be found beforehand.
5. Do SingularValueDecomposition of E such that E = U * S * V.t()
6. R = U * W.inv() * V.t() // W = [[0, -1, 0], [1, 0, 0], [0, 0, 1]]
7. Tx = V * Z * V.t() // Z = [[0, -1, 0], [1, 0, 0], [0, 0, 0]]
8. get t from Tx (matrix version of cross product)
9. Find the correct solution. R.t() and -t are possiblities.
10. Get overall scale by knowing the length of the size of the Rubrik's cube.
Alternative Solutions
I am certain that a more straightforward approach can also work. The benefit of this approach is that no human input is needed (unsupervised). This is not true for the optional step 10 (determining scale).
A different solution would exploit the knowledge of the geometry of the Rubrik's cube. For example, six (5.5) points are needed to estimate the position of the camera, if the point's 3D position is known.
Unfortunatly, I am not aware of any software that does this for you automatically.
So here is the alternative algorithm:
Write down the coordinates of the corners of the cube as (X_i, Y_i, Z_i), and possibly also points with other knowable positions.
Mark the corresponding points u_i = (x_i, y_i).
For every correspondence create two lines in a matrix A.
(X_i, Y_i, Z_i, 1, 0, 0, 0, 0, -x_iX_i, -x_iY_i, -x_iZ_i -x_i)
(0, 0, 0, 0, X_i, Y_i, Z_i, 1, -y_iX_i, -y_iY_i, -y_iZ_i -y_i)
Then find p such that Ap = 0. I.e. p is the right kernel of A, or the least-squared solution to Ap=0.
De-flatten p, to create a 3x4 matrix. P.
I calculated the histogram(a simple 1d array) for an 3D grayscale Image.
Now I would like to calculate the gradient for the this histogram at each point. So this would actually mean I have to calculate the gradient for a 1D function at certain points. However I do not have a function. So how can I calculate it with concrete x and y values?
For the sake of simplicity could you probably explain this to me on an example histogram - for example with the following values (x is the intensity, and y the frequency of this intensity):
x1 = 1; y1 = 3
x2 = 2; y2 = 6
x3 = 3; y3 = 8
x4 = 4; y4 = 5
x5 = 5; y5 = 9
x6 = 6; y6 = 12
x7 = 7; y7 = 5
x8 = 8; y8 = 3
x9 = 9; y9 = 5
x10 = 10; y10 = 2
I know that this is also a math problem, but since I need to solve it in c++ I though you could help me here.
Thank you for your advice
Marc
I think you can calculate your gradient using the same approach used in image border detection (which is a gradient calculus). If your histogram is in a vector you can calculate an approximation of the gradient as*:
for each point in the histogram compute
gradient[x] = (hist[x+1] - hist[x])
This is a very simple way to do it, but I'm not sure if is the most accurate.
approximation because you are working with discrete data instead of continuous
Edited:
Other operators will may emphasize small differences (small gradients will became more emphasized). Roberts algorithm derives from the derivative calculus:
lim delta -> 0 = f(x + delta) - f(x) / delta
delta tends infinitely to 0 (in order to avoid 0 division) but is never zero. As in computer's memory this is impossible, the smallest we can get of delta is 1 (because 1 is the smallest distance from to points in an image (or histogram)).
Substituting
lim delta -> 0 to lim delta -> 1
we get
f(x + 1) - f(x) / 1 = f(x + 1) - f(x) => vet[x+1] - vet[x]
Two generally approaches here:
a discrete approximation to the derivative
take the real derivative of a fitted function
In the first case try:
g = (y_(i+1) - y_(i-1))/2*dx
at all the points except the ends, or one of
g_left-end = (y_(i+1) - y_i)/dx
g_right-end = (y_i - y_(i-1))/dx
where dx is the spacing between x points. (Unlike the equally correct definition Andres suggested, this one is symmetric. Whether it matters or not depends on you use case.)
In the second case, fit a spline to your data[*], and ask the spline library the derivative at the point you want.
[*] Use a library! Do not implement this yourself unless this is a learning project. I'd use ROOT because I already have it on my machine, but it is a pretty heavy package just to get a spline...
Finally, if you data is noisy, you ma want to smooth it before doing slope detection. That was you avoid chasing the noise, and only look at large scale slopes.
Take some squared paper and draw on it your histogram. Draw also vertical and horizontal axes through the 0,0 point of your histogram.
Take a straight edge and, at each point you are interested in, rotate the straight edge until it accords with your idea of what the gradient at that point is. It is most important that you do this, your definition of gradient is the one you want.
Once the straight edge is at the angle you desire draw a line at that angle.
Drop perpendiculars from any 2 points on the line you just drew. It will be easier to take the following step if the horizontal distance between the 2 points you choose is about 25% or more of the width of your histogram. From the same 2 points draw horizontal lines to intersect the vertical axis of your histogram.
Your lines now define an x-distance and a y-distance, ie the length of the horizontal/ vertical (respectively) axes marked out by their intersections with the perpendiculars/horizontal lines. The gradient you want is the y-distance divided by the x-distance.
Now, to translate this into code is very straightforward, apart from step 2. You have to define what the criteria are for determining what the gradient at any point on the histogram is. Simple choices include:
a) at each point, set down your straight edge to pass through the point and the next one to its right;
b) at each point, set down your straight edge to pass through the point and the next one to its left;
c) at each point, set down your straight edge to pass through the point to the left and the point to the right.
You may want to investigate more complex choices such as fitting a curve (such as a quadratic or higher-order polynomial) through a number of points on your histogram and using the derivative of that to represent the gradient.
Until you understand the question on paper avoid coding in C++ or anything else. Once you do understand it, coding should be trivial.