I'm struggling with this problem:
I have the an image and I want to apply a warp perspective to it (I already have the transformation matrix) but instead of the output only having the transformation area (like the example below) I want to be able to see the whole image instead.
EXAMPLE http://docs.opencv.org/trunk/_images/perspective.jpg
Instead of having the transformation region, like this example, I want to transform the whole original image.
How can I achieve this?
Thanks!
It seems that you are computing the perspective transform by selecting the corners of the sudoku grid in the input image and requesting them to be warped at fixed location in the output image. In your example, it seems that you are requesting the top-left corner to be warped at coordinates (0,0), the top-right corner at (300,0), the bottom-left at (0,300) and the bottom-right at (300,300).
This will always result in the cropping of the image area on the left of the two left corners and above the two top corners (i.e. the image area where x<0 or y<0 in the output image). Also, if you specify an output image size of 300x300, this results in the cropping of the image area on the right to the right corners and below the bottom corners.
If you want to keep the whole image, you need to use different output coordinates for the corners. For example warp TLC to (100, 100), TRC to (400,100), BLC to (100,400) and BRC to (400,400), and specify an output image size of 600x600 for instance.
You can also calculate the optimal coordinates as follows:
Compute the default perspective transform H0 (as you are doing now)
Transform the corners of the input image using H0, and compute the minimum and maximum values for the x and y coordinates of these corners. Let's denote them xmin, xmax, ymin, ymax.
Compute the translation necessary to map the point (xmin,ymin) to (0,0). The matrix of this translation is T = [1, 0, -xmin; 0, 1, -ymin; 0, 0, 1].
Compute the optimised perspective transform H1 = T*H0 and specify an output image size of (xmax-xmin) x (ymax-ymin).
This way, you are guaranteed that:
the four corners of your input sudoku grid will form a true square
the output image will be translated so that no useful image data is cropped above or to the left of the grid corners
the output image will be have sized so that no useful image data is cropped below or to the right of the grid corners
However, this will generate black areas since the ouput image is no longer a perfect rectangle, hence some pixels in the output image won't have any correspondence in the input image.
Edit 1: If you want to replace the black areas with something else, you can initialize the destination matrix as you wish and then set the borderMode parameter of the warpPerspective function to BORDER_TRANSPARENT.
Related
I know that I must call one of the following before each call to glVertex:
glTexCoord(0,0);
glTexCoord(0,1);
glTexCoord(1,1);
glTexCoord(1,0);
But I have no idea what they mean. I know, however, that if I multiply (or is that divide?) the right side (or is it all the ones?) by two, my texture expands, and if I do the opposite, my texture repeats twice. I've managed to code a texture atlas by applying operations until it worked. But I have no proper idea about what's going on. Why does dividing these coordinates affect the image and why does reversing them mirror it? How do texture coordinates work?
Texture coordinates specify the point in the texture image that will correspond to the vertex you are specifying them for. Think of a rectangular rubber sheet with your texture image printed on it, where the length of each side is normalized to the range 0-1. Now let's say you wanted to draw a triangle using that texture. You'd take 3 pins and place them in the rubber sheet in the positions of each of your desired texture coordinates. (Say [0, 0], [1, 0] and [1, 1]) then move those pins (without taking them out) to your desired vertex coordinates (Say [0, 0], [0.5, 0] and [1, 1]), so that the rubber sheet is stretched out and the image is distorted. That's basically how texture coordinates work.
If you use texture coordinates greater than 1 and your texture is set to repeat, then it's as if the rubber sheet was infinite in size and the texture was tiled across it. Therefore if your texture coordinates for two vertices were 0, 0 and 4, 0, then the image would have to be repeated 4 times between those vertices.
#b1nary.atr0phy Image for all you visual thinkers!
OpenGL uses inverse texturing. It takes coordinates from world space (X,Y,Z) to texture space (X,Y) to discrete space(U,V), where the discrete space is in the [0,1] domain.
Take a polygon, think of it as a sheet of paper. With this:
glTexCoord(0,0);
glTexCoord(0,1);
glTexCoord(1,1);
glTexCoord(1,0);
You tell OpenGL to draw on the whole sheet of paper. When you apply modifications your texturing space modifies accordingly to the give coordinates. That is why for example when you divide you get the same texture twice, you tell OpenGL to map half of your sheet, instead of the whole sheet of paper.
Chapter 9 of the Red Book explains this in detail and is available for free online.
http://www.glprogramming.com/red/chapter09.html
Texture coordinates map x,y to the space 0-1 in width and height texture space. This is then stretched like a rubber sheet over the triangles. It is best explained with pictures and the Red Book does this.
For 2D image textures, 0,0 in texture coordinates corresponds to the bottom left corner of the image, and 1,1 in texture coordinates corresponds to the top right corner of the image. Note that "bottom left corner of the image" is not at the center of the bottom left pixel, but at the edge of the pixel.
Also interesting when uploading images:
8.5.3 Texture Image Structure
The texture image itself (referred to by data) is a sequence of groups of values. The first group is the lower left back corner of the texture image. Subsequent groups fill out rows of width width from left to right; height rows are stacked from bottom to top forming a single two-dimensional image slice; and depth slices are stacked from back to front.
Note that most image formats have the data start at the top, not at the bottom row.
I am using 2 different methods to render an image (as an opencv Matrix):
an implemented projection function that uses the camera intrinsics (focal length, principal point; distortion is disabled) - this function is used in other software packages and is supposed to work correctly (repository)
a 2D to 2D image warping (here, I'm determining the intersections of the corner-rays of my camera with my 2D image that should be warped into my camera frame); this backprojection of the corner points is using the same camera model as above
Now, I overlay these two images and what should basically happen is that a projected pen-tip (method 1.) should line up with a line that is drawn on the warped image (method 2.). However, this is not happening.
There is a tiny shift in both directions, depending on the orientation of the pen that is writing, and it is reduced when I am shifting the principal point of the camera. Now my question is, since I am not considering the principal point in the 2D-2D image warping, can this be the cause of the mismatch? Or is it generally impossible to align those two, since the image warping is a simplification of the projection process?
Grey Point: projected origin (should fall in line with the edges of the white area)
Blue Reticle: penTip that should "write" the Bordeaux-colored line
Grey Line: pen approximation
Red Edge: "x-axis" of white image part
Green Edge: "y-axis" of white image part
EDIT:
I also did the same projection with the origin of the coordinate system, and here, the mismatch grows, the further the origin moves out of the center of the image. (so delta[warp,project] gets larger on the image borders compare to the center)
I have to implement a fisheye transfromation with bilinear interpolation. After the transformation of one pixel i don't have integer coordinates anymore and I would like to map this pixel on integer coordinates using bilinear interpolation. The problem is that everithing I found on bilinear interpolation on the inetrnete (see for example Wikipedia) does the opposite thing: it gives the value of one non-integer pixel by using the coordinates of four neighbors that have integer coordinates. I would like to do the opposite, i.e. map the one pixel with non-integer coordinates to the four neighbors with integer coordinates. Surely there is something that I am missing and would be helpful to understand where I am wrong.
EDIT:
TO be more clear: Let say that I have the pixel (i,j)=(2,2) of the starting image. After the fisheye transformation I obtain non-integer coordinates, for example (2.1,2.2). I want to save this new pixel to a new image but obviously I don't know in which pixel to save it because of non-integer coordinates. The easiest way is to truncate the coordinates, but the image quality is not very good: I have to use bilinear interpolation. Despite this I don't understand how it works because I want to split my non integer pixel to neighboring pixels with integer coordinates of the new (transformed image), but I found description only of the opposite operation, i.e. finding non-integer coordinates starting from four integer pixels (http://en.wikipedia.org/wiki/Bilinear_interpolation)
Your question is a little unclear. From what I understand, you have a regular image which you want to transform into a fisheye-like image. To do this, I am guessing you take each pixel coordinate {xr,yr} from the regular image, use the fisheye transformation to obtain the corresponding coordinates {xf,yf} in the fisheye-like image. You would like to assign the initial pixel intensity to the destination pixel, however you do not know how to do this since {xf,yf} are not integer values.
If that's the case, you are actually taking the problem backwards. You should start from integer pixel coordinates in the fisheye image, use the inverse fisheye transformation to obtain floating-point pixel coordinates in the regular image, and use bilinear interpolation to estimate the intensity of the floating point coordinates from the 4 closest integer coordinates.
The basic procedure is as follows:
Start with integer pixel coordinates (xf,yf) in the fisheye image (e.g. (2,3) in the fisheye image). You want to estimate the intensity If associated to these coordinates.
Find the corresponding point in the "starting" image, by mapping (xf,yf) into the "starting" image using the inverse fisheye transformation. You obtain floating-point pixel coordinates (xs,ys) in the "starting" image (e.g. (2.2,2.5) in the starting image).
Use Bilinear Interpolation to estimate the intensity Is at coordinates (xs,ys), based on the intensity of the 4 closest integer pixel coordinates in the "starting" image (e.g. (2,2), (2,3), (3,2), (3,3) in the starting image)
Assign Is to If
Repeat from step 1. with the next integer pixel coordinates, until the intensity of all pixels of the fisheye image have been found.
Note that deriving the inverse fisheye transformation might be a little tricky, depending on the equations... However, that is how image resampling has to be performed.
You need to find the inverse fisheye transform first, and use "backward wrap" to go from the destination image to the source image.
I'll give you a simple example. Say you want to expand the image by a non integral factor of 1.5. So you have
x_dest = x_source * 1.5, y_dest = y_source * 1.5
Now if you iterate over the coordinates in the original image, you'll get non-integral coordinates in the destination image. E.g., (1,1) will be mapped to (1.5, 1.5). And this is your problem, and in general the problem with "forward wrapping" an image.
Instead, you reverse the transformation and write
x_source = x_dest / 1.5, y_source = y_dest / 1.5
Now you iterate over the destination image pixels. For example, pixel (4,4) in the destination image comes from (4/1.5, 4/1.5) = (2.6, 2.6) in the source image. These are non-integral coordinates and you use the 4 neighboring pixels in the source image to estimate the color at this coordinate (in our example the pixels at (2,2), (2,3), (3,2) and (3,3))
I'm working on image warping. The transformed version of the real coordinates of an image are x and y, and, the polar coordinates of the transformed image are r and theta.
(cylindrical anamorphosis). I have the transformation functions. But Im confused about a certain things. I'm getting the polar coordinates from transformation functions which can easily be converted to cartesian. But how to draw this transformed image? as the new size will be different than the old image size.
EDIT : I have the image as shown in the cylinder. I have the transformation function to convert it into the illusion image as shown. As this image's size is different from the original image, how do I ensure that all my points in the main image are being transformed. Moreover the coordinates of those points in transformed image are polar. Can I use openCV to form the new image using the transformed polar coordinates?
REF: http://www.physics.uoguelph.ca/phyjlh/morph/Anamorph.pdf
You have two problems here. In my understanding, the bigger problem arises because you are converting discrete integral coordinates into floating point coordinates. The other problem is that the resulting image's size is larger or smaller than the original image's size. Additionally, the resulting image does not have to be rectangular, so it will have to be either cropped, or filled with black pixels along the corners.
According to http://opencv.willowgarage.com/documentation/geometric_image_transformations.html there is no radial transformation routine.
I'd suggest you do the following:
Upscale the original image to have width*2, height*2. Set the new image to black. (cvResize, cvZero)
Run over each pixel in the original image. Find the new coordinates of the pixel. Add 1/9 of its value to all 8 neighbors of the new
coordinates, and to the new coordinates itself. (CV_IMAGE_ELEM(...) +=
1.0/9 * ....)
Downscale the new image back to the original width, height.
Depending on the result, you may want to use a sharpening routine.
If you want to KEEP some pixels that go out of bounds, that's a different question. Basically you want to find Min and Max of the coordinates you receive, so for example your original image has Min,Max = [0,1024] and your new MinNew,MaxNew = [-200,1200] you make a function
normalize(int &convertedx,int &convertedy)
{
convertedx = MinNewX + (MaxNewX-MinNewX)/(MaxX-MinX) * convertedx;
convertedy = ...;
}
I know that I must call one of the following before each call to glVertex:
glTexCoord(0,0);
glTexCoord(0,1);
glTexCoord(1,1);
glTexCoord(1,0);
But I have no idea what they mean. I know, however, that if I multiply (or is that divide?) the right side (or is it all the ones?) by two, my texture expands, and if I do the opposite, my texture repeats twice. I've managed to code a texture atlas by applying operations until it worked. But I have no proper idea about what's going on. Why does dividing these coordinates affect the image and why does reversing them mirror it? How do texture coordinates work?
Texture coordinates specify the point in the texture image that will correspond to the vertex you are specifying them for. Think of a rectangular rubber sheet with your texture image printed on it, where the length of each side is normalized to the range 0-1. Now let's say you wanted to draw a triangle using that texture. You'd take 3 pins and place them in the rubber sheet in the positions of each of your desired texture coordinates. (Say [0, 0], [1, 0] and [1, 1]) then move those pins (without taking them out) to your desired vertex coordinates (Say [0, 0], [0.5, 0] and [1, 1]), so that the rubber sheet is stretched out and the image is distorted. That's basically how texture coordinates work.
If you use texture coordinates greater than 1 and your texture is set to repeat, then it's as if the rubber sheet was infinite in size and the texture was tiled across it. Therefore if your texture coordinates for two vertices were 0, 0 and 4, 0, then the image would have to be repeated 4 times between those vertices.
#b1nary.atr0phy Image for all you visual thinkers!
OpenGL uses inverse texturing. It takes coordinates from world space (X,Y,Z) to texture space (X,Y) to discrete space(U,V), where the discrete space is in the [0,1] domain.
Take a polygon, think of it as a sheet of paper. With this:
glTexCoord(0,0);
glTexCoord(0,1);
glTexCoord(1,1);
glTexCoord(1,0);
You tell OpenGL to draw on the whole sheet of paper. When you apply modifications your texturing space modifies accordingly to the give coordinates. That is why for example when you divide you get the same texture twice, you tell OpenGL to map half of your sheet, instead of the whole sheet of paper.
Chapter 9 of the Red Book explains this in detail and is available for free online.
http://www.glprogramming.com/red/chapter09.html
Texture coordinates map x,y to the space 0-1 in width and height texture space. This is then stretched like a rubber sheet over the triangles. It is best explained with pictures and the Red Book does this.
For 2D image textures, 0,0 in texture coordinates corresponds to the bottom left corner of the image, and 1,1 in texture coordinates corresponds to the top right corner of the image. Note that "bottom left corner of the image" is not at the center of the bottom left pixel, but at the edge of the pixel.
Also interesting when uploading images:
8.5.3 Texture Image Structure
The texture image itself (referred to by data) is a sequence of groups of values. The first group is the lower left back corner of the texture image. Subsequent groups fill out rows of width width from left to right; height rows are stacked from bottom to top forming a single two-dimensional image slice; and depth slices are stacked from back to front.
Note that most image formats have the data start at the top, not at the bottom row.