How to combine two remap() operations into one? - c++

I have a tight loop, where I get a camera image, undistort it and also transform it according to some transformation (e.g. a perspective transform). I already figured out to use cv::remap(...) for each operation, which is already much more efficient than using plain matrix operations.
In my understanding it should be possible to combine the lookup maps into one and call remap just once in every loop iteration. Is there a canonical way to do this? I would prefer not to implement all the interpolation stuff myself.
Note: The procedure should work with differently sized maps. In my particular case the undistortion preserves the image dimensions, while the other transformation scales the image to a different size.
Code for illustration:
// input arguments
const cv::Mat_<math::flt> intrinsic = getIntrinsic();
const cv::Mat_<math::flt> distortion = getDistortion();
const cv::Mat mNewCameraMatrix = cv::getOptimalNewCameraMatrix(intrinsic, distortion, myImageSize, 0);
// output arguments
cv::Mat undistortMapX;
cv::Mat undistortMapY;
// computes undistortion maps
cv::initUndistortRectifyMap(intrinsic, distortion, cv::Mat(),
newCameraMatrix, myImageSize, CV_16SC2,
undistortMapX, undistortMapY);
// computes undistortion maps
// ...computation of mapX and mapY omitted
cv::convertMaps(mapX, mapY, skewMapX, skewMapY, CV_16SC2);
for(;;) {
cv::Mat originalImage = getNewImage();
cv::Mat undistortedImage;
cv::remap(originalImage, undistortedImage, undistortMapX, undistortMapY, cv::INTER_LINEAR);
cv::Mat skewedImage;
cv::remap(undistortedImage, skewedImage, skewMapX, skewMapY, cv::INTER_LINEAR);
outputImage(skewedImage);
}

You can apply remap on undistortMapX and undistortMapY.
cv::remap(undistortMapX, undistrtSkewX, skewMapX, skewMapY, cv::INTER_LINEAR);
cv::remap(undistortMapY, undistrtSkewY, skewMapX, skewMapY, cv::INTER_LINEAR);
Than you can use:
cv::remap(originalImage , skewedImage, undistrtSkewX, undistrtSkewY, cv::INTER_LINEAR);
It works because skewMaps and undistortMaps are arrays of coordinates in image, so it should be similar to taking location of location...
Edit (answer to comments):
I think I need to make some clarification. remap() function calculates pixels in new image from pixels of old image. In case of linear interpolation each pixel in new image is a weighted average of 4 pixels from the old image. The weights differ from pixel to pixel according to values from provided maps. If the value is more or less integer, then most of the weight is taken from single pixel. As a result new image will be as sharp is original image. On the other hand, if the value is far from being integer (i.e. integer + 0.5) then the weights are similar. This will create smoothing effect. To get a feeling of what I am talking about, look at the undistorted image. You will see that some parts of the image are sharper/smoother than other parts.
Now back to the explanation about what happened when you combined two remap operations into one. The coordinates in combined maps are correct, i.e. pixel in skewedImage is calculated from correct 4 pixels of originalImage with correct weights. But it is not identical to result of two remap operations. Each pixel in undistortedImage is a weighted average of 4 pixels from originalImage. This means that each pixel of skewedImage would be a weighted average of 9-16 pixels from orginalImage. Conclusion: using single remap() can NOT possibly give result that is identical to two usages of remap().
Discussion about which of the two possible images (single remap() vs double remap()) is better is quite complicated. Normally it is good to make as little interpolations as possible, because each interpolation introduces different artifacts. Especially if the artifacts are not uniform in the image (some regions became more smooth than others). In some cases those artifacts may have good visual effect on the image - like reducing some of the jitter. But if this is what you want, you can achieve this in cheaper and more consistent ways. For example by smoothing original image prior to remaping.

In the case of two general mappings, there is no choice but to use the approach suggested by #MichaelBurdinov.
However, in the special case of two mappings with known inverse mappings, an alternative approach is to compute the maps manually. This manual approach is more accurate than the double remap one, since it does not involve interpolation of coordinate maps.
In practice, most of the interesting applications match this special case. It does too in your case because your first map corresponds to image undistortion (whose inverse operation is image distortion, which is associated to a well known analytical model) and your second map corresponds to a perspective transform (whose inverse can be expressed analytically).
Computing the maps manually is actually quite easy. As stated in the documentation (link) these maps contain, for each pixel in the destination image, the (x,y) coordinates where to find the appropriate intensity in the source image. The following code snippet shows how to compute the maps manually in your case:
int dst_width=...,dst_height=...; // Initialize the size of the output image
cv::Mat Hinv=H.inv(), Kinv=K.inv(); // Precompute the inverse perspective matrix and the inverse camera matrix
cv::Mat map_undist_warped_x32f(dst_height,dst_width,CV_32F); // Allocate the x map to the correct size (n.b. the data type used is float)
cv::Mat map_undist_warped_y32f(dst_height,dst_width,CV_32F); // Allocate the y map to the correct size (n.b. the data type used is float)
// Loop on the rows of the output image
for(int y=0; y<dst_height; ++y) {
std::vector<cv::Point3f> pts_undist_norm(dst_width);
// For each pixel on the current row, first use the inverse perspective mapping, then multiply by the
// inverse camera matrix (i.e. map from pixels to normalized coordinates to prepare use of projectPoints function)
for(int x=0; x<dst_width; ++x) {
cv::Mat_<float> pt(3,1); pt << x,y,1;
pt = Kinv*Hinv*pt;
pts_undist_norm[x].x = pt(0)/pt(2);
pts_undist_norm[x].y = pt(1)/pt(2);
pts_undist_norm[x].z = 1;
}
// For each pixel on the current row, compose with the inverse undistortion mapping (i.e. the distortion
// mapping) using projectPoints function
std::vector<cv::Point2f> pts_dist;
cv::projectPoints(pts_undist_norm,cv::Mat::zeros(3,1,CV_32F),cv::Mat::zeros(3,1,CV_32F),intrinsic,distortion,pts_dist);
// Store the result in the appropriate pixel of the output maps
for(int x=0; x<dst_width; ++x) {
map_undist_warped_x32f.at<float>(y,x) = pts_dist[x].x;
map_undist_warped_y32f.at<float>(y,x) = pts_dist[x].y;
}
}
// Finally, convert the float maps to signed-integer maps for best efficiency of the remap function
cv::Mat map_undist_warped_x16s,map_undist_warped_y16s;
cv::convertMaps(map_undist_warped_x32f,map_undist_warped_y32f,map_undist_warped_x16s,map_undist_warped_y16s,CV_16SC2);
Note: H above is your perspective transform while Kshould be the camera matrix associated with the undistorted image, so it should be what in your code is called newCameraMatrix (which BTW is not an output argument of initUndistortRectifyMap). Depending on your specific data, there might also be some additional cases to handle (e.g. division by pt(2) when it might be zero, etc).

I found this question when looking to combine dewarping (undistortion) and projection tranforms in python, but there is no direct python answer.
Here is an direct conversion of BConic's answer in python
import numpy as np
import cv2
dst_width = ...
dst_height = ...
h_inv = np.linalg.inv(h)
k_inv = np.linalg.inv(new_camera_matrix)
map_x = np.zeros((dst_height, dst_width), dtype=np.float32)
map_y = np.zeros((dst_height, dst_width), dtype=np.float32)
for y in range(dst_height):
pts_undist_norm = np.zeros((dst_width, 3, 1))
for x in range(dst_width):
pt = np.array([x, y, 1]).reshape(3,1)
pt2 = k_inv # h_inv # pt
pts_undist_norm[x][0] = pt2[0]/pt2[2]
pts_undist_norm[x][1] = pt2[1]/pt2[2]
pts_undist_norm[x][2] = 1
r_vec = np.zeros((3,1))
t_vec = np.zeros((3,1))
pts_dist, _ = cv2.projectPoints(pts_undist_norm, r_vec, t_vec, intrinsic, distortion)
pts_dist = pts_dist.squeeze()
for x2 in range(dst_width):
map_x[y][x2] = pts_dist[x2][0]
map_y[y][x2] = pts_dist[x2][1]
# using CV_16SC2 introduced substantial image artifacts for me
map_x_final, map_y_final = cv2.convertMaps(map_x, map_y, cv2.CV_32FC1, cv2.CV_32FC1)
This is obviously really slow since it is using a double for loop and iterating through every pixel, so you can do it much faster using numpy. You should be able to do something similar in C++ to eliminate the for loops and do a single matrix multiplication.
import numpy as np
import cv2
dst_width = ...
dst_height = ...
h_inv = np.linalg.inv(h)
k_inv = np.linalg.inv(new_camera_matrix)
m_grid = np.mgrid[0:dst_width, 0:dst_height].reshape(2, dst_height*dst_width)
m_grid = np.insert(m_grid, 2, 1, axis=0)
m_grid_result = k_inv # h_inv # m_grid
pts_undist_norm = m_grid_result[:2, :] / m_grid_result[2, :]
pts_undist_norm = np.insert(pts_undist_norm, 2, 1, axis=0)
r_vec = np.zeros((3,1))
t_vec = np.zeros((3,1))
pts_dist = cv2.projectPoints(pts_undist_norm, r_vec, t_vec, intrinsic, distortion)
pts_dist = pts_dist.squeeze().astype(np.float32)
map_x = pts_dist[:, 0].reshape(dst_width, dst_height).swapaxes(0,1)
map_y = pts_dist[:, 1].reshape(dst_width, dst_height).swapaxes(0,1)
# using CV_16SC2 introduced substantial image artifacts for me
map_x_final, map_y_final = cv2.convertMaps(map_x, map_y, cv2.CV_32FC1, cv2.CV_32FC1)
This numpy implementation is roughly 25-75x faster than the first method.

I came across the same problem. I tried to implement AldurDisciple's answer. Instead of calculating transformation in a loop. I'm having a mat with mat.at <Vec2f>(x,y)=Vec2f(x,y) and applying perspectiveTransform to this mat. Add a 3rd channel of "1" to the result mat and apply projectPoints.
Here is my code
Mat xy(2000, 2500, CV_32FC2);
float *pxy = (float*)xy.data;
for (int y = 0; y < 2000; y++)
for (int x = 0; x < 2500; x++)
{
*pxy++ = x;
*pxy++ = y;
}
// perspective transformation of coordinates of destination image,
// which generates the map from destination image to norm points
Mat pts_undist_norm(2000, 2500, CV_32FC2);
Mat matPerspective =transRot3x3;
perspectiveTransform(xy, pts_undist_norm, matPerspective);
//add 3rd channel of 1
vector<Mat> channels;
split(pts_undist_norm, channels);
Mat channel3(2000, 2500, CV_32FC1, cv::Scalar(float(1.0)));
channels.push_back(channel3);
Mat pts_undist_norm_3D(2000, 2500, CV_32FC3);
merge(channels, pts_undist_norm_3D);
//projectPoints to extend the map from norm points back to the original captured image
pts_undist_norm_3D = pts_undist_norm_3D.reshape(0, 5000000);
Mat pts_dist(5000000, 1, CV_32FC2);
projectPoints(pts_undist_norm_3D, Mat::zeros(3, 1, CV_64F), Mat::zeros(3, 1, CV_64F), intrinsic, distCoeffs, pts_dist);
Mat maps[2];
pts_dist = pts_dist.reshape(0, 2000);
split(pts_dist, maps);
// apply map
remap(originalImage, skewedImage, maps[0], maps[1], INTER_LINEAR);
The transformation matrix used to map to norm points is a bit different from the one used in AldurDisciple's answer. transRot3x3 is composed from tvec and rvec generated by calibrateCamera.
double transData[] = { 0, 0, tvecs[0].at<double>(0), 0, 0,
tvecs[0].at<double>(1), 0, 0, tvecs[0].at<double>(2) };
Mat translate3x3(3, 3, CV_64F, transData);
Mat rotation3x3;
Rodrigues(rvecs[0], rotation3x3);
Mat transRot3x3(3, 3, CV_64F);
rotation3x3.col(0).copyTo(transRot3x3.col(0));
rotation3x3.col(1).copyTo(transRot3x3.col(1));
translate3x3.col(2).copyTo(transRot3x3.col(2));
Added:
I realized if the only needed map is the final map why not just use projectPoints to a mat with mat.at(x,y)=Vec2f(x,y,0) .
//generate a 3-channel mat with each entry containing it's own coordinates
Mat xyz(2000, 2500, CV_32FC3);
float *pxyz = (float*)xyz.data;
for (int y = 0; y < 2000; y++)
for (int x = 0; x < 2500; x++)
{
*pxyz++ = x;
*pxyz++ = y;
*pxyz++ = 0;
}
// project coordinates of destination image,
// which generates the map from destination image to source image directly
xyz=xyz.reshape(0, 5000000);
Mat pts_dist(5000000, 1, CV_32FC2);
projectPoints(xyz, rvecs[0], tvecs[0], intrinsic, distCoeffs, pts_dist);
Mat maps[2];
pts_dist = pts_dist.reshape(0, 2000);
split(pts_dist, maps);
//apply map
remap(originalImage, skewedImage, maps[0], maps[1], INTER_LINEAR);

Related

C++ OpenCV use vector<Point> as index of a matrix

I have a matrix img (480*640 pixel, float 64 bits) on which I apply a complex mask. After this, I need to multiply my matrix by a value but in order to win time I want to do this multiplication only on the non-zero elements because for now the multiplication is too long because I have to iterate the operation 2000 times on 2000 different matrix but with the same mask. So I found the index (on x/y axes) of the nonzero pixels which I keep in a vector of Point. But I don't succeed to use this vector to do the multplication only on the pixels indexed in this same vector.
Here is an example (with a simple mask) to understand my problem :
Mat img_temp(480, 640, CV_64FC1);
Mat img = img_temp.clone();
Mat mask = Mat::ones(img.size(), CV_8UC1);
double value = 3.56;
// Apply mask
img_temp.copyTo(img, mask);
// Finding non zero elements
vector<Point> nonZero;
findNonZero(img, nonZero);
// Previous multiplication (long because on all pixels)
Mat result = img.clone()*value;
// What I wish to do : multiplication only on non-zero pixels (not functional)
Mat result = Mat::zeros(img.size(), CV_64FC1);
result.at<int>(nonZero) = img.at(nonZero).clone() * value
What is tricky is that my pixels are not on a range (for example pixels 3, 4 and 50, 51 on a line).
Thank you in advance.
I would suggest using Mat.convertTo.
Basically, for the parameter alpha, which is the scaling factor, use the value of the mask (3.56 in your case). Make sure that the Mat is of type CV_32 or CV_64.
This will be faster than finding all non-zero pixels, saving their coordinates in a Vector and iterating (it was faster for me in Java).
Hope it helps!
Constructing vector of points will also increase computation time. I think you should consider iterating over all pixels and multiply if the pixel is not equal to zero.
Iterating will be faster if you have the matrix as raw data.
If you do
Mat result = img*value;
Instead of
Mat result = img.clone()*value;
The speed will be almost 10 times as fast
I have also tested your suggestion with vector but this is even slower than your first solution.
Below the code I used to test your firs suggestion
cv::Mat multMask(cv::Mat &img, std::vector<cv::Point> mask, double fact)
{
if (img.type() != CV_64FC1) throw "invalid format";
cv::Mat res = cv::Mat::zeros(img.size(), img.type());
int iLen = (int)mask.size();
for (int i = 0; i < iLen; i++)
{
cv::Point &p = mask[i];
((double*)(res.data + res.step.p[0] * p.y))[p.x] = ((double*)(img.data + img.step.p[0] * p.y))[p.x] * fact;
}
return res;
}

Implementing Structured Tensor

I am trying to implement a paper called Structured Tensor Based Image Interpolation. In the paper what it does is the use structure tensor to classify each pixel in an image into three different classes (uniform, corners and edges) based on eigen values of a structured tensor.
To a achieve this I have written the following code:
void tensorComputation(Mat dx, Mat dy, Mat magnitude)
{
Mat dx2, dy2, dxy;
GaussianBlur(magnitude, magnitude, Size(3, 3), 0, 0, BORDER_DEFAULT);
// Calculate image derivatives
multiply(dx, dx, dx2);
multiply(dy, dy, dy2);
multiply(dx, dy, dxy);
Mat t(2, 2, CV_32F); // tensor matrix
// Insert values to the tensor matrix.
t.at<float>(0, 0) = sum(dx2)[0];
t.at<float>(0, 1) = sum(dxy)[0];
t.at<float>(1, 0) = sum(dxy)[0];
t.at<float>(1, 1) = sum(dy2)[0];
// eigen decomposition to get the main gradient direction.
Mat eigVal, eigVec;
eigen(t, eigVal, eigVec);
// This should compute the angle of the gradient direction based on the first eigenvector.
float* eVec1 = eigVec.ptr<float>(0);
float* eVec2 = eigVec.ptr<float>(1);
cout << fastAtan2(eVec1[0], eVec1[1]) << endl;
cout << fastAtan2(eVec2[0], eVec2[1]) << endl;
}
Here dx, dy, magnitude are derivative in x-axis, derivative in y- axis and magnitude of an image respectively.
What I know is I have found structured tensor for the entire image. But my problem is that I need to compute structured tensor for each pixel in an image. How to achieve this?
In your code you blur magnitude, but then don't use it. You don't need this magnitude at all.
You build the structure tensor correctly, but you average over the whole image. What you want to do is apply local averaging. For each pixel, the structure tensor is the average of your matrix over the pixels in the neighborhood. You compute this by applying a Gaussian blur to each of the components of the tensor: dx2, dy2, and dxy.
The larger the sigma of the Gaussian, the larger the neighborhood you average over. You get more regularization (less sensitive to noise) but also less resolution (less sensitive to small variations and short edges). Play around with the parameter until you get what you need. Sigma between 2 and 5 are quite common.
Next, you need to compute the eigendecomposition per pixel. I don't know if OpenCV makes this easy. I recommend you use DIPlib 3 instead. It has the right infrastructure to compute and use the structure tensor. See here how easy it can be.

OpenCV Center Homography

I am trying to create a stitching algorithm. I have been successful in creating it with a few tweaks needed. The photos below are examples of my stitching program so far. I am able to provide it with an unordered list of image (so long as image is in flight path or side by side it will work regardless of their orientation to one another.
The issue is if the images are reversed some of the image doesn't make it into the final product. Here is the code for the actual stitching. Assume that finding keypoints, matching, and homography is done correctly.
By altering this code is there a way to centre the first image to the destination blank image and still stitch to it. Also, I got this code on stack overflow (Opencv Image Stitching or Panorama ) and am not fully sure how it works and would love if someone could explain it.
Thanks for any help in advance!
Mat stitchMatches(Mat image1,Mat image2, Mat homography){
Mat result;
vector<Point2f> fourPoint;
//-Get the four corners of the first image (master)
fourPoint.push_back(Point2f (0,0));
fourPoint.push_back(Point2f (image1.size().width,0));
fourPoint.push_back(Point2f (0, image1.size().height));
fourPoint.push_back(Point2f (image1.size().width, image1.size().height));
Mat destination;
perspectiveTransform(Mat(fourPoint), destination, homography);
double min_x, min_y, tam_x, tam_y;
float min_x1, min_x2, min_y1, min_y2, max_x1, max_x2, max_y1, max_y2;
min_x1 = min(fourPoint.at(0).x, fourPoint.at(1).x);
min_x2 = min(fourPoint.at(2).x, fourPoint.at(3).x);
min_y1 = min(fourPoint.at(0).y, fourPoint.at(1).y);
min_y2 = min(fourPoint.at(2).y, fourPoint.at(3).y);
max_x1 = max(fourPoint.at(0).x, fourPoint.at(1).x);
max_x2 = max(fourPoint.at(2).x, fourPoint.at(3).x);
max_y1 = max(fourPoint.at(0).y, fourPoint.at(1).y);
max_y2 = max(fourPoint.at(2).y, fourPoint.at(3).y);
min_x = min(min_x1, min_x2);
min_y = min(min_y1, min_y2);
tam_x = max(max_x1, max_x2);
tam_y = max(max_y1, max_y2);
Mat Htr = Mat::eye(3,3,CV_64F);
if (min_x < 0){
tam_x = image2.size().width - min_x;
Htr.at<double>(0,2)= -min_x;
}
if (min_y < 0){
tam_y = image2.size().height - min_y;
Htr.at<double>(1,2)= -min_y;
}
result = Mat(Size(tam_x*2,tam_y*2), CV_32F);
warpPerspective(image2, result, Htr, result.size(), INTER_LINEAR, BORDER_CONSTANT, 0);
warpPerspective(image1, result, (Htr*homography), result.size(), INTER_LINEAR, BORDER_TRANSPARENT,0);
return result;`
It's normally easy to center an image; you simply create a bigger matrix padded with zeros (or whatever color you want), and define an ROI in the center with the same size of your image, and place it in there. However, you cannot in general do this with your two images. The problem is that if an image is shifted, or rotated, so that parts of it are outside your destination image bounds, then your returned warped image from warpPerspective is cut off at those bounds. What you need to do is create the padded image, insert the image that is not being warped wherever you like, and modify the transformation (homography, in this case) by adding in the translation to those pixels.
For example, if your centered image has it's top-left point at (400,500) in the padded image, then you need to add a translation of (400, 500) to your homography so the pixels get mapped to the correct space, and as long as your padded image is large enough, none of it will be cut off.
You will need to create a translational homography and compose it with your original homography to add the translation in. For example, suppose your anchor point for the non-warped image inside the padded image is at (x,y). Translation in an homography is given by the last two columns; if your homography is a 3x3 matrix H then (using normal mathematical indexing) H(1,3) is your translation in x and H(2,3) is the translation in y given by your homography. So we need to create a new identity homography H_t and add those translations in:
1 0 x
H_t = 0 1 y
0 0 1
Then you can compose this with your original homography H (using matrix multiplication): H_n = H_t * H. Using the new homography H_n we can warp the image into this padded space with that added translation to move it to the correct spot using warpPerspective as usual.
You can also automate this to pad the image precisely as much as it needs, so that you don't have excess padding and the padding will stretch only as needed. See my answer here for a detailed explanation of how to calculate that and warp your images into the padded space.

creating a bounding box around a field of optical flow paths

I have used cv::calcOpticalFlowFarneback to calculate the optical flow in the current and previous frames of video with ofxOpenCv in openFrameworks.
I then draw the video with the optical flow field on top and then draw vectors showing the flow of motion in areas that are above a certain threshold.
What I want to do now is create a bounding box of those areas of motion and get the centroid and store that x,y position in a variable for tracking.
This is how I'm drawing my flow field if that helps.
if (calculatedFlow){
ofSetColor( 255, 255, 255 );
video.draw( 0, 0);
int w = gray1.width;
int h = gray1.height;
//1. Input images + optical flow
ofPushMatrix();
ofScale( 4, 4 );
//Optical flow
float *flowXPixels = flowX.getPixelsAsFloats();
float *flowYPixels = flowY.getPixelsAsFloats();
ofSetColor( 0, 0, 255 );
for (int y=0; y<h; y+=5) {
for (int x=0; x<w; x+=5) {
float fx = flowXPixels[ x + w * y ];
float fy = flowYPixels[ x + w * y ];
//Draw only long vectors
if ( fabs( fx ) + fabs( fy ) > .5 ) {
ofDrawRectangle( x-0.5, y-0.5, 1, 1 );
ofDrawLine( x, y, x + fx, y + fy );
}
}
}
}
For what you are asking, there is no simple answer. Here is a suggested solution. It involves multiple steps, but if your domain is simple enough, you could simplify this.
For each frame,
Calculate flow as two images flow_x,flow_y comparing current frame with previous frame using farneback method.(you seem to be doing this, in your code)
Translate the flow images into an hsv image, where the hue component of each pixel denotes the angle of the flow atan2(flow_y/flow_x) and value component of each pixel denotes the magnitude of the flow sqrt(flow_x\*\*2 + flow_y\*\*2)
In the above step, use your thresholding mechanism to suppress flow- pixels (make them black) whose magnitude falls below a certain threshold.
Segment the HSV image based on color ranges. You could use apriori information about your domain, or you could take histogram of hue components and identify prominent ranges of hues to classify pixels. As a result of this step, you can assign a class to each pixel.
Separate the pixels belonging to each class into multiple images. All pixels belonging to segmented class-1 will goto image-1, all pixels belonging to segmented class-2 will go to image-2 etc. Now each segmented image contains pixels in the HSV image, in a particular color range.
Transform each segmented image as a black and white image, and using opencv's morphological operations split into multiple regions using connectivity. (connected components).
Find the centroid of each connected component.
I found this reference to be helpful in this context.
I resolved my problem by creating new image from my flowX, and flowY. This was done by adding flowX and flowY to a new CV FloatImage.
flowX +=flowY;
flowXY = flowX;
Then I was able to do the contour finding from the pixels of the newly created image and then I could store all the centroids of all the blobs of movement.
Like so:
contourFinder.findContours( mask, 10, 10000, 20, false );
//Storing the objects centers with contour finder.
vector<ofxCvBlob> &blobs = contourFinder.blobs;
int n = blobs.size(); //Get number of blobs
obj.resize( n ); //Resize obj array
for (int i=0; i<n; i++) {
obj[i] = blobs[i].centroid; //Fill obj array
}
I initially noticed that movement was only being tracked in one direction in the x-axis and y-axis because of negative values. I resolved this by changing the calculation for my optical flow by calling the abs() function in cv::Mat.
Mat img1( gray1.getCvImage() ); //Create OpenCV images
Mat img2( gray2.getCvImage() );
Mat flow;
calcOpticalFlowFarneback( img1, img2, flow, 0.7, 3, 11, 5, 5, 1.1, 0 );
//Split flow into separate images
vector<Mat> flowPlanes;
Mat newFlow;
newFlow = abs(flow); //abs flow so values are absolute. Allows tracking in both directions.
split( newFlow, flowPlanes );
//Copy float planes to ofxCv images flowX and flowY
IplImage iplX( flowPlanes[0] );
flowX = &iplX;
IplImage iplY( flowPlanes[1] );
flowY = &iplY;

Cumulative Homography Wrongly Scaling

I'm to build a panorama image of the ground covered by a downward facing camera (at a fixed height, around 1 metre above ground). This could potentially run to thousands of frames, so the Stitcher class' built in panorama method isn't really suitable - it's far too slow and memory hungry.
Instead I'm assuming the floor and motion is planar (not unreasonable here) and trying to build up a cumulative homography as I see each frame. That is, for each frame, I calculate the homography from the previous one to the new one. I then get the cumulative homography by multiplying that with the product of all previous homographies.
Let's say I get H01 between frames 0 and 1, then H12 between frames 1 and 2. To get the transformation to place frame 2 onto the mosaic, I need to get H01*H12. This continues as the frame count increases, such that I get H01*H12*H23*H34*H45*....
In code, this is something akin to:
cv::Mat previous, current;
// Init cumulative homography
cv::Mat cumulative_homography = cv::Mat::eye(3);
video_stream >> previous;
for(;;) {
video_stream >> current;
// Here I do some checking of the frame, etc
// Get the homography using my DenseMosaic class (using Farneback to get OF)
cv::Mat tmp_H = DenseMosaic::get_homography(previous,current);
// Now normalise the homography by its bottom right corner
tmp_H /= tmp_H.at<double>(2, 2);
cumulative_homography *= tmp_H;
previous = current.clone( );
}
It works pretty well, except that as the camera moves "up" in the viewpoint, the homography scale decreases. As it moves down, the scale increases again. This gives my panoramas a perspective type effect that I really don't want.
For example, this is taken on a few seconds of video moving forward then backward. The first frame looks ok:
The problem comes as we move forward a few frames:
Then when we come back again, you can see the frame gets bigger again:
I'm at a loss as to where this is coming from.
I'm using Farneback dense optical flow to calculate pixel-pixel correspondences as below (sparse feature matching doesn't work well on this data) and I've checked my flow vectors - they're generally very good, so it's not a tracking problem. I also tried switching the order of the inputs to find homography (in case I'd mixed up the frame numbers), still no better.
cv::calcOpticalFlowFarneback(grey_1, grey_2, flow_mat, 0.5, 6,50, 5, 7, 1.5, flags);
// Using the flow_mat optical flow map, populate grid point correspondences between images
std::vector<cv::Point2f> points_1, points_2;
median_motion = DenseMosaic::dense_flow_to_corresp(flow_mat, points_1, points_2);
cv::Mat H = cv::findHomography(cv::Mat(points_2), cv::Mat(points_1), CV_RANSAC, 1);
Another thing I thought it could be was the translation I include in the transformation to ensure my panorama is centred within the scene:
cv::warpPerspective(init.clone(), warped, translation*homography, init.size());
But having checked the values in the homography before the translation is applied, the scaling issue I mention is still present.
Any hints are gratefully received. There's a lot of code I could put in but it seems irrelevant, please do let me know if there's something missing
UPDATE
I've tried switching out the *= operator for the full multiplication and tried reversing the order the homographies are multiplied in, but no luck. Below is my code for calculating the homography:
/**
\brief Calculates the homography between the current and previous frames
*/
cv::Mat DenseMosaic::get_homography()
{
cv::Mat grey_1, grey_2; // Grayscale versions of frames
cv::cvtColor(prev, grey_1, CV_BGR2GRAY);
cv::cvtColor(cur, grey_2, CV_BGR2GRAY);
// Calculate the dense flow
int flags = cv::OPTFLOW_FARNEBACK_GAUSSIAN;
if (frame_number > 2) {
flags = flags | cv::OPTFLOW_USE_INITIAL_FLOW;
}
cv::calcOpticalFlowFarneback(grey_1, grey_2, flow_mat, 0.5, 6,50, 5, 7, 1.5, flags);
// Convert the flow map to point correspondences
std::vector<cv::Point2f> points_1, points_2;
median_motion = DenseMosaic::dense_flow_to_corresp(flow_mat, points_1, points_2);
// Use the correspondences to get the homography
cv::Mat H = cv::findHomography(cv::Mat(points_2), cv::Mat(points_1), CV_RANSAC, 1);
return H;
}
And this is the function I use to find the correspondences from the flow map:
/**
\brief Calculate pixel->pixel correspondences given a map of the optical flow across the image
\param[in] flow_mat Map of the optical flow across the image
\param[out] points_1 The set of points from #cur
\param[out] points_2 The set of points from #prev
\param[in] step_size The size of spaces between the grid lines
\return The median motion as a point
Uses a dense flow map (such as that created by cv::calcOpticalFlowFarneback) to obtain a set of point correspondences across a grid.
*/
cv::Point2f DenseMosaic::dense_flow_to_corresp(const cv::Mat &flow_mat, std::vector<cv::Point2f> &points_1, std::vector<cv::Point2f> &points_2, int step_size)
{
std::vector<double> tx, ty;
for (int y = 0; y < flow_mat.rows; y += step_size) {
for (int x = 0; x < flow_mat.cols; x += step_size) {
/* Flow is basically the delta between left and right points */
cv::Point2f flow = flow_mat.at<cv::Point2f>(y, x);
tx.push_back(flow.x);
ty.push_back(flow.y);
/* There's no need to calculate for every single point,
if there's not much change, just ignore it
*/
if (fabs(flow.x) < 0.1 && fabs(flow.y) < 0.1)
continue;
points_1.push_back(cv::Point2f(x, y));
points_2.push_back(cv::Point2f(x + flow.x, y + flow.y));
}
}
// I know this should be median, not mean, but it's only used for plotting the
// general motion direction so it's unimportant.
cv::Point2f t_median;
cv::Scalar mtx = cv::mean(tx);
t_median.x = mtx[0];
cv::Scalar mty = cv::mean(ty);
t_median.y = mty[0];
return t_median;
}
It turns out this was because my viewpoint was close to the features, meaning that the non-planarity of the tracked features was causing skew to the homography. I managed to prevent this (it's more of a hack than a method...) by using estimateRigidTransform instead of findHomography, as this does not estimate for perspective variations.
In this particular case, it makes sense to do so, as the view does only ever undergo rigid transformations.