Image shadowing / rotating / flipping when applying Gaussian kernel in Fourier space - c++

DISCLAIMER UP FRONT:
I know I can smooth/blur images using OpenCV, but my assignment is more under the hood so I can understand the process. I'm new to image processing and DFT's in general, so this is proving to be a challenge.
DESCRIPTION:
I'm doing some image processing using the FFTW library and OpenCV to read an image into an OpenCV::Mat object and grayscale color space. I convert the data to type double so I can create a double pointer to it which FFTW's function for my FFT requires. This is in a Mat object named imageDouble and the pointer to it is pImageDouble.
std::string filename = "<path_to_your_image>";
Mat image = imread(filename, IMREAD_GRAYSCALE);
/***********************************************************************
*** creating Mat object of type *doube* and copying image contents to it
*** then creating a pointer to it.
***********************************************************************/
Mat imageDouble;
image.convertTo(imageDouble, CV_64FC1);
double* pImageDouble = imageDouble.ptr<double>(0);
I also have a Gaussian kernel that I've read into a Mat object. I perform a circular shift where the center (and highest) value of the Gaussian kernel is shifted to the top left (the (0,0) position) of the kernel, and then I zero-pad the kernel to the size of the original image that I read in. The results (type double) are stored in a Mat object named paddedGKernel and the pointer is named pPaddedGKernel.
Mat paddedGKernel = padGKernelWithZeros(nRows, nCols, rolledGKernel);
double* pPaddedGKernel = paddedGKernel.ptr<double>(0);
I initializefftw_complex objects to output the FFT results into, an fftw_plan, and then I allocate the memory for the fftw_complex objects and execute the plan. I use FFTW's fftw_plan_dft_r2c_2d() function to perform an FFT on both 2d objects (image and shifted/padded Gaussian kernel), then perform a pointwise multiplication in Fourier space to apply the Gaussian filter to the original image.
fftw_complex* out; //for result of FFT for original image
fftw_complex* outGaussian; //for result of FFT for Gaussian kernel
fftw_plan p;
/*************************************************
*** Allocating memory for the fftw_complex objects
*************************************************/
out = (fftw_complex*)fftw_malloc(sizeof(fftw_complex) * nRows * nCols);
outGaussian = (fftw_complex*)fftw_malloc(sizeof(fftw_complex) * nRows * nCols);
/****************************************
*** FFT on imageDouble, outputting to out
****************************************/
p = fftw_plan_dft_r2c_2d(imageDouble.rows, imageDouble.cols, pImageDouble, out, FFTW_ESTIMATE);
fftw_execute(p);
/**************************************************
*** FFT on paddedGKernel, outputting to outGuassian
**************************************************/
p = fftw_plan_dft_r2c_2d(paddedGKernel.rows, paddedGKernel.cols, pPaddedGKernel, outGaussian, FFTW_ESTIMATE);
fftw_execute(p);
/****************************************************
*** Pointwise multiplication to apply Gaussian kernel
****************************************************/
for (size_t i = 0; i < nCCols * nRows; i++)
{
out[i][0] = out[i][0] * (1 / outGaussian[i][0]);
}
I then perform the inverse FFT fftw_plan_dft_c2r_2d() on the result of the pointwise multiplication and output that to an OpenCV::Mat object imageCloneDouble and convert the data back to uchar, placing the data in OpenCV::Mat imageBack.
/***********************************************************************
*** Making Mat object (type *double*) to put data into and pointer to it
***********************************************************************/
Mat imageCloneDouble = Mat::zeros(nRows, nCols, CV_64FC1);
double* pImageCloneDouble = imageCloneDouble.ptr<double>(0);
/*****************************************************************************
*** Making and executing plan for inverse FFT, outputting to imageCloneDouble,
*** then normalizing since this function puts out unnormalized values.
*****************************************************************************/
fftw_plan pp = fftw_plan_dft_c2r_2d(imageCloneDouble.rows, imageCloneDouble.cols, out, pImageCloneDouble, FFTW_BACKWARD | FFTW_ESTIMATE);
fftw_execute(pp);
imageCloneDouble = imageCloneDouble / (nCols * nRows *2);
/***************************************************************************
*** Making Mat object (type *uchar*) and copying data inverse FFT data to it
***************************************************************************/
Mat imageBack;
imageCloneDouble.convertTo(imageBack, CV_8UC1);
When I show the transformed image imageBack I expect to get the original with the Gaussian kernel applied, but it looks like the original with potentially some modification overlaid with a rotated version of the original image that appears to have some modification. I can't understand where and why this flip/rotation is happening and why it is overlaid with what appears to be the original image, but I suspect it happens when I'm in Fourier space and performing a pointwise or element-wise multiplication. Either that or I'm omitting a process I need to perform on the result of the pointwise multiplication.
What do I need to do to this data in Fourier space to get back to the original image?

This is your problem:
I perform a circular shift where the center (and highest) value of the Gaussian kernel is shifted to the top left (the (0,0) position) of the kernel, and then I zero-pad the kernel to the size of the original image that I read in.
You need to reverse these two operations: first pad to the right size, then apply the circular shift. You indeed need the center of the kernel to be at (0,0). But you need the kernel to be continuous in the circularly periodic world (i.e. replicating the image you should be able to see complete Gaussian kernels). When you pad after the circular shift, you separate the four quadrants of the kernel in this circularly periodic world.
A second problem is in this line:
out[i][0] = out[i][0] * (1 / outGaussian[i][0]);
First, why divide, not multiply? You are trying to apply a convolution, no?
Second, you are only multiplying the real component of the complex numbers, leaving the imaginary component unchanged. You need to do a full complex multiplication of two complex numbers to produce a new complex number. Cast the out pointer to std::complex<double>*, then use the compiler's knowledge of complex arithmetic to do your bidding!
std::size_t N = nRows * nCols;
fftw_complex* out = fftw_alloc_complex(N);
fftw_complex* outGaussian = fftw_alloc_complex(N);
// ...
auto* out_p = reinterpret_cast<std::complex<double>*>(out);
auto* outGaussian_p = reinterpret_cast<std::complex<double>*>(outGaussian);
for (std::size_t i = 0; i < N; i++)
{
out_p[i] *= outGaussian_p[i];
}
Note that std::complex<double> and fftw_complex have the same memory layout, guaranteed by the C++ spec for std::complex, with the intention of being able to do this type of casting. No reinterpretation is actually done, only the C++ type system thinks so.

The key to fixing this issue started with what Cris pointed out as the second problem. So the first thing to do was to implement the code provided in that part of the reply.
auto* out_p = reinterpret_cast<std::complex<double>*>(out);
auto* outGaussian_p = reinterpret_cast<std::complex<double>*>(outGaussian);
for (std::size_t i = 0; i < N; i++)
{
out_p[i] *= outGaussian_p[i];
}
I had the Gaussian kernel normalized the usual way (dividing by the sum of all the values). After making the changes above from Cris, the image had the blur applied and the only issue was the range of pixel values had changed.
To fix that, I applied the formula for contrast stretching to match the original range and that solved the problem.
/* Finding the original image's minimum and maximum value */
double oMin, oMax;
minMaxLoc(image, &oMin, &oMax);
/* Finding min and max values for our 'imageBack' and rounding them. */
double min, max;
minMaxLoc(imageCloneDouble, &min, &max);
min = round(min);
max = round(max);
/* Code version of formula for contrast stretching */
if (oMin != min | oMax != max) {
double numerator = oMax - oMin;
double denominator = max - min;
double scaledMin = -(min)*numerator;
double scaledOMin = oMin * denominator;
double innerTerm = scaledMin + scaledOMin;
imageCloneDouble = ((numerator * imageCloneDouble) + innerTerm ) / denominator;
}

Related

C++ OpenCV use vector<Point> as index of a matrix

I have a matrix img (480*640 pixel, float 64 bits) on which I apply a complex mask. After this, I need to multiply my matrix by a value but in order to win time I want to do this multiplication only on the non-zero elements because for now the multiplication is too long because I have to iterate the operation 2000 times on 2000 different matrix but with the same mask. So I found the index (on x/y axes) of the nonzero pixels which I keep in a vector of Point. But I don't succeed to use this vector to do the multplication only on the pixels indexed in this same vector.
Here is an example (with a simple mask) to understand my problem :
Mat img_temp(480, 640, CV_64FC1);
Mat img = img_temp.clone();
Mat mask = Mat::ones(img.size(), CV_8UC1);
double value = 3.56;
// Apply mask
img_temp.copyTo(img, mask);
// Finding non zero elements
vector<Point> nonZero;
findNonZero(img, nonZero);
// Previous multiplication (long because on all pixels)
Mat result = img.clone()*value;
// What I wish to do : multiplication only on non-zero pixels (not functional)
Mat result = Mat::zeros(img.size(), CV_64FC1);
result.at<int>(nonZero) = img.at(nonZero).clone() * value
What is tricky is that my pixels are not on a range (for example pixels 3, 4 and 50, 51 on a line).
Thank you in advance.
I would suggest using Mat.convertTo.
Basically, for the parameter alpha, which is the scaling factor, use the value of the mask (3.56 in your case). Make sure that the Mat is of type CV_32 or CV_64.
This will be faster than finding all non-zero pixels, saving their coordinates in a Vector and iterating (it was faster for me in Java).
Hope it helps!
Constructing vector of points will also increase computation time. I think you should consider iterating over all pixels and multiply if the pixel is not equal to zero.
Iterating will be faster if you have the matrix as raw data.
If you do
Mat result = img*value;
Instead of
Mat result = img.clone()*value;
The speed will be almost 10 times as fast
I have also tested your suggestion with vector but this is even slower than your first solution.
Below the code I used to test your firs suggestion
cv::Mat multMask(cv::Mat &img, std::vector<cv::Point> mask, double fact)
{
if (img.type() != CV_64FC1) throw "invalid format";
cv::Mat res = cv::Mat::zeros(img.size(), img.type());
int iLen = (int)mask.size();
for (int i = 0; i < iLen; i++)
{
cv::Point &p = mask[i];
((double*)(res.data + res.step.p[0] * p.y))[p.x] = ((double*)(img.data + img.step.p[0] * p.y))[p.x] * fact;
}
return res;
}

Replace a chain of image blurs with one blur

In this question I asked how to implement a chain of blurs in one single step.
Then I found out from the gaussian blur page of Wikipedia that:
Applying multiple, successive gaussian blurs to an image has the same
effect as applying a single, larger gaussian blur, whose radius is the
square root of the sum of the squares of the blur radii that were
actually applied. For example, applying successive gaussian blurs with
radii of 6 and 8 gives the same results as applying a single gaussian
blur of radius 10, since sqrt {6^{2}+8^{2}}=10.
So I thought that blur and singleBlur were the same in the following code:
cv::Mat firstLevel;
float sigma1, sigma2;
//intialize firstLevel, sigma1 and sigma2
cv::Mat blur = gaussianBlur(firstLevel, sigma1);
blur = gaussianBlur(blur, sigma2);
float singleSigma = std::sqrt(std::pow(sigma1,2)+std::pow(sigma2,2));
cv::Mat singleBlur = gaussianBlur(firstLevel, singleSigma);
cv::Mat diff = blur != singleBLur;
// Equal if no elements disagree
assert( cv::countNonZero(diff) == 0);
But this assert fails (and actually, for example, the first row of blur is different from the first one of singleBlur).
Why?
UPDATE:
After different comments asking for more information, I'll update the answer.
What I'm trying to do is to parallelize this code. In particular, I'm focusing now on computing all the blurs at all levels in advance. The serial code (which works correctly) is the following:
vector<Mat> blurs ((par.numberOfScales+3)*levels, Mat());
cv::Mat octaveLayer = firstLevel;
int scaleCycles = par.numberOfScales+2;
//compute blurs at all layers (not parallelizable)
for(int i=0; i<levels; i++){
blurs[i*scaleCycles+1] = octaveLayer.clone();
for (int j = 1; j < scaleCycles; j++){
float sigma = par.sigmas[j]* sqrt(sigmaStep * sigmaStep - 1.0f);
blurs[j+1+i*scaleCycles] = gaussianBlur(blurs[j+i*scaleCycles], sigma);
if(j == par.numberOfScales)
octaveLayer = halfImage(blurs[j+1+i*scaleCycles]);
}
}
Where:
Mat halfImage(const Mat &input)
{
Mat n(input.rows/2, input.cols/2, input.type());
float *out = n.ptr<float>(0);
for (int r = 0, ri = 0; r < n.rows; r++, ri += 2)
for (int c = 0, ci = 0; c < n.cols; c++, ci += 2)
*out++ = input.at<float>(ri,ci);
return n;
}
Mat gaussianBlur(const Mat input, const float sigma)
{
Mat ret(input.rows, input.cols, input.type());
int size = (int)(2.0 * 3.0 * sigma + 1.0); if (size % 2 == 0) size++;
GaussianBlur(input, ret, Size(size, size), sigma, sigma, BORDER_REPLICATE);
return ret;
}
I'm sorry for the horrible indexes above, but I tried to respect the original code system (which is horrible, like starting counting from 1 instead of 0). The code above has scaleCycles=5 and levels=6, so 30 blurs are generated in total.
This is the "single blur" version, where first I compute the sigmas for each blur that has to be computed (following Wikipedia's formula) and then I apply the blur (notice that this is still serial and not parallelizable):
vector<Mat> singleBlurs ((par.numberOfScales+3)*levels, Mat());
vector<float> singleSigmas(scaleCycles);
float acc = 0;
for (int j = 1; j < scaleCycles; j++){
float sigma = par.sigmas[j]* sqrt(sigmaStep * sigmaStep - 1.0f);
acc += pow(sigma, 2);
singleSigmas[j] = sqrt(acc);
}
octaveLayer = firstLevel;
for(int i=0; i<levels; i++){
singleBlurs[i*scaleCycles+1] = octaveLayer.clone();
for (int j = 1; j < scaleCycles; j++){
float sigma = singleSigmas[j];
std::cout<<"j="<<j<<" sigma="<<sigma<<std::endl;
singleBlurs[j+1+i*scaleCycles] = gaussianBlur(singleBlurs[j+i*scaleCycles], sigma);
if(j == par.numberOfScales)
octaveLayer = halfImage(singleBlurs[j+1+i*scaleCycles]);
}
}
Of course the code above generates 30 blurs also with the same parameters of the previous version.
And then this is the code to see the difference between each signgleBlurs and blurs:
assert(blurs.size() == singleBlurs.size());
vector<Mat> blurDiffs(blurs.size());
for(int i=1; i<levels*scaleCycles; i++){
cv::Mat diff;
absdiff(blurs[i], singleBlurs[i], diff);
std::cout<<"i="<<i<<"diff rows="<<diff.rows<<" cols="<<diff.cols<<std::endl;
blurDiffs[i] = diff;
std::cout<<"blurs rows="<<blurs[i].rows<<" cols="<<blurs[i].cols<<std::endl;
std::cout<<"singleBlurs rows="<<singleBlurs[i].rows<<" cols="<<singleBlurs[i].cols<<std::endl;
std::cout<<"blurDiffs rows="<<blurDiffs[i].rows<<" cols="<<blurDiffs[i].cols<<std::endl;
namedWindow( "blueDiffs["+std::to_string(i)+"]", WINDOW_AUTOSIZE );// Create a window for display.
//imshow( "blueDiffs["+std::to_string(i)+"]", blurDiffs[i] ); // Show our image inside it.
//waitKey(0); // Wait for a keystroke in the window
Mat imageF_8UC3;
std::cout<<"converting..."<<std::endl;
blurDiffs[i].convertTo(imageF_8UC3, CV_8U, 255);
std::cout<<"converted"<<std::endl;
imwrite( "blurDiffs_"+std::to_string(i)+".jpg", imageF_8UC3);
}
Now, what I've seen is that blurDiffs_1.jpg and blurDiffs_2.jpg are black, but suddendly from blurDiffs_3.jpg until the blurDiffs_29.jpg becomes whiter and whiter. For some reason, blurDiffs_30.jpg is almost completely black.
The first (correct) version generates 1761 descriptors. The second (uncorrect) version generates >2.3k descriptors.
I can't post the blurDiffs matrices because (especially the first ones) are very big and post's space is limited. I'll post some samples. I'll not post blurDiffs_1.jpg and blurDiffs_2.jpg because they're totally blacks. Notice that because of halfImage the images become smaller and smaller (as expected).
blurDiffs_3.jpg:
blurDiffs_6.jpg:
blurDiffs_15.jpg:
blurDiffs_29.jpg:
How the image is read:
Mat tmp = imread(argv[1]);
Mat image(tmp.rows, tmp.cols, CV_32FC1, Scalar(0));
float *out = image.ptr<float>(0);
unsigned char *in = tmp.ptr<unsigned char>(0);
for (size_t i=tmp.rows*tmp.cols; i > 0; i--)
{
*out = (float(in[0]) + in[1] + in[2])/3.0f;
out++;
in+=3;
}
Someone here suggested to divide diff by 255 to see the real difference, but I don't understand why of I understood him correctly.
If you need any more details, please let me know.
A warning up front: I have no experience with OpenCV, but the following is relevant to computing Gaussian blur in general. And it is applicable as I threw a glance at the OpenCV documentation wrt border treatment and use of finite kernels (FIR filtering).
As an aside: your initial test was sensitive to round-off, but you have cleared up that issue and shown the errors to be much larger.
Beware image border effects. For pixels near the edge, the image is virtually extended using one of the offered methods (BORDER_DEFAULT, BORDER_REPLICATE, etc...). If your image is |abcd| and you use BORDER_REPLICATE you are effectively filtering an extended image aaaa|abcd|dddd. The result is klmn|opqr|stuv. There are new pixel values (k,l,m,n,s,t,u,v) that are immediately discarded to yield the output image |opqr|. If you now apply another Gaussian blur, this blur will operate on a newly extended image oooo|opqr|rrrr - different from the "true" intermediate result and thus giving you a result different from that obtained by a single Gaussian blur with a larger sigma. These extension methods are safe though: REFLECT, REFLECT_101, WRAP.
Using a finite kernel size the G(s1)*G(s2)=G(sqrt(s1^2+s2^2)) rule does not hold in general due to the tails of the kernel being cut off. You can reduce the error thus introduced by increasing the kernel size relative to the sigma, e.g.:
int size = (int)(2.0 * 10.0 * sigma + 1.0); if (size % 2 == 0) size++;
Point 3 seems to be the issue that is "biting" you. Do you really care whether the G(s1)*G(s2) property is preserved. Both results are wrong in a way. Does it affect the methodology that works on the result in a major way? Note that the example of using 10x sigma I have given may resolve the difference, but will be very much slower.
Update: I forgot to add what might the most practical resolution. Compute the Gaussian blur using a Fourier transform. The scheme would be:
Compute Fourier transform (FFT) of your input image
Multiply with the Fourier transform of the Gaussian kernel and compute inverse Fourier transform. Ignore the imaginary part of the complex output.
You can find the equation for the Gaussian in the frequency domain on wikipedia
You can perform the second step separately (i.e. in parallel) for each scale (sigma). The border condition implied computing the blur this way is BORDER_WRAP. If you prefer you can achieve the same but with BORDER_REFLECT if you use a discrete cosine transform (DCT) instead. Do not know if OpenCV provides one. You will be after the DCT-II
It's basically as what G.M. says. Remember you are not only rounding by floating points, you are also rounding by looking only at integer points (both on the image and on the Gaussian kernels).
Here's what I got from a small (41x41) image:
where blur and single are rounded by convertTo(...,CV8U) and diff is where they are different. So, in DSP terms, it may not be a great agreement. But in Image Processing, it's not that bad.
Also, I suspect that the different will be less significant as you perform Gaussian on bigger images.

How to combine two remap() operations into one?

I have a tight loop, where I get a camera image, undistort it and also transform it according to some transformation (e.g. a perspective transform). I already figured out to use cv::remap(...) for each operation, which is already much more efficient than using plain matrix operations.
In my understanding it should be possible to combine the lookup maps into one and call remap just once in every loop iteration. Is there a canonical way to do this? I would prefer not to implement all the interpolation stuff myself.
Note: The procedure should work with differently sized maps. In my particular case the undistortion preserves the image dimensions, while the other transformation scales the image to a different size.
Code for illustration:
// input arguments
const cv::Mat_<math::flt> intrinsic = getIntrinsic();
const cv::Mat_<math::flt> distortion = getDistortion();
const cv::Mat mNewCameraMatrix = cv::getOptimalNewCameraMatrix(intrinsic, distortion, myImageSize, 0);
// output arguments
cv::Mat undistortMapX;
cv::Mat undistortMapY;
// computes undistortion maps
cv::initUndistortRectifyMap(intrinsic, distortion, cv::Mat(),
newCameraMatrix, myImageSize, CV_16SC2,
undistortMapX, undistortMapY);
// computes undistortion maps
// ...computation of mapX and mapY omitted
cv::convertMaps(mapX, mapY, skewMapX, skewMapY, CV_16SC2);
for(;;) {
cv::Mat originalImage = getNewImage();
cv::Mat undistortedImage;
cv::remap(originalImage, undistortedImage, undistortMapX, undistortMapY, cv::INTER_LINEAR);
cv::Mat skewedImage;
cv::remap(undistortedImage, skewedImage, skewMapX, skewMapY, cv::INTER_LINEAR);
outputImage(skewedImage);
}
You can apply remap on undistortMapX and undistortMapY.
cv::remap(undistortMapX, undistrtSkewX, skewMapX, skewMapY, cv::INTER_LINEAR);
cv::remap(undistortMapY, undistrtSkewY, skewMapX, skewMapY, cv::INTER_LINEAR);
Than you can use:
cv::remap(originalImage , skewedImage, undistrtSkewX, undistrtSkewY, cv::INTER_LINEAR);
It works because skewMaps and undistortMaps are arrays of coordinates in image, so it should be similar to taking location of location...
Edit (answer to comments):
I think I need to make some clarification. remap() function calculates pixels in new image from pixels of old image. In case of linear interpolation each pixel in new image is a weighted average of 4 pixels from the old image. The weights differ from pixel to pixel according to values from provided maps. If the value is more or less integer, then most of the weight is taken from single pixel. As a result new image will be as sharp is original image. On the other hand, if the value is far from being integer (i.e. integer + 0.5) then the weights are similar. This will create smoothing effect. To get a feeling of what I am talking about, look at the undistorted image. You will see that some parts of the image are sharper/smoother than other parts.
Now back to the explanation about what happened when you combined two remap operations into one. The coordinates in combined maps are correct, i.e. pixel in skewedImage is calculated from correct 4 pixels of originalImage with correct weights. But it is not identical to result of two remap operations. Each pixel in undistortedImage is a weighted average of 4 pixels from originalImage. This means that each pixel of skewedImage would be a weighted average of 9-16 pixels from orginalImage. Conclusion: using single remap() can NOT possibly give result that is identical to two usages of remap().
Discussion about which of the two possible images (single remap() vs double remap()) is better is quite complicated. Normally it is good to make as little interpolations as possible, because each interpolation introduces different artifacts. Especially if the artifacts are not uniform in the image (some regions became more smooth than others). In some cases those artifacts may have good visual effect on the image - like reducing some of the jitter. But if this is what you want, you can achieve this in cheaper and more consistent ways. For example by smoothing original image prior to remaping.
In the case of two general mappings, there is no choice but to use the approach suggested by #MichaelBurdinov.
However, in the special case of two mappings with known inverse mappings, an alternative approach is to compute the maps manually. This manual approach is more accurate than the double remap one, since it does not involve interpolation of coordinate maps.
In practice, most of the interesting applications match this special case. It does too in your case because your first map corresponds to image undistortion (whose inverse operation is image distortion, which is associated to a well known analytical model) and your second map corresponds to a perspective transform (whose inverse can be expressed analytically).
Computing the maps manually is actually quite easy. As stated in the documentation (link) these maps contain, for each pixel in the destination image, the (x,y) coordinates where to find the appropriate intensity in the source image. The following code snippet shows how to compute the maps manually in your case:
int dst_width=...,dst_height=...; // Initialize the size of the output image
cv::Mat Hinv=H.inv(), Kinv=K.inv(); // Precompute the inverse perspective matrix and the inverse camera matrix
cv::Mat map_undist_warped_x32f(dst_height,dst_width,CV_32F); // Allocate the x map to the correct size (n.b. the data type used is float)
cv::Mat map_undist_warped_y32f(dst_height,dst_width,CV_32F); // Allocate the y map to the correct size (n.b. the data type used is float)
// Loop on the rows of the output image
for(int y=0; y<dst_height; ++y) {
std::vector<cv::Point3f> pts_undist_norm(dst_width);
// For each pixel on the current row, first use the inverse perspective mapping, then multiply by the
// inverse camera matrix (i.e. map from pixels to normalized coordinates to prepare use of projectPoints function)
for(int x=0; x<dst_width; ++x) {
cv::Mat_<float> pt(3,1); pt << x,y,1;
pt = Kinv*Hinv*pt;
pts_undist_norm[x].x = pt(0)/pt(2);
pts_undist_norm[x].y = pt(1)/pt(2);
pts_undist_norm[x].z = 1;
}
// For each pixel on the current row, compose with the inverse undistortion mapping (i.e. the distortion
// mapping) using projectPoints function
std::vector<cv::Point2f> pts_dist;
cv::projectPoints(pts_undist_norm,cv::Mat::zeros(3,1,CV_32F),cv::Mat::zeros(3,1,CV_32F),intrinsic,distortion,pts_dist);
// Store the result in the appropriate pixel of the output maps
for(int x=0; x<dst_width; ++x) {
map_undist_warped_x32f.at<float>(y,x) = pts_dist[x].x;
map_undist_warped_y32f.at<float>(y,x) = pts_dist[x].y;
}
}
// Finally, convert the float maps to signed-integer maps for best efficiency of the remap function
cv::Mat map_undist_warped_x16s,map_undist_warped_y16s;
cv::convertMaps(map_undist_warped_x32f,map_undist_warped_y32f,map_undist_warped_x16s,map_undist_warped_y16s,CV_16SC2);
Note: H above is your perspective transform while Kshould be the camera matrix associated with the undistorted image, so it should be what in your code is called newCameraMatrix (which BTW is not an output argument of initUndistortRectifyMap). Depending on your specific data, there might also be some additional cases to handle (e.g. division by pt(2) when it might be zero, etc).
I found this question when looking to combine dewarping (undistortion) and projection tranforms in python, but there is no direct python answer.
Here is an direct conversion of BConic's answer in python
import numpy as np
import cv2
dst_width = ...
dst_height = ...
h_inv = np.linalg.inv(h)
k_inv = np.linalg.inv(new_camera_matrix)
map_x = np.zeros((dst_height, dst_width), dtype=np.float32)
map_y = np.zeros((dst_height, dst_width), dtype=np.float32)
for y in range(dst_height):
pts_undist_norm = np.zeros((dst_width, 3, 1))
for x in range(dst_width):
pt = np.array([x, y, 1]).reshape(3,1)
pt2 = k_inv # h_inv # pt
pts_undist_norm[x][0] = pt2[0]/pt2[2]
pts_undist_norm[x][1] = pt2[1]/pt2[2]
pts_undist_norm[x][2] = 1
r_vec = np.zeros((3,1))
t_vec = np.zeros((3,1))
pts_dist, _ = cv2.projectPoints(pts_undist_norm, r_vec, t_vec, intrinsic, distortion)
pts_dist = pts_dist.squeeze()
for x2 in range(dst_width):
map_x[y][x2] = pts_dist[x2][0]
map_y[y][x2] = pts_dist[x2][1]
# using CV_16SC2 introduced substantial image artifacts for me
map_x_final, map_y_final = cv2.convertMaps(map_x, map_y, cv2.CV_32FC1, cv2.CV_32FC1)
This is obviously really slow since it is using a double for loop and iterating through every pixel, so you can do it much faster using numpy. You should be able to do something similar in C++ to eliminate the for loops and do a single matrix multiplication.
import numpy as np
import cv2
dst_width = ...
dst_height = ...
h_inv = np.linalg.inv(h)
k_inv = np.linalg.inv(new_camera_matrix)
m_grid = np.mgrid[0:dst_width, 0:dst_height].reshape(2, dst_height*dst_width)
m_grid = np.insert(m_grid, 2, 1, axis=0)
m_grid_result = k_inv # h_inv # m_grid
pts_undist_norm = m_grid_result[:2, :] / m_grid_result[2, :]
pts_undist_norm = np.insert(pts_undist_norm, 2, 1, axis=0)
r_vec = np.zeros((3,1))
t_vec = np.zeros((3,1))
pts_dist = cv2.projectPoints(pts_undist_norm, r_vec, t_vec, intrinsic, distortion)
pts_dist = pts_dist.squeeze().astype(np.float32)
map_x = pts_dist[:, 0].reshape(dst_width, dst_height).swapaxes(0,1)
map_y = pts_dist[:, 1].reshape(dst_width, dst_height).swapaxes(0,1)
# using CV_16SC2 introduced substantial image artifacts for me
map_x_final, map_y_final = cv2.convertMaps(map_x, map_y, cv2.CV_32FC1, cv2.CV_32FC1)
This numpy implementation is roughly 25-75x faster than the first method.
I came across the same problem. I tried to implement AldurDisciple's answer. Instead of calculating transformation in a loop. I'm having a mat with mat.at <Vec2f>(x,y)=Vec2f(x,y) and applying perspectiveTransform to this mat. Add a 3rd channel of "1" to the result mat and apply projectPoints.
Here is my code
Mat xy(2000, 2500, CV_32FC2);
float *pxy = (float*)xy.data;
for (int y = 0; y < 2000; y++)
for (int x = 0; x < 2500; x++)
{
*pxy++ = x;
*pxy++ = y;
}
// perspective transformation of coordinates of destination image,
// which generates the map from destination image to norm points
Mat pts_undist_norm(2000, 2500, CV_32FC2);
Mat matPerspective =transRot3x3;
perspectiveTransform(xy, pts_undist_norm, matPerspective);
//add 3rd channel of 1
vector<Mat> channels;
split(pts_undist_norm, channels);
Mat channel3(2000, 2500, CV_32FC1, cv::Scalar(float(1.0)));
channels.push_back(channel3);
Mat pts_undist_norm_3D(2000, 2500, CV_32FC3);
merge(channels, pts_undist_norm_3D);
//projectPoints to extend the map from norm points back to the original captured image
pts_undist_norm_3D = pts_undist_norm_3D.reshape(0, 5000000);
Mat pts_dist(5000000, 1, CV_32FC2);
projectPoints(pts_undist_norm_3D, Mat::zeros(3, 1, CV_64F), Mat::zeros(3, 1, CV_64F), intrinsic, distCoeffs, pts_dist);
Mat maps[2];
pts_dist = pts_dist.reshape(0, 2000);
split(pts_dist, maps);
// apply map
remap(originalImage, skewedImage, maps[0], maps[1], INTER_LINEAR);
The transformation matrix used to map to norm points is a bit different from the one used in AldurDisciple's answer. transRot3x3 is composed from tvec and rvec generated by calibrateCamera.
double transData[] = { 0, 0, tvecs[0].at<double>(0), 0, 0,
tvecs[0].at<double>(1), 0, 0, tvecs[0].at<double>(2) };
Mat translate3x3(3, 3, CV_64F, transData);
Mat rotation3x3;
Rodrigues(rvecs[0], rotation3x3);
Mat transRot3x3(3, 3, CV_64F);
rotation3x3.col(0).copyTo(transRot3x3.col(0));
rotation3x3.col(1).copyTo(transRot3x3.col(1));
translate3x3.col(2).copyTo(transRot3x3.col(2));
Added:
I realized if the only needed map is the final map why not just use projectPoints to a mat with mat.at(x,y)=Vec2f(x,y,0) .
//generate a 3-channel mat with each entry containing it's own coordinates
Mat xyz(2000, 2500, CV_32FC3);
float *pxyz = (float*)xyz.data;
for (int y = 0; y < 2000; y++)
for (int x = 0; x < 2500; x++)
{
*pxyz++ = x;
*pxyz++ = y;
*pxyz++ = 0;
}
// project coordinates of destination image,
// which generates the map from destination image to source image directly
xyz=xyz.reshape(0, 5000000);
Mat pts_dist(5000000, 1, CV_32FC2);
projectPoints(xyz, rvecs[0], tvecs[0], intrinsic, distCoeffs, pts_dist);
Mat maps[2];
pts_dist = pts_dist.reshape(0, 2000);
split(pts_dist, maps);
//apply map
remap(originalImage, skewedImage, maps[0], maps[1], INTER_LINEAR);

opencv function transpose() giving false results, memory leak?

I have a matrix of integer data type having dimensions 100 x 7000. I want to transpose it. I've used transpose() function from opencv library. but it gives the false results. Most of the values becomes floating point numbers and very high, which are not present in the original matrix. Here is my code
cv::Mat data; //data matrix with integer values, dimension is 100 x 7000
cv::Mat data_tp = cv::Mat(data.cols, data.rows, CV_32F);
cv::transpose(data, data_tp);
I think this might be the problem of memory leak or any sort of memory mismanagement. because this is just a part of a big code. Any tips regarding the memory management or anyone else faced this issue??
cv::Mat data; //data matrix with integer values, dimension is 100 x 7000
// here are 2 problems:
// - you never need to pre-allocate the result.
// - you try to transpose an int Mat into a float one.
cv::Mat data_tp = cv::Mat(data.cols, data.rows, CV_32F);
cv::transpose(data, data_tp);
// instead, just use:
cv::Mat data_tp = data.t();

Cumulative Homography Wrongly Scaling

I'm to build a panorama image of the ground covered by a downward facing camera (at a fixed height, around 1 metre above ground). This could potentially run to thousands of frames, so the Stitcher class' built in panorama method isn't really suitable - it's far too slow and memory hungry.
Instead I'm assuming the floor and motion is planar (not unreasonable here) and trying to build up a cumulative homography as I see each frame. That is, for each frame, I calculate the homography from the previous one to the new one. I then get the cumulative homography by multiplying that with the product of all previous homographies.
Let's say I get H01 between frames 0 and 1, then H12 between frames 1 and 2. To get the transformation to place frame 2 onto the mosaic, I need to get H01*H12. This continues as the frame count increases, such that I get H01*H12*H23*H34*H45*....
In code, this is something akin to:
cv::Mat previous, current;
// Init cumulative homography
cv::Mat cumulative_homography = cv::Mat::eye(3);
video_stream >> previous;
for(;;) {
video_stream >> current;
// Here I do some checking of the frame, etc
// Get the homography using my DenseMosaic class (using Farneback to get OF)
cv::Mat tmp_H = DenseMosaic::get_homography(previous,current);
// Now normalise the homography by its bottom right corner
tmp_H /= tmp_H.at<double>(2, 2);
cumulative_homography *= tmp_H;
previous = current.clone( );
}
It works pretty well, except that as the camera moves "up" in the viewpoint, the homography scale decreases. As it moves down, the scale increases again. This gives my panoramas a perspective type effect that I really don't want.
For example, this is taken on a few seconds of video moving forward then backward. The first frame looks ok:
The problem comes as we move forward a few frames:
Then when we come back again, you can see the frame gets bigger again:
I'm at a loss as to where this is coming from.
I'm using Farneback dense optical flow to calculate pixel-pixel correspondences as below (sparse feature matching doesn't work well on this data) and I've checked my flow vectors - they're generally very good, so it's not a tracking problem. I also tried switching the order of the inputs to find homography (in case I'd mixed up the frame numbers), still no better.
cv::calcOpticalFlowFarneback(grey_1, grey_2, flow_mat, 0.5, 6,50, 5, 7, 1.5, flags);
// Using the flow_mat optical flow map, populate grid point correspondences between images
std::vector<cv::Point2f> points_1, points_2;
median_motion = DenseMosaic::dense_flow_to_corresp(flow_mat, points_1, points_2);
cv::Mat H = cv::findHomography(cv::Mat(points_2), cv::Mat(points_1), CV_RANSAC, 1);
Another thing I thought it could be was the translation I include in the transformation to ensure my panorama is centred within the scene:
cv::warpPerspective(init.clone(), warped, translation*homography, init.size());
But having checked the values in the homography before the translation is applied, the scaling issue I mention is still present.
Any hints are gratefully received. There's a lot of code I could put in but it seems irrelevant, please do let me know if there's something missing
UPDATE
I've tried switching out the *= operator for the full multiplication and tried reversing the order the homographies are multiplied in, but no luck. Below is my code for calculating the homography:
/**
\brief Calculates the homography between the current and previous frames
*/
cv::Mat DenseMosaic::get_homography()
{
cv::Mat grey_1, grey_2; // Grayscale versions of frames
cv::cvtColor(prev, grey_1, CV_BGR2GRAY);
cv::cvtColor(cur, grey_2, CV_BGR2GRAY);
// Calculate the dense flow
int flags = cv::OPTFLOW_FARNEBACK_GAUSSIAN;
if (frame_number > 2) {
flags = flags | cv::OPTFLOW_USE_INITIAL_FLOW;
}
cv::calcOpticalFlowFarneback(grey_1, grey_2, flow_mat, 0.5, 6,50, 5, 7, 1.5, flags);
// Convert the flow map to point correspondences
std::vector<cv::Point2f> points_1, points_2;
median_motion = DenseMosaic::dense_flow_to_corresp(flow_mat, points_1, points_2);
// Use the correspondences to get the homography
cv::Mat H = cv::findHomography(cv::Mat(points_2), cv::Mat(points_1), CV_RANSAC, 1);
return H;
}
And this is the function I use to find the correspondences from the flow map:
/**
\brief Calculate pixel->pixel correspondences given a map of the optical flow across the image
\param[in] flow_mat Map of the optical flow across the image
\param[out] points_1 The set of points from #cur
\param[out] points_2 The set of points from #prev
\param[in] step_size The size of spaces between the grid lines
\return The median motion as a point
Uses a dense flow map (such as that created by cv::calcOpticalFlowFarneback) to obtain a set of point correspondences across a grid.
*/
cv::Point2f DenseMosaic::dense_flow_to_corresp(const cv::Mat &flow_mat, std::vector<cv::Point2f> &points_1, std::vector<cv::Point2f> &points_2, int step_size)
{
std::vector<double> tx, ty;
for (int y = 0; y < flow_mat.rows; y += step_size) {
for (int x = 0; x < flow_mat.cols; x += step_size) {
/* Flow is basically the delta between left and right points */
cv::Point2f flow = flow_mat.at<cv::Point2f>(y, x);
tx.push_back(flow.x);
ty.push_back(flow.y);
/* There's no need to calculate for every single point,
if there's not much change, just ignore it
*/
if (fabs(flow.x) < 0.1 && fabs(flow.y) < 0.1)
continue;
points_1.push_back(cv::Point2f(x, y));
points_2.push_back(cv::Point2f(x + flow.x, y + flow.y));
}
}
// I know this should be median, not mean, but it's only used for plotting the
// general motion direction so it's unimportant.
cv::Point2f t_median;
cv::Scalar mtx = cv::mean(tx);
t_median.x = mtx[0];
cv::Scalar mty = cv::mean(ty);
t_median.y = mty[0];
return t_median;
}
It turns out this was because my viewpoint was close to the features, meaning that the non-planarity of the tracked features was causing skew to the homography. I managed to prevent this (it's more of a hack than a method...) by using estimateRigidTransform instead of findHomography, as this does not estimate for perspective variations.
In this particular case, it makes sense to do so, as the view does only ever undergo rigid transformations.