C++ OpenCV use vector<Point> as index of a matrix

C++ OpenCV use vector<Point> as index of a matrix - c++

I have a matrix img (480*640 pixel, float 64 bits) on which I apply a complex mask. After this, I need to multiply my matrix by a value but in order to win time I want to do this multiplication only on the non-zero elements because for now the multiplication is too long because I have to iterate the operation 2000 times on 2000 different matrix but with the same mask. So I found the index (on x/y axes) of the nonzero pixels which I keep in a vector of Point. But I don't succeed to use this vector to do the multplication only on the pixels indexed in this same vector.
Here is an example (with a simple mask) to understand my problem :
Mat img_temp(480, 640, CV_64FC1);
Mat img = img_temp.clone();
Mat mask = Mat::ones(img.size(), CV_8UC1);
double value = 3.56;
// Apply mask
img_temp.copyTo(img, mask);
// Finding non zero elements
vector<Point> nonZero;
findNonZero(img, nonZero);
// Previous multiplication (long because on all pixels)
Mat result = img.clone()*value;
// What I wish to do : multiplication only on non-zero pixels (not functional)
Mat result = Mat::zeros(img.size(), CV_64FC1);
result.at<int>(nonZero) = img.at(nonZero).clone() * value
What is tricky is that my pixels are not on a range (for example pixels 3, 4 and 50, 51 on a line).
Thank you in advance.

I would suggest using Mat.convertTo.
Basically, for the parameter alpha, which is the scaling factor, use the value of the mask (3.56 in your case). Make sure that the Mat is of type CV_32 or CV_64.
This will be faster than finding all non-zero pixels, saving their coordinates in a Vector and iterating (it was faster for me in Java).
Hope it helps!

Constructing vector of points will also increase computation time. I think you should consider iterating over all pixels and multiply if the pixel is not equal to zero.
Iterating will be faster if you have the matrix as raw data.

If you do
Mat result = img*value;
Instead of
Mat result = img.clone()*value;
The speed will be almost 10 times as fast
I have also tested your suggestion with vector but this is even slower than your first solution.
Below the code I used to test your firs suggestion
cv::Mat multMask(cv::Mat &img, std::vector<cv::Point> mask, double fact)
{
if (img.type() != CV_64FC1) throw "invalid format";
cv::Mat res = cv::Mat::zeros(img.size(), img.type());
int iLen = (int)mask.size();
for (int i = 0; i < iLen; i++)
{
cv::Point &p = mask[i];
((double*)(res.data + res.step.p[0] * p.y))[p.x] = ((double*)(img.data + img.step.p[0] * p.y))[p.x] * fact;
}
return res;
}

Related

OpenCV - RGB Channels in Float Data Type and Intensity Range within 0-255

How can I achieve the values of the RGB channels as
Float data type
Intensity range within 0-255
I used CV_32FC4 as the matrix type since I'll perform floating-point mathematical operations to implement Daltonization. I was expecting that the intensity range is the same with the intensity range of the RGB Channels in CV_8UC3, just having a different data type. But when I printed the matrix I noticed that the intensities of the channels are not within 0-255. I realized that it due to the range of the float matrix type.
Mat mFrame(height, width, CV_32FC4, (unsigned char *)pNV21FrameData);
for(int y = 0 ; y < height ; y++){
for(int x = 0 ; x < width ; x++){
Vec4f BGRA = mFrame.at<Vec4f>(y,x);
// Algorithm Implementation
mFrame.at<Vec4f>(y,x) = BGRA;
}
}
Mat mResult;
mFrame.convertTo(mResult, CV_8UC4, 1.0/255.0);
I need to manipulate the pixels like BGRA[0] = BGRA[0] * n; then assign it back to the matrix.

By your comments and the link in it I see that the data comes in BGRA. The data is in uchar.
I assume this from this line:
Mat mResult(height, width, CV_8UC4, (unsigned char *)poutPixels);
To solve this you can create the matrix and then convert it to float.
Mat mFrame(height, width, CV_8UC4, (unsigned char *)pNV21FrameData);
Mat mFloatFrame;
mFrame.convertTo(mFloatFrame, CV_32FC4);
Notice that this will keep the current ranges (0-255) if you need another one (like 0-1) you may put the scaling factor.
Finally you can convert back, but beware that this function does saturate_cast. If you have an specific way you want to manage the overflow or the decimals, you will have to do it before converting it.
Mat mResult;
mFloatFrame.convertTo(mResult, CV_8UC4);
Note that 1.0/255.0 is not there, since the data is already in the range of 0-255 (at least before the operations).
One final comment, the link in your comments use IplImage and other old C (deprecated) versions of OpenCv. If you are working in c++, stick to the c++ versions like Mat. This is not in the code you show here, but in the you linked. This comment is more for you to avoid future headaches.

Replace a chain of image blurs with one blur

In this question I asked how to implement a chain of blurs in one single step.
Then I found out from the gaussian blur page of Wikipedia that:
Applying multiple, successive gaussian blurs to an image has the same
effect as applying a single, larger gaussian blur, whose radius is the
square root of the sum of the squares of the blur radii that were
actually applied. For example, applying successive gaussian blurs with
radii of 6 and 8 gives the same results as applying a single gaussian
blur of radius 10, since sqrt {6^{2}+8^{2}}=10.
So I thought that blur and singleBlur were the same in the following code:
cv::Mat firstLevel;
float sigma1, sigma2;
//intialize firstLevel, sigma1 and sigma2
cv::Mat blur = gaussianBlur(firstLevel, sigma1);
blur = gaussianBlur(blur, sigma2);
float singleSigma = std::sqrt(std::pow(sigma1,2)+std::pow(sigma2,2));
cv::Mat singleBlur = gaussianBlur(firstLevel, singleSigma);
cv::Mat diff = blur != singleBLur;
// Equal if no elements disagree
assert( cv::countNonZero(diff) == 0);
But this assert fails (and actually, for example, the first row of blur is different from the first one of singleBlur).
Why?
UPDATE:
After different comments asking for more information, I'll update the answer.
What I'm trying to do is to parallelize this code. In particular, I'm focusing now on computing all the blurs at all levels in advance. The serial code (which works correctly) is the following:
vector<Mat> blurs ((par.numberOfScales+3)*levels, Mat());
cv::Mat octaveLayer = firstLevel;
int scaleCycles = par.numberOfScales+2;
//compute blurs at all layers (not parallelizable)
for(int i=0; i<levels; i++){
blurs[i*scaleCycles+1] = octaveLayer.clone();
for (int j = 1; j < scaleCycles; j++){
float sigma = par.sigmas[j]* sqrt(sigmaStep * sigmaStep - 1.0f);
blurs[j+1+i*scaleCycles] = gaussianBlur(blurs[j+i*scaleCycles], sigma);
if(j == par.numberOfScales)
octaveLayer = halfImage(blurs[j+1+i*scaleCycles]);
}
}
Where:
Mat halfImage(const Mat &input)
{
Mat n(input.rows/2, input.cols/2, input.type());
float *out = n.ptr<float>(0);
for (int r = 0, ri = 0; r < n.rows; r++, ri += 2)
for (int c = 0, ci = 0; c < n.cols; c++, ci += 2)
*out++ = input.at<float>(ri,ci);
return n;
}
Mat gaussianBlur(const Mat input, const float sigma)
{
Mat ret(input.rows, input.cols, input.type());
int size = (int)(2.0 * 3.0 * sigma + 1.0); if (size % 2 == 0) size++;
GaussianBlur(input, ret, Size(size, size), sigma, sigma, BORDER_REPLICATE);
return ret;
}
I'm sorry for the horrible indexes above, but I tried to respect the original code system (which is horrible, like starting counting from 1 instead of 0). The code above has scaleCycles=5 and levels=6, so 30 blurs are generated in total.
This is the "single blur" version, where first I compute the sigmas for each blur that has to be computed (following Wikipedia's formula) and then I apply the blur (notice that this is still serial and not parallelizable):
vector<Mat> singleBlurs ((par.numberOfScales+3)*levels, Mat());
vector<float> singleSigmas(scaleCycles);
float acc = 0;
for (int j = 1; j < scaleCycles; j++){
float sigma = par.sigmas[j]* sqrt(sigmaStep * sigmaStep - 1.0f);
acc += pow(sigma, 2);
singleSigmas[j] = sqrt(acc);
}
octaveLayer = firstLevel;
for(int i=0; i<levels; i++){
singleBlurs[i*scaleCycles+1] = octaveLayer.clone();
for (int j = 1; j < scaleCycles; j++){
float sigma = singleSigmas[j];
std::cout<<"j="<<j<<" sigma="<<sigma<<std::endl;
singleBlurs[j+1+i*scaleCycles] = gaussianBlur(singleBlurs[j+i*scaleCycles], sigma);
if(j == par.numberOfScales)
octaveLayer = halfImage(singleBlurs[j+1+i*scaleCycles]);
}
}
Of course the code above generates 30 blurs also with the same parameters of the previous version.
And then this is the code to see the difference between each signgleBlurs and blurs:
assert(blurs.size() == singleBlurs.size());
vector<Mat> blurDiffs(blurs.size());
for(int i=1; i<levels*scaleCycles; i++){
cv::Mat diff;
absdiff(blurs[i], singleBlurs[i], diff);
std::cout<<"i="<<i<<"diff rows="<<diff.rows<<" cols="<<diff.cols<<std::endl;
blurDiffs[i] = diff;
std::cout<<"blurs rows="<<blurs[i].rows<<" cols="<<blurs[i].cols<<std::endl;
std::cout<<"singleBlurs rows="<<singleBlurs[i].rows<<" cols="<<singleBlurs[i].cols<<std::endl;
std::cout<<"blurDiffs rows="<<blurDiffs[i].rows<<" cols="<<blurDiffs[i].cols<<std::endl;
namedWindow( "blueDiffs["+std::to_string(i)+"]", WINDOW_AUTOSIZE );// Create a window for display.
//imshow( "blueDiffs["+std::to_string(i)+"]", blurDiffs[i] ); // Show our image inside it.
//waitKey(0); // Wait for a keystroke in the window
Mat imageF_8UC3;
std::cout<<"converting..."<<std::endl;
blurDiffs[i].convertTo(imageF_8UC3, CV_8U, 255);
std::cout<<"converted"<<std::endl;
imwrite( "blurDiffs_"+std::to_string(i)+".jpg", imageF_8UC3);
}
Now, what I've seen is that blurDiffs_1.jpg and blurDiffs_2.jpg are black, but suddendly from blurDiffs_3.jpg until the blurDiffs_29.jpg becomes whiter and whiter. For some reason, blurDiffs_30.jpg is almost completely black.
The first (correct) version generates 1761 descriptors. The second (uncorrect) version generates >2.3k descriptors.
I can't post the blurDiffs matrices because (especially the first ones) are very big and post's space is limited. I'll post some samples. I'll not post blurDiffs_1.jpg and blurDiffs_2.jpg because they're totally blacks. Notice that because of halfImage the images become smaller and smaller (as expected).
blurDiffs_3.jpg:
blurDiffs_6.jpg:
blurDiffs_15.jpg:
blurDiffs_29.jpg:
How the image is read:
Mat tmp = imread(argv[1]);
Mat image(tmp.rows, tmp.cols, CV_32FC1, Scalar(0));
float *out = image.ptr<float>(0);
unsigned char *in = tmp.ptr<unsigned char>(0);
for (size_t i=tmp.rows*tmp.cols; i > 0; i--)
{
*out = (float(in[0]) + in[1] + in[2])/3.0f;
out++;
in+=3;
}
Someone here suggested to divide diff by 255 to see the real difference, but I don't understand why of I understood him correctly.
If you need any more details, please let me know.

A warning up front: I have no experience with OpenCV, but the following is relevant to computing Gaussian blur in general. And it is applicable as I threw a glance at the OpenCV documentation wrt border treatment and use of finite kernels (FIR filtering).
As an aside: your initial test was sensitive to round-off, but you have cleared up that issue and shown the errors to be much larger.
Beware image border effects. For pixels near the edge, the image is virtually extended using one of the offered methods (BORDER_DEFAULT, BORDER_REPLICATE, etc...). If your image is |abcd| and you use BORDER_REPLICATE you are effectively filtering an extended image aaaa|abcd|dddd. The result is klmn|opqr|stuv. There are new pixel values (k,l,m,n,s,t,u,v) that are immediately discarded to yield the output image |opqr|. If you now apply another Gaussian blur, this blur will operate on a newly extended image oooo|opqr|rrrr - different from the "true" intermediate result and thus giving you a result different from that obtained by a single Gaussian blur with a larger sigma. These extension methods are safe though: REFLECT, REFLECT_101, WRAP.
Using a finite kernel size the G(s1)*G(s2)=G(sqrt(s1^2+s2^2)) rule does not hold in general due to the tails of the kernel being cut off. You can reduce the error thus introduced by increasing the kernel size relative to the sigma, e.g.:
int size = (int)(2.0 * 10.0 * sigma + 1.0); if (size % 2 == 0) size++;
Point 3 seems to be the issue that is "biting" you. Do you really care whether the G(s1)*G(s2) property is preserved. Both results are wrong in a way. Does it affect the methodology that works on the result in a major way? Note that the example of using 10x sigma I have given may resolve the difference, but will be very much slower.
Update: I forgot to add what might the most practical resolution. Compute the Gaussian blur using a Fourier transform. The scheme would be:
Compute Fourier transform (FFT) of your input image
Multiply with the Fourier transform of the Gaussian kernel and compute inverse Fourier transform. Ignore the imaginary part of the complex output.
You can find the equation for the Gaussian in the frequency domain on wikipedia
You can perform the second step separately (i.e. in parallel) for each scale (sigma). The border condition implied computing the blur this way is BORDER_WRAP. If you prefer you can achieve the same but with BORDER_REFLECT if you use a discrete cosine transform (DCT) instead. Do not know if OpenCV provides one. You will be after the DCT-II

It's basically as what G.M. says. Remember you are not only rounding by floating points, you are also rounding by looking only at integer points (both on the image and on the Gaussian kernels).
Here's what I got from a small (41x41) image:
where blur and single are rounded by convertTo(...,CV8U) and diff is where they are different. So, in DSP terms, it may not be a great agreement. But in Image Processing, it's not that bad.
Also, I suspect that the different will be less significant as you perform Gaussian on bigger images.

Why does StereoSGBM give negative and above numberOfDisparities numbers

I'm writing a function in OpenCV to compute v and u-disparities, so I need first the disparity image. I set sgbm.minDisparity = 0 and numberOfDisparities = 160.
The disparity image is CV_16SC1, and I need Unsigned values to go on programming my function.
I printed the whole Mat and there are negative values and values above 160. If I understood well the documentation, the disparity image represents the disparity values*16. Does that mean that the maximum value is 16*160 in my case?. If not, what could be wrong?. And anyway, why there are values less than zero when minDisparity is set to 0? Here's the code:
void Stereo_SGBM(){
int numberOfDisparities;
StereoSGBM sgbm;
Mat img1, img2;
img1=left_frame; //left and right frames are global variables
img2=right_frame;
Size img_size = img1.size();
//I make sure the number of disparities is divisible by 16
numberOfDisparities = 160;
int cn=1; //Grayscale
sgbm.preFilterCap = 63;
sgbm.SADWindowSize = 3;
sgbm.P1 = 8*cn*sgbm.SADWindowSize*sgbm.SADWindowSize;
sgbm.P2 = 32*cn*sgbm.SADWindowSize*sgbm.SADWindowSize;
sgbm.minDisparity = 0;
sgbm.numberOfDisparities = numberOfDisparities;
sgbm.uniquenessRatio = 10;
sgbm.speckleWindowSize = 100;
sgbm.speckleRange = 2;
sgbm.disp12MaxDiff = 1;
sgbm.fullDP = false;
Mat disp; // CV_16SC1
Mat disp8; //CV_8UC1 (used later in the code
sgbm(img1, img2, disp);
//disp contains negative values and larger than 160!!!
//img1 and img2 are left and right channels of size 1242x375 grayscale
}

The way I see it, the disparity is meant to be a float, and it reflects on the parameters. If you convert the result to float, and divide by 16, things makes a little more sense:
The algorithm apparently reports -1 (actually minDisparity - 1) where it could not match. And numberOfDisparities is more "max disparity - min disparity", rather than an actual number of values.
For example, if you give minDisparity=2 and numberOfDisparities=144, you will get results in the range: 1.0 - 145.0. The number of different values will actually be 144*16 because it goes in 1/16 increments.
So yes, in your case, using integers, this means you will get 16*160 max value.

How to combine two remap() operations into one?

I have a tight loop, where I get a camera image, undistort it and also transform it according to some transformation (e.g. a perspective transform). I already figured out to use cv::remap(...) for each operation, which is already much more efficient than using plain matrix operations.
In my understanding it should be possible to combine the lookup maps into one and call remap just once in every loop iteration. Is there a canonical way to do this? I would prefer not to implement all the interpolation stuff myself.
Note: The procedure should work with differently sized maps. In my particular case the undistortion preserves the image dimensions, while the other transformation scales the image to a different size.
Code for illustration:
// input arguments
const cv::Mat_<math::flt> intrinsic = getIntrinsic();
const cv::Mat_<math::flt> distortion = getDistortion();
const cv::Mat mNewCameraMatrix = cv::getOptimalNewCameraMatrix(intrinsic, distortion, myImageSize, 0);
// output arguments
cv::Mat undistortMapX;
cv::Mat undistortMapY;
// computes undistortion maps
cv::initUndistortRectifyMap(intrinsic, distortion, cv::Mat(),
newCameraMatrix, myImageSize, CV_16SC2,
undistortMapX, undistortMapY);
// computes undistortion maps
// ...computation of mapX and mapY omitted
cv::convertMaps(mapX, mapY, skewMapX, skewMapY, CV_16SC2);
for(;;) {
cv::Mat originalImage = getNewImage();
cv::Mat undistortedImage;
cv::remap(originalImage, undistortedImage, undistortMapX, undistortMapY, cv::INTER_LINEAR);
cv::Mat skewedImage;
cv::remap(undistortedImage, skewedImage, skewMapX, skewMapY, cv::INTER_LINEAR);
outputImage(skewedImage);
}

You can apply remap on undistortMapX and undistortMapY.
cv::remap(undistortMapX, undistrtSkewX, skewMapX, skewMapY, cv::INTER_LINEAR);
cv::remap(undistortMapY, undistrtSkewY, skewMapX, skewMapY, cv::INTER_LINEAR);
Than you can use:
cv::remap(originalImage , skewedImage, undistrtSkewX, undistrtSkewY, cv::INTER_LINEAR);
It works because skewMaps and undistortMaps are arrays of coordinates in image, so it should be similar to taking location of location...
Edit (answer to comments):
I think I need to make some clarification. remap() function calculates pixels in new image from pixels of old image. In case of linear interpolation each pixel in new image is a weighted average of 4 pixels from the old image. The weights differ from pixel to pixel according to values from provided maps. If the value is more or less integer, then most of the weight is taken from single pixel. As a result new image will be as sharp is original image. On the other hand, if the value is far from being integer (i.e. integer + 0.5) then the weights are similar. This will create smoothing effect. To get a feeling of what I am talking about, look at the undistorted image. You will see that some parts of the image are sharper/smoother than other parts.
Now back to the explanation about what happened when you combined two remap operations into one. The coordinates in combined maps are correct, i.e. pixel in skewedImage is calculated from correct 4 pixels of originalImage with correct weights. But it is not identical to result of two remap operations. Each pixel in undistortedImage is a weighted average of 4 pixels from originalImage. This means that each pixel of skewedImage would be a weighted average of 9-16 pixels from orginalImage. Conclusion: using single remap() can NOT possibly give result that is identical to two usages of remap().
Discussion about which of the two possible images (single remap() vs double remap()) is better is quite complicated. Normally it is good to make as little interpolations as possible, because each interpolation introduces different artifacts. Especially if the artifacts are not uniform in the image (some regions became more smooth than others). In some cases those artifacts may have good visual effect on the image - like reducing some of the jitter. But if this is what you want, you can achieve this in cheaper and more consistent ways. For example by smoothing original image prior to remaping.

In the case of two general mappings, there is no choice but to use the approach suggested by #MichaelBurdinov.
However, in the special case of two mappings with known inverse mappings, an alternative approach is to compute the maps manually. This manual approach is more accurate than the double remap one, since it does not involve interpolation of coordinate maps.
In practice, most of the interesting applications match this special case. It does too in your case because your first map corresponds to image undistortion (whose inverse operation is image distortion, which is associated to a well known analytical model) and your second map corresponds to a perspective transform (whose inverse can be expressed analytically).
Computing the maps manually is actually quite easy. As stated in the documentation (link) these maps contain, for each pixel in the destination image, the (x,y) coordinates where to find the appropriate intensity in the source image. The following code snippet shows how to compute the maps manually in your case:
int dst_width=...,dst_height=...; // Initialize the size of the output image
cv::Mat Hinv=H.inv(), Kinv=K.inv(); // Precompute the inverse perspective matrix and the inverse camera matrix
cv::Mat map_undist_warped_x32f(dst_height,dst_width,CV_32F); // Allocate the x map to the correct size (n.b. the data type used is float)
cv::Mat map_undist_warped_y32f(dst_height,dst_width,CV_32F); // Allocate the y map to the correct size (n.b. the data type used is float)
// Loop on the rows of the output image
for(int y=0; y<dst_height; ++y) {
std::vector<cv::Point3f> pts_undist_norm(dst_width);
// For each pixel on the current row, first use the inverse perspective mapping, then multiply by the
// inverse camera matrix (i.e. map from pixels to normalized coordinates to prepare use of projectPoints function)
for(int x=0; x<dst_width; ++x) {
cv::Mat_<float> pt(3,1); pt << x,y,1;
pt = Kinv*Hinv*pt;
pts_undist_norm[x].x = pt(0)/pt(2);
pts_undist_norm[x].y = pt(1)/pt(2);
pts_undist_norm[x].z = 1;
}
// For each pixel on the current row, compose with the inverse undistortion mapping (i.e. the distortion
// mapping) using projectPoints function
std::vector<cv::Point2f> pts_dist;
cv::projectPoints(pts_undist_norm,cv::Mat::zeros(3,1,CV_32F),cv::Mat::zeros(3,1,CV_32F),intrinsic,distortion,pts_dist);
// Store the result in the appropriate pixel of the output maps
for(int x=0; x<dst_width; ++x) {
map_undist_warped_x32f.at<float>(y,x) = pts_dist[x].x;
map_undist_warped_y32f.at<float>(y,x) = pts_dist[x].y;
}
}
// Finally, convert the float maps to signed-integer maps for best efficiency of the remap function
cv::Mat map_undist_warped_x16s,map_undist_warped_y16s;
cv::convertMaps(map_undist_warped_x32f,map_undist_warped_y32f,map_undist_warped_x16s,map_undist_warped_y16s,CV_16SC2);
Note: H above is your perspective transform while Kshould be the camera matrix associated with the undistorted image, so it should be what in your code is called newCameraMatrix (which BTW is not an output argument of initUndistortRectifyMap). Depending on your specific data, there might also be some additional cases to handle (e.g. division by pt(2) when it might be zero, etc).

I found this question when looking to combine dewarping (undistortion) and projection tranforms in python, but there is no direct python answer.
Here is an direct conversion of BConic's answer in python
import numpy as np
import cv2
dst_width = ...
dst_height = ...
h_inv = np.linalg.inv(h)
k_inv = np.linalg.inv(new_camera_matrix)
map_x = np.zeros((dst_height, dst_width), dtype=np.float32)
map_y = np.zeros((dst_height, dst_width), dtype=np.float32)
for y in range(dst_height):
pts_undist_norm = np.zeros((dst_width, 3, 1))
for x in range(dst_width):
pt = np.array([x, y, 1]).reshape(3,1)
pt2 = k_inv # h_inv # pt
pts_undist_norm[x][0] = pt2[0]/pt2[2]
pts_undist_norm[x][1] = pt2[1]/pt2[2]
pts_undist_norm[x][2] = 1
r_vec = np.zeros((3,1))
t_vec = np.zeros((3,1))
pts_dist, _ = cv2.projectPoints(pts_undist_norm, r_vec, t_vec, intrinsic, distortion)
pts_dist = pts_dist.squeeze()
for x2 in range(dst_width):
map_x[y][x2] = pts_dist[x2][0]
map_y[y][x2] = pts_dist[x2][1]
# using CV_16SC2 introduced substantial image artifacts for me
map_x_final, map_y_final = cv2.convertMaps(map_x, map_y, cv2.CV_32FC1, cv2.CV_32FC1)
This is obviously really slow since it is using a double for loop and iterating through every pixel, so you can do it much faster using numpy. You should be able to do something similar in C++ to eliminate the for loops and do a single matrix multiplication.
import numpy as np
import cv2
dst_width = ...
dst_height = ...
h_inv = np.linalg.inv(h)
k_inv = np.linalg.inv(new_camera_matrix)
m_grid = np.mgrid[0:dst_width, 0:dst_height].reshape(2, dst_height*dst_width)
m_grid = np.insert(m_grid, 2, 1, axis=0)
m_grid_result = k_inv # h_inv # m_grid
pts_undist_norm = m_grid_result[:2, :] / m_grid_result[2, :]
pts_undist_norm = np.insert(pts_undist_norm, 2, 1, axis=0)
r_vec = np.zeros((3,1))
t_vec = np.zeros((3,1))
pts_dist = cv2.projectPoints(pts_undist_norm, r_vec, t_vec, intrinsic, distortion)
pts_dist = pts_dist.squeeze().astype(np.float32)
map_x = pts_dist[:, 0].reshape(dst_width, dst_height).swapaxes(0,1)
map_y = pts_dist[:, 1].reshape(dst_width, dst_height).swapaxes(0,1)
# using CV_16SC2 introduced substantial image artifacts for me
map_x_final, map_y_final = cv2.convertMaps(map_x, map_y, cv2.CV_32FC1, cv2.CV_32FC1)
This numpy implementation is roughly 25-75x faster than the first method.

I came across the same problem. I tried to implement AldurDisciple's answer. Instead of calculating transformation in a loop. I'm having a mat with mat.at <Vec2f>(x,y)=Vec2f(x,y) and applying perspectiveTransform to this mat. Add a 3rd channel of "1" to the result mat and apply projectPoints.
Here is my code
Mat xy(2000, 2500, CV_32FC2);
float *pxy = (float*)xy.data;
for (int y = 0; y < 2000; y++)
for (int x = 0; x < 2500; x++)
{
*pxy++ = x;
*pxy++ = y;
}
// perspective transformation of coordinates of destination image,
// which generates the map from destination image to norm points
Mat pts_undist_norm(2000, 2500, CV_32FC2);
Mat matPerspective =transRot3x3;
perspectiveTransform(xy, pts_undist_norm, matPerspective);
//add 3rd channel of 1
vector<Mat> channels;
split(pts_undist_norm, channels);
Mat channel3(2000, 2500, CV_32FC1, cv::Scalar(float(1.0)));
channels.push_back(channel3);
Mat pts_undist_norm_3D(2000, 2500, CV_32FC3);
merge(channels, pts_undist_norm_3D);
//projectPoints to extend the map from norm points back to the original captured image
pts_undist_norm_3D = pts_undist_norm_3D.reshape(0, 5000000);
Mat pts_dist(5000000, 1, CV_32FC2);
projectPoints(pts_undist_norm_3D, Mat::zeros(3, 1, CV_64F), Mat::zeros(3, 1, CV_64F), intrinsic, distCoeffs, pts_dist);
Mat maps[2];
pts_dist = pts_dist.reshape(0, 2000);
split(pts_dist, maps);
// apply map
remap(originalImage, skewedImage, maps[0], maps[1], INTER_LINEAR);
The transformation matrix used to map to norm points is a bit different from the one used in AldurDisciple's answer. transRot3x3 is composed from tvec and rvec generated by calibrateCamera.
double transData[] = { 0, 0, tvecs[0].at<double>(0), 0, 0,
tvecs[0].at<double>(1), 0, 0, tvecs[0].at<double>(2) };
Mat translate3x3(3, 3, CV_64F, transData);
Mat rotation3x3;
Rodrigues(rvecs[0], rotation3x3);
Mat transRot3x3(3, 3, CV_64F);
rotation3x3.col(0).copyTo(transRot3x3.col(0));
rotation3x3.col(1).copyTo(transRot3x3.col(1));
translate3x3.col(2).copyTo(transRot3x3.col(2));
Added:
I realized if the only needed map is the final map why not just use projectPoints to a mat with mat.at(x,y)=Vec2f(x,y,0) .
//generate a 3-channel mat with each entry containing it's own coordinates
Mat xyz(2000, 2500, CV_32FC3);
float *pxyz = (float*)xyz.data;
for (int y = 0; y < 2000; y++)
for (int x = 0; x < 2500; x++)
{
*pxyz++ = x;
*pxyz++ = y;
*pxyz++ = 0;
}
// project coordinates of destination image,
// which generates the map from destination image to source image directly
xyz=xyz.reshape(0, 5000000);
Mat pts_dist(5000000, 1, CV_32FC2);
projectPoints(xyz, rvecs[0], tvecs[0], intrinsic, distCoeffs, pts_dist);
Mat maps[2];
pts_dist = pts_dist.reshape(0, 2000);
split(pts_dist, maps);
//apply map
remap(originalImage, skewedImage, maps[0], maps[1], INTER_LINEAR);

Assign 3x1 mat to 3 channels mat

This question is continuance from my question in this link. After i get mat matrix, the 3x1 matrix is multiplied with 3x3 mat matrix.
for (int i = 0; i < im.rows; i++)
{
for (int j = 0; j < im.cols; j++)
{
for (int k = 0; k < nChannels; k++)
{
zay(k) = im.at<Vec3b>(i, j)[k]; // get pixel value and assigned to Vec4b zay
}
//convert to mat, so i can easily multiplied it
mat.at <double>(0, 0) = zay[0];
mat.at <double>(1, 0) = zay[1];
mat.at <double>(2, 0) = zay[2];
We get 3x1 mat matrix and do multiplication with the filter.
multiply= Filter*mat;
And i get mat matrix 3x1. I want to assign the value into my new 3 channels mat matrix, how to do that? I want to construct an images using this operation. I'm not use convolution function, because i think the result is different. I'm working in c++, and i want to change the coloured images to another color using matrix multiplication. I get the algorithm from this paper. In that paper, we need to multiplied several matrix to get the result.

OpenCV gives you a reshape function to change the number of channels/rows/columns implicitly:
http://docs.opencv.org/modules/core/doc/basic_structures.html#mat-reshape
This is very efficient since no data is copied, only the matrix header is changed.
try:
cv::Mat mat3Channels = mat.reshape(3,1);
Didn't test it, but should work. It should give you a 1x1 matrix with 3 channel element (Vec3d) if you want a Vec3b element instead, you have to convert it:
cv::Mat mat3ChannelsVec3b;
mat3Channels.convertTo(mat3ChannelsVec3b, CV_8UC3);
If you just want to write your mat back, it might be better to create a single Vec3b element instead:
cv::Vec3b element3Channels;
element3Channels[0] = multiply.at<double>(0,0);
element3Channels[1] = multiply.at<double>(1,0);
element3Channels[2] = multiply.at<double>(2,0);
But care in all cases, that Vec3b elements can't save values < 0 and > 255
Edit: After reading your question again, you ask how to assign...
I guess you have another matrix:
cv::Mat outputMatrix = cv::Mat(im.rows, im.cols, CV_8UC3, cv::Scalar(0,0,0));
Now to assign multiply to the element in outputMatrix you ca do:
cv::Vec3b element3Channels;
element3Channels[0] = multiply.at<double>(0,0);
element3Channels[1] = multiply.at<double>(1,0);
element3Channels[2] = multiply.at<double>(2,0);
outputMatrix.at<Vec3b>(i, j) = element3Channels;
If you need alpha channel too, you can adapt that easily.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js