How to get a depth image from sparse depth data? - c++

I am currently working on a problem where I have created an uint16 image of type CV_16UC1 based on Velodyne data where lets say 98% of the pixels are black (value 0) and the remaining pixels have the metric depth information (distance to that point). These pixels correspond to the velodyne points from the cloud.
cv::Mat depthMat = cv::Mat::zeros(frame.size(), CV_16UC1);
depthMat = ... //here the matrice is filled
If I try to display this image I get this:
On the image you can see that the brightest(white) pixels correspond to the pixels with biggest depth.From this I need to get a denser depth image or smth that would resemble a proper depth image like in the example shown on this video:
https://www.youtube.com/watch?v=4yZ4JGgLE0I
This would require proper interpolation and extrapolation of those points (the pixels of the 2D image) and it is here is where I am stuck. I am a beginner when it comes to interpolation techniques. Does anyone know how this can be done or at least can point me to a working solution or example algorithm for creating a depth map from sparse data?
I tried the following from the Kinect examples but it did not change the output:
depthMat.convertTo(depthf, CV_8UC1, 255.0/65535);
const unsigned char noDepth = 255;
cv::Mat small_depthf, temp, temp2;
cv::resize(depthf, small_depthf, cv::Size(), 0.01, 0.01);
cv::inpaint(small_depthf, (small_depthf == noDepth), temp, 5.0, cv::INPAINT_TELEA);
cv::resize(temp, temp2, depthf.size());
temp2.copyTo(depthf, (depthf == noDepth));
cv::imshow("window",depthf);
cv::waitKey(3);

I managed to get the desired output(something that resembles a depth image) by simply using dilation on the sparse depth image:
cv::Mat result;
dilate(depthMat, result, cv::Mat(), cv::Point(-1, -1), 10, 1, 1);

Related

Correct kernel for blur filter

I am attempting to use my own kernel to blur an image (for educational purposes). But my kernel just makes my whole image white. Is my blur kernel correct? I believe the proper name of the blur filter I am trying to apply is a normalised blur.
void blur_img(const Mat& src, Mat& output) {
// src is a 1 channel CV_8UC1
float kdata[] = { 0.0625f, 0.125f, 0.0625f, 0.125f, 0.25f, 0.125f, 0.0625f, 0.125f, 0.0625f };
//float kdata[] = { -1,-1,-1, -1,8,-1, -1,-1,-1}; // outline filter works fine
Mat kernel(3, 3, CV_32F, kdata);
// results in output being a completely white image
filter2D(src, output, CV_32F, kernel);
}
Your image is not white, is float. I am sure you that you are displaying the image with imshow in a later place, and it looks all white. This is explained in the imshow documentation. Specifically:
If the image is 32-bit floating-point, the pixel values are multiplied
by 255. That is, the value range [0,1] is mapped to [0,255].
This means that if it is float it has to be [0,1] values to be displayed correctly.
Now, that we know what cause it, lets see how to solve it. I can think of 3 possible ways:
1) normalize the image to [0,1]
cv::Mat dst;
cv::normalize(outputFromBlur, dst, 0, 1, cv::NORM_MINMAX);
This function normalizes the values, so it may shift the colors... this is not the best one for known images, but rather for depth maps or other matrices with values of unknown colors.
2) covertTo uchar:
cv::Mat dst;
outputFromBlur.convertTo(dst, CV_8U);
This function does saturate_cast, so it may handle possible overflow/underflow.
3) use filter2D with another output depth:
cv::filter2D(src, output, -1, kernel);
With -1 the desire output will be of the same type of the source (I assume your source is CV_8U)
I hope this helps you, if not leave a comment.

opencv Mat CV_8UC1 type (uchar) to *unsigned short (*UINT16)

This is mainly a C++ variable/pointer handling/casting question.
I am trying to apply one of the openCV library image filters to a depth Image from the Kinect v2 SDK (16bit grayscale, values between 0 and 8092).
I want to do this after getting the depth image but BEFORE using the kinect SDK to do rgb-depth registration and conversion to a point cloud. Therefore I want the final filtered image/array to be of the same type as I received before filtering so I can pass it back to the Kinect SDK.
Initial code:
Get the kinect depth frame as a pointer
UINT nBufferSize = nDepthFrameHeight * nDepthFrameWidth;
hr = pDepthFrame->CopyFrameDataToArray(nBufferSize, pDepth);
create 2 matrices along with the conversion between the 16bit and 8bit(openCV works with 8bit greyscale)
Mat depthMat(height, width, CV_16UC1, depth); // from kinect
Mat depthf(height, width, CV_8UC1);
depthMat.convertTo(depthf, CV_8UC1, 255.0/2048.0);
imshow("original-depth", depthf);
const unsigned char noDepth = 0; // change to 255, if values no depth uses max value
Mat temp, temp2;
1 step - downsize for performance, use a smaller version of depth image
Mat small_depthf;
resize(depthf, small_depthf, Size(), 0.2, 0.2);
2 step - inpaint only the masked "unknown" pixels
cv::inpaint(small_depthf, (small_depthf == noDepth), temp, 5.0, INPAINT_TELEA);
3 step - upscale to original size and replace inpainted regions in original depth image
resize(temp, temp2, depthf.size());
temp2.copyTo(depthf, (depthf == noDepth)); // add to the original signal
imshow("depth-inpaint", depthf); // show results
Problematic Part:
When I try to reverse the process (even with loss of information for now)
cv::Mat newDepth(nDepthFrameHeight, nDepthFrameWidth, CV_16UC1);
depthf.convertTo(newDepth, CV_16UC1, 8092.0 / 255.0);
I have found no way to convert these cv::Mat types back to *ushort (*UINT16 in this case).
I have tried things like reinterpret_cast, depthf.data and depthf.ptr() but it keeps showing uchar when hovering over the final data, unless I force it like in the ptr case above, in which case it crashes.
Any ideas?
P.S.: Code works flawlessly if I don't try to filter the depth. Also, crash occurs when the SDK tries to map color and depth and tries to use pDepth in
pCoordinateMapper->MapColorFrameToDepthSpace(nDepthFrameWidth * nDepthFrameHeight, pDepth, nColorFrameWidth * nColorFrameHeight, (DepthSpacePoint*)pDepthSpacePoints);

OpenCV Object Detection (HOGDescriptor) on iOS

I'm trying to get the people detector provided by the OpenCV library running. So far I get decent performance on my iPhone 6 but the detection is super bad and almost never correct and I'm not really sure why this is since you can find example videos using the same default HOG descriptor with way better detection.
Here is the code:
- (void)processImage:(Mat&)image {
cv::Mat cvImg, result;
cvtColor(image, cvImg, COLOR_BGR2HSV);
cv::vector<cv::Rect> found, found_filtered;
hog.detectMultiScale(cvImg, found, 0, cv::Size(4,4), cv::Size(8,8), 1.5, 0);
size_t i;
for (i=0; i < found.size(); i++) {
cv::Rect r = found[i];
rectangle(image, r.tl(), r.br(), Scalar(0,255,0), 2);
}
}
The video input comes from the iPhone camera itself and "processImage:" is called for every frame. For the HOGDescriptor I use the default people detector:
_hog.setSVMDetector(cv::HOGDescriptor::getDefaultPeopleDetector());
I appreciate any help. :)
I'm new to openCV, so take this with a grain of salt:
The line cvtColor(image, cvImg, COLOR_BGR2HSV); converts the image from the BGR color space to the HSV color space. Essentially, it changes each pixel from being represented by how much blue, green, and red it has, to being represented by the components hue (color), saturation (how much color) and value (how bright). Clearly, the hogDescriptor acts on a BGR image, not an HSV image. You need to pass it a type CV_8UC3 image: An image with 3 channels per pixel (C3), ex. BGR, and an 8bit unsigned number for each channel (8U), This part is less important. What are you passing into the method processImage()? It should be one of those types. If not, you need to know the type and convert it to CV_8UC3 using the cvtColor() method

OpenCV keep background transparent during warpAffine

I create a Bird-View-Image with the warpPerspective()-function like this:
warpPerspective(frame, result, H, result.size(), CV_WARP_INVERSE_MAP, BORDER_TRANSPARENT);
The result looks very good and also the border is transparent:
Bird-View-Image
Now I want to put this image on top of another image "out". I try doing this with the function warpAffine like this:
warpAffine(result, out, M, out.size(), CV_INTER_LINEAR, BORDER_TRANSPARENT);
I also converted "out" to a four channel image with alpha channel according to a question which was already asked on stackoverflow:
Convert Image
This is the code: cvtColor(out, out, CV_BGR2BGRA);
I expected to see the chessboard but not the gray background. But in fact, my result looks like this:
Result Image
What am I doing wrong? Do I forget something to do? Is there another way to solve my problem? Any help is appreciated :)
Thanks!
Best regards
DamBedEi
I hope there is a better way, but here it is something you could do:
Do warpaffine normally (without the transparency thing)
Find the contour that encloses the image warped
Use this contour for creating a mask (white values inside the image warped, blacks in the borders)
Use this mask for copy the image warped into the other image
Sample code:
// load images
cv::Mat image2 = cv::imread("lena.png");
cv::Mat image = cv::imread("IKnowOpencv.jpg");
cv::resize(image, image, image2.size());
// perform warp perspective
std::vector<cv::Point2f> prev;
prev.push_back(cv::Point2f(-30,-60));
prev.push_back(cv::Point2f(image.cols+50,-50));
prev.push_back(cv::Point2f(image.cols+100,image.rows+50));
prev.push_back(cv::Point2f(-50,image.rows+50 ));
std::vector<cv::Point2f> post;
post.push_back(cv::Point2f(0,0));
post.push_back(cv::Point2f(image.cols-1,0));
post.push_back(cv::Point2f(image.cols-1,image.rows-1));
post.push_back(cv::Point2f(0,image.rows-1));
cv::Mat homography = cv::findHomography(prev, post);
cv::Mat imageWarped;
cv::warpPerspective(image, imageWarped, homography, image.size());
// find external contour and create mask
std::vector<std::vector<cv::Point> > contours;
cv::Mat imageWarpedCloned = imageWarped.clone(); // clone the image because findContours will modify it
cv::cvtColor(imageWarpedCloned, imageWarpedCloned, CV_BGR2GRAY); //only if the image is BGR
cv::findContours (imageWarpedCloned, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_NONE);
// create mask
cv::Mat mask = cv::Mat::zeros(image.size(), CV_8U);
cv::drawContours(mask, contours, 0, cv::Scalar(255), -1);
// copy warped image into image2 using the mask
cv::erode(mask, mask, cv::Mat()); // for avoid artefacts
imageWarped.copyTo(image2, mask); // copy the image using the mask
//show images
cv::imshow("imageWarpedCloned", imageWarpedCloned);
cv::imshow("warped", imageWarped);
cv::imshow("image2", image2);
cv::waitKey();
One of the easiest ways to approach this (not necessarily the most efficient) is to warp the image twice, but set the OpenCV constant boundary value to different values each time (i.e. zero the first time and 255 the second time). These constant values should be chosen towards the minimum and maximum values in the image.
Then it is easy to find a binary mask where the two warp values are close to equal.
More importantly, you can also create a transparency effect through simple algebra like the following:
new_image = np.float32((warp_const_255 - warp_const_0) *
preferred_bkg_img) / 255.0 + np.float32(warp_const_0)
The main reason I prefer this method is that openCV seems to interpolate smoothly down (or up) to the constant value at the image edges. A fully binary mask will pick up these dark or light fringe areas as artifacts. The above method acts more like true transparency and blends properly with the preferred background.
Here's a small test program that warps with transparent "border", then copies the warped image to a solid background.
int main()
{
cv::Mat input = cv::imread("../inputData/Lenna.png");
cv::Mat transparentInput, transparentWarped;
cv::cvtColor(input, transparentInput, CV_BGR2BGRA);
//transparentInput = input.clone();
// create sample transformation mat
cv::Mat M = cv::Mat::eye(2,3, CV_64FC1);
// as a sample, just scale down and translate a little:
M.at<double>(0,0) = 0.3;
M.at<double>(0,2) = 100;
M.at<double>(1,1) = 0.3;
M.at<double>(1,2) = 100;
// warp to same size with transparent border:
cv::warpAffine(transparentInput, transparentWarped, M, transparentInput.size(), CV_INTER_LINEAR, cv::BORDER_TRANSPARENT);
// NOW: merge image with background, here I use the original image as background:
cv::Mat background = input;
// create output buffer with same size as input
cv::Mat outputImage = input.clone();
for(int j=0; j<transparentWarped.rows; ++j)
for(int i=0; i<transparentWarped.cols; ++i)
{
cv::Scalar pixWarped = transparentWarped.at<cv::Vec4b>(j,i);
cv::Scalar pixBackground = background.at<cv::Vec3b>(j,i);
float transparency = pixWarped[3] / 255.0f; // pixel value: 0 (0.0f) = fully transparent, 255 (1.0f) = fully solid
outputImage.at<cv::Vec3b>(j,i)[0] = transparency * pixWarped[0] + (1.0f-transparency)*pixBackground[0];
outputImage.at<cv::Vec3b>(j,i)[1] = transparency * pixWarped[1] + (1.0f-transparency)*pixBackground[1];
outputImage.at<cv::Vec3b>(j,i)[2] = transparency * pixWarped[2] + (1.0f-transparency)*pixBackground[2];
}
cv::imshow("warped", outputImage);
cv::imshow("input", input);
cv::imwrite("../outputData/TransparentWarped.png", outputImage);
cv::waitKey(0);
return 0;
}
I use this as input:
and get this output:
which looks like ALPHA channel isn't set to ZERO by warpAffine but to something like 205...
But in general this is the way I would do it (unoptimized)

Matching small grayscale images

I want to test whether two images match. Partial matches also interest me.
The problem is that the images suffer from strong noise. Another problem is that the images might be rotated with an unknown angle. The objects shown in the images will roughly always have the same scale!
The images show area scans from a top-shot perspective. "Lines" are mostly walls and other objects are mostly trees and different kinds of plants.
Another problem was, that the left image was very blurry and the right one's lines were very thin.
To compensate for this difference I used dilation. The resulting images are the ones I uploaded.
Although It can easily be seen that these images match almost perfectly I cannot convince my algorithm of this fact.
My first idea was a feature based matching, but the matches are horrible. It only worked for a rotation angle of -90°, 0° and 90°. Although most descriptors are rotation invariant (in past projects they really were), the rotation invariance seems to fail for this example.
My second idea was to split the images into several smaller segments and to use template matching. So I segmented the images and, again, for the human eye they are pretty easy to match. The goal of this step was to segment the different walls and trees/plants.
The upper row are parts of the left, and the lower are parts of the right image. After the segmentation the segments were dilated again.
As already mentioned: Template matching failed, as did contour based template matching and contour matching.
I think the dilation of the images was very important, because it was nearly impossible for the human eye to match the segments without dilation before the segmentation. Another dilation after the segmentation made this even less difficult.
Your first job should be to fix the orientation. I am not sure what is the best algorithm to do that but here is an approach I would use: fix one of the images and start rotating the other. For each rotation compute a histogram for the color intense on each of the rows/columns. Compute some distance between the resulting vectors(e.g. use cross product). Choose the rotation that results in smallest cross product. It may be good idea to combine this approach with hill climbing.
Once you have the images aligned in approximately the same direction, I believe matching should be easier. As the two images are supposed to be at the same scale, compute something analogous to the geometrical center for both images: compute weighted sum of all pixels - a completely white pixel would have a weight of 1, and a completely black - weight 0, the sum should be a vector of size 2(x and y coordinate). After that divide those values by the dimensions of the image and call this "geometrical center of the image". Overlay the two images in a way that the two centers coincide and then once more compute cross product for the difference between the images. I would say this should be their difference.
You can also try following methods to find rotation and similarity.
Use image moments to get the rotation as shown here.
Once you rotate the image, use cross-correlation to evaluate the similarity.
EDIT
I tried this with OpenCV and C++ for the two sample images. I'm posting the code and results below as it seems to work well at least for the given samples.
Here's the function to calculate the orientation vector using image moments:
Mat orientVec(Mat& im)
{
Moments m = moments(im);
double cov[4] = {m.mu20/m.m00, m.mu11/m.m00, m.mu11/m.m00, m.mu02/m.m00};
Mat covMat(2, 2, CV_64F, cov);
Mat evals, evecs;
eigen(covMat, evals, evecs);
return evecs.row(0);
}
Rotate and match sample images:
Mat im1 = imread(INPUT_FOLDER_PATH + string("WojUi.png"), 0);
Mat im2 = imread(INPUT_FOLDER_PATH + string("XbrsV.png"), 0);
// get the orientation vector
Mat v1 = orientVec(im1);
Mat v2 = orientVec(im2);
double angle = acos(v1.dot(v2))*180/CV_PI;
// rotate im2. try rotating with -angle and +angle. here using -angle
Mat rot = getRotationMatrix2D(Point(im2.cols/2, im2.rows/2), -angle, 1.0);
Mat im2Rot;
warpAffine(im2, im2Rot, rot, Size(im2.rows, im2.cols));
// add a border to rotated image
int borderSize = im1.rows > im2.cols ? im1.rows/2 + 1 : im1.cols/2 + 1;
Mat im2RotBorder;
copyMakeBorder(im2Rot, im2RotBorder, borderSize, borderSize, borderSize, borderSize,
BORDER_CONSTANT, Scalar(0, 0, 0));
// normalized cross-correlation
Mat& image = im2RotBorder;
Mat& templ = im1;
Mat nxcor;
matchTemplate(image, templ, nxcor, CV_TM_CCOEFF_NORMED);
// take the max
double max;
Point maxPt;
minMaxLoc(nxcor, NULL, &max, NULL, &maxPt);
// draw the match
Mat rgb;
cvtColor(image, rgb, CV_GRAY2BGR);
rectangle(rgb, maxPt, Point(maxPt.x+templ.cols-1, maxPt.y+templ.rows-1), Scalar(0, 255, 255), 2);
cout << "max: " << max << endl;
With -angle rotation in code, I get max = 0.758. Below is the rotated image in this case with the matching region.
Otherwise max = 0.293