Or, perhaps, not warped enough...
So I'm trying to take an image and specify four corners - then move those four corners into a (near-)perfect square in the middle of the image.
UPDATE 1 (At bottom): 9/10/13 # 8:20PM GMT The math & matrix tags were added with this update. (If your read this update before 8:20, I must apologize, I gave you really bad info!)
I don't need it super-accurate, but my current results are very clearly not working, yet after looking at multiple other examples I cannot see what I've been doing wrong.
Here is my mock-up:
And through some magical process I obtain the coordinates. For this 640x480 mock-up the points are as follows:
Corners:
Upper-Left: 186, 87
Upper-Right: 471, 81
Lower-Left: 153, 350
Lower-Right: 500, 352
And the points I want to move the corners to are as follows:
Upper-Left: 176, 96
Upper-Right: 464, 96
Lower-Left: 176, 384
Lower-Right: 464, 384
Now, the end-goal here is to get the coordinates of the black dot-thing relative to the corners. I'm not making the box fill up the entire image because that point in the center could be outside the box given a different picture, so I want to keep enough "outside the box" room to generalize the process. I know I can get the point after it's moved, I'm just having trouble moving it correctly. My current warpPerspective attempts provide the following result:
Ok, so it looks like it's trying to fit things properly, but the corners didn't actually end up where we thought they would. The top-left is too far to the right. The Bottom-Left is too high, and the two on the right are both too far right and too close together. Well, ok... so lets try expanding the destination coordinates so it fills up the screen.
Just seems zoomed in? Are my coordinates somehow off? Do I need to feed it new coordinates? Source, destination, or both?
Here's my code: (This is obviously edited down to the key pieces of information, but if I missed something, please ask me to re-include it.)
Mat frame = inputPicture;
Point2f sourceCoords[4], destinationCoords[4];
// These values were pre-determined, and given above for this image.
sourceCoords[0].x = UpperLeft.X;
sourceCoords[0].y = UpperLeft.Y;
sourceCoords[1].x = UpperRight.X;
sourceCoords[1].y = UpperRight.Y;
sourceCoords[2].x = LowerLeft.X;
sourceCoords[2].y = LowerLeft.Y;
sourceCoords[3].x = LowerRight.X;
sourceCoords[3].y = LowerRight.Y;
// We need to make a square in the image. The 'if' is just in case the
// picture used is not longer left-to-right than it is top-to-bottom.
int top = 0;
int bottom = 0;
int left = 0;
int right = 0;
if (frame.cols >= frame.rows)
{
int longSideMidpoint = frame.cols/2.0;
int shortSideFifthpoint = frame.rows/5.0;
int shortSideTenthpoint = frame.rows/10.0;
top = shortSideFifthpoint;
bottom = shortSideFifthpoint*4;
left = longSideMidpoint - (3*shortSideTenthpoint);
right = longSideMidpoint + (3*shortSideTenthpoint);
}
else
{
int longSideMidpoint = frame.rows/2.0;
int shortSideFifthpoint = fFrame.cols/5.0;
int shortSideTenthpoint = frame.cols/10.0;
top = longSideMidpoint - (3*shortSideTenthpoint);
bottom = longSideMidpoint + (3*shortSideTenthpoint);
left = shortSideFifthpoint;
right = shortSideFifthpoint*4;
}
// This code was used instead when putting the destination coords on the edges.
//top = 0;
//bottom = frame.rows-1;
//left = 0;
//right = frame.cols-1;
destinationCoords[0].y = left; // UpperLeft
destinationCoords[0].x = top; // UL
destinationCoords[1].y = right; // UpperRight
destinationCoords[1].x = top; // UR
destinationCoords[2].y = left; // LowerLeft
destinationCoords[2].x = bottom; // LL
destinationCoords[3].y = right; // LowerRight
destinationCoords[3].x = bottom; // LR
Mat warp_matrix = cvCreateMat(3, 3, CV_32FC1);
warp_matrix = getPerspectiveTransform(sourceCoords, destinationCoords); // This seems to set the warp_matrix to 3x3 even if it isn't.
warpPerspective(frame, frame, warp_matrix, frame.size(), CV_INTER_LINEAR, 0);
IplImage *savePic = new IplImage(frame);
sprintf(fileName, "test/%s/photo%i-7_warp.bmp", startupTime, Count);
cvSaveImage(fileName, savePic);
delete savePic;
I've also tried using perspectiveTransform, but that lead to the error:
OpenCV Error: Assertion failed (scn + 1 == m.cols && (depth == CV_32F
|| depth == CV_64F)) in unknown function, file
C:\slace\builds\WinInstallerMegaPack\src\opencv\modules\core\src\matmul.cpp,
line 1926
which lead me to trying getHomography leading to THIS error instead:
OpenCV Error: Assertion failed (npoints >= 0 && points2.checkVector(2)
== npoints && points1.type()) in unknown function, file C:\slave\builds\WinInstallerMegaPack\src\opencv\modules\calib3d\src\fundam.cpp,
line 1074
(I checked - npoints IS greater than zero, because it equals points1.checkVector(2) - it's failing because points1.checkVector(2) and points2.checkVector(2) don't match - but I don't understand what checkVector(2) does. points1 and points2 are taken from the coordinates I tried feeding getHomography - which were the same coordinates as above.
Any idea how to get the output I'm looking for? I've been left confused for a while now. :/
^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^-^
UPDATE 1:
Ok, so I've figured out how it's supposed to calculate things.
getPerspectiveTransform is supposed to find a 3x3 matrix such that:
|M00 M10 M20| |X| |c*X'|
|M01 M11 M21| * |Y| = |c*Y'|
|M02 M12 M22| |1| | c |
Where MXX is the constant matrix, X & Y are the input coordinates, and X' & Y' are the output coordinates. This has to work for all four input/output coordinate combos. (The idea is simple, even if the math to get there may not be - I'm still not sure how they're supposed to get that matrix... input here would be appreciated - since I only need one actual coordinate I would not mind just bypassing getPerspectiveTransform and WarpPerspective entirely and just using a mathematical solution.)
After you've gotten the perspective transform matrix, WarpPerspective basically just moves each pixel's coordinates by multiplying:
|M00 M10 M20| |X|
|M01 M11 M21| * |Y|
|M02 M12 M22| |1|
for each coordinate. Then dividing cX' & cY' both by c (obviously). Then some averaging needs to be done since the results are unlikely to be perfect integers.
Ok, the basic idea is easy enough but here's the problem; getPerspectiveTransform does not seem to be working!
In my above example I modified the program to print out the matrix it was getting from getPerspectiveTransform. It gave me this:
|1.559647 0.043635 -37.808761|
|0.305521 1.174385 -50.688854|
|0.000915 0.000132 1.000000|
In my above example, I gave for the upper-left coordinate 186, 87 to be moved to 176, 96. Unfortunately when I multiply the above warp_matrix with the input coordinate (186, 87) I get not 176, 96 - but 217, 92! Which is the same result WarpPerspective gets. So at least we know WarpPerspective is working...
Um, Alex...
Your X & Y coordinates are reversed on both your Source and Destination coordinates.
In other words:
sourceCoords[0].x = UpperLeft.X;
sourceCoords[0].y = UpperLeft.Y;
should be
sourceCoords[0].x = UpperLeft.Y;
sourceCoords[0].y = UpperLeft.X;
and
destinationCoords[0].y = top;
destinationCoords[0].x = left;
should be
destinationCoords[0].y = left;
destinationCoords[0].x = top;
Jus' thought you'd like to know.
Related
I have a sample of images and would like to detect the object among others in the image/video already knowing in advance the real physical dimensions of that object. I have one of the image sample (its airplane door) and would like to find the window in the airplane door knowing its physical dimensions(let we say it has inner radius of 20cm and out radius of 23cm) and its real world position in the door (for example its minimal distance to the door frame is 15cm) .Also I can know prior my camera resolution. Any matlab code or OpenCV C++ that can do that automatically with image processing?
Here is my image sample
And more complex image with round logos.
I run the code for second complex image and do not get the same results. Here is the image result.
You are looking for a circle in the image so i suggest you use Hough circle transform.
Convert image to gray
Find edges in the image
Use Hugh circle transform to find circles in the image.
For each candidate circle sample the values of the circle and if the values corresponds to a predefined values accept.
The code:
clear all
% Parameters
minValueWindow = 90;
maxValueWindow = 110;
% Read file
I = imread('image1.jpg');
Igray = rgb2gray(I);
[row,col] = size(Igray);
% Edge detection
Iedge = edge(Igray,'canny',[0 0.3]);
% Hough circle transform
rad = 40:80; % The approximate radius in pixels
detectedCircle = {};
detectedCircleIndex = 1;
for radIndex=1:1:length(rad)
[y0detect,x0detect,Accumulator] = houghcircle(Iedge,rad(1,radIndex),rad(1,radIndex)*pi/2);
if ~isempty(y0detect)
circles = struct;
circles.X = x0detect;
circles.Y = y0detect;
circles.Rad = rad(1,radIndex);
detectedCircle{detectedCircleIndex} = circles;
detectedCircleIndex = detectedCircleIndex + 1;
end
end
% For each detection run a color filter
ang=0:0.01:2*pi;
finalCircles = {};
finalCircleIndex = 1;
for i=1:1:detectedCircleIndex-1
rad = detectedCircle{i}.Rad;
xp = rad*cos(ang);
yp = rad*sin(ang);
for detectedPointIndex=1:1:length(detectedCircle{i}.X)
% Take each detected center and sample the gray image
samplePointsX = round(detectedCircle{i}.X(detectedPointIndex) + xp);
samplePointsY = round(detectedCircle{i}.Y(detectedPointIndex) + yp);
sampleValueInd = sub2ind([row,col],samplePointsY,samplePointsX);
sampleValueMean = mean(Igray(sampleValueInd));
% Check if the circle color is good
if(sampleValueMean > minValueWindow && sampleValueMean < maxValueWindow)
circle = struct();
circle.X = detectedCircle{i}.X(detectedPointIndex);
circle.Y = detectedCircle{i}.Y(detectedPointIndex);
circle.Rad = rad;
finalCircles{finalCircleIndex} = circle;
finalCircleIndex = finalCircleIndex + 1;
end
end
end
% Find Main circle by merging close hyptosis together
for finaCircleInd=1:1:length(finalCircles)
circleCenter(finaCircleInd,1) = finalCircles{finaCircleInd}.X;
circleCenter(finaCircleInd,2) = finalCircles{finaCircleInd}.Y;
circleCenter(finaCircleInd,3) = finalCircles{finaCircleInd}.Rad;
end
[ind,C] = kmeans(circleCenter,2);
c = [length(find(ind==1));length(find(ind==2))];
[~,maxInd] = max(c);
xCircle = median(circleCenter(ind==maxInd,1));
yCircle = median(circleCenter(ind==maxInd,2));
radCircle = median(circleCenter(ind==maxInd,3));
% Plot circle
imshow(Igray);
hold on
ang=0:0.01:2*pi;
xp=radCircle*cos(ang);
yp=radCircle*sin(ang);
plot(xCircle+xp,yCircle+yp,'Color','red', 'LineWidth',5);
The resulted image:
Remarks:
For other images will still have to fine tune several parameters like the radius that you search for the color and Hough circle threshold and canny edge thresholds.
In the function i searched for circle with radius from 40 pixels to 80. In here you can use your prior information about the real world radius of the window and the resolution of the camera. If you know approximately the distance the camera was from the airplane and the resolution of the camera and also the window radius in cm you can use this to get the radius in pixels and use this for the hough circle transform.
I wouldn't worry too much about the exact geometry and calibration and rather find the window by its own characteristics.
Binarization works relatively well, be it on the whole image or in a large region of interest.
Then you can select the most likely blob based on it approximate area and/or circularity.
My project uses an isometric perspective for the time being I am showing the co-ordinates in grid-format above them for debugging. However, when it comes to collision/grid-locking of the player, I have an issue.
Due to the nature of sprite drawing, my maths is creating some issues with the 'triangular' corner empty areas of the textures. I think that the issue is something like below (blue is what I think is the way my tiles are being detected, whereas the red is how they ideally should be detected for accurate roaming movement on the tiles:
As you can see, the boolean that checks the tile I am stood on (which takes the pixel central to the player's feet, the player will later be a car and take a pixel based on the direction of movement) is returning false and denying movement in several scenarios, as well as letting the player move in some places that shouldn't be allowed.
I think that it's because the cutoff areas of each texture are (I think) being considered part of the grid area, so when the player is in one of these corner areas it is not truly checking the correct tile, and so returning the wrong results.
The code I'm using for creating the grid is this:
int VisualComponent::TileConversion(Tile* tileToConvert, bool xOrY)
{
int X = (tileToConvert->x - tileToConvert->y) * 64; //change 64 to TILE_WIDTH_HALF
int Y = (tileToConvert->x + tileToConvert->y) * 25;
/*int X = (tileToConvert->x * 128 / 2) + (tileToConvert->y * 128 / 2) + 100;
int Y = (tileToConvert->y * 50 / 2) - (tileToConvert->x * 50 / 2) + 100;*/
if (xOrY)
{
return X;
}
else
{
return Y;
}
}
and the code for checking the player's movement is:
bool Clsentity::CheckMovementTile(int xpos, int ypos, ClsMapData* mapData) //check if the movement will end on a legitimate road tile UNOPTIMISED AS RUNS EVERY FRAME FOR EVERY TILE
{
int x = xpos + 7; //get the center bottom pixel as this is more suitable than the first on an iso grid (more realistic 'foot' placement)
int y = ypos + 45;
int mapX = (x / 64 + y / 25) / 2; //64 is TILE-WIDTH HALF and 25 is TILE HEIGHT
int mapY = (y / 25 - (x / 64)) / 2;
for (int i = 0; i < mapData->tilesList.size(); i++) //for each tile of the map
{
if (mapData->tilesList[i]->x == mapX && mapData->tilesList[i]->y == mapY) //if there is an existing tile that will be entered
{
if (mapData->tilesList[i]->movementTile)
{
HAPI->DebugText(std::to_string(mapX) + " is the x and the y is " + std::to_string(mapY));
return true;
}
}
}
return false;
}
I'm a little stuck on progression until having this fixed in the game loop aspect of things. If anyone thinks they either know the issue from this or might be able to help it'd be great and I would appreciate it. For reference also, my tile textures are 128x64 pixels and the math behind drawing them to screen treats them as 128x50 (to cleanly link together).
Rather than writing specific routines for rendering and click mapping, seriously consider thinking of these as two views on the data, which can be transformed in terms of matrix transformations of a coordinate space. You can have two coordinate spaces - one is a nice rectangular grid that you use for positioning and logic. The other is the isometric view that you use for display and input.
If you're not familiar with linear algebra, it'll take a little bit to wrap your head around it, but once you do, it makes everything trivial.
So, how does that work? Your isometric view is merely a rotation of a bog standard grid view, right? Well, close. Isometric view also changes the dimensions if you're starting with a square grid. Anyhow: can we just do a simple coordinate transformation?
Logical coordinate system -> display system (e.g. for rendering)
Texture point => Rotate 45 degrees => Scale by sqrt(2) because a 45 degree rotation changes the dimension of the block by sqrt(1 * 1 + 1 * 1)
Display system -> logical coordinate system (e.g. for mapping clicks into logical space)
Click point => descale by sqrt(2) to unsquish => unrotate by 45 degrees
Why?
If you can do coordinate transformations, then you'd be dealing with a pretty bog-standard rectangular grid for everything else you write, which will make your any other logic MUCH simpler. Your calculations there won't involve computing angles or slopes. E.g. now your "can I move 'down'" logic is much simpler.
Let's say you have 64 x 64 tiles, for simplicity. Now transforming a screen space click to a logical tile is simply:
(int, int) whichTile(clickX, clickY) {
logicalX, logicalY = transform(clickX, clickY)
return (logicalX / 64, logicalY / 64)
}
You can do checks like see if x0,y0 and x1,y1 are on the same tile, in the logical space by someting as simple as:
bool isSameTile(x0, y0, x1, y1) {
return floor(x0/64) == floor(x1/64) && floor(y0/64) == floor(y1/64)
}
Everything gets much simpler once you define the transforms and work in the logical space.
http://en.wikipedia.org/wiki/Rotation_matrix
http://en.wikipedia.org/wiki/Scaling_%28geometry%29#Matrix_representation
http://www.alcove-games.com/advanced-tutorials/isometric-tile-picking/
If you don't want to deal with some matrix library, you can do the equivalent math pretty straightforwardly, but if you separate concerns of logic management from display / input through these transformations, I suspect you'll have a much easier time of it.
So, here is the code for my 2D point class to rotate:
float nx = (x * cos(angle)) - (y * sin(angle));
float ny = (y * cos(angle)) + (x * sin(angle));
x = nx;
y = ny;
x and y are local variables in the point class.
And here is the code for my sprite class's rotation:
//Make clip
SDL_Rect clip;
clip.w = width;
clip.h = height;
clip.x = (width * _frameX) + (sep * (_frameX) + osX);
clip.y = (height * _frameY) + (sep * (_frameY) + osY);
//Make a rotated image
col bgColor = image->format->colorkey;
//Surfaces
img *toEdit = newImage(clip.w, clip.h);
img *toDraw = 0;
//Copy the source into the workspace
drawRect(0, 0, toEdit->w, toEdit->h, toEdit, bgColor);
drawImage(0, 0, image, toEdit, &clip);
//Edit the image
toDraw = SPG_Transform(toEdit, bgColor, angle, xScale, yScale, SPG_NONE);
SDL_SetColorKey(toDraw, SDL_SRCCOLORKEY, bgColor);
//Find new origin and offset by pivot
2DVec *pivot = new xyVec(pvX, pvY);
pivot->rotate(angle);
//Draw and remove the finished image
drawImage(_x - pivot->x - (toDraw->w / 2), _y - pivot->y - (toDraw->h / 2), toDraw, _destination);
//Delete stuff
deleteImage(toEdit);
delete pivot;
deleteImage(toDraw);
The code uses the center of the sprite as the origin. It works fine if I leave the pivot at (0,0), but if I move it somewhere else, the character's shoulder for instance, it starts making the sprite dance around as it spins like a spirograph, instead of the pivot staying on the character's shoulder.
The image rotation function is from SPriG, a library for drawing primitives and transformed images in SDL. Since the pivot is coming from the center of the image, I figure the new size of the clipped surface produced by rotating shouldn't matter.
[EDIT]
I've messed with the code a bit. By slowing it down, I found that for some reason, the vector is rotating 60 times faster than the image, even though I'm not multiplying anything by 60. So, I tried to just divide the input by 60, only now, it's coming out all jerky and not rotating to anything between multiples of 60.
The vector rotation code I found on this very site, and people have repeatedly confirmed that it works, so why does it only rotate in increments of 60?
I haven't touched the source of SPriG in a long time, but I can give you some info.
If SPriG has problems with rotating off of center, it would probably be faster and easier for you to migrate to SDL_gpu (and I suggest SDL 2.0). That way you get a similar API but the performance is much better (it uses the graphics card).
I can guess that the vector does not rotate 60 times faster than the image, but rather more like 57 times faster! This is because you are rotating the vector with sin() and cos(), which accept values in radians. The image is being rotated by an angle in degrees. The conversion factor for radians to degrees is 180/pi, which is about 57. SPriG can use either degrees or radians, but uses degrees by default. Use SPG_EnableRadians(1) to switch that behavior. Alternatively, you can stick to degree measure in your angle variable by multiplying the argument to sin() and cos() by pi/180.
currently I am developing a tool for the Kinect for Windows v2 (similar to the one in XBOX ONE). I tried to follow some examples, and have a working example that shows the camera image, the depth image, and an image that maps the depth to the rgb using opencv. But I see that it duplicates my hand when doing the mapping, and I think it is due to something wrong in the coordinate mapper part.
here is an example of it:
And here is the code snippet that creates the image (rgbd image in the example)
void KinectViewer::create_rgbd(cv::Mat& depth_im, cv::Mat& rgb_im, cv::Mat& rgbd_im){
HRESULT hr = m_pCoordinateMapper->MapDepthFrameToColorSpace(cDepthWidth * cDepthHeight, (UINT16*)depth_im.data, cDepthWidth * cDepthHeight, m_pColorCoordinates);
rgbd_im = cv::Mat::zeros(depth_im.rows, depth_im.cols, CV_8UC3);
double minVal, maxVal;
cv::minMaxLoc(depth_im, &minVal, &maxVal);
for (int i=0; i < cDepthHeight; i++){
for (int j=0; j < cDepthWidth; j++){
if (depth_im.at<UINT16>(i, j) > 0 && depth_im.at<UINT16>(i, j) < maxVal * (max_z / 100) && depth_im.at<UINT16>(i, j) > maxVal * min_z /100){
double a = i * cDepthWidth + j;
ColorSpacePoint colorPoint = m_pColorCoordinates[i*cDepthWidth+j];
int colorX = (int)(floor(colorPoint.X + 0.5));
int colorY = (int)(floor(colorPoint.Y + 0.5));
if ((colorX >= 0) && (colorX < cColorWidth) && (colorY >= 0) && (colorY < cColorHeight))
{
rgbd_im.at<cv::Vec3b>(i, j) = rgb_im.at<cv::Vec3b>(colorY, colorX);
}
}
}
}
}
Does anyone have a clue of how to solve this? How to prevent this duplication?
Thanks in advance
UPDATE:
If I do a simple depth image thresholding I obtain the following image:
This is what more or less I expected to happen, and not having a duplicate hand in the background. Is there a way to prevent this duplicate hand in the background?
I suggest you use the BodyIndexFrame to identify whether a specific value belongs to a player or not. This way, you can reject any RGB pixel that does not belong to a player and keep the rest of them. I do not think that CoordinateMapper is lying.
A few notes:
Include the BodyIndexFrame source to your frame reader
Use MapColorFrameToDepthSpace instead of MapDepthFrameToColorSpace; this way, you'll get the HD image for the foreground
Find the corresponding DepthSpacePoint and depthX, depthY, instead of ColorSpacePoint and colorX, colorY
Here is my approach when a frame arrives (it's in C#):
depthFrame.CopyFrameDataToArray(_depthData);
colorFrame.CopyConvertedFrameDataToArray(_colorData, ColorImageFormat.Bgra);
bodyIndexFrame.CopyFrameDataToArray(_bodyData);
_coordinateMapper.MapColorFrameToDepthSpace(_depthData, _depthPoints);
Array.Clear(_displayPixels, 0, _displayPixels.Length);
for (int colorIndex = 0; colorIndex < _depthPoints.Length; ++colorIndex)
{
DepthSpacePoint depthPoint = _depthPoints[colorIndex];
if (!float.IsNegativeInfinity(depthPoint.X) && !float.IsNegativeInfinity(depthPoint.Y))
{
int depthX = (int)(depthPoint.X + 0.5f);
int depthY = (int)(depthPoint.Y + 0.5f);
if ((depthX >= 0) && (depthX < _depthWidth) && (depthY >= 0) && (depthY < _depthHeight))
{
int depthIndex = (depthY * _depthWidth) + depthX;
byte player = _bodyData[depthIndex];
// Identify whether the point belongs to a player
if (player != 0xff)
{
int sourceIndex = colorIndex * BYTES_PER_PIXEL;
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // B
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // G
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // R
_displayPixels[sourceIndex] = 0xff; // A
}
}
}
}
Here is the initialization of the arrays:
BYTES_PER_PIXEL = (PixelFormats.Bgr32.BitsPerPixel + 7) / 8;
_colorWidth = colorFrame.FrameDescription.Width;
_colorHeight = colorFrame.FrameDescription.Height;
_depthWidth = depthFrame.FrameDescription.Width;
_depthHeight = depthFrame.FrameDescription.Height;
_bodyIndexWidth = bodyIndexFrame.FrameDescription.Width;
_bodyIndexHeight = bodyIndexFrame.FrameDescription.Height;
_depthData = new ushort[_depthWidth * _depthHeight];
_bodyData = new byte[_depthWidth * _depthHeight];
_colorData = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_displayPixels = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_depthPoints = new DepthSpacePoint[_colorWidth * _colorHeight];
Notice that the _depthPoints array has a 1920x1080 size.
Once again, the most important thing is to use the BodyIndexFrame source.
Finally I get some time to write the long awaited answer.
Lets start with some theory to understand what is really happening and then a possible answer.
We should start by knowing the way to pass from a 3D point cloud which has the depth camera as the coordinate system origin to an image in the image plane of the RGB camera. To do that it is enough to use the camera pinhole model:
In here, u and v are the coordinates in the image plane of the RGB camera. the first matrix in the right side of the equation is the camera matrix, AKA intrinsics of the RGB Camera. The following matrix is the rotation and translation of the extrinsics, or better said, the transformation needed to go from the Depth camera coordinate system to the RGB camera coordinate system. The last part is the 3D point.
Basically, something like this, is what the Kinect SDK does. So, what could go wrong that makes the hand gets duplicated? well, actually more than one point projects to the same pixel....
To put it in other words and in the context of the problem in the question.
The depth image, is a representation of an ordered point cloud, and I am querying the u v values of each of its pixels that in reality can be easily converted to 3D points. The SDK gives you the projection, but it can point to the same pixel (usually, the more distance in the z axis between two neighbor points may give this problem quite easily.
Now, the big question, how can you avoid this.... well, I am not sure using the Kinect SDK, since you do not know the Z value of the points AFTER the extrinsics are applied, so it is not possible to use a technique like the Z buffering.... However, you may assume the Z value will be quite similar and use those from the original pointcloud (at your own risk).
If you were doing it manually, and not with the SDK, you can apply the Extrinsics to the points, and the use the project them into the image plane, marking in another matrix which point is mapped to which pixel and if there is one existing point already mapped, check the z values and compared them and always leave the closest point to the camera. Then, you will have a valid mapping without any problems. This way is kind of a naive way, probably you can get better ones, since the problem is now clear :)
I hope it is clear enough.
P.S.:
I do not have Kinect 2 at the moment so I can'T try to see if there is an update relative to this issue or if it still happening the same thing. I used the first released version (not pre release) of the SDK... So, a lot of changes may had happened... If someone knows if this was solve just leave a comment :)
I've been struggling with a small problem for some time now and just can't figure out what is wrong.
So I have a black 126 x 126 image with a 1 pixel blue border ( [B,G,R] = [255, 0, 0] ).
What I want, is the pixel which is furthest away from all blue pixels (such as the border). I understand how this is done. Iterate through every pixel, if it is black then compute distance to every other pixel which is blue looking for the minimum, then select the black pixel with the largest minimum distance to any blue.
Note: I don't need to actually know the true distance, so when doing the sum of the squares for distance I don't square root, I only want to know which distance is larger (less expensive).
First thing I do is loop through every pixel and if it is blue, add the row and column to a vector. I can confirm this part works correctly. Next, I loop through all pixels again and compare every black pixel's distance to every pixel in the blue pixel vector.
Where blue is a vector of Blue objects (has row and column)
region is the image
int distance;
int localShortest = 0;
int bestDist = 0;
int posX = 0;
int posY = 0;
for(int i = 0; i < image.rows; i++)
{
for(int j = 0; j < image.cols; j++)
{
//Make sure pixel is black
if(image.at<cv::Vec3b>(i,j)[0] == 0
&& image.at<cv::Vec3b>(i,j)[1] == 0
&& image.at<cv::Vec3b>(i,j)[2] == 0)
{
for(int k = 0; k < blue.size(); k++)
{
//Distance between pixels
distance = (i - blue.at(k).row)*(i - blue.at(k).row) + (j - blue.at(k).col)*(j - blue.at(k).col);
if(k == 0)
{
localShortest = distance;
}
if(distance < localShortest)
{
localShortest = distance;
}
}
if(localShortest > bestDist)
{
posX = i;
posY = j;
bestDistance = localShortest;
}
}
}
}
This works absolutely fine for a 1 pixel border around the edge.
https://dl.dropboxusercontent.com/u/3879939/works.PNG
Similarly, if I add more blue but keep a square ish black region, then it also works.
https://dl.dropboxusercontent.com/u/3879939/alsoWorks.PNG
But as soon as I make the image not have a square black portion, but maybe rectangular. Then the 'furthest away' is off. Sometimes it even says a blue pixel is the furthest away from blue, which is just not right.
https://dl.dropboxusercontent.com/u/3879939/off.PNG
Any help much appreciated! Hurting my head a bit.
One possibility, given that you're using OpenCV anyway, is to just use the supplied distance transform function.
For your particular case, you would need to do the following:
Convert your input to a single-channel binary image (e.g. map black to white and blue to black)
Run the cv::distanceTransform function with CV_DIST_L2 (Euclidean distance)
Examine the resulting greyscale image to get the results.
Note that there may be more than one pixel at the maximum distance from the border, so you need to handle this case according to your application.
The brightest pixels in the distance transform will be the ones that you need. For example, here is a white rectangle and its distance transform:
In square due to its symmetry the furthest black point (the center) is also the furthest no matter in which direction you look from there. But now try to imagine a very long rectangle with a very short height. There will be multiple points on its horizontal axis, to which the largest minimum distance will be the short distance to both the top and bottom sides, because the left and right sides are far away. In this case the pixel your algorithm finds can be any one on this line, and the result will depend on your pixel scanning order.
It's because there is a line(more than one pixel) to meet your condition for a rectangular