Calculate y derivative of image - c++

I want to implement the y derivative of an image.
Given is the following openCV function, who uses a Sobel Operator:
Sobel(gray_input_picture, y_derivative_picture, CV_32FC1 , 0, 1, 3, BORDER_DEFAULT);
There is an example how it looks like:
Now i need to implement it by myself without using this sobel function.
I found some help here: Gradient direction computation
What i now implemented was:
for(int x=0; x<gray_input_picture.rows; x++)
{
for(int y=0; y<gray_input_picture.cols; y++)
{
if(x == gray_input_picture.rows-1 || x == 0)
{
y_derivative.at<Vec3b>(x,y) = gray_input_picture.at<Vec3b>(x,y);
}
else
{
Vec3b color;
color[0] = gray_input_picture.at<Vec3b>(x+1,y)[0] - gray_input_picture.at<Vec3b>(x-1,y)[0];
color[1] = gray_input_picture.at<Vec3b>(x+1,y)[1] - gray_input_picture.at<Vec3b>(x-1,y)[1];
color[2] = gray_input_picture.at<Vec3b>(x+1,y)[2] - gray_input_picture.at<Vec3b>(x-1,y)[2];
y_derivative_picture.at<Vec3b>(x,y) = color;
}
}
}
bitwise_not ( y_derivative_picture, y_derivative_picture2 );
The x and y switch comes from openCV. That's why it's a little bit different. What i don't understand is, that i get a picture which i need to convert (black to white, white to black).
The Result is a little bit different too. It contains blue areas:
Anyone know how i can implement the y derivative better (that it looks similar to the Sobel function)?
Or does anyone know the problem in my implementation?
Thanks!

I think you would probably need to check out what Sobel operator does here:
http://en.wikipedia.org/wiki/Sobel_operator

Related

Plot a function using SFML

I'm new to SFML. I searched Google to find a way to plot multiple points in SFML from an equation. For example, I want to plot 200 points (x,y) such that y = 2x, in the range (-10 < x < 10).
I couldn't seem to find the right functions to plot points in SFML, because most other functions are just drawing circle and other geometric shapes. If anyone know any functions for graphing in SFML, please tell me (Something like this: https://www.youtube.com/watch?v=jMrnSa6CHfE&t=42s, not the animation, just the plotting part).
Thanks a lot!
As Galik suggested, drawing pixels onto an image is a good solution.
You could try something along the lines of this:
sf::Vector2u size;
sf::Image graph;
graph.create(size.x, size.y, sf::Color(255, 255, 255));
// y = 2x
for (unsigned int x = 0; x < size.x; x++)
{
unsigned int y = 2u * x;
if (y < size.y)
{
graph.setPixel(x, y, sf::Color(0, 0, 0));
}
}

Find rectangular object quality with perspective

I get image from a camera (calibrated and without lens distortions) and I need to detect a rectangular object. Markers are a good example. For markers I check corner count, min size, board contrast and convexity. I had an idea on how to improve this in cases where there is large amount of false rectangles.
Here is an example image:
Normally all of these are valid, because without knowing anything about camera we cannot determine if perspective allows these kinds of shapes. I know the size (or at least the ratio) of the rectangle in real-life. So I had an idea that I should be able to disregard many of these shapes just by reprojecting them and checking for error.
Like if I use solvePnPRansac it would not be able to converge if the shape is not possible. If it doesn't converge I just disregard it. Sadly, none of the OpenCV solve functions allow checking me for error or convergence. I actually need some ratio or quality, because it is possible that some of the rectangles overlap. For example my object finder identifies these rectangles:
One of the three is actually correct, or at least "the best". But I need some way to know which one it is. I cannot use things like line lengths because of the camera perspective. So I just thought I could solve and see which has the smallest error.
There are no lens distortions in the image, but even if there were solvePnP usually allows passing D to it as well.
Is this even possible or am I missing something?
I guess I could try hacking around solvePnPRansac just to return convergence, but maybe there is a simpler way?
I figured I can do something like what is done during calibration with a grid. I can calculate the reprojection error. So first I solve to get the transformation matrix. Then I transform the points in 3D using the transformation matrix and afterwards use projectPoints to project them back in 2D. Then I check distance between original 2D points and the projected 2D points. This can then be used for quality. Objects that are not possible often have 100 pixels or more reprojection error in my images, but possible objects have less than 20px. So I just did a 25 pixel cutoff and it seems to work fine.
Note that more transformations are possible than I though. In my original image maybe two are not possible with my current camera, but it still did reject a lot of fakes.
If nobody else has some ideas I will accept this as answer.
Here is some code for the method I use:
//This is the object in 3D
double width = 50.0; //Object is 50mm wide
double height = 30.0; //Object is 30mm tall
cv::Mat object_points(4,3,CV_64FC1);
object_points.at<double>(0,0)=0;
object_points.at<double>(0,1)=0;
object_points.at<double>(0,2)=0;
object_points.at<double>(1,0)=width;
object_points.at<double>(1,1)=0;
object_points.at<double>(1,2)=0;
object_points.at<double>(2,0)=width;
object_points.at<double>(2,1)=height;
object_points.at<double>(2,2)=0;
object_points.at<double>(3,0)=0;
object_points.at<double>(3,1)=height;
object_points.at<double>(3,2)=0;
//Check all rectangles for error
cv::Mat image_points(4,2,CV_64FC1);
for (size_t i = 0; i < rectangles_to_test.size(); i++) {
// Get rectangle points
for (size_t c = 0; c < 4; ++c) {
image_points.at<double>(c,0) = (rectangles_to_test[i].points[c].x);
image_points.at<double>(c,1) = (rectangles_to_test[i].points[c].y);
}
// Calculate transformation matrix
cv::Mat rvec, tvec;
cv::solvePnP(object_points, image_points, M1, D1, rvec, tvec);
cv::Mat rotation;
Matrix4<double> transform;
transform.init_identity();
cv::Rodrigues(rvec, rotation);
for(size_t row = 0; row < 3; ++row) {
for(size_t col = 0; col < 3; ++col) {
transform.set(row, col, rotation.at<double>(row, col));
}
transform.set(row, 3, tvec.at<double>(row, 0));
}
// Calculate projection
std::vector<cv::Point3f> p3(4);
std::vector<cv::Point2f> p2;
Vector4<double> p = transform * Vector4<double>(0, 0, 0, 1);
p3[0] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
p = transform * Vector4<double>(width, 0, 0, 1);
p3[1] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
p = transform * Vector4<double>(width, height, 0, 1);
p3[2] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
p = transform * Vector4<double>(0, height, 0, 1);
p3[3] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
cv::projectPoints(p3, cv::Mat::zeros(1, 3, CV_64FC1), cv::Mat::zeros(1, 3, CV_64FC1), M1, D1, p2);
// Calculate reprojection error
rectangles_to_test[i].reprojection_error = 0.0;
for (size_t c = 0; c < 4; ++c) {
double dx = p2[c].x - rectangles_to_test[i].points[c].x;
double dy = p2[c].y - rectangles_to_test[i].points[c].y;
rectangles_to_test[i].reprojection_error += std::sqrt(dx*dx + dy*dy);
}
if (rectangles_to_test[i].reprojection_error > reprojection_error_threshold) {
//rectangle is no good
}
}

Rendering Tilemap on the screen correctly

I'm having a strange problem rendering my level based on tilemap correctly.
On the y axis all the tiles are normal and aligned, instead on the x axis they seem to be divided by a space i can't figure out why...
I created a matrix with enum values(from 0 to 2) and i cycled my matrix in a for
loop to render the tile with the current number:
ex. GROUND = 0; etc...
Here is a photo of what it looks like
http://it.tinypic.com/r/ali261/8
Here is the sprite for the tile
http://it.tinypic.com/r/21kggw5/8
i will add the code down here.
for(int y = 0; y < 15; y++)
{
for(int x = 0; x < 20; x++)
{
if(map[y][x] == GROUND)
render(tileTex,x*64 - camera.x,y*64 - camera.y,&gTileSprite[0],0,NULL,SDL_FLIP_NONE);
else if(map[y][x] == UGROUND)
render(tileTex,x*64 - camera.x,y*64 - camera.y,&gTileSprite[1],0,NULL,SDL_FLIP_NONE);
else if(map[y][x] == SKY)
render(tileTex,x*64 - camera.x,y*64 - camera.y,&gTileSprite[2],0,NULL,SDL_FLIP_NONE);
tBox[y][x].x = x*64;
tBox[y][x].y = y*64;
tBox[y][x].w = TILE_WIDTH;
tBox[y][x].h = TILE_HEIGHT;
}
}
Further to the comments above, one must be careful to avoid any blurring along the edges of tiles, since their repetition will make any defects more obvious than if they were viewed in isolation.
Blurring may be introduced in the process of drawing portions of the tilemap to the final/intermediate target, or as seems (and has been confirmed) in this case, the source material may have blurred edges.
Particularly when working with images of such 'low` pixel dimensions, one must be vigilant and ensure that any/all resizing operations are performed in an image-editor without re-sampling.
While bilinear/cubic re-sampling may be desired when blitting the assembled image to the screen, it is never desirable for such re-sampling to happen to the source material.

Kinect for Windows v2 depth to color image misalignment

currently I am developing a tool for the Kinect for Windows v2 (similar to the one in XBOX ONE). I tried to follow some examples, and have a working example that shows the camera image, the depth image, and an image that maps the depth to the rgb using opencv. But I see that it duplicates my hand when doing the mapping, and I think it is due to something wrong in the coordinate mapper part.
here is an example of it:
And here is the code snippet that creates the image (rgbd image in the example)
void KinectViewer::create_rgbd(cv::Mat& depth_im, cv::Mat& rgb_im, cv::Mat& rgbd_im){
HRESULT hr = m_pCoordinateMapper->MapDepthFrameToColorSpace(cDepthWidth * cDepthHeight, (UINT16*)depth_im.data, cDepthWidth * cDepthHeight, m_pColorCoordinates);
rgbd_im = cv::Mat::zeros(depth_im.rows, depth_im.cols, CV_8UC3);
double minVal, maxVal;
cv::minMaxLoc(depth_im, &minVal, &maxVal);
for (int i=0; i < cDepthHeight; i++){
for (int j=0; j < cDepthWidth; j++){
if (depth_im.at<UINT16>(i, j) > 0 && depth_im.at<UINT16>(i, j) < maxVal * (max_z / 100) && depth_im.at<UINT16>(i, j) > maxVal * min_z /100){
double a = i * cDepthWidth + j;
ColorSpacePoint colorPoint = m_pColorCoordinates[i*cDepthWidth+j];
int colorX = (int)(floor(colorPoint.X + 0.5));
int colorY = (int)(floor(colorPoint.Y + 0.5));
if ((colorX >= 0) && (colorX < cColorWidth) && (colorY >= 0) && (colorY < cColorHeight))
{
rgbd_im.at<cv::Vec3b>(i, j) = rgb_im.at<cv::Vec3b>(colorY, colorX);
}
}
}
}
}
Does anyone have a clue of how to solve this? How to prevent this duplication?
Thanks in advance
UPDATE:
If I do a simple depth image thresholding I obtain the following image:
This is what more or less I expected to happen, and not having a duplicate hand in the background. Is there a way to prevent this duplicate hand in the background?
I suggest you use the BodyIndexFrame to identify whether a specific value belongs to a player or not. This way, you can reject any RGB pixel that does not belong to a player and keep the rest of them. I do not think that CoordinateMapper is lying.
A few notes:
Include the BodyIndexFrame source to your frame reader
Use MapColorFrameToDepthSpace instead of MapDepthFrameToColorSpace; this way, you'll get the HD image for the foreground
Find the corresponding DepthSpacePoint and depthX, depthY, instead of ColorSpacePoint and colorX, colorY
Here is my approach when a frame arrives (it's in C#):
depthFrame.CopyFrameDataToArray(_depthData);
colorFrame.CopyConvertedFrameDataToArray(_colorData, ColorImageFormat.Bgra);
bodyIndexFrame.CopyFrameDataToArray(_bodyData);
_coordinateMapper.MapColorFrameToDepthSpace(_depthData, _depthPoints);
Array.Clear(_displayPixels, 0, _displayPixels.Length);
for (int colorIndex = 0; colorIndex < _depthPoints.Length; ++colorIndex)
{
DepthSpacePoint depthPoint = _depthPoints[colorIndex];
if (!float.IsNegativeInfinity(depthPoint.X) && !float.IsNegativeInfinity(depthPoint.Y))
{
int depthX = (int)(depthPoint.X + 0.5f);
int depthY = (int)(depthPoint.Y + 0.5f);
if ((depthX >= 0) && (depthX < _depthWidth) && (depthY >= 0) && (depthY < _depthHeight))
{
int depthIndex = (depthY * _depthWidth) + depthX;
byte player = _bodyData[depthIndex];
// Identify whether the point belongs to a player
if (player != 0xff)
{
int sourceIndex = colorIndex * BYTES_PER_PIXEL;
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // B
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // G
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // R
_displayPixels[sourceIndex] = 0xff; // A
}
}
}
}
Here is the initialization of the arrays:
BYTES_PER_PIXEL = (PixelFormats.Bgr32.BitsPerPixel + 7) / 8;
_colorWidth = colorFrame.FrameDescription.Width;
_colorHeight = colorFrame.FrameDescription.Height;
_depthWidth = depthFrame.FrameDescription.Width;
_depthHeight = depthFrame.FrameDescription.Height;
_bodyIndexWidth = bodyIndexFrame.FrameDescription.Width;
_bodyIndexHeight = bodyIndexFrame.FrameDescription.Height;
_depthData = new ushort[_depthWidth * _depthHeight];
_bodyData = new byte[_depthWidth * _depthHeight];
_colorData = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_displayPixels = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_depthPoints = new DepthSpacePoint[_colorWidth * _colorHeight];
Notice that the _depthPoints array has a 1920x1080 size.
Once again, the most important thing is to use the BodyIndexFrame source.
Finally I get some time to write the long awaited answer.
Lets start with some theory to understand what is really happening and then a possible answer.
We should start by knowing the way to pass from a 3D point cloud which has the depth camera as the coordinate system origin to an image in the image plane of the RGB camera. To do that it is enough to use the camera pinhole model:
In here, u and v are the coordinates in the image plane of the RGB camera. the first matrix in the right side of the equation is the camera matrix, AKA intrinsics of the RGB Camera. The following matrix is the rotation and translation of the extrinsics, or better said, the transformation needed to go from the Depth camera coordinate system to the RGB camera coordinate system. The last part is the 3D point.
Basically, something like this, is what the Kinect SDK does. So, what could go wrong that makes the hand gets duplicated? well, actually more than one point projects to the same pixel....
To put it in other words and in the context of the problem in the question.
The depth image, is a representation of an ordered point cloud, and I am querying the u v values of each of its pixels that in reality can be easily converted to 3D points. The SDK gives you the projection, but it can point to the same pixel (usually, the more distance in the z axis between two neighbor points may give this problem quite easily.
Now, the big question, how can you avoid this.... well, I am not sure using the Kinect SDK, since you do not know the Z value of the points AFTER the extrinsics are applied, so it is not possible to use a technique like the Z buffering.... However, you may assume the Z value will be quite similar and use those from the original pointcloud (at your own risk).
If you were doing it manually, and not with the SDK, you can apply the Extrinsics to the points, and the use the project them into the image plane, marking in another matrix which point is mapped to which pixel and if there is one existing point already mapped, check the z values and compared them and always leave the closest point to the camera. Then, you will have a valid mapping without any problems. This way is kind of a naive way, probably you can get better ones, since the problem is now clear :)
I hope it is clear enough.
P.S.:
I do not have Kinect 2 at the moment so I can'T try to see if there is an update relative to this issue or if it still happening the same thing. I used the first released version (not pre release) of the SDK... So, a lot of changes may had happened... If someone knows if this was solve just leave a comment :)

Erase parts of image to make pixels transparent

I'm working in openFrameworks, a set of C++ libraries.
What I'm trying to do is simply 'erase' certain pixels of an ofImage (a class that loads/displays an image, and has access to it's pixel array) when a shape (an eraser) passes over the appropriate pixels of the image. Pretty simple stuff - I think - but I'm having a mental block!
ofImage has two methods - getPixels() and getPixelsRef() that seem to approach what I am trying to do, but the methodology I am using is not quite giving me the results I want.
Here is an example of an attempt to update the pixels of a foreground image from the pixels of a background image:
ofPixelsRef fore = foreground.getPixelsRef();
ofPixelsRef back = background.getPixelsRef();
for(int x = 0; x < foreground.getWidth()/2; x++) {
for (int y = 0; y < foreground.getHeight(); y++) {
ofColor c = back.getColor(x, y);
fore.setColor(x, y, c);
}
}
foreground.setFromPixels(fore);
and here is an attempt to statically colour the foreground with a predetermined colour (which I think is what I want to do, with a transparent white ?!?):
ofPixelsRef fore = foreground.getPixelsRef();
ofColor c(0, 127);
for(int x = 0; x < foreground.getWidth(); x++) {
for (int y = 0; y < foreground.getHeight(); y++) {
fore.setColor(x, y, c);
}
}
foreground.setFromPixels(fore);
Neither are quite where I want to get to, but I think they're a stab in the right direction.
If anyone has any ideas on where to proceed, I'm all ears.
I'd consider moving to the ofFbo class, or even GLSL if there's a clean lead/example.
Feel free to post vanilla C++ as well, and I'll see what I can do about porting it to oF.
Thanks,
~ Jesse
FYI, I've found a solution detailed at this page: http://forum.openframeworks.cc/index.php/topic,12899.0.html