Map real world height to point cloud height - computer-vision

I have a point cloud that consists of a cube on a table. I'm currently trying to filter out everything except for the cube in the point cloud. Given the table height in reality and the camera pose and intrinsics, is there a way to know the coordinate of the table's bounding box so that I could filter out everything else? Thank you in advance!

Related

Axis-Aligned Bounding Box Calculation (AABB) for different orientations of 3D object

I am trying to calculate the Axis-Aligned Bounding Box of a 3d CAD model (.stp file) for different orientations.
More specifically, imagine a 3d object lying on a virtual workbench and we have a top view of it in a CAD program.
We only care about the top view (representing the projection of the object on the XY plane).
The final goal is to create a table containing the ratio of the bounding box X and Y sides for every degree of rotation.
Τhe following sketches clarify what I mean.
Any ideas/ suggestions for any part of the task?
I've got two ideas for solution approaches. (depending of the capability of your cad software)
Using kind of "extreme point" function. Get the coordinates of these extreme points by varying the direction the point is generated
Create a straight line (or plane in 3d) which does not intersect your geometry. Measure the minimal distance between your body an the line/plane. Rotate the line/plane around your body (around the cog) stepwise to get multiple measurements.

How do I return absolute coordinates for bounding boxes with CustomVision API?

I'm wondering whether it is possible for the Azure Cognitive Custom Vision Prediction API to return absolute coordinates instead of percentile based ones:
In the above screenshot, you can see the top, left, width and height properties of the prediction results.
Is there any way to let the API return absolute coordinates instead of - what I assume - percentage wise coordinates?
Extra: does anyone have an idea why it returns this type of coordinates?
There is no way of getting absolute values in the current API.
You just have to multiply those relative values by your image width / height. I made an answer about that a moment ago, you can have a look at it here: How to use Azure custom vision service response boundingBox to plot shape
For your extra question: I guess the result is in relative because the processing is scaling/resizing the image to specific ratio. As you can see in the sample of the consumption of an exported Custom Vision model here, they are rescaling the image to a 256x256 square

How to segment whiteboard region?

This is a case of whiteboard e-learning. The video shows the instructor teaching using the whiteboard.
The student is asked to select four corners of the whiteboard. two of the corners may not be in the visible region. Can anyone suggest an algorithm which finds out the whiteboard area based of the four corner points selected?
I want to do something like what we see in camscanner app.
If you can mark 4 points in image coordinates, then you can map these points to another quadrangle, or more specifically, rectangle using a homography. You will also need the whiteboard aspect ratio so that your result will have the correct X and Y equal scales. With this homography you can warp the video to straighten the board.
Note that the warping will cause some blurring of more distant areas and anything outside the whiteboard plane, e.g. the instructor, will be unrealistically warped.
If some of these points are outside the image, you will have to approximate their position for a proper view.

Grouping different scale bounding boxes

I've created an openCV application for human detection on images.
I run my algorithm on the same image over different scales, and when detections are made, at the end I have information about the bounding box position and at which scale it was taken from. Then I want to transform that rectangle to the original scale, given that position and size will vary.
I've wrapped my head around this and I've gotten nowhere. This should be rather simple, but at the moment I am clueless.
Help anyone?
Ok, got the answer elsewhere
"What you should do is store the scale where you are at for each detection. Then transforming should be rather easy right. Imagine you have the following.
X and Y coordinates (center of bounding box) at scale 1/2 of the original. This means that you should multiply with the inverse of the scale to get the location in the original, which would be 2X, 2Y (again for the bounxing box center).
So first transform the center of the bounding box, than calculate the width and height of your bounding box in the original, again by multiplying with the inverse. Then from the center, your box will be +-width_double/2 and +-height_double/2."

Coordinate Transformation C++

I have a webcam pointed at a table at a slant and with it I track markers.
I have a transformationMatrix in OpenSceneGraph and its translation part contains the relative coordinates from the tracked Object to the Camera.
Because the Camera is pointed at a slant, when I move the marker across the table the Y and Z axis is updated, although all I want to be updated is the Z axis, because the height of the marker doesnt change only its distance to the camera.
This has the effect when when project a model on the marker in OpenSceneGraph, the model is slightly off and when I move the marker arround the Y and Z values are updated incorrectly.
So my guess is I need a Transformation Matrix with which I multiply each point so that I have a new coordinate System which lies orthogonal on the table surface.
Something like this: A * v1 = v2 v1 being the camera Coordinates and v2 being my "table Coordinates"
So what I did now was chose 4 points to "calibrate" my system. So I placed the marker at the top left corner of the Screen and defined v1 as the current camera coordinates and v2 as (0,0,0) and I did that for 4 different points.
And then taking the linear equations I get from having an unknown Matrix and two known vectors I solved the matrix.
I thought the values I would get for the matrix would be the values I needed to multiply the camera Coordinates with so the model would updated correctly on the marker.
But when I multiply the known Camera Coordinates I gathered before with the matrix I didnt get anything close to what my "table coordinates" were suposed to be.
Is my aproach completely wrong, did I just mess something up in the equations? (solved with the help of wolframalpha.com) Is there an easier or better way of doing this?
Any help would be greatly appreciated, as I am kind of lost and under some time pressure :-/
Thanks,
David
when I move the marker across the table the Y and Z axis is updated, although all I want to be updated is the Z axis, because the height of the marker doesnt change only its distance to the camera.
Only true when your camera's view direction is aligned with your Y axis (or Z axis). If the camera is not aligned with Y, it means the transform will apply a rotation around the X axis, hence modifying both the Y and Z coordinates of the marker.
So my guess is I need a Transformation Matrix with which I multiply each point so that I have a new coordinate System which lies orthogonal on the table surface.
Yes it is. After that, you will have 2 transforms:
T_table to express marker's coordinates in the table referential,
T_camera to express table coordinates in the camera referential.
Finding T_camera from a single 2d image is hard because there's no depth information.
This is known as the Pose problem -- it has been studied by -among others- Daniel DeMenthon. He developed a fast and robust algorithm to find the pose of an object:
articles available on its research homepage, section 4 "Model Based Object Pose" (and particularly "Model-Based Object Pose in 25 Lines of Code", 1995);
code at the same place, section "POSIT (C and Matlab)".
Note that the OpenCv library offers an implementation of the DeMenthon's algorithm. This library also offers a convenient and easy-to-use interface to grab images from a webcam. It's worth a try: OpenCv homepage
If you know the location in the physical world of your four markers and you've recorded the positions as they appear on the camera, you ought to be able to derive some sort of transform.
When you do the calibration, surely you'd want to put the marker at the four corners of the table not the screen? If you're just doing the corners of the screen, I imagine you're probably not taking into acconut the slant of the table.
Is the table literally just slanted relative to the camera or is it also rotated at all?