I am using 3 ArUco Markers stuck on a 3D head phantom model to do pose estimation using OpenCV in C++. My algorithm for pose estimation is giving me the translation with respect to the camera, but I want to now know the coordinates of the marker with respect to the model coordinate system. Therefore I have scanned the head model using a 3D scanner and have an object file and the texture file with me. My question is what is the easiest or best way to get the coordinates of the markers with respect to the head model. Should I use OpenGL, blender or some other software for it? Looking for some pointers or advice.
Sounds like you have the coordinates for the markers with respect to the camera as the coordinate system, so coordinates in "eye space" or camera space. Which is when you have coordinates where the camera is at the origin.
This article has a brilliant diagram which explain the different spaces and how to transform in to different spaces:
http://antongerdelan.net/opengl/raycasting.html
If you want these same coordinates but in model space you need the matrices that will get you in to that space.
In this case you are going from eye/camera space -> model space so you need to multiply those coordinates by the inverse view matrix then by the inverse model matrix. Then your coordinate would be in model space.
But this is a lot more difficult when you are using a physical camera, as opposed to a software camera, in OpenGL for example.
To do that you will need to use OpenCV to obtain your camera's intrinsic and extrinsic parameters.
See this tutorial for more details:
https://docs.opencv.org/3.1.0/dc/dbb/tutorial_py_calibration.html
Related
I am interested in finding the Rotation Matrix of an Aruco Marker from a Stereo Camera.
I know that estimateposesinglemarkers gives a Rotation Vector (which can be converted to matrix via Rodrigues)and Translation Vector but the values are not that stable and is supposedly written for MonoCam.
I can get Stable 3D points of the Marker from a Stereo Camera, however i am struggling in creating a Rotation Matrix. My Main goal is to achieve what Ali has achieved in this following blog Relative Position of Aruco Markers.
I have tried working with Euler Angles from here by creating a plane of the Aruco Marker from the 3D points that i get from the Stereo Camera but in vain.
I know my algorithm is failing because the values of the Relative Co-ordinates keeps on changing on moving the camera which should not happen as the Relative Co-ordinates b/w the Markers Should remain Constant.
I have a properly Calibrated camera with all the required matrices.
I tried using SolvePnP, but i believe it gives Rvecs and Tvecs which when combined together brings points from the model coordinate system to the camera coordinate system.
Any idea on how i can create the Rotation Matrix of the Marker with my fairly Stable 3D points so that on moving the camera, the relative Co-ordinates doesn't Change ?
Thanks in Advance.
I've got a picture of a plane with 4 known points on it. I've got the intrinsic and extrinsic camera parameters and also (using the Rodriguez function) the position of the camera. The plane is defined as my ground level (Z = 0). If I select a point in my image, is there an easy way to calculate the coordinates, where this point would be on my plane?
Not much can be labeled as 'easy' when dealing with 3D rendering.
For your question, I would look into ray tracing. I am not going to try to explain it, as most sites will do a better job of explaining it then I can.
When you look at opencv in calib3d module. you will see this equation:
https://docs.opencv.org/master/d9/d0c/group__calib3d.html
Please scroll down the link and see the perspective transformation equations
From what you say, you declare the plane ground level(Z=0). you also know the internsic (focal point in pixels , image center) camera parameter and you know the exterinsic (rotation and translation) camera parameter. and you want to access some pixels in your image (is it?) and from there, you want to estimate where it is on the plane??
You can use triangulatePoints() function in calib3D module of opencv. you need at least 2 images tough.
But your case seems unlikely to me, if you try to detect 4 known points, you will have to define the world coordinate of those plane first, usually, you define top left corner of the plane as original (0,0,0), then, you will know the position of those 4 known points in world coordinate by manual calculation. when you detect it in opencv program, it gives you the pixel coordinates of those 4 points. then, usually, what people expect to calculate is the pose ( rotation and translation ).
Alternatively, if your case is what you said, you can make a simple matrix operation code based on perspective transformation equation.
I'm building an interactive floor. The main idea is to match the detections made with a Xtion camera with objects I draw in a floor projection and have them following the person.
I also detect the projection area on the floor which translates to a polygon. the camera can detect outside the "screen" area.
The problem is that the algorithm detects the the top most part of the person under it using depth data and because of the angle between that point and the camera that point isn't directly above the person's feet.
I know the distance to the floor and the height of the person detected. And I know that the camera is not perpendicular to the floor but I don't know the camera's tilt angle.
My question is how can I project that 3D point onto the polygon on the floor?
I'm hoping someone can point me in the right direction. I've been reading about camera projections but I'm not seeing how to use it in this particular problem.
Thanks in advance
UPDATE:
With the awnser from Diego O.d.L I was able to get an almost perfect detection. I'll write the steps I used for those who might be looking for the same solution (I won't get into much detail on how detection is made):
Step 1 : Calibration
Here I get some color and depth frames from the camera, using openNI, with the projection area cleared.
The projection area is detected on the color frames.
I then convert the detection points to real world coordinates (using OpenNI's CoordinateConverter). With the new real world detection points I look for the plane that better fits them.
Step 2: Detection
I use the detection algorithm to get new person detections and to track them using the depth frames.
These detection points are converted to real world coordinates and projected to the plane previously computed. This corrects the offset between the person's height and the floor.
The points are mapped to screen coordinates using a perspective transform.
Hope this helps. Thank you again for the awnsers.
Work with the camera coordinate system initially. I'm assuming you don't have problems converting from (row,column,distance) to a real world system aligned with the camera axis (x,y,z):
calculate the plane with three or more points (for robustness) with
the camera projection (x,y,z). (choose your favorite algorithm,
i.e
Then Find the projection of your head point to the floor plane
(example)
Finally, you can convert it to the floor coordinate system or just
keep it in the camera system
From the description of your intended application, it is probably more useful for you to recover the image coordinates, I guess.
This type of problems usually benefits from clearly defining the variables.
In this case, you have a head at physical position {x,y,z} and you want the ground projection {x,y,0}. That's trivial, but your camera gives you {u,v,d} (d being depth) and you need to transform that to {x,y,z}.
The easiest solution to find the transform for a given camera positioning may be to simply put known markers on the floor at {0,0,0}, {1,0,0}, {0,1,0} and see where they pop up in your camera.
I'm attempting to animate a skeleton in C++. I currently have the coordinates of all the joints and which joints connect to which.
Does anyone know how I would go about, say, raising the arm up by 90 degrees and then calculate the new coordinates for all the joints further down the arm?
I'm guessing I'd have to get the vectors for each bone and rotate those and take it from there, but I'm not sure how to go about doing that.
(I'm using openGL to display them)
Calculate the point and axis you wish to rotate around. The axis is perpendicular to the rotation plane. The point is the shoulder in your case. You would rotate all points in the arm hierarchy the same amount. There are many examples on the web for rotation about an arbitrary axis. Here's one: http://paulbourke.net/geometry/rotate/
Another way to go about it is to reinterpret your skeleton using local transformations: ie each new bone has a transformation from its parent bone's world space. This is useful for forward kinematics (FK) where you simply pose a skeleton based on local rotations. Most motion capture data is stored in this way. To calculate the world co-ordinates of each joint, you must multiply all local matrices up the hierarchy.
If you currently only have skeleton joint positions in world space it is a pain to generate the local transformation matrices, because you don't necessarily know the local matrix of each joint. This is a bigger topic. I did it years ago when I worked on the motion capture retargetting module in a 3dsmax plugin called CAT.
I worked with two popular formats: BVH and HTR. From memory, BVH uses global positions (and is a pain), whereas HTR uses local joint matrices and is much easier to import.
I am using opencv with C, and I am trying to get the extrinsic parameters (Rotation and translation) between 2 cameras.
I'm told that a checkerboard pattern can be used to calibrate, but I can't find any good samples on this. How do I go about doing this?
edit
The suggestions given are for calibrating a single camera with a checkerboard. How would you find the rotation and translation between 2 cameras given the checkerboard images from both views?
I was using code from http://www.starlino.com/opencv_qt_stereovision.html. It has some useful information and code of the author is pretty easy to understand and analyze, it covers both - chessboard calibrate and getting depth image from stereo cameras. I think it's based on this OpenCV book
opencv library here and about 3 chapters of the opencv book
A picture from a camera is just a projection of a bunch of color samples onto a plane. Assuming that the camera itself creates pictures with square pixels, the possible position of a given pixel is a vector from the camera's origin through the plane the pixel was projected onto. We'll refer to that plane as the picture plane.
One sample doesn't give you that much information. Two samples tells you a little bit more - the position of the camera relative to the plane created by three points: the two sample points and the camera position. And a third sample tells you the relative position of the camera in the world; this will be a single point in space.
If you take the same three samples and find them in another picture taken from a different camera, you will be able to determine the relative position of the cameras from the three samples (and their orientations based on the right and up vectors of the picture plane). To the correct distance, you need to know the distance between the actual sample points. In the case of a checkerboard, it's the physical dimensions of the checkerboard.