Comparison of Two 3D Models to Determine Orientation Difference - computer-vision

I am working on a project where I am trying to compare a 3D reconstructed model with a predefined 3D model of the same object to find the orientation shift between them. An example of these types of models can be seen here: example models.
I was thinking about maybe trying to use Kabsch's algorithm to compare them but I'm not completely sure that will work because they don't have the same number of vertices and I'm not sure if there's a good way to make that happen. Also, I don't know the correspondence information - which point in set 1 represents which point in set 2.
Regardless, I have the model's PLY files which contains the coordinates of each vertex so I'm looking for some way to compare them that will match up the different features in each object together. Here is a GitHub repo with both PLY files just in case that would be useful.
Thanks in advance for any help, I've been struck trying to figure out this problem for a while!

Related

Orient two objects to face the same direction (Computer Visualization)

I have two STL models of a scanned skull that are similar but not the same. When they are rendered side by side as actors in a vtkRenderer, they are facing different directions and one has been rotated 180 degrees.
Normally, I would just hard-code in the transformation so that they are both oriented facing the screen, but in this case, there will be lots of similar but different skulls uploaded, all of which might face different directions.
So, can anyone suggest a VTK specific way to programmatically orient the skulls so they both face they same direction? If not in a VTK specific way, does there exist a generally accepted method to do this else where in computer visualization software?
In case you know rotation angles for each skull I would suggest to use that knowledge (eg.: prepare file with rotation angles for each model) and rotate them on load.
If not, then you have a real problem. If assumed that these skulls are pretty similar then I could suggest to try to align these skulls to each other, so in result they will be facing same direction.
You can achieve that through dedicated software like Geomagic, CloudCompare, or MeshLab , you can also write your own algorithm (Eg.: Least Squares Matching). You can also try to use library with already implemented alignment algorithms like PCL
Manual approach: You can use 3 points alignment method to achieve that. It will be way faster than trying doing that through rotations and translations. (How it works)

VTK: Aligning two actors

I am experimenting and currently have two objects in an iOS app using the VES/VTK framework and can move the vesActors in the scene. What I don't understand is how I can take the position of one object and apply it to a second object. In other words make two planes parallel basically planar homography within the VTK framework using actors, mappers, and/or transforms. Are there any examples of this?
If you can pick 3 points on/relative to your planes, you can use vtkLandmarkTransform (http://www.vtk.org/doc/nightly/html/classvtkLandmarkTransform.html), as demonstrated here: http://www.vtk.org/Wiki/VTK/Examples/Cxx/PolyData/AlignFrames

Object Detection: Training Requried or No Training Required?

This question is related to Object detection, and basically, detecting any "known" object. For an example, imagine I have the below objects.
Table
Bottle.
Camera
Car
I will take 4 photos from all of these individual object. One from left, another from right, and other 2 from up and down. I originally thought it is possible to recognize these objects with these 4 photos per each, because you have the photos in all 4 angles, no matter how you see the object you can detect it.
But I got confused with someones idea about training the engine with thousands of positive and negative images from each object. I really don't think this is required.
So simply speaking, my question is, in order to identify an object, do I need these thousands of positive and negative objects? Or else simply 4 photos from 4 angles is enough?
I am expecting to use OpenCV for this.
Update
Actually the main thing is something like this.. Imagine that I have 2 laptops. One is Dell and the other one is HP. Both are laptops but you know, they have clearly visible differences including the Logo. Can we do this using Feature Description? If not, how "hard" the "training" process? How many pics needed?
Update 2
I need to detect "specific" objects. Not all the cars, all the bottles etc. For an example, the "Maruti Car Model 123" and "Ferrari Car Model 234" are both cars but different. Imagine I have the pictures of Maruti and Ferrari of above mentioned models, then I need to detect them. I don't have to worry about other cars or vehicles, or even other models of Maruti and Ferrari. But the above mentioned "Maruti Car Model 123" should be identified as "Maruti Car Model 123" and above mentioned "Ferrari Car Model 234"should be identified as "Ferrari Car Model 234". How many pictures do I need for this?
Answers:
If you want to detect a specific object and you don't need to account for view point changes, you can use 2D features:
http://docs.opencv.org/doc/tutorials/features2d/feature_homography/feature_homography.html
To distinguish between 2 logos, you'll probably need to build a detector for each logo which will be trained on a set of images. For example, you can train a Haar cascade classifier.
To distinguish between different models of cars, you'll probably need to train a classifier using training images of each car. However, I encountered an application which does that using a nearest neighbor approach - it just extracts features from the given test image and compares it to a know set of images of difference car models.
Also, I can recommend some approaches and packages if you'll explain more on the application.
To answer the question you asked in the title, if you want to be able to determine what the object in the picture is you need a supervised algorithm (a.k.a. trained). Otherwise you would be able to determine, in some cases, the edges or the presence of an object, but not what kind of an object it is. In order to tell what the object is you need a labelled training set.
Regarding the contents of the question, the number of possible angles in a picture of an object is infinite. If you just have four pictures in your training set, the test example could be taken in an angle that falls halfway between training example A and training example B, making it hard to recognize for your algorithm. The larger the training set the higher the probability of recognizing the object. Be careful: you never reach the absolute certainty that your algorithm will recognize the object. It just becomes more likely.

What is `query` and `train` in openCV features2D

Everywhere in features2D classes I see terms query and train. For example matches have trainIdx and queryIdx, and Matchers have train() method.
I know the definition of words train and query in English, but I can't understand the meaning of this properties or methods.
P.S. I understand, that it's very silly question, but maybe it's because English is not my native language.
To complete sansuiso's answer, I suppose the reason for choosing these names should be that in some application we have got a set of images (training images) beforehand, for example 10 images taken inside your office. The features can be extracted and the feature descriptors can be computed for these images. And at run-time an image is given to the system to query the trained database. Hence the query image refers to this image. I really don't like the way they have named these parameters. Where you have a pair of stereo images and you want to match the features, these names don't make sense but you have to chose a convention say always call the left image the query image and the right image as the training image. I did my PhD in computer vision and some naming conventions in OpenCV seem really confusing/silly to me. So if you find these confusing or silly you're not alone.
train: this function builds the classifier inner state in order to make it operational. For example, think of training an SVM, or building a kd-tree from the reference data. Maybe you are confused because this step is often referred to as learning in the literature.
query is the action of finding the nearest neighbors to a set of points, and by extension it also refers to the whole set of points for which yo want a nearest neighbor. Recall that you can ask for the neighbors of 1 point, or a whole lot in the same function call (by stacking the feature points in a matrix).
trainIdxand queryIdx refer to the index of a pint in the reference / query set respectively, i.e. you ask the matcher for the nearest point (stored at the trainIdx position) to some other point (stored at the queryIdxposition). Of course, trainIdxis known after the function call. If your points are stored in a matrix, the index will be the line of the considered feature.
I understand "query" and "train" in a very naive but useful way:
"train": a data or image is preprocessed to get a database
"query": an input data or image that will be queried in the database which we trained before.
Hope it helps u as well.

GJK collision detection implementation from 2D to 3D

I apologize for the length of this question and give a pre-emptive thanks for anyone who reads through this!
So i've spent the last few days going over the GJK algorithm. I understand the general concepts behind it, and understand the most of the nitty gritties of its implementation in 2D thanks to the wonderful article by William Bittle at http://www.codezealot.org/archives/88 .
I've implemented his pseudo code (found at the end of the article) into my own c++ project, however i want to make a 3D implementation. My weakness comes into using the dot products to test the voronoi regions and the tripleProducts to get perpandicular lines. But im trying to read up more on that.
My problem comes down to the containsOrigin function. Im having trouble visualizing and accounting for the new voronoi regions that the z axis adds. I just can't seem to wrap my head around how to determine which regions contains the origin. I assume there is 4 I have to account for, each extending from the triangular planes that the comprise the 4 faces of the tetrahedron simplex. If the origin is not within any of those regions, then it is contained, and we have a collision.
How do i go about testing if it is contained in a particular voronoi region/ which triangular face is pointing in the direction of the origin?
The current 2D algorithm checks if a triangle is made, if not, then the simplex is a line and it finds the 3rd point. I assume the 3D algorithm with check if a tetrahedron is made, if not, then it will check for a triangle, if true then it will to find a 4th point to make a tetrahedron(how would i get this? using a normal in direction of origin?). If i trangle isnt made, it will find a 3rd point to make a triangle (do i still use triple product for this like in 2D?).
Any suggestions, outlines, resources, code augmentations, comments are much appretiated.
Depending on what result you expect from the GJK algorithm you might want to look at this nice tutorial from Molly Rocket: https://mollyrocket.com/849
Be aware though that his implementation only outputs intersection? yes/no. But it might be a nice start.