The default settings for clustering seems to work fine- specifically the EuclideanDistanceFunction. However, I want to run the clustering with spatial data in the form of lng/lat and when I change the distance function elki crashes on me:
Running: -dbc.in /tmp/test_data_lnglat-test.dat -db.index tree.spatial.rstarvariants.deliclu.DeLiCluTreeFactory -algorithm clustering.DeLiClu -algorithm.distancefunction geo.LngLatDistanceFunction -deliclu.minpts 4
Task failed
java.lang.UnsupportedOperationException: MBR to MBR mindist is not yet implemented.
at de.lmu.ifi.dbs.elki.distance.distancefunction.geo.LngLatDistanceFunction.doubleMinDist(Unknown Source)
at de.lmu.ifi.dbs.elki.algorithm.KNNJoin.processDataPagesDouble(Unknown Source)
at de.lmu.ifi.dbs.elki.algorithm.KNNJoin.processDataPagesOptimize(Unknown Source)
at de.lmu.ifi.dbs.elki.algorithm.KNNJoin.initHeaps(Unknown Source)
at de.lmu.ifi.dbs.elki.algorithm.KNNJoin.run(Unknown Source)
at de.lmu.ifi.dbs.elki.algorithm.clustering.DeLiClu.run(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at de.lmu.ifi.dbs.elki.algorithm.AbstractAlgorithm.run(Unknown Source)
at de.lmu.ifi.dbs.elki.workflow.AlgorithmStep.runAlgorithms(Unknown Source)
at […]
Its not clear (to me) what this error means. Is it possible the clustering functions do not work with geo spatial data?
Is there a simple workaround for this? Would it be difficult to implement the required function (mindset)?
As the error pretty clearly states:
MBR to MBR mindist is not yet implemented.
However, the algorithm you tried to use - DeLiClu - needs to compute the minimum distance between two rectangles. In geodetic coordinates, not in the 2d plane.
You are welcome to contribute the adequate formulas. Spherical geometry is not trivial, so be aware that computing the minimal rectangle-to-rectangle distance is non trivial. It is not sufficient to look at the four corners. So far, we have only solved this for the point-to-rectangle case. It's doable - as the rectangles are axis aligned - but nobody so far bothered to sit down and do the math, and then sit down some more and optimize the formulas to require as little trigonometric functions as possible.
The simplest workaround probably is to use OPTICS with a regular R-tree (use bulk loads with STR!) instead of DeLiClu, because this algorithm will yield an almost identical result, but does not need the rectangle-to-rectangle minimum distance. In theory, DeLiClu is faster; in practise this does not necessarily hold, because of the much more complex (and thus harder to optimize) code of KNN joins on R-Trees.
Related
I'm writing to ask about homography and perspective projection.
I'm trying to write a piece of code, that will "warp" my image so that its corners align with 4 reference points that are in the 3D space - however, the game engine that I'm running it in, already allows me to get the screen position of them, so I already have their screen-space coordinates of both xi,yi and ui,vi, normalized to values between 0 and 1.
I have to mention that I don't have a degree in mathematics, which seems to be a requirement in the posts I've seen on this topic so far, but I'm hoping there is actually a solution to this problem that one can comprehend. I never had a chance to take classes in Computer Vision.
The reason I came here is that in all the posts I've seen online, the simple explanation that I came across is that each point must be put into a 1x3 matrix and multiplied by a 3x3 homography, which consists of 9 components h1,h2,h3...h9, and this transformation matrix will transform each point to the correct perspective. And that's where I'm hitting a brick wall - how do I calculate the transformation matrix? It feels like it should be a relatively simple algebraic task, but apparently it's not.
At this point I spent days reading on the topic, and the solutions I've come across are either based on matlab (which have a ton of mathematical functions built into them), or include elaborations and discussions that don't really explain much; sometimes they suggest tons of different parameters and simplifications, but rarely explain why and what's their purpose, or they are referencing books and studies that have been since removed from the web, and I found myself more confused than I was in the beginning. Most of the resources I managed to find online are also made in a different context - image stitching and 3d engine development.
I also want to mention that I need to run this code each frame on the CPU, and I'm fairly concerned about the effect of having to run too many matrix transformations and solving a ton of linear algebra equations.
I apologize for not asking about any specific code, but my general question is - can anyone point me in the right direction with this issue?
Limit the problem you deal with.
For example, if you always warp the entire rectangular image, you can treat that the coordinates of the image corners are {(0,0), (1,0), (0,1), (1,1)}.
This can simplify the equation, and you'll be able to solve the equation by yourself.
So you'll be able to implement the answer.
Note : Homograpy is scale invariant. So you can decrease the freedom to 8. (e.g. you can solve the equation under h9=1).
Best advice I can give: read a good book on the subject. For example, "Multiple View Geometry" by Hartley and Zisserman
I have created a point cloud of an irregular (non-planar) complex object using SfM. Each one of those 3D points was viewed in more than one image, so it has multiple (SIFT) features associated with it.
Now, I want to solve for the pose of this object in a new, different set of images using a PnP algorithm matching the features detected in the new images with the features associated with the 3D points in the point cloud.
So my question is: which descriptor do I associate with the 3D point to get the best results?
So far I've come up with a number of possible solutions...
Average all of the descriptors associated with the 3D point (taken from the SfM pipeline) and use that "mean descriptor" to do the matching in PnP. This approach seems a bit far-fetched to me - I don't know enough about feature descriptors (specifically SIFT) to comment on the merits and downfalls of this approach.
"Pin" all of the descriptors calculated during the SfM pipeline to their associated 3D point. During PnP, you would essentially have duplicate points to match with (one duplicate for each descriptor). This is obviously intensive.
Find the "central" viewpoint that the feature appears in (from the SfM pipeline) and use the descriptor from this view for PnP matching. So if the feature appears in images taken at -30, 10, and 40 degrees ( from surface normal), use the descriptor from the 10 degree image. This, to me, seems like the most promising solution.
Is there a standard way of doing this? I haven't been able to find any research or advice online regarding this question, so I'm really just curious if there is a best solution, or if it is dependent on the object/situation.
The descriptors that are used for matching in most SLAM or SFM systems are rotation and scale invariant (and to some extent, robust to intensity changes). That is why we are able to match them from different view points in the first place. So, in general it doesn't make much sense to try to use them all, average them, or use the ones from a particular image. If the matching in your SFM was done correctly, the descriptors of the reprojection of a 3d point from your point cloud in any of its observations should be very close, so you can use any of them 1.
Also, it seems to me that you are trying to directly match the 2d points to the 3d points. From a computational point of view, I think this is not a very good idea, because by matching 2d points with 3d ones, you lose the spatial information of the images and have to search for matches in a brute force manner. This in turn can introduce noise. But, if you do your matching from image to image and then propagate the results to the 3d points, you will be able to enforce priors (if you roughly know where you are, i.e. from an IMU, or if you know that your images are close), you can determine the neighborhood where you look for matches in your images, etc. Additionally, once you have computed your pose and refined it, you will need to add more points, no? How will you do it if you haven't done any 2d/2d matching, but just 2d/3d matching?
Now, the way to implement that usually depends on your application (how much covisibility or baseline you have between the poses from you SFM, etc). As an example, let's note your candidate image I_0, and let's note the images from your SFM I_1, ..., I_n. First, match between I_0 and I_1. Now, assume q_0 is a 2d point from I_0 that has successfully been matched to q_1 from I_1, which corresponds to some 3d point Q. Now, to ensure consistency, consider the reprojection of Q in I_2, and call it q_2. Match I_0 and I_2. Does the point to which q_0 is match in I_2 fall close to q_2? If yes, keep the 2d/3d match between q_0 and Q, and so on.
I don't have enough information about your data and your application, but I think that depending on your constraints (real-time or not, etc), you could come up with some variation of the above. The key idea anyway is, as I said previously, to try to match from frame to frame and then propagate to the 3d case.
Edit: Thank you for your clarifications in the comments. Here are a few thoughts (feel free to correct me):
Let's consider a SIFT descriptor s_0 from I_0, and let's note F(s_1,...,s_n) your aggregated descriptor (that can be an average or a concatenation of the SIFT descriptors s_i in their corresponding I_i, etc). Then when matching s_0 with F, you will only want to use a subset of the s_i that belong to images that have close viewpoints to I_0 (because of the 30deg problem that you mention, although I think it should be 50deg). That means that you have to attribute a weight to each s_i that depends on the pose of your query I_0. You obviously can't do that when constructing F, so you have to do it when matching. However, you don't have a strong prior on the pose (otherwise, I assume you wouldn't be needing PnP). As a result, you can't really determine this weight. Therefore I think there are two conclusions/options here:
SIFT descriptors are not adapted to the task. You can try coming up with a perspective-invariant descriptor. There is some literature on the subject.
Try to keep some visual information in the form of "Key-frames", as in many SLAM systems. It wouldn't make sense to keep all of your images anyway, just keep a few that are well distributed (pose-wise) in each area, and use those to propagate 2d matches to the 3d case.
If you only match between the 2d point of your query and 3d descriptors without any form of consistency check (as the one I proposed earlier), you will introduce a lot of noise...
tl;dr I would keep some images.
1 Since you say that you obtain your 3d reconstruction from an SFM pipline, some of them are probably considered inliers and some are outliers (indicated by a boolean flag). If they are outliers, just ignore them, if they are inliers, then they are the result of matching and triangulation, and their position has been refined multiple times, so you can trust any of their descriptors.
I am using opencv c++ and am a new user. I am interested in object detection problems . So far I have studies and implemented the use of sparse optical flow( Lucas Kanade method) in a video from a stationary camera.After trying k means and Background substraction , I have decided to move to a more difficult problem , that is the moving camera.
I have so far studied some documentation and found out that I could use cv::findHomography in order to find the inliers or outliers during the sequence of frames in my video and then understand from the returned values what movement is caused due to camera motion and what due to object motion. In addition , I could use SURF features to track some objects and then decide which of them are good points .
However , I was wondering how I could implement this theory. For example, should I use the first frame as ground truth and detect some features using SURF and then for the rest of the video use findHomography for each frame ? Any ideas/help is welcome !
Detecting moving objects from moving camera is a quite challenging task, and requires solid understanding of multiple view geometry, besides there is less info on this topic available (than, for example, about structure from motion), so be warned!
Anyway, homography matrix will not be a good choice for detection of moving objects (unless you are 100% sure that your background can be represented by a flat surface accurately enough). You should probably use a fundamental matrix or trifocal tensor.
Fundamental matrix is computed from point correspondences between 2 frames. It associates points on one image with lines on other image (so called epipolar lines), and this way it is independent from scene structure. After you have obtained F matrix using some robust estimation method, like RANSAC or LMEDS (RANSAC seems to be better for this kind of task), you can calculate the reprojection error for each point. Objects that move independently from scene would not be accurately described by F matrix and will have a bigger error. So, outliers of F matrix calculated from image matches over two frames can be considered moving objects. One note though - objects that move along epipolar lines would not be detected by this approach, since their parallax can be also described by some depth level.
Trifocal tensor does not have the depth/motion ambiguity with objects that move along epipolar lines, but it is harder to estimate and it is not included into OpenCV. It can be calculated from correspondences over 3 frames, and its usage can be conceptually described as triangulating a point from 2 views and then calculating reprojection error on a third view.
As for the matching - I still think that LK tracking will be better than SURF matching if you work with video sequences, since in that case you don't need to consider very distant points as matches, and tracking usually is faster then detection+matching.
I have a matrix (vector of vectors) with several points (measurements from sensors) that are supposed to represent walls. All the walls are parallel/perpendicular.
I want to fit these points to the respective walls. I thought of using ransac but I can't find a easy way to implement this on the matrix in cpp, without having to do visualization code, like point cloud library.
Do I have to write my own RANSAC or does this exist?
You may try RANSAC in OpenCV library. If it is not enough, take it's code (it is open source) and modify it according to your problem details.
Or you may add some pictures here for better understanding of your issue details.
In PointCloudLibrary there's a Ransac implementation for 3D. You can use it for your own application. It can identify planes too.
I am wondering if PCA widget was performing centred and/or normalized PCA. I did'nt find any corresponding option in the widget.
Does anyone know the answer and if there is plan to add these options?
Thanks alot. Best regards,
Yes, the 'PCA' widget both centers and standardizes the data.
Usually, PCA is defined to be on standardized data.
The use of the term PCA for performing SVD on other matrixes (such as non-central correlation matrixes) is much less common - because there is a proper term for that, too: SVD.
Furthermore, using it on non-centered data means that the transformation is sensitive to data translation; an aspect one usually does not want.
So unless explicitly specified, you should assume that PCA is implemented by
Centering the data
Computing the covariance matrix
Applying SVD on the covariance matrix to decompose it as desired