Get outliers from PCL sample consensus - c++

I am using the functionality for SampleConsensus provided by PCL as per the example here: http://pointclouds.org/documentation/tutorials/random_sample_consensus.php#random-sample-consensus
The question is, the implementation allows for the retrieval of ransac inliers through the use of getInliers this can then be easily transferred into a point cloud using the common function copyPointCloud(in, inliers, out). I am interested in looking at the outliers. There doesnt seem to be functionality to return a list of outliers. If the inlier list is sorted then I could loop through the point list and check against the current inlier as so:
for i in point cloud
if i == currentInlier
currentInlier++
else
add point cloud (i) to new outlier cloud
But I am not sure that the inlier list is guaranteed to be sorted even though it seems like it would be created that way?
Also surely there is a native way to do this in PCL?

You almost certainly want pcl::ExtractIndices. It is documented here:
http://docs.pointclouds.org/1.7.0/classpcl_1_1_extract_indices.html
You can see how it is used here:
http://pointclouds.org/documentation/tutorials/extract_indices.php#extract-indices
See in particular:
pcl::ExtractIndices<pcl::PointXYZ> extract;
...
extract.setInputCloud (cloud_filtered);
extract.setIndices (inliers);
...
extract.setNegative (true);
extract.filter (*cloud_f);

Related

How to use uncertainties to weight residuals in a Savitzky-Golay filter.

Is there a way to incorporate the uncertainties on my data set into the result of the Savitzky Golay fit? Since I am not passing this information into the function, I asume that it is simply calcuating the 'best fit' via an unweighted least-squares process. I am currently working with data that has non-uniform uncertainty, and so the fit of the data could be improved by including the errors that I have for my main dataset.
The wikipedia page for the Savitzky-Golay filter suggests how I might go about alter the process of calculating the coefficients of the fit, and I am staring at the code for scipy.signal.savgol_filter, but I cannot get my head around what I need to adjust so that this will do what I want it to.
Are there any ready-made weighted SG filters floating about? I find it hard to believe that no-one else has ever needed this tool in Python, but maybe I have missed something.
Check out this Python module: https://github.com/surhudm/savitzky_golay_with_errors
This python script improves upon the traditional Savitzky-Golay filter
by accounting for errors or covariance in the data. The inputs and
arguments are all modelled after scipy.signal.savgol_filter
Matlab function sgolayfilt supports weights. Check the documentation.

Principal component analysis on PCL

I am using PCL with C++ and want to perform PCA on an already clustered pointcloud (that is on every individual cluster). The idea is to eliminate all clusters that are too large/small by measuring their size along the eigenvectors. So intended algorithm is :Get eigenvectors of every cluster, project the cluster points on the corresponding eigenvectors in order to measure maximum distances of points along these dimensions, and from there eliminate all clusters that have "bad" dimensions/ratio of dimensions. I am having difficulties with the implementation. The more help you could spare, the merrier. This is how my data is organized (just to be clear, these are snippets, cluster_indices have already been extracted properly which is checked), and what I started:
std::vector<pcl::PointIndices> cluster_indices;
pcl::PCA<pcl::PointXYZ> cpca = new pcl::PCA<pcl::PointXYZ>;
cpca.setInputCloud(input);
Eigen::Vector3f pca_vector(3,3);
// and now iterating with a for loop over the clusters, but having issues using setIndices already
for (std::vector<pcl::PointIndices>::const_iterator it = cluster_indices.begin (); it != cluster_indices.end (); ++it, n++)
{
cpca.setIndices(it->indices); //not sure what to put in brackets, everything I thought of returns an error!
pca_vector = cpca.getEigenVectors();
}
It seems that there is a general issue with this. So I found this solution:
Instead of iterating with pointers, use a regular iterator, and then use these formulation.
PointIndicesPtr pi_ptr(new PointIndices);
pi_ptr->indices = cluster_indices[i].indices;
//now can use pi_ptr as input

How to detect and delete noise in rapidminer?

I am new in rapid miner 5, just want to know how to find noise in my data and show them in chart and how to delete them?
A complex problem because it depends what you mean by noise.
If you mean finding individual attributes whose values are plain wrong then you could plot a histogram view and work out some sort of limits on what constitutes a valid value. You could then impose that rule by using Filter Examples to remove them.
If you mean finding attributes that have some sort of random jitter applied to them it would be difficult to detect these. Only by knowing beforehand what the expected shape of the distribution is could you compare with observation and do something about it. However, the action to take is by no means obvious.
If you mean finding examples within an example set that are obviously different from other examples then you could consider using the various outlier functions. The simplest one to get started is Detect Outlier (Distances). This finds a set number of outliers (default 10) based on a distance calculation that uses all the attributes for examples. It creates a new attribute called outlier that is set to true or false. You could then use the Filter Examples operator to remove those that are set to true.
Hope that helps at least as a start.

sequential/online kmeans clustering, how does it work? Existing codes?

I'm a little confused about online kmeans clustering. I know that it allows me to cluster with just one data at a time. But,is this all limited to one session? Suppose that I have a bunch of data clustered via this method and I get the clustered data result, would I be able to add more data to the cluster in the future?
I've also been looking for implementations of this code, and to no avail. Anyone know of any?
Update:
To clarify more. Here is how my code works right now:
Image is taken from live video feed, once enough pictures are saved, get kmeans of sift features.
Repeat step 1, a new batch of live feed pictures, get kmeans again. Combine the kmeans vectors with the previous kmeans like :[A B]
You can see that this is bad, because I quickly get too much clusters, and each batch of clusters will definitely have overlaps with another batch.
What I want:
Image taken from live video feed, once pics are saved, get kmeans
Repeat step 1, get kmeans again, which updates and adds new clusters to the previous cluster.
Nothing that I've seen could accommodate that, unless I'm just not understanding them correctly.
If you look at the original (!) publications, the method proposed by MacQueen - where the name k-means comes from - was in fact an online algorithm. I'm not sure if MacQueen did multiple passes over the data to improve the result. I believe he used a single pass, and objects would never be reassigned to a different cluster. If so, it was already an online algorithm!
Means are commonly computed as sum / count. This is not very sensible from a numerical point of view. E.g. in the classic Knuth book you can find a method for incrementally updating means. Wikipedia has it also.
Things get slightly more complicated once you actually want to reassign earlier points. But usually in a streaming context you do not know the previous points, so you cannot do that anyway.

How to visualize generated RNA secondary structure

I'm working on a tool to visualize RNA secondary structure, for this purpose I have implemented Nussinov's algorithm which generates the RNA secondary structure as list with the corresponding indices, the code can be found here [0]
[0] http://dpaste.com/596262/
But I really stuck with understanding how I should visualize it (as a planar graph), the code above gives me a sequential list of the secondary structure, so can someone please suggest me as to how I can visualize the structure.An example of such tool can be found here [1]
[1] http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi
and I know there are better algorithms but for now I would just want to visualize with this and once I understand visualization, I will go for a better algorithm.
Visualizing the secondary structure of RNA (or any graph, for that matter) algorithmically is a difficult problem. You need to take care that there are as few overlaps as possible while maintaining consistent link lengths. As the other answers have pointed out, there are a number of existing implementations that you can already use. I'll just throw in another one that's quite easy to use and requires no downloads:
forna - nibiru.tbi.univie.ac.at/forna
Here you just need to enter a dotbracket string:
>molecule_name
CGCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAGCCUUAAACUCUUGAUUAUGAAGUG
((((((((((..((((((.........))))))......).((((((.......))))))..)))))))))
This will give you a visualization that looks something like this:
This is computed using a combination of the ViennaRNA RNAplot program and d3's force-directed graph algorithm.
You could do this with jmol . Jmol allows you to add arbitrary bonds / atoms to a coordinate space using its java or I believe its javascript api also.
In general, of course, PDB file formats would be used for such data.
RNAviz is old but still commonly used. JalView apparently was supposed to get RNA secondary structure rendering thru a GSoC project last year, but I'm not sure what the status in the program is.