I have been looking for methods to register (align) organized point clouds with normal information.
I could only find generic point cloud registration methods (for example in PCL).
I am using Microsoft Kinect to get my point clouds, but the problem is that they are quite big.
What I would like to know:
Is there are fast ways to register organized point clouds?
Are there down-sampling methods that are very fast (and may also
be using the fact that the point clouds are organized)?
I was also thinking about using OpenCV filters, since an organized
point cloud could be though of as an image with gray values (2D matrix with depth values). For example using the openCV resize method on the matrix, and some derivative type filters (because edges are important for me in the scene). Is that a good idea?
Also, down-sampling looks like a data-parallel problem, which could be a great candidate for GPU implementation. Do you know about any such implementation?
What I have done so far is the following.
- Several down-sampling methods (Random, Voxel-based, Uniform), but the problem with all of them is that they all took a lot of time (in PCL). Best was Voxel-based.
- Then did ICP, which ran pretty fast and accurate enough for me on the down-sampled point clouds.
So for me, currently, a good solution would be a fast way of down-sampling my point clouds. For example a GPU-based implementation for it.
Thinking of an organized point cloud as an image with greyvalues (simple 2D matrix) turns out to be a good idea.
Downsampling methods for 2D matrices implemented on GPU are available in, for example, OpenCV cuda.
Also, it is easy to implement your own fast downsampling methods on 2D matrices, depending on how important accuracy is. For example, just simply take every kth element. You can do, if needed, averaging at these elements to blur, or derivative type filters to sharpen (edge enhancement). You can come up with special picking methods, depending on information about the frames (e.g. if you know your objects tend to be in the center, then you can pick more points around the area).
All these three above will give faster results and probably "more-tuned" to your problem (especially #3). "More-tuned" implies less robust.
Related
I'm working in C++ with large voxel grids in a scientific context and I'm trying to decide, which library to use. Only a fraction of the voxel grid holds values - but might be several per voxel (e.g. struct), which are determined by raytracing. I'm not trying to render anything, but I have to determine the potential number of rays passing though the entire target area, thus an awful lot of ray-box computations will have to be caluculated and preferebly very fast...
So far, I found
OpenVDB http://www.openvdb.org/
Field3d http://sites.google.com/site/field3d/
The latter appeals a bit more, because it seems simpler/easier to use.
My question is: Which of them would be more suited if put to use in tasks, which are not aimed at rendering/visualization? Which one is faster/better when computing a lot of ray-box-intersections (no viewpoint-dependent culling possible)? Suggestions, anyone?
In any case, I want to use an existing C++ library and not write a kdTree/Octree etc. myself. Don't have the time for inventing the wheel anew.
I would advise
OpenSceneGraph
Ogre3D
VTK
I have personally used the first two. However, VTK is also a popular alternative. All three of them support voxel based rendering.
I'm using the Kinect and the OpenNI library to track a user's hands.
As far as I can see, there are two ways to do this; either using the HandsGenerator and tracking each hand separately, or using UserGenerator, and then asking for the hand positions using GetSkeletonJoint and XN_SKEL_LEFT_HAND/XN_SKEL_RIGHT_HAND.
For various reasons, it'd be much more convenient if I could just use the UserGenerator, but the coordinates it gives me for the two hands are extremely jittery to the point of being unusable, even when setting a high smoothing value. In comparison, the coordinates given by HandsGenerator are very precise and stable.
How come the precision of the two methods is so different, and is there anything I can do to improve the precision of the coordinates given by the UserGenerator method?
I'm working on the analysis of a particle's trajectory in a 2D plane. This trajectory typically consists of 5 to 50 (in rare cases more) points (discrete integer coordinates). I have already matched the points of my dataset to form a trajectory (thus I have time resolution).
I'd like to perform some analysis on the curvature of this trajectory, unfortunately the analysis framework I'm using has no support for fitting a trajectory. From what I heard one can use splines/bezier curves for getting this done but I'd like your opinion and/or suggestions what to use.
As this is only an optional part of my work I can not invest a vast amount of time for implementing a solution on my own or understanding a complex framework. The solution has to be as simple as possible.
Let me specify the features I need from a possible library:
- create trajectory from varying number of points
- as the points are discrete it should interpolate their position; no need for exact matches for all points as long as the resulting distance between trajectory and point is less than a threshold
- it is essential that the library can yield the derivative of the trajectory for any given point
- it would be beneficial if the library could report a quality level (like chiSquare for fits) of the interpolation
EDIT: After reading the comments I'd like to add some more:
It is not necessary that the trajectory exactly matches the points. The points are created from values of a pixel matrix and thus they form a discrete matrix of coordinates with a space resolution limited by the number of pixel per given distance. Therefore the points (which are placed at the center of the firing pixel) do not (exactly) match the actual trajectory of the particle. Either interpolation or fit is fine for me as long as the solution can cope with a trajectory which may/most probably will be neither bijective nor injective.
Thus most traditional fit approaches (like fitting with polynomials or exponential functions using a least squares fit) can't fulfil my criterias.
Additionaly all traditional fit approaches I have tried yield a function which seems to describe the trajectory quite well but when looking at their first derivative (or at higher resolution) one can find numerous "micro-oscillations" which (from my interpretation) are a result of fitting non-straight functions to (nearly) straight parts of the trajectory.
Edit2: There has been some discussion in the comments, what those trajectories may look like. Essentially thay may have any shape, length and "curlyness", although I try to exclude trajectories which overlap or cross in the previous steps. I have included two examples below; ignore the colored boxes, they're just a representation of the values of the raw pixel matrix. The black, circular dots are the points which I'd like to match to a trajectory, as you can see they are always centered to the pixels and therefore may have only discrete (integer) values.
Thanks in advance for any help & contribution!
This MIGHT be the way to go
http://alglib.codeplex.com/
From your description I would say that a parametric spline interpolation may suit your requirements. I have not used the above library myself, but it does have support for spline interpolation. Using an interpolant means you will not have to worry about goodness of fit - the curve will pass through every point that you give it.
If you don't mind using matrix libraries, linear least squares is the easiest solution (look at the end of the General Problem section for the equation to use). You can also use linear/polynomial regression to solve something like this.
Linear least squares will always give the best solution, but it's not scalable, because matrix multiplication is moderately expensive. Regression is an iterative heuristic method, so you can just run it until you have a "sufficiently good" answer. I've seen guidelines for the cutoff at about 1000-10000 dimensions in your data. So, with your data set, I'd recommend linear least squares, unless you decide to make them highly dimensioned for some reason.
Is there a pathfinding algorithm also suited for real 3D environments e.g. real Buildings with multiple stairs etc. A C++ library or open implementation would be splendid ;-)
One solution I saw was Djikstra but I wonder whether there is something more optimal.
Normal A* would not work better then Djikstra since the distance heuristic does not work well (Position one floor above destination).
Another solution that I'm currently pondering is the mapping of the 3d environment on a 2d graph. So if there is some available C++ implementation/library going this way it would be helpful too.
If the path has to take into account the ability to navigate through obstacles (i.e. the movement is that of some entity with known volume in space), then I'd recommend looking into the literature on robot motion planning. The notion of a configuration space allows you to handle changes in pose in order to deal with obstacles. See the classic textbook by Jean-Claude Latombe
For simpler scenarios, you can probably make do with path planning algorithms used in first person computer games, which are similar to Dijkstra, A* (example)
For an approximation algorithm you can easily map the 3d to a 1d curve and traverse an octree with a gray code. That way you can reorder each path. I don't know if there is a guarantee to be within the optimum solution but it must be better then any heuristic method.
I thought about:
1) Implement everything for the b/w images, then make wrappers for the methods that check if it's a color image. If it is, split the channels, make the operations on each individually and then merge them.
2) Use functors to correctly update the values depending on what I'm dealing with. Problem is that the compiler errors would be really complicated and I'm not used to it, and I think I may end up needing quite a few of them. Not sure if this is a good idea tbh.
There might be a correct design pattern here I'm not seeing too. There could also be a way to do this that's channel/color agnostic in OpenCV though I haven't found it yet, and so far the book I'm reading (OpenCV 2 Computer Vision Application Programming Cookbook) hasn't shown me such a possibility yet.
If speed is important, Don't.
It sounds like you're trying to encapsulate or abstract away the type of pixel using OO techniques or the like. This could add an extra level of indirection for every pixel access, killing your performance.
If you're calling staight to a function vs. a pointer to one (e.g., delegate, overriden method, functor) it can still be faster for the CPU, but if you're doing function calls at all reconsider; they're still extra work and if you can nest everything in the outer FOR loop, it will look ugly and functional programming snobs will sneer at you, remember, this isn't a big LOB app that will get hard to maintain. That's why engineers can still perfectly maintain 30 year old quickbasic code, the problem space doesn't need anything smarter (however usually their problems themselves need something a lot smarter than I!)
It's best to implement simple things (e.g., a threshold op or resizing) optimized for each kind of image if you want speed. You can also research transformation matrix and see if you can accomplish your work like that. That way you can write 2 transformer algorithms (b&w) only, and, using a similar (or same) matrix do the same thing for both types of pictures.
Hence accomplishing a major goal of abstraction anyway, seamless reuse, separation of concerns. And speed to boot (but hopefully not reboot!) good luck
Splitting the channels could work well with algorithms that work with the channels independently; not all of them do, so this will be quite limiting. You'll also spend a bit of time and space making all those copies.
By functors I presume you mean making templates out of your algorithm functions, with a pixel type as the template parameter. That could work also, but it means defining your basic pixel operations in a way that they could be implemented as functions or operators on a generic pixel type. This is harder than it looks and should be done after you've had some experience in implementing the algorithms.
A third option not mentioned is to promote the b/w images to full color, process them, and convert back to b/w. This optimizes the full color processing at the expense of the b/w.
For most algorithms it is not necessary to worry about monochrome vs. colour images. You either use the grey value of the monochrome image or you calculate the luminance/intensity/whatever of the colour and use that. You choose the measure luminance etc. by looking at which colour space will give you the result you want.
When you have calculated how you are going to modify your images you use some pixel aware processing, e.g. blending two pixels might be pixel_a*0.5 + pixel_b*0.5, your pixel class will sort out how to apply that to the different colour channels, i.e. Pixel::operator+(const Pixel &), Pixel::operator*(float) and so on.
There are algorithms that are applied individually to each colour channel but they are not as common and often there is some correlation between the spatiotemporal changes in the colours so you wouldn't do something as basic as process each channel totally independently of each other.
My own Image class uses a planar structure (that is, color channels are separate) instead of an interleaved structure. However this is VERY limiting when it comes to image quantization and other joint color processing tasks.
I am planning to rewrite it to use the other approach, to simply be a two dimensional array of pixels. At the moment I am not sure how will I implement it exactly (template pixel class, Pixel base class or a simple three dimensional array).
I also plan to write a planar wrapper for this interleaved image structure to ease any disadvantage I might encounter. One thing is sure, this wrapper will be much efficient than a pixel wrapper would be for planar images.
Frankly I believe splitting planes is rather inefficient, since you calculate various overheads several times. For example, if you want to resize an image, calculation of the various filter coefficients is very expensive, and it would be MUCH better to just calculate them once and apply Pixel::operator * and + instead of the same with the underlying subpixel components.