Using binary search tree to store multidimensional vectors

Using binary search tree to store multidimensional vectors - c++

I have an array of vector-3 values:
struct Vector3 {
int x, y, z;
...
};
I would like to put these values into a binary search tree for quick search and finding duplicates. In a binary search tree we need to compare values to set them in a correct order. But would there be a way to compare vectors that makes sense?
I thought about comparing them by length but this will not find duplicates correctly, as two vectors can point in different directions but have the same length.
Comparing each element doesn't make sense either since it will give different results of comparison based on order in which you compare two vectors.
So does utilizing a binary search tree for multidimensional vectors makes sense? Or should I look into hash tables?
Edit: The suggested answer implements comparison between two two-dimensional vectors (points) like this (x < pt.x) || ((!(pt.x < x)) && (y < pt.y));. But there's no explanation as to how/why this comparison works.

When dealing with performance work, I do like John Carmack does in "Masters of Doom." Do the simplest possible thing first. You could start with a binary tree with a sort function sorting by x, then y, and then z. I'd measure and if that wasn't fast enough, depending on your ranges for X, Y, and Z, I might try a hash map (maybe hashed using bitshifting based on the possible ranges for X, Y and Z), or look into storing them in an octtree.

Related

Graph with std::vectors?

I thought that a cool way of using vectors could be to have one vector class template hold an two separate int variables for x/y-coordinates to graph.
example:
std::vector<int, int> *name*;
// First int. being the x-intercept on a graph
// Second int. being the y-intercept on a graph
(I also understand that I could just make every even/odd location or two separate vectors to classify each x/y-coordinate, but for me I would just like to see if this could work)
However, after making this vector type, I came across an issue with assigning which int within the vector will be written to or extracted from. Could anyone tell me how to best select and std::cout both x/y ints appropriately?
P.S. - My main goal, in using vectors this way, is to make a very basic graph output to Visual Studio terminal. While being able to change individual x/y-intercepts by 'selecting' and changing if needed. These coordinates will be outputted to the terminal via for/while loops.
Also, would anyone like to list out different ways to best make x/y-coordinates with different containers?

Your question rather broad, in other words it is asking for a bit too much. I will just try to give you some pointers from which you can work your way to what you like.
A) equidistant x
If your x values are equidistant, ie 0, 0.5, 1, 1.5 then there is no need to store them, simply use a
std::vector<int> y;
if the number of variables is not known at compile time, otherwise a
std::array<int,N> y;
B) arbitrary x
There are several options that depend on what you actually want to do. For simply storing (x,y)-pairs and printing them on the screen, they all work equally well.
map
std::map<int,int> map_x_to_y = { { 1,1}, {2,4}, {3,9}};
// print on screen
for (const auto& xy : map_x_to_y) {
std::cout << xy.first << ":" xy.second;
}
a vector of pairs
std::vector<std::pair<int,int>> vector_x_and_y = { { 1,1}, {2,4}, {3,9}};
Printing on screen is actually the same as with map. The advantage of the map is that it has its elements ordered, while this is not the case for the vector.
C) not using any container
For leightweight calculations you can consider to not store the (xy) pairs at all, but simply use a function:
int fun(int x) { return x*x; }
TL;DR / more focussed
A vector stores one type. You cannot have a std::vector<int,int>. If you look at the documentation of std::vector you will find that the second template parameter is an allocator (something you probably dont have to care about for some time). If you want to store two values as one element in a vector you either have to use std::vector<std::pair<double,double>> or a different container.
PS
I used std::pair in the examples above. However, I do consider it as good practice to name things whenever I can and leave std::pair for cases when I simply cannot give names better than first and second. In this spirit you can replace std::pair in the above examples with a
struct data_point {
int x;
int y;
};

How to use lower_bound on vector of vectors?

I am relative new at C++ and I have little problem. I have vector and in that vector are vectors with 3 integers.
Inner vector represents like one person. 3 integers inside that inner vector represents distance from start, velocity and original index (because in input integers aren't sorted and in output I need to print original index not index in this sorted vector).
Now I have given some points representing distance from start and I need to find which person will be first at that point so I have been thinking that my first step would be that I would find closest person to the given point so basically I need to find lower_bound/upper_bound.
How can I use lower_bound if I want to find the lower_bound of first item in inner vectors? Or should I use struct/class instead of inner vectors?

You would use the version of std::lower_bound which takes a custom comparator (the versions marked "(2)" at the link); and you would write a comparator of vectors which compares vectors by their first item (or whatever other way you like).
Howerver:
As #doctorlove points out, std::lower_bound doesn't compare the vectors to each other, it compares them to a given value (be it a vector or a scalar). So it's possible you actually want to do something else.
It's usually not a good idea to keep fixed-length sequences of elements in std::vector's. Have you considered std::array?
It's very likely that your "vectors with 3 integers" actually stand for something else, e.g. points in a 3-dimensional geometric space; in which case, yes, they should be in some sort of class.

I am not sure that your inner things should be std::vector-s of 3 elements.
I believe that they should std::array-s of 3 elements (because you know that the size is 3 and won't change).
So you probably want to have
typedef std::array<double,3> element_ty;
then use std::vector<element_ty> and for the rest (your lower_bound point) do like in einpoklum's answer.
BTW, you probably want to use std::min_element with an explicit compare.
Maybe you want something like:
std::vector<element_ty> vec;
auto minit =
std::min_element(vec.begin(), vec.end(),
[](const element_ty& x, const element_ty&y) {
return x[0] < y[0]));

Is there a find () of a map to use a comparator with parameters?

To explain, what I want, let there is a map
std::map<Point, SomeClass> hm_map;
Is there a way to use in a find () of a map the comparator with parameter? - the radius of interesting me Point, so the find () must return the set of proper pairs. I think, I have chosen the incorrect container for it.
Edit:
comparator::distance = someNumber;
setOfProperPairs = hmap.find (key, comparator);
where
struct comparator
{
static double distance;
bool operator()(Point ptg, Point p) const
{ return ptg.Hit (p, distance); }
};
Do you know what the container to use for it?

std::map supports one-dimensional sorted data.
If you want geometric sorting in two or more dimensions, there is no std support for that.
You will want a quad tree (or oct tree or higher dimensional analogues), or an r tree, or a kd-tree, or similar.
They are a bit tricky to code.
Now, if you know the radius you care about before you build your structure, you can hack a simpler implementation. Create a square n dimensional grid where the spacing between the grid sides is about 1/2 to 2/3 said radius.
Store data in a multimap from grid cell to exact location and data.
Now when doing a lookup, figure out what grid the center is in, work out what grid cells could have hits in them, and search through said grid cells, doing a final check on the location to see if it is a hit.

Finding all possible pairs of subsets using recursion

I am given
struct point
{
int x;
int y;
};
and the table of points:
point tab[MAX];
Program should return the minimal distance between the centers of gravity of any possible pair of subsets from tab. Subset can be any size (of course >=1 and < MAX).
I am obliged to write this program using recursion.
So my function will be int type because I have to return int.
I globally set variable min (because while doing recurssion I have to compare some values with this min)
int min = 0;
My function should for sure, take number of elements I add, sum of Y coordinates and sum of X coordinates.
int return_min_distance(int sY, int sX, int number, bool iftaken[])
I will be glad for any help further.
I thought about another table of bools which I pass as a parameter to determine if I took value or not from table. Still my problem is how to implement this, I do not know how to even start.

I think you need a function that can iterate through all subsets of the table, starting with either nothing or an existing iterator. The code then gets easy:
int min_distance = MAXINT;
SubsetIterator si1(0, tab);
while (si1.hasNext())
{
SubsetIterator si2(&si1, tab);
while (si2.hasNext())
{
int d = subsetDistance(tab, si1.subset(), si2.subset());
if (d < min_distance)
{
min_distance = d;
}
}
}
The SubsetIterators can be simple base-2 numbers capable of counting up to MAX, where a 1 bit indicates membership in the subset. Yes, it's a O(N^2) algorithm, but I think it has to be.
The trick is incorporating recursion. Sorry, I just don't see how it helps here. If I can think of a way to use it, I'll edit my answer.
Update: I thought about this some more, and while I still can't see a use for recursion, I found a way to make the subset processing easier. Rather than run through the entire table for every distance computation, the SubsetIterators could store precomputed sums of the x and y values for easy distance computation. Then, on every iteration, you subtract the values that are leaving the subset and add the values that are joining. A simple bit-and operation can reveal these. To be even more efficient, you could use gray coding instead of two's complement to store the membership bitmap. This would guarantee that at each iteration exactly one value enters and/or leaves the subset. Minimal work.

Fastest way to find if a 3D coordinate is already used

Using C++ (and Qt), I need to process a big amount of 3D coordinates.
Specifically, when I receive a 3D coordinate (made of 3 doubles), I need to check in a list if this coordinate has already been processed.
If not, then I process it and add it to the list (or container).
The amount of coordinates can become very big, so I need to store the processed coordinates in a container which will ensure that checking if a 3D coordinate is already contained in the container is fast.
I was thinking of using a map of a map of a map, storing the x coordinate, then the y coordinate then the z coordinate, but this makes it quite tedious to use, so I'm actually hoping there is a much better way to do it that I cannot think of.

Probably the simplest way to speed up such processing is to store the already-processed points in Octree. Checking for duplication will become close to logarithmic.
Also, make sure you tolerate round-off errors by checking the distance between the points, not the equality of the coordinates.

Divide your space into discrete bins. Could be infinitely deep squares, or could be cubes. Store your processed coordinates in a simple linked list, sorted if you like in each bin. When you get a new coordinate, jump to the enclosing bin, and walk the list looking for the new point.
Be wary of floating point comparisons. You need to either turn values into integers (say multiply by 1000 and truncate), or decide how close 2 values are to be considered equal.

You can easily use a set as follows:
#include <set>
#include <cassert>
const double epsilon(1e-8);
class Coordinate {
public:
Coordinate(double x, double y, double z) :
x_(x), y_(y), z_(z) {}
private:
double x_;
double y_;
double z_;
friend bool operator<(const Coordinate& cl, const Coordinate& cr);
};
bool operator<(const Coordinate& cl, const Coordinate& cr) {
if (cl.x_ < cr.x_ - epsilon) return true;
if (cl.x_ > cr.x_ + epsilon) return false;
if (cl.y_ < cr.y_ - epsilon) return true;
if (cl.y_ > cr.y_ + epsilon) return false;
if (cl.z_ < cr.z_ - epsilon) return true;
return false;
}
typedef std::set<Coordinate> Coordinates;
// Not thread safe!
// Return true if real processing is done
bool Process(const Coordinate& coordinate) {
static Coordinates usedCoordinates;
// Already processed?
if (usedCoordinates.find(coordinate) != usedCoordinates.end()) {
return false;
}
usedCoordinates.insert(coordinate);
// Here goes your processing code
return true;
}
// Test it
int main() {
assert(Process(Coordinate(1, 2, 3)));
assert(Process(Coordinate(1, 3, 3)));
assert(!Process(Coordinate(1, 3, 3)));
assert(!Process(Coordinate(1+epsilon/2, 2, 3)));
}

Assuming you already have a Coordinate class, add a hash function and maintain a hash_set of the coordinates.
Would look something like:
struct coord_eq
{
bool operator()(const Coordinate &s1, const Coordinate &s2) const
{
return s1 == s2;
// or: return s1.x() == s2.x() && s1.y() == s2.y() && s1.z() == s2.z();
}
};
struct coord_hash
{
size_t operator()(const Coordinate &s) const
{
union {double d, unsigned long ul} c[3];
c[0].d = s.x();
c[1].d = s.y();
c[2].d = s.z();
return static_cast<size_t> ((3 * c[0].ul) ^ (5 * c[1].ul) ^ (7 * c[2].ul));
}
};
std::hash_map<Coordinate, coord_hash, coord_eq> existing_coords;

Well, it depends on what's most important... if a tripple map is too tedious to use, then is implementing other data structures not worth the effort?
If you want to get around the uglyness of the tripple map solution, just wrap it up in another container class with an access function with three parameter, and hide all the messing around with maps internally in that.
If you're more worried about the runtime performance of this thing, storing the coordinates in an Octree might be a good idea.
Also worth mentioning is that doing these sorts of things with floats or doubles you should be very careful about precision -- if (0, 0, 0.01) the same coordinate as (0, 0, 0.01000001)? If it is, you'll need to look at the comparison functions you use, regardless of the data structure. That also depends on the source of your coordinates I guess.

Are you expecting/requiring exact matches? These might be hard to enforce with doubles. For example, if you have processed (1.0, 1.0, 1.0) and you then receive (0.9999999999999, 1.0, 1.0) would you consider it the same? If so, you will need to either apply some kind of approximation or else define error bounds.
However, to answer the question itself: the first method that comes to mind is to create a single index (either a string or a bitstring, depending how readable you want things to be). For example, create the string "(1.0,1.0,1.0)" and use that as the key to your map. This will make it easy to look up the map, keeps the code readable (and also lets you easily dump the contents of the map for debugging purposes) and gives you reasonable performance. If you need much faster performance you could use a hashing algorithm to combine the three coordinates numerically without going via a string.

Use any unique transformation of your 3D coordinates and store only the list of the results.
Example:
md5('X, Y, Z') is unique and you can store only the resulting string.
The hash is not a performant idea but you get the concept. Find any methematic unique transformation and you have it.
/Vey

Use an std::set. Define a type for the 3d coordinate (or use a boost::tuple) that has operator< defined. When adding elements, you can add it to the set, and if it was added, do your processing. If it was not added (because it already exists in there), do not do your processing.
However, if you are using doubles, be aware that your algorithm can potentially lead to unpredictable behavior. IE, is (1.0, 1.0, 1.0) the same as (1.0, 1.0, 1.000000001)?

How about using a boost::tuple for the coordinates, and storing the tuple as the index for the map?
(You may also need to do the divide-by-epsilon idea from this answer.)

Pick a constant to scale the coordinates by so that 1 unit describes an acceptably small box and yet the integer part of the largest component by magnitude will fit into a 32-bit integer; convert the X, Y and Z components of the result to integers and hash them together. Use that as a hash function for a map or hashtable (NOT as an array index, you need to deal with collisions).
You may also want to consider using a fudge factor when comparing the coordinates, since you may get floating point values which are only slightly different, and it is usually preferable to weld those together to avoid cracks when rendering.

If you write a helper class with a simple public interface, that greatly reduces the practical tedium of implementation details like use of a map<map<map<>>>. The beauty of encapsulation!
That said, you might be able to rig a hashmap to do the trick nicely. Just hash the three doubles together to get the key for the point as a whole. If you're concerned about to many collisions between points with symmetric coordinates (e.g., (1, 2, 3) and (3, 2, 1) and so on), just make the hash key asymmetric with respect to the x, y, and z coordinates, using bit shift or some such.

You could use a hash_set of any hashable type - for example, turn each tuple into a string "(x, y, z)". hash_set does fast lookups but handles collisions well.

Whatever your storage method, I would suggest you decide on an epsilon (minimum floating point distance that differentiates two coordinates), then divide all coordinates by the epsilon, round and store them as integers.

Something in this direction maybe:
struct Coor {
Coor(double x, double y, double z)
: X(x), Y(y), Z(z) {}
double X, Y, Z;
}
struct coords_thesame
{
bool operator()(const Coor& c1, const Coor& c2) const {
return c1.X == c2.X && c1.Y == c2.Y && c1.Z == c2.Z;
}
};
std::hash_map<Coor, bool, hash<Coor>, coords_thesame> m_SeenCoordinates;
Untested, use at your own peril :)

You can easily define a comparator for a one-level std::map, so that lookup becomes way less cumbersome. There is no reason of being afraid of that. The comparator defines an ordering of the _Key template argument of the map. It can then also be used for the multimap and set collections.
An example:
#include <map>
#include <cassert>
struct Point {
double x, y, z;
};
struct PointResult {
};
PointResult point_function( const Point& p ) { return PointResult(); }
// helper: binary function for comparison of two points
struct point_compare {
bool operator()( const Point& p1, const Point& p2 ) const {
return p1.x < p2.x
|| ( p1.x == p2.x && ( p1.y < p2.y
|| ( p1.y == p2.y && p1.z < p2.z )
)
);
}
};
typedef std::map<Point, PointResult, point_compare> pointmap;
int _tmain(int argc, _TCHAR* argv[])
{
pointmap pm;
Point p1 = { 0.0, 0.0, 0.0 };
Point p2 = { 0.1, 1.0, 1.0 };
pm[ p1 ] = point_function( p1 );
pm[ p2 ] = point_function( p2 );
assert( pm.find( p2 ) != pm.end() );
return 0;
}

There are more than a few ways to do it, but you have to ask yourself first what are your assumptions and conditions.
So, assuming that your space is limited in size and you know what is the maximum accuracy, then you can form a function that given (x,y,z) will convert them to a unique number or string -this can be done only if you know that your accuracy is limited (for example - no two entities can occupy the same cubic centimeter).
Encoding the coordinate allows you to use a single map/hash with O(1).
If this is not tha case, you can always use 3 embedded maps as you suggested, or go towards space division algorithms (such as OcTree as mentioned) which although given O(logN) on a average search, they also give you additional information you might want (neighbors, population, etc..), but of course it is harder to implement.

You can either use a std::set of 3D coordinates, or a sorted std::vector. Both will give you logarithmic time lookup. In either case, you'll need to implement the less than comparison operator for your 3D coordinate class.

Why bother? What "processing" are you doing? Unless it's very complex, it's probably faster to just do the calculation again, rather then waste time looking things up in a huge map or hashtable.
This is one of the more counter-intuitive things about modern cpu's. Computation is fast, memory is slow.
I realize this isn't really an answer to your question, it's questioning your question.

Good question... it's one that has many solutions, because this type of problem comes
up many times in Graphical and Scientific applications.
Depending on the solution you require it may be rather complex under the hood, in this
case less code doesn't necessarily mean faster.
"but this makes it quite tedious to use" --- generally, you can get around this by
typedefs or wrapper classes (wrappers in this case would be highly recommended).
If you don't need to use the 3D co-ordinates in any kind of spacially significant way (
things like "give me all the points within X distance of point P") then I suggest you
just find a way to hash each point, and use a single hash map... O(n) creation, O(1)
access (checking to see if it's been processed), you can't do much better than that.
If you do need more spacial information you'll need a container that explicitly takes
it into account.
The type of container you choose will be dependant on your data set. If you have good
knowledge of the range of values that you recieve this will help.
If you are recieving well distributed data over a known range... go with octree.
If you have a distribution that tends to cluster, then go with k-d trees. You'll need
to rebuild a k-d tree after inputting new co-ordinates (not necessarily every time,
just when it becomes overly imbalanced). Put simply, Kd-trees are like Octrees, but with non uniform division.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using binary search tree to store multidimensional vectors - c++

Related

Graph with std::vectors?

How to use lower_bound on vector of vectors?

Is there a find () of a map to use a comparator with parameters?

Finding all possible pairs of subsets using recursion

Fastest way to find if a 3D coordinate is already used

Categories

Resources