geometric search in red black binary search tree - c++

I'm currently implementing a red-black binary search tree for geometric interval search.
I'm saving in the tree segments containing a startpoint and a endpoinit where the startpoint is the key entry to the tree.
My concern is to be able to save into the tree segments which have the duplicate starting point (or if you prefer, which have the same key).
It's a kind of C++ multimap for geometric search.
The solution I came up with is: for each entry who has duplicate keys, save a list (or vector) of segments with corresponding duplicate keys.
The problems I see with this approach are twofold:
1. It will reduces the efficiency of the search if there are a large number of duplicate keys.
2. I'll have to use more memory to store the duplicate keys.
My question is: is there another way to implement this more efficiently?

I'd recommend not to use a normal binary search tree! Instead, have a look at interval trees: I suspect this structure suits you needs better.

Related

Efficiently search among pairs of adjacent elements in a `set`

I'm currently working on a problem where I want to maintain the convex hull of a set of linear functions. It might look something like this:
I'm using a set<Line> to maintain the lines so that I can dynamically insert lines, which works fine. The lines are ordered by increasing slope, which is defined by the operator< of the lines. By throwing out "superseded" lines, the data structure guarantees that every line will have some segment that is a part of the convex hull.
Now the problem is that I want to search in this data structure for the crossing point whose X coordinate precedes a given x. Since those crossing points are only implicitely defined by adjacency in the set (in the image above, those are the points N, Q etc.), it seems to be entirely impossible to solve with the set alone, since I don't have
The option to find an element by anything but the primary compare function
The option to "binary search" in the underlying search tree myself, that is, compute the pre-order predecessor or successor of an iterator
The option to access elements by index efficiently
I am thus inclined to use a second set<pair<set<Line>::iterator, set<Line>::iterator> > >, but this seems incredibly hacky. Seeing as we mainly need this for programming contests, I want to minimize code size, so I want to avoid a second set or a custom BBST data structure.
Is there a good way to model this scenario which still let's me maintain the lines dynamically and binary search by the value of a function on adjacent elements, with a reasonable amount of code?

Range tree construction

Let us consider the following picture
this is a so called range tree. I don't understand one thing, it looks similar to a binary search tree, so if we are inserting elements, we can use the same procedure as during binary search tree insertion. So what is the difference?
I have read a tutorial and guess that it is a varation of kd trees, query search trees (like geometric point searching, and so on) but how to construct it? Like binary search tree or does it need additional parameters? Maybe like this
struct range
{
int lowerbound;
int upperbound,
int element;
};
and during insertion we have to check
if(element>lowerbound && element <upperbound)
then insert node
Please help me to understand correctly how to construct a range tree.
In binary search tree when you insert a value you insert a new node.
The range search tree is more similar to binary index tree. These two data structures have fixed structures. When you add / subtract a point to a given range you update the values in the nodes, but do not introduce new nodes.
The construction of this structure is much similar to that of KD-tree: based on the given points you choose the most appropriate points of splitting.
When you learn new data structure always consider the supported operations - this will help you understand the structure faster. In your case it would have helped you distinguish between binary search tree and range tree.

Binary Tree using int array as key (Euclidean distances)?

Have written a Binary Search Tree that stores data on ships, the key for the search is their acoustic signature.
When searching the tree I want to return either a ship with the correct signature or the one with the closest match to the searched signature. (By seeing which ship has the closest Euclidean distance).
The problem I am having is how to compare the signatures other than their actual numerical value. Which would then mean that any search performed would be sequential and not binary?
Any ideas?
What you're doing comes down to a nearest-neighbour search in multiple dimensions. You can't solve this efficiently with just a binary tree; you'll need some space partitioning structure.
If your array length N is small (single digits), you can use a 2^N-ary tree (quadtree, octree, ...) as a generalization of a binary tree.
A popular choice which also works well for higher dimensions is a Kd-tree.

Range Minimum/ Maximum Query

I have coordinate points (x,y) say I have 10 000 points . now when a new point is given as test query say (p,q). I have to check with every point in coordinate points.if x coordinate of text query that is
P Y
from online searches I came to know that Rmq- range min/max query datastructure can help me but I am not sure how to do it..can some one help me how may i do this ..any references or code help in c++ will be of great help .thank you
If your goal is to check whether the point exists in the data set, then there are a number of really useful data structures you could use to hold the data, each of which supports very efficient lookups.
For starters, if all you need to know is whether or not the point exists, you could always store all the points in a standard hash table or balanced binary search tree. This would offer O(1) or O(log n) lookup time, respectively. Plus these structures tend to be available in most programming languages.
If, on the other hand, you're planning on doing fancier operations on the data, such as searching for the k points in the data set nearest some test point, or trying to find all of the points in some bounding area, you might want to consider using a kd-tree or a quadtree. These variants on the standard binary search offer fast lookups (O(log n) time). The kd-tree also supports very fast k-nearest-neighbor searches and searches inside of bounding volumes. Moreover, the kd-tree is surprisingly easy to implement if you have any experience implementing a standard binary search tree.
Hope this helps!

Tree matching using serialization of tree and unique id generation for each subtree

Binary tree http://img9.imageshack.us/img9/9981/binarytree.jpg
What would be the best way to serialize a given binary tree and inturn evaluate a unique id for each serialized binary tree?
For example, I need to serialize the sub-tree (2,7,(5,6,11)) and generate a unique id 'x' representing that sub-tree so that whenever I come across a similar sub-tree (2,7,(5,6,11)) it would serialize to the same value 'x' and hence I can deduce that I've found a match.
Here we assume that each node has properties that are unique. In the above example, it would be the numbers assigned to each node and hence they would always generate the same ids for similar sub-trees. I'm trying to do this in C++.
Do algorithms already exist to perform such serialized tree matching?
Do you want to to be able match any arbitrary part of the tree or a subtree running upto some leaf node(s)? IIUC, you are looking at suffix matching.
You can also look at Compact Directed Acyclic Word Graph for ideas.
I would make a hash value (in some Rabin-Karp fashion) based on the nodes' IDs and position in the tree, ie:
long h = 0
for each node in sub tree:
h ^= node.id << (node.depth % 30)
in pseudo code. The downside is that different subtrees may yield the same hash value. But the advantage is that it is fast to compare hash values, and when match is found you can further investige the actual sub tree for equality.
If you're not looking for high efficiency, you might want to use a very simple depth-first-search algorithm.
"2,7,2,U,6,5,U,11,U,U,U,5,9,4"
As you can see, i added U commands ("up") so as to show where the next child would be created. Of course you can make this more efficient, but i believe that's a start.
Also, you might want to have a look at Boost.Graph (BGL) for implementation.
What's wrong with the parentheses notation like you used in your question?