Binary Tree using int array as key (Euclidean distances)? - c++

Have written a Binary Search Tree that stores data on ships, the key for the search is their acoustic signature.
When searching the tree I want to return either a ship with the correct signature or the one with the closest match to the searched signature. (By seeing which ship has the closest Euclidean distance).
The problem I am having is how to compare the signatures other than their actual numerical value. Which would then mean that any search performed would be sequential and not binary?
Any ideas?

What you're doing comes down to a nearest-neighbour search in multiple dimensions. You can't solve this efficiently with just a binary tree; you'll need some space partitioning structure.
If your array length N is small (single digits), you can use a 2^N-ary tree (quadtree, octree, ...) as a generalization of a binary tree.
A popular choice which also works well for higher dimensions is a Kd-tree.

Related

geometric search in red black binary search tree

I'm currently implementing a red-black binary search tree for geometric interval search.
I'm saving in the tree segments containing a startpoint and a endpoinit where the startpoint is the key entry to the tree.
My concern is to be able to save into the tree segments which have the duplicate starting point (or if you prefer, which have the same key).
It's a kind of C++ multimap for geometric search.
The solution I came up with is: for each entry who has duplicate keys, save a list (or vector) of segments with corresponding duplicate keys.
The problems I see with this approach are twofold:
1. It will reduces the efficiency of the search if there are a large number of duplicate keys.
2. I'll have to use more memory to store the duplicate keys.
My question is: is there another way to implement this more efficiently?
I'd recommend not to use a normal binary search tree! Instead, have a look at interval trees: I suspect this structure suits you needs better.

Range tree construction

Let us consider the following picture
this is a so called range tree. I don't understand one thing, it looks similar to a binary search tree, so if we are inserting elements, we can use the same procedure as during binary search tree insertion. So what is the difference?
I have read a tutorial and guess that it is a varation of kd trees, query search trees (like geometric point searching, and so on) but how to construct it? Like binary search tree or does it need additional parameters? Maybe like this
struct range
{
int lowerbound;
int upperbound,
int element;
};
and during insertion we have to check
if(element>lowerbound && element <upperbound)
then insert node
Please help me to understand correctly how to construct a range tree.
In binary search tree when you insert a value you insert a new node.
The range search tree is more similar to binary index tree. These two data structures have fixed structures. When you add / subtract a point to a given range you update the values in the nodes, but do not introduce new nodes.
The construction of this structure is much similar to that of KD-tree: based on the given points you choose the most appropriate points of splitting.
When you learn new data structure always consider the supported operations - this will help you understand the structure faster. In your case it would have helped you distinguish between binary search tree and range tree.

Huffman tree implementation using minheap

In the book I'm using for my class (and from what I've seen from a few other places), it seems like the algorithm for creating a huffman tree stems from
(1) Building a minheap based on the frequency of each character in whatever file or string is being read in.
(2) Popping off the 2 smallest values from the minheap and combining their weights into a new node.
(3) Re-inserting the new node back into the same minheap.
I'm confused about step 3. Most huffman trees I've seen have attributes more similar to a max heap than a minheap (although they are not complete trees). That is to say, the root contains the maximum weight (or combination of weights rather), while all of it's children have lesser weights. How does this implementation give a huffman tree when the combined nodes are put back into a minheap? I've been struggling with this for a while now.
A similar question has already been posted here (with the same book as me): I don't understand this Huffman algorithm implementation
In case you wanted to see the exact function described in (3).
Thanks for any help!
A Huffman tree is often not a complete binary tree, and so is not a min-heap.
The Huffman algorithm is easily understood as a list of frequencies from which a tree is built. Small branches are constructed first, which will eventually all be merged into a single tree. Each list item starts off as a symbol, and later may be a symbol or a sub-tree that has been built. Each list item always has a frequency (an integer count usually).
Take the two smallest frequencies out of the list (ties don't matter -- any choice will result in an optimal code, though there may be more than one optimal code). Construct a single-level binary tree from those two, where the two leaves are the symbols for those frequencies. Add the frequencies to make a new frequency representing the tree. Put that frequency back in the list. The list now has one less frequency in it.
Repeat. Now the binary tree constructed at each step may have symbol leaves on each branch, or one leaf and a previously constructed tree, or two trees (at earliest in the third step).
Keep going until there is only one frequency left in the list. That will be the sum of all the original frequencies. That frequency has the complete Huffman tree associated with it.
Now you can (arbitrarily) assign a 0 and a 1 to each binary branch. You build codes or decode codes by traversing the tree from the root to a symbol. The bits from the branches of that traverse are in order the Huffman code for that symbol.

Range Minimum/ Maximum Query

I have coordinate points (x,y) say I have 10 000 points . now when a new point is given as test query say (p,q). I have to check with every point in coordinate points.if x coordinate of text query that is
P Y
from online searches I came to know that Rmq- range min/max query datastructure can help me but I am not sure how to do it..can some one help me how may i do this ..any references or code help in c++ will be of great help .thank you
If your goal is to check whether the point exists in the data set, then there are a number of really useful data structures you could use to hold the data, each of which supports very efficient lookups.
For starters, if all you need to know is whether or not the point exists, you could always store all the points in a standard hash table or balanced binary search tree. This would offer O(1) or O(log n) lookup time, respectively. Plus these structures tend to be available in most programming languages.
If, on the other hand, you're planning on doing fancier operations on the data, such as searching for the k points in the data set nearest some test point, or trying to find all of the points in some bounding area, you might want to consider using a kd-tree or a quadtree. These variants on the standard binary search offer fast lookups (O(log n) time). The kd-tree also supports very fast k-nearest-neighbor searches and searches inside of bounding volumes. Moreover, the kd-tree is surprisingly easy to implement if you have any experience implementing a standard binary search tree.
Hope this helps!

Binary tree that stores partial sums: Name and existing implementations

Consider a sequence of n positive real numbers, (ai), and its partial sum sequence, (si). Given a number x ∊ (0, sn], we have to find i such that si−1 < x ≤ si. Also we want to be able to change one of the ai’s without having to update all partial sums. Both can be done in O(log n) time by using a binary tree with the ai’s as leaf node values, and the values of the non-leaf nodes being the sum of the values of the respective children. If n is known and fixed, the tree doesn’t have to be self-balancing and can be stored efficiently in a linear array. Furthermore, if n is a power of two, only 2 n − 1 array elements are required. See Blue et al., Phys. Rev. E 51 (1995), pp. R867–R868 for an application. Given the genericity of the problem and the simplicity of the solution, I wonder whether this data structure has a specific name and whether there are existing implementations (preferably in C++). I’ve already implemented it myself, but writing data structures from scratch always seems like reinventing the wheel to me—I’d be surprised if nobody had done it before.
This is known as a finger tree in functional programming but apparently there are implementations in imperative languages. In the articles there is a link to a blog post explaining an implementation of this data structure in C# which could be useful to you.
Fenwick tree (aka Binary indexed tree) is a data structure that maintains a sequence of elements, and is able to compute cumulative sum of any range of consecutive elements in O(logn) time. Changing value of any single element needs O(logn) time as well.