Binary Search Tree - to Make a dictionary - c++

I wanted to make a dictionary using BST but I did not have any Idea how to store them in the tree
struct node
{
char word[50];
char meaning[256];
struct node *left, *right;
};
I started like that but I dont know which words to put in the left and which on the right...

Instead of a binary tree, you should use something like a suffix tree. BSTs are really more for "greater/less-than" relationships, which would be hard to map with words. With suffix trees your nodes are characters and branches eventually lead to leaves representing an actual word.

Which words to put left and which to put right would still follow the basic rules of a BST: All nodes to the left of a given root are guaranteed to be less than that root's value, and all nodes to the right of a given root are guaranteed to be greater than or equal to that root's value.
Apply that same principle to your dictionary. I don't know if you're using C or C++, but if you're using C++, I would recommend making a "Word" struct, and overloading it's equality operators. Then in your "node" struct, just have a Word, a left Node, and a right Node.
A BST is not the best choice of a data structure for a dictionary though. I would look into different types of maps and hashing.

Typically, the words that are lexicograpically smaller than the word in the current node go left, the rest goes right. Use < to do the comparison on C++'s std::string, strcmp for C-style strings (NUL-terminated char arrays).

Related

I want help in writing a c++ code for swapping a data in a node of a Binary Search Tree

It will take two keys as input key1 and key2 and finds the node with key1, then changes it
to key2. This may disturb the BST property (of having smaller things to the left and bigger
to the right); this function should make appropriate changes in tree so that it becomes BST
again.
I tried making links of the new data with links of the older node to be replaced with ,but I can organize my tree precisely.
The easiest way is to find the LCA of key1 and key2, then delete key1 and key2 sequentially and lastly, add their values back to the subtree rooted in the LCA.
If the LCA is either key1 or key2, use the root of the whole tree as the LCA.
If key1 or key2 is the LCA and also the root of the whole tree, just return the original tree before the swap.
This will give you a possibly different tree. Since you didn't specify a criteria for how the tree should be organized after the swap, this is a possibility. Total complexity should be O(n).
I kind of agree with #Daniel except for the fact that key2 is not in your tree.
In your question, remove the use of the word "swap" and instead say replace. I think this is responsible for creating the idea that you want to swap (exchange) two nodes.
For this problem, simply remove the node with key1 in it and insert a new node with key2. Here I am assuming you already have the code in place to do these things

alphabetic binary search tree BST algorithm

I want to declare a class of an alphabetic BST where you can store the nodes by Name -strings or char array-. what is the best algorithm for insertion method in order to have the best search time and have a ideal case BST.
also good to remind that names are not all in same length and may start with same words, they will not be sorted before entering the BST.
Insertion is fast in balanced binary search trees, so either implement a Red-Black tree or an AVL tree. You may also go for B-Trees if you wish to.
Next, you need to look at what to store in a node of the BST. In your case, store the string or char array. To compare two keys, you already have functions defined for both string and char array, i.e string::compare and strcmp respectively.
These two things are all you need to do what you asked, a balanced BST and a datatype for nodes which is comparable.

Remove an element from unbalanced binary search tree

I have been wanting to write remove() method for my Binary Search Tree (which happens to be an array representation). But before writing it, I must consider all cases. Omitting all cases (since they are easy) except when the node has two children, in all the explanations I have read so far, most of the cases I see remove an element from an already balanced binary search tree. In the few cases where I have seen an element being removed from an unbalanced binary search tree, I find that they balance it through zigs and zags, and then remove the element.
Is there a way that I can possibly remove an element from an unbalanced binary search tree without having to balance it beforehand?
If not, would it be easier to write an AVL tree (in array representation)?
You don't need to balance it, but you do need to recursively go down the tree performing some swaps here and there so you actually end up with a valid BST.
Deletion of a node with 2 children in an (unbalanced) BST: (from Wikipedia)
Call the node to be deleted N. Do not delete N. Instead, choose either its in-order successor node or its in-order predecessor node, R. Copy the value of R to N, then recursively call delete on R until reaching one of the first two cases.
Deleting a node with two children from a binary search tree. First the rightmost node in the left subtree, the inorder predecessor 6, is identified. Its value is copied into the node being deleted. The inorder predecessor can then be easily deleted because it has at most one child. The same method works symmetrically using the inorder successor labelled 9.
Although, why do you want an unbalanced tree? All operations on it take on it take longer (or at least as long), and the additional overhead to balance doesn't change the asymptotic complexity of any operations. And, if you're using the array representation where the node at index i has children at indices 2i and 2i+1, it may end up fairly sparse, i.e. there will be quite a bit of wasted memory.

Is a Trie a K-ary tree?

If you look at the node definitions for a simple Trie and a simple K-ary tree, they look the same.
(using C++ notation)
template <size_t K>
trieNode
{
trieNode *[K]
};
template <size_t K>
KaryNode
{
KaryNode *[K]
};
At its simplest a K-ary tree has multiple children per node (2 for a binary tree)
And a Trie has "multiple children per node"
It seems that a K-ary tree makes it's choice of child based on comparison( < or > ) of Keys
While a Trie makes it's choice of child based on (unary) equality of sub-spans of the Key
Since neither data structure has made it into any standards, what would be best definition of each, and how would they be differentiated?
From the point of view of the shape of the data structure, a trie is clearly an N-ary tree, in the same way that a balanced binary search tree is a binary tree, the difference being in how the data structure manages the data.
A binary search tree is a binary tree with additional constraint that the keys in the nodes are ordered, a balanced binary tree adds on top of that a constraint on the difference between the lengths of different branches.
Similarly, a trie is a N-ary tree with additional constrains that determine how the keys are managed.
Let's try a definition of what a trie is:
A trie is an efficient data structure used to implement a dictionary in which keys are sequences lexicographically. The implementation uses an N-ary tree where the branching factor is the range of valid values for each element in the key sequence[1] and each node may or not hold a value, but always holds a subsequence of the key being stored [2]. For each node in the tree, the concatenation of the subsequences of keys stored in the nodes from the root to any given node represent the key for the value stored, if the node holds a value, and/or a common prefix for all nodes in this subtree.
This layout of data allows for linear lookups on the size of the keys, and sharing the prefix allows for compact representations for many natural languages (like Spanish, where different forms of each verb differ only on the last few suffix characters).
1: That keys are sequences is an important premise, as the main advantage of the tries is that they split the key into different nodes along the path.
2: Depending on the implementation each node might maintain a single element (character) from the sequence or a combination.
A binary tree refers to the shape of the tree without saying anything about how the tree will be used. A binary search tree is a binary tree that is being used in a particular way.
Similarly, a k-ary tree = n-ary tree = multi-way tree refers to the shape of the tree. A trie is a multiway tree that is being used in a particular way.
(But, be careful, just like there are many variations on binary search trees, there are many different variations on tries.)
So, what makes a trie a trie?
A trie is usually used to represent a collection of sequences, such as strings. A particular key is stored, not in a single node like in a binary search tree, but rather split up across many levels of the tree. Here's a picture of a trie containing the strings "can", "car", "cat", and "do".
.
/ \
c/ \d
/ \
. .
| |
a| |o
| |
. .
/|\
n/r| \t
/ | \
. . .
As you can see, it may easier to think of the characters as being associated with the edges instead of the nodes, but any particular implementation might represent it either way.
The many varieties of tries differ in things like how they handle cases where one key is a prefix of another (eg, "cat" and "catastrophe"), and how/whether to compress long common substrings.
K-nary tree: each node has at most K children.
Trie: the children of each node is not limited to a number (theoretically). In practice of course there's always a limit. For example for an asian word trie, the number of children of each node is limited to the size of asian characters, which is probably say 5000 or 10000.
Thanks to user534498's comment about Knuth's "Taocp volume 3, chapter 6.2 & 6.3"
Knuth claims - Ch 6.3
A trie is essentially an M-ary tree, whose nodes are M-place vectors
with components corresponding to digits or characters. each node on
level l represent the set of all keys that begin with a certain
sequence of l characters; the node specifies an M-way branch,
depending on the (l +1)st character.
K-ary, M-ary and N-ary being synonyms, it seems the answer is yes.

Binary Tree Search & Tracking

I have a binary tree of node containing an integer and a char. I'm working on Huffman Coding and I want to get the binary presentation of the nodes. A '0' is appended to the string for every left branching and a '1' is appended for every right branching.
I'm thinking of searching for a char but keeping track of its branches, if it's not in the left node, remove the last '0' appended to the string and go back up and check the right.
This looks very tasking. Is there another way for me to keep track of the node?
EDIT:
I have to use a Binary Tree.
Sounds like a stack data structure:
Wikipedia
C++ reference
You keep track of where you are in the tree by using the stack in this way:
path = std::stack<int>
move up to parent == pop()
move to left child == push(0)
move to right child == push(1)
Edit:
You may want to actually use a std::vector<int> with push_back and pop_back instead. It still behaves like a stack, but you can get the entire list of 0's and 1's at the end if you use a vector.
Are you talking about encoding the Huffman output?
You will want to build a table of output codes and lengths for each possible input character - don't traverse the tree on each input character.