Binary Tree Search & Tracking - c++

I have a binary tree of node containing an integer and a char. I'm working on Huffman Coding and I want to get the binary presentation of the nodes. A '0' is appended to the string for every left branching and a '1' is appended for every right branching.
I'm thinking of searching for a char but keeping track of its branches, if it's not in the left node, remove the last '0' appended to the string and go back up and check the right.
This looks very tasking. Is there another way for me to keep track of the node?
EDIT:
I have to use a Binary Tree.

Sounds like a stack data structure:
Wikipedia
C++ reference
You keep track of where you are in the tree by using the stack in this way:
path = std::stack<int>
move up to parent == pop()
move to left child == push(0)
move to right child == push(1)
Edit:
You may want to actually use a std::vector<int> with push_back and pop_back instead. It still behaves like a stack, but you can get the entire list of 0's and 1's at the end if you use a vector.

Are you talking about encoding the Huffman output?
You will want to build a table of output codes and lengths for each possible input character - don't traverse the tree on each input character.

Related

"Guessing" what side of a doubly linked list to start on

A bit hard to explain what I'm planning, but here it goes. I have a doubly linked list of objects which are ordered alphabetically by a member attribute called name. I wish to remove a Node with a specific name, but I would like to remove it in such a way that it is more likely to start looking for it on the side of the list closer to it.
So I was thinking that I would have to find the 'midpoint' between the first Node's name and the last Node's name. Then I will check to see if that midpoint is less than the name of the Node. If it is less, I will start from the tail, otherwise I will start from the head.
The problem I am having is that I am unable to convert a string directly into an int. My potential solution is this:
Convert each individual character in the head and tail to an int
Put each conversion into an int array, one array for the head, one for the tail
Convert each int into a string again and put them into a new array
Make each converted string have a length of 3 by inserting 0s into them if they have less than a length of 3
Add the strings in each array together
Convert the strings to int again and find the difference between the two ints and divide that by 2
Add the new value to the first Node's converted name
Find if this 'midpoint' is less than the name of the Node I want to remove
If it is, start searching from the tail
Else, search from the head
Is there any easier way to go about doing this?
Alf's comment is realistically what you want. In order to decide which end to be on, you are getting maximum resolution by simply finding the first different character and then picking based on the midpoint.
Algorithm idea
list = ["apple", "banana", "orange"]
word_to_search_for = ["banana"]
index = 0
while list[0][index] == list[last][index]:
if word_to_search_for[index] != list[0][index]:
return "word not in list"
++index
spread = list[last][index] - list[0][index]
if (word_to_search_for[index] - list[0][index])> spread/2:
start at last
else:
start at 0
As others have already alluded, your main problem is that you're using the wrong data structure. Your question shouldn't be "How do I make a double linked list operate in a manner that is distinctly unlike a double linked list?", it should be "What is the best data structure for {insert your specific use case}?".
Reading between the lines, it appears that you're after something that allows for insertions, removals and relatively high speed scans. This leads me to suggest a Left Leaning Red Black Tree: see https://en.wikipedia.org/wiki/Left-leaning_red%E2%80%93black_tree
You could create an array of pointers to some sub-set of nodes in the list, like pointers to the first, middle, and last node of a list. You could use more pointers to reduce the search time, perhaps 4 to 16 pointers. Sort of a hierarchical overall structure. The array would need to be updated as nodes are deleted (at least the pointers to deleted nodes, pick the node before or after if this happens, or shrink the array). At some point, a tree like structure would be better.

Given a binary search tree and a number, find a path whose node's data added to be the given number.

Given a binary search tree and a number, find if there is a path from root to a leaf such that all numbers on the path added up to be the given number.
I know how to do it by recursively. But, I prefer an iterative solution.
If we iterate from root to a leaf each time, there will be overlap because some paths may have overlap.
What if the tree is not binary search ?
Thanks
Basically this problem can be solved using Dynamic Programming on tree to avoid those overlapping paths.
The basic idea is to keep track of the possible lengths from each leaf to a given node in a table f[node]. If we implement it in a 2-dimensional boolean array, it is something like f[node][len], which indicates whether there is a path from a leaf to node with length equal to len. We can also use a vector<int> to store the value in each f[node] instead of using a boolean array. No matter what kind of representation you use, the way you calculate between different f are straightforward, in the form of
f[node] is the union of f[node->left] + len_left[node] and f[node->right] + len_right[node].
This is the case of binary tree, but it is really easy to extend it to non-binary-tree cases.
If there is anything unclear, please feel free to comment.
Anything you can do recursively, you can also do iteratively. However you are not having performance issues with the recursive solution, then I would leave it as is. It would more likely than not be more difficult to code/read if you try to do it iteratively.
However if you insist, you can transform your recursive solution into an iterative one by using a stack. Every time you make a recursive call, push the state variables in your current function call onto the stack. When you are done with a call, pop off the variables.
For BST:
Node current,final = (initialize)
List nodesInPath;
nodesInPath.add(current);
while(current != final) {
List childrenNodes = current.children;
if(noChildren) noSolution;
if(current < final) {
//choose right child if there is one, otherwise no solution
current = children[right];
} else if(current > final){
//choose left child if there is one, otherwise no solution
current = children[left];
}
nodesInPath.add(current);
}
check sum in the nodesInPath
However, for non BST you should apply a solution using dynamic programming as derekhh suggests if you don't want to calculate same paths over and over again. I think, you can store the total length between a certain processed node and the root node. You calculate the distances when you expand them. Then you would apply Breadth-first search to not to traverse same paths again and use previously computed total distances. The algorithm comes to my mind is a little complex, sorry but not have enough time to write it.

Binary Search Tree - to Make a dictionary

I wanted to make a dictionary using BST but I did not have any Idea how to store them in the tree
struct node
{
char word[50];
char meaning[256];
struct node *left, *right;
};
I started like that but I dont know which words to put in the left and which on the right...
Instead of a binary tree, you should use something like a suffix tree. BSTs are really more for "greater/less-than" relationships, which would be hard to map with words. With suffix trees your nodes are characters and branches eventually lead to leaves representing an actual word.
Which words to put left and which to put right would still follow the basic rules of a BST: All nodes to the left of a given root are guaranteed to be less than that root's value, and all nodes to the right of a given root are guaranteed to be greater than or equal to that root's value.
Apply that same principle to your dictionary. I don't know if you're using C or C++, but if you're using C++, I would recommend making a "Word" struct, and overloading it's equality operators. Then in your "node" struct, just have a Word, a left Node, and a right Node.
A BST is not the best choice of a data structure for a dictionary though. I would look into different types of maps and hashing.
Typically, the words that are lexicograpically smaller than the word in the current node go left, the rest goes right. Use < to do the comparison on C++'s std::string, strcmp for C-style strings (NUL-terminated char arrays).

MinMax Heap implementation without an array

I found lots of MinMax Heap implementations, that were storing data in an array. It is realy easy to implement, that is way I am looking for something different. I want to create a MinMax Heap using only elements of the Heap with pointers to left child and right child (and afcourse a key to compare). So the Heap have only pointer to the root object (min level), and a root object have a pointer to his children (max level) and so on. I know how to insert a new object (finding a proper path by using binary represenation of int depending on Heap size), but I don't know how to implement the rest (push up (down) the element, find parent or grandparent).
Thx for help
A priority queue using a heap ordered binary tree can be implemented using a triply linked list structure instead of an array. you will need three links per node:two to traverse down and one to traverse up.
The heapq module source code shows to implement the steps for pushing up and down. To switch from an array implementation to a pointer implementation, replace the arr[2*n+1] computation with node.left and arr[2*n+2] with node.right. For parent references such as arr[(n-1)>>1], every node will need a pointer to its parent, node.parent.
Alternatively, you can adopt a functional style which makes this all very easy to implement. I found the code for treaps implemented in Lisp to be an inspiration for how to do this.
I have solved this problem as part of an assignment long back. You can find it here
I have multiple implementations in Java and C++ implementing MinHeap with and without arrays. See my Java implementations for the solution. And yes it is very much possible to implement Heap without arrays. You just have to remember where to insert the next node and how to heapify and reverse heapify.
Edit1: I also tried to look up any existing solutions for min heap without arrays but couldn't find any. So, I am posting it here so it could be helpful for anyone who wishes to know the approach.
Yes, you can implement it without relying on an array.
I personally relied on a binary counter...
Here is my implementation(https://github.com/mohamedadnane8/HeapsUsingPointers) in c.
Note that this is still a very fast implementation with log(n).
1 => binary "1"
2=> "10" 3=> "11"
4=> "100" 5= "101" 6="110" 7="111"
In this program i tried to use the sequence of numbers to insert and delete as u can see above the tree can be easily represented as binary strings of numbers.
The first '1' in the binary string is to start.
After that the sequence of 0 and 1 determines where to go '1' means go to the left and '0' go to the right.
Also, note that this implementation relies on a very small array of characters or integers that make the calculation of the binary numbers faster but u can rely on bin() function to convert ur counter to a binary number(I implemented the array just to practice a bit my problem-solving skills).
Sorry if I couldn't explain it very well, I lack a bit in my communication skills.
It is hard implement binary heap without array. Because you should keep all the parent while inserting you pass and then do operation push up and down. like that [parent_1, parent_2 ... parant_k] and then if parent_(k+1) < parant_k pushUp and rearrange their right child and left child

Efficient Huffman tree search while remembering path taken

As a follow up question related to my question regarding efficient way of storing huffman tree's I was wondering what would be the fastest and most efficient way of searching a binary tree (based on the Huffman coding output) and storing the path taken to a particular node.
This is what I currently have:
Add root node to queue
while queue is not empty, pop item off queue
check if it is what we are looking
yes:
Follow a head pointer back to the root node, while on each node we visit checking whether it is the left or right and making a note of it.
break out of the search
enqueue left, and right node
Since this is a Huffman tree, all of the entries that I am looking for will exist. The above is a breadth first search, which is considered the best for Huffman trees since items that are in the source more often are higher up in the tree to get better compression, however I can't figure out a good way to keep track of how we got to a particular node without backtracking using the head pointer I put in the node.
In this case, I am also getting all of the right/left paths in reverse order, for example, if we follow the head to the root, and we find out that from the root it is right, left, left, we get left, left, right. or 001 in binary, when what I am looking for is to get 100 in an efficient way.
Storing the path from root to the node as a separate value inside the node was also suggested, however this would break down if we ever had a tree that was larger than however many bits the variable we created for that purpose could hold, and at that point storing the data would also take up huge amounts of memory.
Create a dictionary of value -> bit-string, that would give you the fastest lookup.
If the values are a known size, you can probably get by with just an array of bit-strings and look up the values by their index.
If you're decoding Huffman-encoded data one bit at a time, your performance will be poor. As much as you'd like to avoid using lookup tables, that's the only way to go if you care about performance. The way Huffman codes are created, they are left-to-right unique and lend themselves perfectly to a fast table lookup.