Stuck on a Iterator Implementation of a Trie - c++

I have to implement a homemade Trie and I'm stuck on the Iterator part. I can't seem to figure out the increment method for the trie.
I hope someone can help me clear things out.
Here's the code for the Iterator:
template <typename T> class Trie<T>::IteratorPrefixe{
friend class Trie<T>;
public:
IteratorPrefixe() : tree(NULL), currentNode(NULL), currentKey("") {};
pair<string, T*> operator*() {return make_pair(currentKey, currentNode -> element);} ;
IteratorPrefixe operator++()throw(runtime_error);
void operator=(IteratorPrefixe iter) {tree = iter.tree; currentNode = iter.currentNode; currentKey = iter.currentKey;};
bool operator==(IteratorPrefixe iter) {return tree == iter.tree && currentNode == iter.currentNode;};
bool operator!=(IteratorPrefixe iter) {return tree != iter.tree || currentNode != iter.currentNode;};
private:
Trie<T> * tree;
Trie<T> * currentNode;
string currentKey;
};
And here's my Trie:
template <typename T> class Trie {
friend class IteratorPrefixe;
public:
// Create a Trie<T> from the alphabet of nbletters, where nbletters must be
// between 1 and NBLETTERSMAX inclusively
Trie(unsigned nbletters) throw(runtime_error);
// Add a key element of which is given in the first argument and content second argument
// The content must be defined (different from NULL pointer)
// The key is to be composed of valid letters (the letters between A + inclusive and exclusive nbletters
// Eg if nblettres is 3, a, b and c are the only characters permitted;
// If nblettres is 15, only the letters between a and o inclusive are allowed.
// Returns true if the insertion was achieved, returns false otherwise.
bool addElement(string, T*) throw(runtime_error);
// Deletes a key element of which is given as an argument and returns the contents of the node removed
// The key is to be composed of letters valid (see above)
// Can also delete at the same time the reference of the ancestors, if these ancestors are no longer used.
// Returns NULL if the item has no delete
T* removeElement(string cle) throw(runtime_error);
// Find a key element of which is given as an argument and returns the associated content
// The key is to be composed of letters valid (see above)
// Returns NULL if the key does not exist
T* searchElement(string cle) throw();
// Iterator class to browse the Trie <T> in preorder mode
class IteratorPrefixe;
// Returns an iterator pointing to the first element
IteratorPrefixe pbegin() throw(runtime_error);
// Returns an iterator pointing beyond the last item
IteratorPrefixe pend() throw();
private:
unsigned nbLetters;
T* element;
vector<Trie<T> *> childs;
Trie<T> * parent;
// This function removes a node and its ancestors if became unnecessary. It is essentially the same work
// as deleteElement that is how to designate remove a node that is changing. Moreover, unlike
// deleteElement, it does not return any information on the node removed.
void remove(Trie<T> * node) throw();
// This function is seeking a node based on a given key. It is essentially the same work
// searchElement but that returns a reference to the node found (or null if the node does not exist)
// The key is to be composed of letters valid (see above)
Trie<T>* search(string key) throw(runtime_error);
};

I'm glad to see Tries are still taught, they're an important data structure that is often neglected.
There may be a design problem in your code since you should probably have a Trie class and a Node class. The way you wrote it it looks like each node in your Trie is it's own trie, which can work, but will make some things complicated.
It's not really clear from your question what it is that you are having the problem with: figuring the order, or figuring the actual code?
From the name of the iterator, it sounds like it would have to work in prefix order. Since your trie stores words and its child nodes are organized by letters, then you are essentially expected to go over all the words in an alphabetic order. Every incrementation will bring you to the next word.
THe invariant about your iterator is that at any point (as long as it is valid), it should be pointing at a node with a "terminator character" for a valid word. Figuring that word merely involves scanning upwards through the parent chain till you find your entire string. Moving to the next word means doing a DFS search: go up once, scan for links in later "brothers", see if you find a word, if not recursively go up, etc.

You may want to see my modified trie implementations at:
jdkoftinoff's trie
Specifically, you may find the discussion I had on comp.lang.c++.moderated about implementing iterators for trie's in a STL compliant way, which is a problem since all stl containers unfortunately are forced to use std::pair<>, and the iterator therefor must contain the value instead of just a reference to the single node in the trie.

For one thing, the code shown does not actually describe a trie. Rather, it appears to be a tree containing a pair of elements in each node (T* and unsigned). You can by discipline use a tree of tuples as a trie, but it's only by convention, not enforcement. This is part of why you're having such a hard time implementing operator++.
What you need to do is have each Trie contain a left-right disjoint ADT, rather than just the raw elements. It's a layer of abstraction which is more commonly found in functional languages (e.g. Scala's Either). Unfortunately, C++'s type system isn't quite powerful enough to do something that elegant. However, there's nothing preventing you from doing this:
template <class L, class R>
class Either
{
public:
Either(L *l) : left(l), right(0)
{}
Either(R *r) : left(0), right(r)
{}
L *get_left() const
{
return left;
}
R *get_right() const
{
return right;
}
bool is_left() const
{
return left != 0;
}
bool is_right() const
{
return right != 0;
}
private:
L *left;
R *right;
};
Then your Trie's data members would be defined as follows:
private:
Either<unsigned, T*> disjoint;
vector<Trie<T> *> children; // english pluralization
Trie<T> * parent;
I'm playing fast and loose with your pointers, but you get the gist of what I'm saying. The important bit is that no given node can contain both an unsigned and a T*.
Try this, and see if that helps. I think you'll find that being able to easily determine whether you are on a leaf or a branch will help you tremendously in your attempt to iterate.

Related

Tree traversal falls into infinite loop (with huffman algorithm implementation)

I am trying implementing the huffman algorithm following the steps described in this tutorial: https://www.programiz.com/dsa/huffman-coding, and so far I got this code:
void encode(string filename) {
List<HuffmanNode> priorityQueue;
List<Node<HuffmanNode>> encodeList;
BinaryTree<HuffmanNode> toEncode;
//Map<char, string> encodeTable;
fstream input;
input.open(filename, ios_base::in);
if (input.is_open()) {
char c;
while (!input.eof()) {
input.get(c);
HuffmanNode node;
node.data = c;
node.frequency = 1;
int pos = priorityQueue.find(node);
if(pos) {
HuffmanNode value = priorityQueue.get(pos)->getData();
value++;
priorityQueue.update(pos, value);
} else {
priorityQueue.insert(node);
}
}
}
input.close();
priorityQueue.sort();
for(int i=1; i<=priorityQueue.size(); i++)
encodeList.insert( priorityQueue.get(i) );
while(encodeList.size() > 1) {
Node<HuffmanNode> * left = new Node<HuffmanNode>(encodeList.get(1)->getData());
Node<HuffmanNode> * right = new Node<HuffmanNode>(encodeList.get(2)->getData());
HuffmanNode z;
z.data = 0;
z.frequency = left->getData().frequency + right->getData().frequency;
Node<HuffmanNode> z_node;
z_node.setData(z);
z_node.setPrevious(left);
z_node.setNext(right);
encodeList.remove(1);
encodeList.remove(1);
encodeList.insert(z_node);
}
Node<HuffmanNode> node_root = encodeList.get(1)->getData();
toEncode.setRoot(&node_root);
}
full code for the main.cpp here: https://pastebin.com/Uw5g9s7j.
When I try run this, the program read the bytes from the file, group each character by frequency and order the list, but when I try generate the huffman tree, I am unable to traverse this tree, always falling into a infinte loop (the method get stuck in the nodes containing the 2 first items from the priorityQueue above).
I tried the tree class with BinaryTree<int>, and everything works fine in this case, but with the code above the issue happens. The code for the tree is this (in the code, previous == left and next == right - I am using here the same Node class already implemented for my List class): https://pastebin.com/ZKLjuBc8.
The code for the List used in this example is: https://pastebin.com/Dprh1Pfa. And the code for the Node class used for both the List and the BinaryTree classes is: https://pastebin.com/ATLvYyft. Anyone can tell me what I am missing here? What I am getting wrong here?
UPDATE
I have tried a version using only c++ stl (with no custom List or BinaryTree implementations),but the same problem happened. The code is that: https://pastebin.com/q0wrVYBB.
Too many things to mention as comments so I'm using an answer, sorry:
So going top to bottom through the code:
Why are you defining all methods outside the class? That just makes the code so much harder to read and is much more work to type.
Node::Node()
NULL is C code, use nullptr. And why not use member initialization in the class?
class Node {
private:
T data{};
Node * previous{nullptr};
Node * next{nullptr};
...
Node::Node(Node * node) {
What is that supposed to be? You create a new node, copy the value and attach it to the existing list of Nodes like a Remora.
Is this supposed to replace the old Node? Be a move constructor?
Node::Node(T data)
Write
Node<T>::Node(T data_ = T{}) : data{data_} { }
and remove the default constructor. The member initialization from (1) initializes the remaining members.
Node::Node(T data, Node * previous, Node * next)
Again creating a Remora. This is not inserting into an existing list.
T Node::getData(), void Node::setData(T value)
If everyone can get and set data then just make it public. That will also mean it will work with cons Node<T>. Your functions are not const correct because you lack all the const versions.
Same for previous and next. But those should actually do something when you set the member. The node you point to should point back to you or made to do so:
void Node::setPrevious(Node * previous) {
// don't break an existing list
assert(this->previous == nullptr);
assert(previous->next == nullptr);
this->previous = previous;
previous->next = this;
}
Think about the copy and move constructors and assignment.
Follow the rule of 0/3/5: https://en.cppreference.com/w/cpp/language/rule_of_three . This goes for Node, List, ... all the classes.
List::List()
Simpler to use
Node<T> * first{nullptr};
List::~List()
You are deleting the elements of the list front to back, each time traversing the list from front till you find index number i. While horrible inefficient the front nodes have also already been deleted. This is "use after free".
void List::insert(T data)
this->first = new Node<T>();
this->first->setData(data);
just write
first = new Node<T>(data);
And if insert will append to the tail of the list then why not keep track of the tail so the insert runs in O(1)?
void List::update(int index, T data)
If you need access to a list by index that is a clear sign that you are using the wrong data structure. Use a vector, not a list, if you need this.
void List::remove(int index)
As mentioned in comments there are 2 memory leaks here. Also aux->next->previous still points at the deleted aux likely causing "use after free" later on.
int List::size()
Nothing wrong here, that's a first. But if you need this frequently you could keep track of the size of the list in the List class.
Node * List::get(int index)
Nothing wrong except the place where you use this has already freed the nodes so this blows up. Missing the const counterpart. And again a strong indication you should be using a vector.
void List::set(int index, Node * value)
What's this supposed to do? Replace the n-th node in a list with a new node? Insert the node at a specific position? What it actually does it follow the list for index steps and then assign the local variable aux the value of value. Meaning it does absolutely nothing, slowly.
int List::find(T data)
Why return an index? Why not return a reference to the node? Also const and non-const version.
void List::sort()
This code looks like a bubblesort. Assuming it wasn't totaly broken by all the previous issues, would be O(n^4). I'm assuming the if(jMin != i) is supposed to swap the two elements in the list. Well, it's not.
I'm giving up now. This is all just the support classes to implement the BinaryTree, which itself is just support. 565 lines of code before you even start with your actual problem and it seems a lot of it broken one way or another. None of it can work with the state Node and List are in. Especially with copy construction / copy assignment of lists.

The proper way to increment my binary tree

Here is the class I've created
#include <memory>
template <typename T>
class binary_tree {
private:
T t_data;
std::unique_ptr<binary_tree<T>> t_left, t_right;
class binary_tree_iterator { // -----------------------
private:
T data;
public:
binary_tree_iterator(T d) : data(d) {} // Iterator class
T& operator*() {return data;}
binary_tree_iterator& operator++() {} // <--------- ??????
};
// ------------------------
public:
binary_tree(T d) : t_data(d), t_left(nullptr), t_right(nullptr)
{}
void insert(T data) {
if(data <= t_data) {
if(t_left == nullptr) {
t_left = std::unique_ptr<binary_tree<T>>(new binary_tree<T>(data));
} else {
t_left->insert(data);
}
} else {
if(t_right == nullptr)
t_right = std::unique_ptr<binary_tree<T>>(new binary_tree<T>(data));
else
t_right->insert(data);
}
}
const T data() const {
return t_data;
}
const std::unique_ptr<binary_tree<T>>& left() const {
return t_left;
}
const std::unique_ptr<binary_tree<T>>& right() const {
return t_right;
}
binary_tree_iterator begin() {
if(t_left == nullptr) {
return binary_tree_iterator(t_data);
} else {
return t_left->begin();
}
}
binary_tree_iterator end() {
if(t_right == nullptr) {
return binary_tree_iterator(t_data);
} else {
return t_right->end();
}
}
};
I've declared my iterator class inside of my container class. This may have been a mistake but either way I'm not sure how to define my overloaded increment function. Once I've found begin() I've lost my way back. It seems like unique_ptr() is designed for one way pointing. Assuming I have to use unique_ptr in this fashion, is there some work around here? I've thought about giving each instance of binary_tree a head member that points back from whence it came, but each node should only be accessible from the node above it. I make some sort of index but that seems to completely defeat the purpose of this container type. I'm solving exercise so I'm restricted to using the unique_ptr.
You defined your iterator as containing the data value in your tree.
This is not what iterators are all about. Iterators do not contain the value they're referencing, but rather a reference (in the common meaning of the word, and not a C++ term) to it, typically a pointer.
Of course you can't figure out what to do with ++. For your iterator, it is natural to expect that the ++ operator will advance the iterator to the next node in your tree, but since the iterator does not contain a pointer to anything, you have nothing to advance there, and run into a mental block.
You will need to redesign your iterator so that it contains a pointer to your binary_tree; its * overload dereferences; and the ++ advances to the next element in your binary tree, which it will then be able to do, using its pointer.
At this point you will run into another mental block. Iterating through an entire binary tree requires, at some point, to back up to parent nodes in the tree. After all, after recursing into the left part of the binary tree, at some point, after iterating through the binary tree you will need to, somehow, in some way, wind up in the right part of the binary tree. However, as designed, your binary_tree has no means of navigating to any node's parent. That's another design flaw you will need to address, in some fashion.
It is possible, I suppose, to implement this entire backtracking in the iterator itself, having the iterator record each node its visited, so it can back up to it, when needed. But iterators are supposed to be lightweight objects, barely more than a pointer themselves, and not a full blown data structure that implements complicated operations.
In summary, you have several holes in the design of your binary tree that you will need to address, before you can implement an effective iterator for it.

Custom iterator on a tree structure in c++

I am implementing a tree structure in c++ with a node class like this:
class Node {
protected:
// relations
Node *_parent;
std::vector<Node*> _children;
public:
// some example method
void someMethod(Node *node) {
// do something with *node
for (int i = 0; i < node->_children; i++) {
_children[i]->myFunction;
}
}
}
Now, to work on the nodes in my tree I am implementing recursive functions like someMethod in my example.
It works, but I end up writing the same recursion code over and over again for every new function that works on my tree.
Is there a generic way to iterate a tree structure like I would on a plain array?
Some method that returns the next object, until I'm done with the whole branch.
EDIT:
Thanks to everybody who has commented so far, with your help I could narrow down the problem.
From my understanding (I'm new to c++), I need an iterator class that encapsulates the code for traversing my tree.
Accessing all tree members should be as simple as that:
for (Node<Node*>::iterator it = _node.begin(); it != _node.end(); ++it) {
Node *node = *it;
// do something with *node
}
Now the question is:
How do I implement such an iterator?
Pass a function pointer to the recursive function that returns the node that you are seeking.
This is the power of function pointers and function pointer arrays in C/C++.
Many function do not simply iterate over all nodes, if the tree is (normally) sorted, then to find the largest value you will only look in the right subtree.
If you search the minimum it is in the left most subtree.
Therefore not always it makes sense, to have an iterator that iterates the whole tree.
But if you need exactly to iterate over all nodes, you can use function pointers, or the Visitor Pattern (Erich Gamma, Design Patterns).

generic "out of bounds", "past end" iterator

In my application I have a (unbalanced) tree datastructure. This tree is simply made of "std::list of std::lists" - node holds an arbitrary "list" of sub-nodes. Using this instead of a single list made the rest of the application a lot easier. (The program is about changing moving nodes from one tree to another tree / another part in the tree / to it's own tree).
Now an obvious task is to find a subtree inside a "tree". For non-recursive searches it is simple enough:
subtree_iterator find_subtree(const N& n) {
auto iter(subtrees.begin());
auto e(subtrees.end());
while (iter != e) {
if ((*iter)->name == n) {
return iter;
}
++iter;
}
return e;
}
Which returns an iterator to the subtree position. The problem however starts when I try to implement a multi-level search. Ie, I wish to search for hello.world.test where the dots mark a new level.
Searching worked alright
subtree_iterator find_subtree(const pTree_type& otree, std::string identify) const {
pTree_type tree(otree);
boost::char_separator<char> sep(".");
boost::tokenizer<boost::char_separator<char> > tokens(identify, sep);
auto token_iter(tokens.begin());
auto token_end(tokens.end());
subtree_iterator subtree_iter;
for (auto token_iter(tokens.begin()); token_iter != token_end; ++token_iter) {
std::string subtree_string(*token_iter);
subtree_iter = tree->find_subtree_if(subtree_string);
if (subtree_iter == tree->subtree_end()) {
return otree->subtree_end()
} else {
tree = *subtree_iter;
}
}
return subtree_iter;
}
On first glace it seemed to work "correct", however when I try to use it, it fails. Using it would be like
auto tIn(find_subtree(ProjectTree, "hello.world.test"));
if (tIn != ProjectTree->subtree_end()) {
//rest
}
however that gives a debug assertion error "list iterators not compatible". This isn't too weird: I'm comparing a iterators from different lists to each other. However I could I implement such a thing? My "backup" option would be to return a std::pair<bool,iterator> where the boolean part determines if the tree actually exists. Is there another method, short of making the whole tree single list?
You should not work on iterators internaly. Use nodes instead.
template <typename T>
struct Node {
T item;
Node<T>* next;
};
Then encapsulate your Node in an iterator facade like this :
template<typename T>
class iterator {
private:
Node<T>* node;
public:
...
};
Then use a generic invalid node (when node is nullptr) that is returned whenever end() is reached or returned.
Note that what i suggest is a single linked list (not double linked list as the standard one). this is because you can't go back from an invalid generic end() iterator that point to an invalid null node.
If you don't use iterator operator--() in your algorithms this should be fine.
std::vector<list_iterator> stack to traverse? Where the .back() of the stack is the only one allowed to be equal to end() of the previous one, and .front() is an iterator to the root list?

Good Node definition for ordered tree in C++

I have a rooted ordered tree representing sets of integers. Each node stores the size of the associated subtree, and also the max and min elements in this subtree. The branch degree of all the nodes if fixed (but determined at runtime). Also for sufficiently small subtrees I would like to change the representation to a bitmap for the subset associated. For example the root node may store a set of size 1000000, one of this children would store a subset of size 100000, then again one of his children would store a subset of size 10000 and in the next level we would stop using this representation and store just a plain bitmap for the associated subset.
I'm trying to implement this structure in C++ and my definition for the node type stores three integers (size, min and max), an array of pointers (something like node_t ** children) to subtrees and the bitmap (in case we are using this representation). The problem is that all the nodes are storing at least one element which is irrelevant (if the set is big enough we would be using the array of pointers but not the bitmap, for example). How should the node type be declared to solve this problem ? I thought about using two subtypes of node (one for each case) but I am not sure what the impact on the performance at runtime would be.
Thanks in advance.
PS. Please let me know if the question is unclear to edit it.
Since you're using multiple representations, you'll probably need at least two node types: The first will be a generic node that handles the root as well as nearby descendants, and the second type will contain a pointer to a map. The latter nodes don't have any children persay, but their immediate ancestors should see them as an entire sub-tree rather than a terminating node that points to a map.
Since each of the upper nodes have pointers to their children, you'll need a way to ensure that these pointers are also able to point to the mapNodes as well as the branching ones. A good way to do this is to create a virtual base node type with a virtual function that returns whatever data you're looking for. For example:
class baseNode {
virtual int getLargest();
virtual baseNode* addData(int);
};
class leafNode : baseNode { //for non-map termination
leafNode(int in) {Data = in;}
int getLargest() {return Data;}
baseNode* addData(int);
int Data;
};
class treeNode : baseNode {
public:
int getLargest(); //returns leftChild->getLargest(), etc
baseNode* addData(int);
baseNode* leftChild;//can point to either a treeNode or mapNode
baseNode* rightChild;
};
class mapNode : baseNode {
baseNode* addData(int);
int getLargest(); //parses subMap to find/return the desired value
Map* subMap;
};
You'll need a bit of finessing to get it to do what you need it to, but the principle is the same. Keep in mind that with 1m objects, every byte you add increases the net memory use by about a megabyte, so do try to keep things minimal. If all of your branching nodes eventually reach a mapNode, you can eliminate the leafNode declaration altogether.
Adding data to the structure is tricky, especially since you're working with multiple types and the parents (hopefully) don't know anything about their neighbors; Use virtual accessors to do what's needed. In many scenarios, if a branching node tries to add a value 'down the line', the child node it references may need to change type. In this case, the child should construct the new substructure then return it to the parent. This can be done like so:
baseNode* treeNode::addData(int in) {
if ((childCount+1) < threshold) { //not enough to merit a map
//....
//if (input needs to go to the leftChild) {
if (leftChild == 0) {
leftChild = new leafNode(in);
} else {
leftChild = leftChild->addData(in);
}
//}
return (baseNode*)this; //casting may be optional
} else { //new Data merits converting self + kids into a map
mapNode* newMap = new mapNode();
//Set newMap->subMap to children, deleting as you go
delete this;//remove self after return
return (baseNode*)newMap; //return the mapNode holding subtree
}
}
baseNode* leafNode::addData(int in) {
treeNode* tmpNode = new treeNode(); //create replacement
tmpNode->leftChild = this; //pin self to new node
tmpNode->rightChild = new leafNode(in); //store data
return (baseNode*)tmpNode;
}
baseNode* mapNode::addData(int in) {
subMap->addValue(in);//However you do it...
return (baseNode*)this; //parent is always a treeNode
}
The leftChild = leftChild->addData(in); usually won't actually modify anything, especially if it points to a treeNode, however it doesn't really hurt anything to do so and the extra if (newPtr != leftChild) check would just add unnecessary overhead. Note that it will cause a change if a leafNode needs to change into a treeNode with multiple kids, or if it's a treeNode with enough children to merit changing itself (and it's kids!) into a mapNode.