What does the following statement in implementation of tries do - c++

I have been trying to implement a C++ implementation of insertion of trie data-structure, working through a blog in which there are a few things I am unable to understand http://theoryofprogramming.com/2015/01/16/trie-tree-implementation/
#define ALPHABETS 26
#define CASE 'a'
#define MAX_WORD_SIZE 25
using namespace std;
struct Node
{
struct Node * parent;
struct Node * children[ALPHABETS];
vector<int> occurrences;
};
// Inserts a word 'text' into the Trie Tree
// 'trieTree' and marks it's occurence as 'index'.
void InsertWord(struct Node * trieTree, char word[], int index)
{
struct Node * traverse = trieTree;
while (*word != '\0') { // Until there is something to process
if (traverse->children[*word - CASE] == NULL) {
// There is no node in 'trieTree' corresponding to this alphabet
// Allocate using calloc(), so that components are initialised
traverse->children[*word - CASE] = (struct Node *) calloc(1, sizeof(struct Node));
traverse->children[*word - CASE]->parent = traverse; // Assigning parent
}
traverse = traverse->children[*word - CASE];
++word; // The next alphabet
}
traverse->occurrences.push_back(index); // Mark the occurence of the word
}
// Prints the 'trieTree' in a Pre-Order or a DFS manner
// which automatically results in a Lexicographical Order
void LexicographicalPrint(struct Node * trieTree, vector<char> word)
{
int i;
bool noChild = true;
if (trieTree->occurrences.size() != 0) {
// Condition trie_tree->occurrences.size() != 0,
// is a neccessary and sufficient condition to
// tell if a node is associated with a word or not
vector<char>::iterator charItr = word.begin();
while (charItr != word.end()) {
printf("%c", *charItr);
++charItr;
}
printf(" -> # index -> ");
vector<int>::iterator counter = trieTree->occurrences.begin();
// This is to print the occurences of the word
while (counter != trieTree->occurrences.end()) {
printf("%d, ", *counter);
++counter;
}
printf("\n");
}
for (i = 0; i < ALPHABETS; ++i) {
if (trieTree->children[i] != NULL) {
noChild = false;
word.push_back(CASE + i); // Select a child
// and explore everything associated with the cild
LexicographicalPrint(trieTree->children[i], word);
word.pop_back();
// Remove the alphabet as we dealt
// everything associated with it
}
}
word.pop_back();
}
int main()
{
int n, i;
vector<char> printUtil; // Utility variable to print tree
// Creating the Trie Tree using calloc
// so that the components are initialised
struct Node * trieTree = (struct Node *) calloc(1, sizeof(struct Node));
char word[MAX_WORD_SIZE];
printf("Enter the number of words-\n");
scanf("%d", &n);
for (i = 1; i <= n; ++i) {
scanf("%s", word);
InsertWord(trieTree, word, i);
}
printf("\n"); // Just to make the output more readable
LexicographicalPrint(trieTree, printUtil);
return 0;
}
I am unable to understand what this statement in insertword does:
if (traverse->children[*word - CASE] == NULL)
Also as we have initialised all the elements to 1 in the main function then how we can it be null?

The function InsertWord() dynamically adds a new word into the trie, and in the process, creates new nodes whenever that word's prefix does not match the prefix of another word already added in the trie.
This is exactly what your line is testing for. From what I can see, traverse is a pointer to the current Node of a prefix of the word. *word is the next character in the word after the prefix. If the node corresponding to the current k-prefix of the word doesn't have a child (pointer is NULL) with label corresponding to the next character, that means we have to allocate a new node for the next k+1-prefix of the word.

Related

How to display a tree by level in C++

I'm trying display a tree level by level but it seems like I'm having trouble understanding the concept. I know that I need to use a queue to push the different values in and then pop them later on. The part that is confusing me the most is trying to determine when I need to include a new line. Here's the most of the code.
Also each node has a Character, a Count, and left and right pointer.
//CharacterAnalyzer.cpp
string CharacterAnalyzer::displayByLevel(nodeptr_t const node, Function visit) const {
/* TODO (2):
* Display the tree one level at a time. For each level
* you must display all the nodes at that level.
*
* If a child is absent (i.e. nullptr), then output 'null'.
*
* So, for the text "Hello all!" the level display will be:
* (H,1)
* (E,1)(L,4)
* (A,1)(null)(null)(O,1)
* (null)(null)(null)(null)
*/
}// end displayByLevel()
I'm assuming this is used for the Function visit in the parameter. This is the the CharacterAnalyzer.h
// using Function = string (const char character, const int count) *;
typedef string (* Function)(const char character, const int count);
Here's one method I tried that failed
string str = "";
if (node == nullptr) {
return str += "(null)";
}
queue<nodeptr_t> q;
q.push(node);
while (q.empty() == false)
{
// nodeCount (queue size) indicates number
// of nodes at current level.
int nodeCount = q.size();
while (nodeCount > 0)
{
nodeptr_t node = q.front();
str += visit(node->character, node->count) + " ";
q.pop();
if (node->left != nullptr)
q.push(node->left);
if (node->right != nullptr)
q.push(node->right);
nodeCount--;
}
str += "";
}

print all words of a dictionary using trie

I am working on a dictionary using a trie with the following struct in c
struct trie_node {
int is_end; //0 is is not the end of the word ,otherwise 1
char c;
struct trie_node* child[26];
};
I am able to insert words, search words and I would like to print all the words of the dictionary. Not sure how to handle it. I was trying to print
void print(struct trie_node node) {
int i = 0;
for (i = 0; i < 26; i++) {
if (node->child[i] != NULL) {
printf("%c", node->child[i]->c);
print(node->child[i]);
}
}
}
But it is not printing correctly
if for example I have the words
beer
bee
bear
beast
it is printing
bearster
it should print
bearbeastbeebeer
How can I print correctly the list of words ?
You need to keep track of the path (path from the root to the current node). When you reach to an end node (is_end is true), you print the path which is the dictionary word.
One approach is to use an array of char and keep track of its length so you know how many of elements you need to print. See the code below:
void print_path (char *path, int len){
int i;
for(i = 0; i < len; i++)
printf("%c", path[i]);
}
void print(struct trie_node* node, char *path, int len) {
// sanity check
if (! node)
return;
// current node is part of the current path, so add it
path[len++] = node->c;
// if it is an end node then print the path
if (node->is_end)
print_path(path, len);
// now go through the children and recursive call
int i = 0;
for (i = 0; i < 26; i++) {
if (node->child[i] != NULL) {
print(node->child[i], path, len);
}
}
}
int main(){
// proper allocation for the trie
// ...
// calling the print, assuming the height of tree is at most 128
char path[128];
print(b, path, 0);
}
you can try to use node.child[i]->c,when use struct var you must use a ".",when use struct point must use "->" or "(&point).",i don't know my think is true : )

Properly exiting out of recursions?

TrieNode and Trie Object:
struct TrieNode {
char nodeChar = NULL;
map<char, TrieNode> children;
TrieNode() {}
TrieNode(char c) { nodeChar = c; }
};
struct Trie {
TrieNode *root = new TrieNode();
typedef pair<char, TrieNode> letter;
typedef map<char, TrieNode>::iterator it;
Trie(vector<string> dictionary) {
for (int i = 0; i < dictionary.size(); i++) {
insert(dictionary[i]);
}
}
void insert(string toInsert) {
TrieNode * curr = root;
int increment = 0;
// while letters still exist within the trie traverse through the trie
while (curr->children.find(toInsert[increment]) != curr->children.end()) { //letter found
curr = &(curr->children.find(toInsert[increment])->second);
increment++;
}
//when it doesn't exist we know that this will be a new branch
for (int i = increment; i < toInsert.length(); i++) {
TrieNode temp(toInsert[i]);
curr->children.insert(letter(toInsert[i], temp));
curr = &(curr->children.find(toInsert[i])->second);
if (i == toInsert.length() - 1) {
temp.nodeChar = NULL;
curr->children.insert(letter(NULL, temp));
}
}
}
vector<string> findPre(string pre) {
vector<string> list;
TrieNode * curr = root;
/*First find if the pre actually exist*/
for (int i = 0; i < pre.length(); i++) {
if (curr->children.find(pre[i]) == curr->children.end()) { //DNE
return list;
}
else {
curr = &(curr->children.find(pre[i])->second);
}
}
/*Now curr is at the end of the prefix, now we will perform a DFS*/
pre = pre.substr(0, pre.length() - 1);
findPre(list, curr, pre);
}
void findPre(vector<string> &list, TrieNode *curr, string prefix) {
if (curr->nodeChar == NULL) {
list.push_back(prefix);
return;
}
else {
prefix += curr->nodeChar;
for (it i = curr->children.begin(); i != curr->children.end(); i++) {
findPre(list, &i->second, prefix);
}
}
}
};
The problem is this function:
void findPre(vector<string> &list, TrieNode *curr, string prefix) {
/*if children of TrieNode contains NULL char, it means this branch up to this point is a complete word*/
if (curr->nodeChar == NULL) {
list.push_back(prefix);
}
else {
prefix += curr->nodeChar;
for (it i = curr->children.begin(); i != curr->children.end(); i++) {
findPre(list, &i->second, prefix);
}
}
}
The purpose is to return all words with the same prefix from a trie using DFS. I manage to retrieve all the necessary strings but I can't exit out of the recursion.
The code completes the last iteration of the if statement and breaks. Visual Studio doesn't return any error code.
The typical end to a recursion is just as you said- return all words. A standard recursion looks something like this:
returnType function(params...){
//Do stuff
if(need to recurse){
return function(next params...);
}else{ //This should be your defined base-case
return base-case;
}
The issue arises in that your recursive function can never return- it can either execute the push_back, or it can call itself again. Neither of these seems to properly exit, so it'll either end quietly (with an inferred return of nothing), or it'll keep recursing.
In your situation, you likely need to store the results from recursion in an intermediate structure like a list or such, and then return that list after iteration (since it's a tree search and ought to check all the children, not return the first one only)
On that note, you seem to be missing part of the point of recursions- they exist to fill a purpose: break down a problem into pieces until those pieces are trivial to solve. Then return that case and build back to a full solution. Any tree-searching must come from this base structure, or you may miss something- like forgetting to return your results.
Check the integrity of your Trie structure. The function appears to be correct. The reason why it wouldn't terminate is if one or more of your leaf nodes doesn't have curr->nodeChar == NULL.
Another case is that any node (leaf or non-leaf) has a garbage child node. This will cause the recursion to break into reading garbage values and no reason to stop. Running in debug mode should break the execution with segmentation fault.
Write another function to test if all leaf-nodes have NULL termination.
EDIT:
After posting the code, the original poster has already pointed out that the problem was that he/she was not returning the list of strings.
Apart from that, there are a few more suggestions I would like to provide based on the code:
How does this while loop terminate if toInsert string is already in the Trie.
You will overrun the toInsert string and read a garbage character.
It will exit after that, but reading beyond your string is a bad way to program.
// while letters still exist within the trie traverse through the trie
while (curr->children.find(toInsert[increment]) != curr->children.end())
{ //letter found
curr = &(curr->children.find(toInsert[increment])->second);
increment++;
}
This can be written as follows:
while (increment < toInsert.length() &&
curr->children.find(toInsert[increment]) != curr->children.end())
Also,
Trie( vector<string> dictionary)
should be
Trie( const vector<string>& dictionary )
because dictionary can be a large object. If you don't pass by reference, it will create a second copy. This is not efficient.
I am a idiot. I forgot to return list on the first findPre() function.
vector<string> findPre(string pre) {
vector<string> list;
TrieNode * curr = root;
/*First find if the pre actually exist*/
for (int i = 0; i < pre.length(); i++) {
if (curr->children.find(pre[i]) == curr->children.end()) { //DNE
return list;
}
else {
curr = &(curr->children.find(pre[i])->second);
}
}
/*Now curr is at the end of the prefix, now we will perform a DFS*/
pre = pre.substr(0, pre.length() - 1);
findPre(list, curr, pre);
return list; //<----- this thing
}

Logic flaw in trie search

I'm currently working on a trie implementation for practice and have run into a mental roadbloack.
The issue is with my searching function. I am attempting to have my trie tree be able to retrieve a list of strings from a supplied prefix after they are loaded into the programs memory.
I also understand I could be using a queue/shouldnt use C functions in C++ ect.. This is just a 'rough draft' so to speak.
This is what I have so far:
bool SearchForStrings(vector<string> &output, string data)
{
Node *iter = GetLastNode("an");
Node *hold = iter;
stack<char> str;
while (hold->visited == false)
{
int index = GetNextChild(iter);
if (index > -1)
{
str.push(char('a' + index));
//current.push(iter);
iter = iter->next[index];
}
//We've hit a leaf so we want to unwind the stack and print the string
else if (index < 0 && IsLeaf(iter))
{
iter->visited = true;
string temp("");
stringstream ss;
while (str.size() > 0)
{
temp += str.top();
str.pop();
}
int i = 0;
for (std::string::reverse_iterator it = temp.rbegin(); it != temp.rend(); it++)
ss << *it;
//Store the string we have
output.push_back(data + ss.str());
//Move our iterator back to the root node
iter = hold;
}
//We know this isnt a leaf so we dont want to print out the stack
else
{
iter->visited = true;
iter = hold;
}
}
return (output.size() > 0);
}
int GetNextChild(Node *s)
{
for (int i = 0; i < 26; i++)
{
if (s->next[i] != nullptr && s->next[i]->visited == false)
return i;
}
return -1;
}
bool IsLeaf(Node *s)
{
for (int i = 0; i < 26; i++)
{
if (s->next[i] != nullptr)
return false;
}
return true;
}
struct Node{
int value;
Node *next[26];
bool visited;
};
The code is too long or i'd post it all, GetLastNode() retrieves the node at the end of the data passed in, so if the prefix was 'su' and the string was 'substring' the node would be pointing to the 'u' to use as an artificial root node
(might be completely wrong... just typed it here, no testing)
something like:
First of all, we need a way of indicating that a node represents an entry.
So let's have:
struct Node{
int value;
Node *next[26];
bool entry;
};
I've removed your visited flag because I don't have a use for it.
You should modify your insert/update/delete functions to support this flag. If the flag is true it means there's an actual entry up to that node.
Now we can modify the
bool isLeaf(Node *s) {
return s->entry;
}
Meaning that we consider a leaf when there's an entry... perhaps the name is wrong now, as the leaf might have childs ("y" node with "any" and "anywhere" is a leaf, but it has childs)
Now for the search:
First a public function that can be called.
bool searchForStrings(std::vector<string> &output, const std::string &key) {
// start the recursion
// theTrieRoot is the root node for the whole structure
return searchForString(theTrieRoot,output,key);
}
Then the internal function that will use for recursion.
bool searchForStrings(Node *node, std::vector<string> &output, const std::string &key) {
if(isLeaf(node->next[i])) {
// leaf node - add an empty string.
output.push_back(std::string());
}
if(key.empty()) {
// Key is empty, collect all child nodes.
for (int i = 0; i < 26; i++)
{
if (node->next[i] != nullptr) {
std::vector<std::string> partial;
searchForStrings(node->next[i],partial,key);
// so we got a list of the childs,
// add the key of this node to them.
for(auto s:partial) {
output.push_back(std::string('a'+i)+s)
}
}
} // end for
} // end if key.empty
else {
// key is not empty, try to get the node for the
// first character of the key.
int c=key[0]-'a';
if((c<0 || (c>26)) {
// first character was not a letter.
return false;
}
if(node->next[c]==nullptr) {
// no match (no node where we expect it)
return false;
}
// recurse into the node matching the key
std::vector<std::string> partial;
searchForStrings(node->next[c],partial,key.substr(1));
// add the key of this node to the result
for(auto s:partial) {
output.push_back(std::string(key[0])+s)
}
}
// provide a meaningful return value
if(output.empty()) {
return false;
} else {
return true;
}
}
And the execution for "an" search is.
Call searchForStrings(root,[],"an")
root is not leaf, key is not empty. Matched next node keyed by "a"
Call searchForStrings(node(a),[],"n")
node(a) is not leaf, key is not empty. Matched next node keyed by "n"
Call searchForStrings(node(n),[],"")
node(n) is not leaf, key is empty. Need to recurse on all not null childs:
Call searchForStrings(node(s),[],"")
node(s) is not leaf, key is empty, Need to recurse on all not null childs:
... eventually we will reach Node(r) which is a leaf node, so it will return an [""], going back it will get added ["r"] -> ["er"] -> ["wer"] -> ["swer"]
Call searchForStings(node(y),[],"")
node(y) is leaf (add "" to the output), key is empty,
recurse, we will get ["time"]
we will return ["","time"]
At this point we will add the "y" to get ["y","ytime"]
And here we will add the "n" to get ["nswer","ny","nytime"]
Adding the "a" to get ["answer","any","anytime"]
we're done

push attribute data to trie, add to multiple keys

my knowledge is limited but I have been working (hacking) at this specific data structure for awhile
I use a trie to store ontology strings that are then returned as a stack including the 'gap' proximity when get (string) is called. As an add on the trie stores attributes on the key. The further down the string the greater the detail of the attribute. This is working well for my purposes.
As an additional add on, I use a wildcard to apply an attribute to all child nodes. For example, to add 'paws' to all subnodes of 'mammals.dogs.' I push(mammals.dogs.*.paws). Now, all dogs have paws.
The problem is only the first dog get paws. The function works for push attributes without wild
If you want I can clean this up and give a simplified version, but in the past i've found on stackoverflow it is better to just give the code; I use 'z' as the '*' wild
void Trie::push(ParseT & packet)
{
if (root==NULL) AddFirstNode(); // condition 1: no nodes exist, should this be in wrapper
const string codeSoFar=packet.ID;
AddRecord(root, packet, codeSoFar); //condotion 2: nodes exist
}
void Trie::AddFirstNode(){ // run-once, initial condition of first node
nodeT *tempNode=new nodeT;
tempNode->attributes.planType=0;
tempNode->attributes.begin = 0;
tempNode->attributes.end = 0;
tempNode->attributes.alt_end = 0;
root=tempNode;
}
//add record to trie with mutal recursion through InsertNode
//record is entered to trie one char at a time, char is removed
//from record and function repeats until record is Null
void Trie::AddRecord(nodeT *w, ParseT &packet, string codeSoFar)
{
if (codeSoFar.empty()) {
//copy predecessor vector at level n, overwrites higher level vectors
if (!packet.predecessorTemp.empty())
w->attributes.predecessorTemp = packet.predecessorTemp;
return; //condition 0: record's last char
}
else { //keep parsing down record path
for (unsigned int i = 0; i < w->alpha.size(); i++) {
if (codeSoFar[0] == w->alpha[i].token_char || codeSoFar[0] == 'z') {
return AddRecord(w->alpha[i].next, packet, codeSoFar.substr(1)); // condition 2: char exists
}
}
InsertNode(w, packet, codeSoFar); //condition 3: no existing char --> mutal recursion
}
}
//AddRecord() helper function
void Trie::InsertNode(nodeT *w, ParseT &packet, string codeSoFar) // add new char to vector array
{
for (unsigned int i=0; i <=w->alpha.size(); i++) { // loop and insert tokens in sorted vector
if (i==w->alpha.size() || codeSoFar[0] < w->alpha[i].token_char) { //look for end of vector or indexical position
//create new TokenT
tokenT *tempChar=new tokenT;
tempChar->next=NULL;
tempChar->token_char=codeSoFar[0];
//create new nodeT
nodeT *tempLeaf=new nodeT;
tempLeaf->attributes.begin = 0;
tempLeaf->attributes.end = 0;
tempLeaf->attributes.planType = 0;
tempLeaf->attributes.alt_end = 0;
//last node
if (codeSoFar.size() == 1){
tempLeaf->attributes.predecessorTemp = packet.predecessorTemp;
}
//link TokenT with its nodeT
tempChar->next=tempLeaf;
AddRecord(tempLeaf, packet, codeSoFar.substr(1)); //mutual recursion --> add next char in record, if last char AddRecord will terminate
w->alpha.insert(w->alpha.begin()+i, *tempChar);
return;
}
}
}
root is global nodeT *w
struct ParseT {
string ID; //XML key
int begin = 0; //planned or actual start date
int end = 0; //planned or actual end date - if end is empty then assumed started but not compelted and flag with 9999 and
int alt_end = 0; //in case of started without completion 9999 case, then this holds expected end
int planType = 0; //actuals == 1, forecast == 2, planned == 3
map<string, string> aux;
vector<string> resourceTemp;
vector<string> predecessorTemp;
};
In this code
for (unsigned int i = 0; i < w->alpha.size(); i++) {
if (codeSoFar[0] == w->alpha[i].token_char || codeSoFar[0] == 'z') {
return AddRecord(w->alpha[i].next, packet, codeSoFar.substr(1)); // condition 2: char exists
}
}
you are returning as soon as you call AddRecord, even if it is because of a wildcard. It might be easier to have a separate loop when codeSoFar[0] == 'z' that goes through all the alphas and adds the record. Then have an else clause that does your current code.
Edit: Here is what I meant, in code form:
else { //keep parsing down record path
// Handle wildcards
if (codeSoFar[0] == 'z') {
for (unsigned int i = 0; i < w->alpha.size(); i++) {
AddRecord(w->alpha[i].next, packet, codeSoFar.substr(1)); // condition 2: char exists
}
}
else {
// Not a wildcard, look for a match
for (unsigned int i = 0; i < w->alpha.size(); i++) {
if (codeSoFar[0] == w->alpha[i].token_char) {
return AddRecord(w->alpha[i].next, packet, codeSoFar.substr(1)); // condition 2: char exists
}
}
InsertNode(w, packet, codeSoFar); //condition 3: no existing char --> mutal recursion
}
}