Printing all the words from a prefixtree in order - c++

I've set up a program that can take in user input to create a prefixtree. Each character is a node which are linked together. I have a "print" command that will print the words out as the following if the user gave this input: cat, car, sat, saw:
ca(R,T),sa(T,W).
I'm trying to create two functions that will instead print out the words given from the user in alphabetical word. One function PrintAllWords() is the function that will be doing most of the work, I'm thinking of having this function be a recursive function that would print to a global string of some sort each word through push_back() then delete that current word from pull_back() and move onto the next. The second function printWordList(); would call printAllWords(); and just print out the list of words create.
I've start with some code trying to slowly get to where I want, but at the moment when I use the command "list" (the command for the new functions) my code only gives me the parent nodes C and S as the following: cs.
How can I just get the first nodes of each word, try and get the first word in the prefixtree being "cat".
My Header File:
#ifndef __PREFIX_TREE_H
#define __PREFIX_TREE_H
#include <iostream>
using namespace std;
const int ALPHABET_SIZE = 26;
class PrefixTreeNode;
/*
Prefix tree
Stores a collection of strings as a tree
*/
class PrefixTree
{
private:
PrefixTreeNode* root;
public:
//Constructs an empty prefix tree
PrefixTree();
//Copy constructor
PrefixTree(const PrefixTree&);
//Copy assignment
const PrefixTree& operator=(const PrefixTree&);
//Utility func: checks whether all characters in str are letters
bool isAllLetters(const string&) const;
//Returns the root of the prefix tree
PrefixTreeNode* getRoot() { return root; };
//Returns the root of the prefix tree
const PrefixTreeNode* getRoot() const { return root; };
//Returns whether or not the given word belongs to the prefixtree
bool contains(const string&) const;
//Adds the given word to the prefix tree
void addWord(const string&);
//Prints all of the words in the prefix tree
void printWordList() const;
//Destructor
~PrefixTree();
};
/*
Node of a prefix tree
*/
class PrefixTreeNode
{
friend PrefixTree;
private:
char c;
bool final;
PrefixTreeNode* link[ALPHABET_SIZE];
public:
//Constructs a new node
PrefixTreeNode();
//Copy constructor
PrefixTreeNode(const PrefixTreeNode&);
//Copy assignment
const PrefixTreeNode& operator=(const PrefixTreeNode&);
//Returns the character this node contains
char getChar() const { return c; }
//Returns whether this node is the end of a word
bool isFinal() const { return final; }
//Changes whether this node is the end of a word
void setFinal(bool b) { final = b; }
//Returns the node corresponding to the given character
PrefixTreeNode* getChild(char);
//Returns the node corresponding to the given character
const PrefixTreeNode* getChild(char) const;
//Adds a child corresponding to the given character
void addChild(char);
//Removes the child corresponding to the given character
void deleteChild(char);
//print all words that end at or below this PrefixTreeNode
void printAllWords() const;
//Destructor
~PrefixTreeNode();
};
ostream& operator<<(ostream&, const PrefixTree&);
ostream& operator<<(ostream&, const PrefixTreeNode&);
#endif
My Source File functions:
void PrefixTreeNode::printAllWords() const{
for (char c = 'a'; c < 'z' + 1; c++)
{
if (this->getChild(c) == nullptr)
continue;
this->getChild(c);
cout << c;
}
}
//Calls all words
void PrefixTree::printWordList() const{
PrefixTreeNode* node = root;
node->printAllWords();
}
PrefixTreeNode* PrefixTreeNode::getChild(char c)
{
if (isalpha(c))
return link[tolower(c)-'a'];
else
return nullptr;
}
void PrefixTree::addWord(const string& str)
{
PrefixTreeNode* node = root;
for (int i = 0; i < str.size(); i++)
{
if (node->getChild(str[i]) == nullptr)
node->addChild(str[i]);
node = node->getChild(str[i]);
}
node->setFinal(true);
}

We use recursion to print all the stored strings in the tree in order. Call the function from main using printAllWords(root, ""). If root points to nullptr, we return. If root->final is true, we print the word. Then we append the current character to word and loop through all it's children and call printAllWords for each of them.
The same will happen for every node.
void printAllWords(Node* current, std::string word)
{
if (current == nullptr)
return;
if (current->final)
std::cout << (word+current->c) << std::endl;
for (int i = 0; i < ALPHABET_SIZE; ++i)
printAllWords(current->link[i], word + current->c);
}
Edit: Although I must confess I'm not sure what's the use of c in the treenode. If you construct the trie such that if let's say the 2nd child (b) of the current node is not null, then that means that b is part of a trail of another word(s) through it. The following code should make it clear:
void printAllWords(Node* root)
{
string word = "";
for (int i = 0; i < ALPHABET_SIZE; ++i)
printAllWords(root->link[i], word + (char)(i + 'a'));
}
void printAllWords(Node* current, std::string word)
{
if (current == nullptr)
return;
if (final)
std::cout << word << std::endl;
for (int i = 0; i < ALPHABET_SIZE; ++i)
printAllWords(current->link[i], word + (char)(i + 'a'));
}

Related

why this C++ Trie implementation is showing odd behaviour?

I implemented this class to create a trie data structure. The function
unsigned long Insert(string) //inserts the string in trie & return no of words in trie
void PrintAllWords(); // prints all words in trie separated by space in dictionary order
implementation works correctly and prints all the words inserted from a text file of english dictionary words when the number of words is not very large, but when supplied with a file with some 350k words it only prints out a b c d upto z.
private variables
struct TrieTree
{
std::map<char,struct TrieTree*> map_child;
std::map<char,unsigned long> map_count; //keeps incrementing count of char in map during insertion.
bool _isLeaf=false; // this flag is set true at node where word ends
};
struct TrieTree* _root=NULL;
unsigned long _wordCount=0;
unsigned long _INITIALIZE=1;
Below is complete implementation with driver program. The program is executable.
#include<iostream>
#include<map>
#include<fstream>
class Trie
{
private:
struct TrieTree
{
std::map<char,struct TrieTree*> map_child;
std::map<char,unsigned long> map_count;
bool _isLeaf=false;
};
struct TrieTree* _root=NULL;
unsigned long _wordCount=0;
unsigned long _INITIALIZE=1;
struct TrieTree* getNode()
{
return new TrieTree;
};
void printWords(struct TrieTree* Tptr,std::string pre)
{
if(Tptr->_isLeaf==true)
{
std::cout<<pre<<" ";
return;
}
std::map<char,struct TrieTree*>::iterator it;
it=Tptr->map_child.begin();
while(it!=Tptr->map_child.end())
{
pre.push_back(it->first);
printWords(it->second,pre);
pre.erase(pre.length()-1); //erase last prefix character
it++;
}
}
public:
Trie()
{
_root=getNode();
}
unsigned long WordCount()
{
return _wordCount;
}
unsigned long WordCount(std::string pre) //count words with prefix pre
{
if(WordCount()!=0)
{
struct TrieTree *Tptr=_root;
std::map<char,unsigned long>::iterator it;
char lastChar;
for(int i=0;i<pre.length()-1;i++)
{
Tptr=Tptr->map_child[pre[i]];
}
lastChar=pre[pre.length()-1];
it=Tptr->map_count.find(lastChar);
if(it!=Tptr->map_count.end())
{
return Tptr->map_count[lastChar];
}
else
{
return 0;
}
}
return 0;
}
unsigned long Insert(std::string key) //return word count after insertion
{
struct TrieTree *Tptr =_root;
std::map<char,struct TrieTree*>::iterator it;
if(!SearchWord(key))
{
for(int level=0;level<key.length();level++)
{
it=Tptr->map_child.find(key[level]);
if(it==Tptr->map_child.end())
{
//alphabet does not exist in map
Tptr->map_child[key[level]]=getNode(); // new node with value pointing to it
Tptr->map_count[key[level]] = _INITIALIZE;
Tptr=Tptr->map_child[key[level]]; //assign pointer to newly obtained node
if(level==key.length()-1)
Tptr->_isLeaf=true;
}
else
{ //alphabet exists at this level
Tptr->map_count[key[level]]++;
Tptr=Tptr->map_child[key[level]];
}
}
_wordCount++;
}
return _wordCount;
}
bool SearchWord(std::string key)
{
struct TrieTree *Tptr =_root;
std::map<char,struct TrieTree*>::iterator it;
for(int level=0;level<key.length();level++)
{
it=Tptr->map_child.find(key[level]);
// cout<<" "<<Tptr->map_child.size()<<endl; //test to count entries at each map level
if(it!=Tptr->map_child.end())
{
Tptr=Tptr->map_child[key[level]];
}
else
{
return false;
}
}
if(Tptr->_isLeaf==true)
return true;
return false;
}
void PrintAllWords()
{ //print all words in trie in dictionary order
struct TrieTree *Tptr =_root;
if(Tptr->map_child.empty())
{
std::cout<<"Trie is Empty"<<std::endl;
return;
}
printWords(Tptr,"");
}
void PrintAllWords(std::string pre)
{ //print all words in trie with prefix pre in Dictionary order
struct TrieTree *Tptr =_root;
if(Tptr->map_child.empty())
{
std::cout<<"Trie is Empty"<<std::endl;
return;
}
for(int i=0;i<pre.length();i++)
{
Tptr=Tptr->map_child[pre[i]];
}
printWords(Tptr,pre);
}
};
int main(){
Trie t;
std::string str;
std::fstream fs;
fs.open("words.txt",std::ios::in);
while(fs>>str){
t.Insert(str);
}
t.PrintAllWords();
return 0;
}
I don't understand the output, please take a look at the code and suggest a fix. Thanks
When you add the word "a", if there is no word starting with 'a' in the tree, you will add a "leaf" node with 'a' as the value. If you then add a word starting with 'a', such as "an", you will add the 'n' node as a child of the 'a' node. However, when you print all the words, you stop recursing when you hit a leaf node, meaning you ignore all the other words starting with that word.
Simple solution: remove the return from printWords.
Similarly if you already have "an" in the tree, when you add 'a', you don't mark it as a leaf, so it will never be output.
Simple solution: Set _isLeaf when adding a word, even if the node already exists (i.e. add Tptr->_isLeaf=true; to the else clause in Insert
I think you would be better off changing _isLeaf to something like _isWord as it seems odd to have leaf nodes with child items.

Seg fault at the specified line: Hash table insert/search functions

I am getting a EXC_BAD_ACCESS error. I'm trying to insert words into a hash table and am using separate chaining. Here is my class Hash.h that has, within it, class wordData to store the word and pageNumbers the word appears on:
class Hash
{
private:
class wordData
{
public:
string word;
vector < int >pageNum;
wordData *nextWord;
// Initializing the next pointer to null in the constructor
wordData()
{
nextWord = nullptr;
}
// Constructor that accepts a word and pointer to next word
wordData(string word, wordData * nextWord)
{
this->word = word;
this->nextWord = nextWord;
}
// Getting and setting the next linked word
wordData *getNext()
{
return nextWord; //-------------------> BAD_ACCESS ERROR
}
void setNext(wordData * newInfo)
{
nextWord = newInfo;
}
// Setting info for the word node.
void setInfo(string & w, int pNum)
{
this->word = w;
this->pageNum.push_back(pNum);
}
// ******************* Gives a thread-bad access error************************
string getWord()
{
return word;
}
void addPageNums(int x)
{
this->pageNum.push_back(x);
}
};
private:
// Head to point to the head node of the linked list for a particular word
wordData ** head;
int size;
int *bucketSize;
int totalElements;
public:
// Class hash function functions
Hash();
// Function to calculate bucket number based on string passed
int hashFunction(string key);
// search if word is present
bool Search(string);
// Insert word
void Insert(string, int);
int bucketNumberOfElements(int index);
};
#endif /* Hash_h */
After running the debugger I found the value of nextWord to be 0x00000000000 which I understand is not the same as nullptr but is due to a NULL assignment although I can't seem to figure out where and why. I haven't included the Hash.cpp file because I think there is an obvious pointer manipulation that I'm doing wrong in the .h file.
Any help will be appreciated. Thanks.

Logic flaw in trie search

I'm currently working on a trie implementation for practice and have run into a mental roadbloack.
The issue is with my searching function. I am attempting to have my trie tree be able to retrieve a list of strings from a supplied prefix after they are loaded into the programs memory.
I also understand I could be using a queue/shouldnt use C functions in C++ ect.. This is just a 'rough draft' so to speak.
This is what I have so far:
bool SearchForStrings(vector<string> &output, string data)
{
Node *iter = GetLastNode("an");
Node *hold = iter;
stack<char> str;
while (hold->visited == false)
{
int index = GetNextChild(iter);
if (index > -1)
{
str.push(char('a' + index));
//current.push(iter);
iter = iter->next[index];
}
//We've hit a leaf so we want to unwind the stack and print the string
else if (index < 0 && IsLeaf(iter))
{
iter->visited = true;
string temp("");
stringstream ss;
while (str.size() > 0)
{
temp += str.top();
str.pop();
}
int i = 0;
for (std::string::reverse_iterator it = temp.rbegin(); it != temp.rend(); it++)
ss << *it;
//Store the string we have
output.push_back(data + ss.str());
//Move our iterator back to the root node
iter = hold;
}
//We know this isnt a leaf so we dont want to print out the stack
else
{
iter->visited = true;
iter = hold;
}
}
return (output.size() > 0);
}
int GetNextChild(Node *s)
{
for (int i = 0; i < 26; i++)
{
if (s->next[i] != nullptr && s->next[i]->visited == false)
return i;
}
return -1;
}
bool IsLeaf(Node *s)
{
for (int i = 0; i < 26; i++)
{
if (s->next[i] != nullptr)
return false;
}
return true;
}
struct Node{
int value;
Node *next[26];
bool visited;
};
The code is too long or i'd post it all, GetLastNode() retrieves the node at the end of the data passed in, so if the prefix was 'su' and the string was 'substring' the node would be pointing to the 'u' to use as an artificial root node
(might be completely wrong... just typed it here, no testing)
something like:
First of all, we need a way of indicating that a node represents an entry.
So let's have:
struct Node{
int value;
Node *next[26];
bool entry;
};
I've removed your visited flag because I don't have a use for it.
You should modify your insert/update/delete functions to support this flag. If the flag is true it means there's an actual entry up to that node.
Now we can modify the
bool isLeaf(Node *s) {
return s->entry;
}
Meaning that we consider a leaf when there's an entry... perhaps the name is wrong now, as the leaf might have childs ("y" node with "any" and "anywhere" is a leaf, but it has childs)
Now for the search:
First a public function that can be called.
bool searchForStrings(std::vector<string> &output, const std::string &key) {
// start the recursion
// theTrieRoot is the root node for the whole structure
return searchForString(theTrieRoot,output,key);
}
Then the internal function that will use for recursion.
bool searchForStrings(Node *node, std::vector<string> &output, const std::string &key) {
if(isLeaf(node->next[i])) {
// leaf node - add an empty string.
output.push_back(std::string());
}
if(key.empty()) {
// Key is empty, collect all child nodes.
for (int i = 0; i < 26; i++)
{
if (node->next[i] != nullptr) {
std::vector<std::string> partial;
searchForStrings(node->next[i],partial,key);
// so we got a list of the childs,
// add the key of this node to them.
for(auto s:partial) {
output.push_back(std::string('a'+i)+s)
}
}
} // end for
} // end if key.empty
else {
// key is not empty, try to get the node for the
// first character of the key.
int c=key[0]-'a';
if((c<0 || (c>26)) {
// first character was not a letter.
return false;
}
if(node->next[c]==nullptr) {
// no match (no node where we expect it)
return false;
}
// recurse into the node matching the key
std::vector<std::string> partial;
searchForStrings(node->next[c],partial,key.substr(1));
// add the key of this node to the result
for(auto s:partial) {
output.push_back(std::string(key[0])+s)
}
}
// provide a meaningful return value
if(output.empty()) {
return false;
} else {
return true;
}
}
And the execution for "an" search is.
Call searchForStrings(root,[],"an")
root is not leaf, key is not empty. Matched next node keyed by "a"
Call searchForStrings(node(a),[],"n")
node(a) is not leaf, key is not empty. Matched next node keyed by "n"
Call searchForStrings(node(n),[],"")
node(n) is not leaf, key is empty. Need to recurse on all not null childs:
Call searchForStrings(node(s),[],"")
node(s) is not leaf, key is empty, Need to recurse on all not null childs:
... eventually we will reach Node(r) which is a leaf node, so it will return an [""], going back it will get added ["r"] -> ["er"] -> ["wer"] -> ["swer"]
Call searchForStings(node(y),[],"")
node(y) is leaf (add "" to the output), key is empty,
recurse, we will get ["time"]
we will return ["","time"]
At this point we will add the "y" to get ["y","ytime"]
And here we will add the "n" to get ["nswer","ny","nytime"]
Adding the "a" to get ["answer","any","anytime"]
we're done

Calling new on object with pointer to of same type, seems to allocate memory to pointer

I'm trying to implement a Trie data structure on my own, without looking at other implementations, so simply based on my conceptual knowledge of the structure. I would like to avoid using vectors, simply because they are easy to use... I prefer to use pointers for dynamically allocating memory for arrays when I'm programming as practice. That said, with the structure that I currently have, I have a Node class that contains a pointer to a Node array, a letter (bool), and a marker (bool). My Trie class has a pointer to the starting Node array. Each node array has 26 elements to refer to each letter of the English alphabet from 'a' to 'z' lowercase (I convert each word inserted to lowercase). When a letter is set to 'true' then its letterArray is allocated new memory. Node has a constructor to set letter and marker to false and letterArray to nullptr. I can insert the first letter no problem and go to the next letterArray (which is nullptr at this point) after which memory is allocated to the new array. The problem is, the next letterArray of each Node is also allocated memory, but the constructor is not called on them, resulting in their letter and marker containing garbage, and I'm wondering what is the reason the constructor is not called? Hopefully the code will make this more clear:
class Node {
private:
bool letter;
bool marker;
Node* letterArray;
void initNode();
public:
Node();
bool setLetter(bool set);
bool setMarker(bool set);
bool checkLetter();
bool checkMarker();
char getLetter();
Node*& getNextLetterArray();
};
class Trie {
private:
Node* start;
int wordCount;
int letterCount;
const int totalLetters = 26;
void destroyTrie();
bool initBranch(Node*& nextBranch);
void insertCharAndMove(Node*& ptr, int, int, int);
public:
Trie();
Trie(string firstWord);
~Trie();
bool insertWord(string word);
bool deleteWord(string word);
bool getToLetter(char letter);
string getLowerCase(string word);
bool wordExists(string word);
};
insertWord:
bool Trie::insertWord(string word) {
Node* ptr = start;
string wordLower = getLowerCase(word);
int wordLength = word.length();
if (wordLength <= 0) return false;
for (int i = 0; i < wordLength; i++) {
int charIndex = (word[i] - 'a');
insertCharAndMove(ptr, charIndex, wordLength, i);
}
wordCount++;
return true;
}
void Trie::insertCharAndMove(Node*& ptr, int charIndex, int wordLength, int i) {
if (ptr[charIndex].setLetter(true)) letterCount++;
if (i < wordLength) {
ptr = ptr[i].getNextLetterArray();
initBranch(ptr);
}
else ptr[i].setMarker(true);
}
initBranch:
bool Trie::initBranch(Node*& nextBranch) {
if (nextBranch != nullptr) return false;
nextBranch = new Node[letterCount];
return true;
}
Trie Constructor:
Trie::Trie() {
start = new Node[totalLetters];
wordCount = 0;
letterCount = 0;
}
Node Constructor:
Node::Node() {
initNode();
}
void Node::initNode() {
letter = false;
marker = false;
letterArray = nullptr;
}
getNextLetterArray:
Node*& Node::getNextLetterArray() {
return letterArray;
}

find a node in a tree and replace with a new node with update private members

I am a newbie trying to learn c++. I am writing a program that is trying to count how many times a word occurs in a text field
My program is storing elements of the class word in a bintree. Word class has two private members: the string representing the word of the text file and the count. If a word already exist I have to increment the count by one
class word {
private:
string myWord;
int count;
public:
word(): myWord(""), count(1)
{
}
word(string input): myWord(input), count(1)
{
}
<ovreload operators>
<some methods>
void addCount(int oldCount)
{
count += oldCount;
}
int getCount()
{
return count;
}
};
Then in a method that will be called in main I am trying to find if the word already exist and add the count:
void removeSeparators(string input, bintree<word> &tree, int &count)
{
removeDot(input);
word * pword;
const word * currentWord;
int currCount = 0;
<use tokenizer to separate each word>
// if the tree find the word
if(tree.find(*pword) != NULL) {
//get the current word
currentWord = tree.find(*pword);
//get the current count of the word
currCount = currentWord -> getCount(); <--- ERROR line 175
pword -> addCount(currCount);
//erase the old node
tree.erase(*currentWord);
//insert new node
tree.insert(*pword);
this is the total count of words
count++; }
if(tree.find(*pword) == NULL) { tree.insert(*pword); count++; }
<bit more code for resetting tokanizer>
}
This is the error I have : countWords.cpp: In function ‘void removeSeparators(std::string, bintree<word>&, int&)’:
countWords.cpp:175: error: passing ‘const word’ as ‘this’ argument of ‘int word::getCount()’ discards qualifiers
My problem is that the find method in tree is like below and I can't change it:
const dataType* find(const dataType &findData) const
{
// this function looks for findData in the tree.
// If it finds the data it will return the address of the data
// in the tree. otherwise it will return NULL
if (root == NULL) return NULL;
else return root->find(findData);
}
How can I access the 'old' count of the word and increased by one? I am on the right track at least?
Thank you for your help!
Your getCount method should be declared const:
int getCount() const
{
...
}
This allows it to be called on const objects (such as currentWord). If a method does not alter a class's data, you should generally make it constant. This gives you more flexibility to use the const qualifier appropriately throughout your program.