Trie data structure using class C++ - c++

I am trying to implement trie data structure in C++ using class. In TrieNode class I have a TrieNode *children[26]; array and an isEndOfWord boolean value to determine if it is the end word. In that same class I have other functions appropriate to function like getters and setters and additionally insert and search.
Whenever I try to add a new word it is also setting the bool value as true at the end of each word by setting true to isEndOfWord. But in searching function it is not determining the end of the word. Please guide me as I am new to this data structure, and please comment on the way i write the code and what is the appropriate way to write it(in a Professional way, if interested). Thanks!
#include<cstdio>
#include<iostream>
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
using namespace std;
class TrieNode{
private:
TrieNode *children[26];
bool isEndOfWord;
public:
TrieNode(){
for(int i = 0; i < 26; i++){
children[i] = NULL;
}
isEndOfWord = false;
}
bool checkNull(char temp){
cout<<"\nIncheckNULL "<<temp<<" "<<(temp - 'a')<<" \n";
if(children[temp - 'a'] == NULL){
return true;
}
else{
return false;
}
}
void setNode(char temp){
cout<<"Setting node \n";
children[temp - 'a'] = new TrieNode();
}
TrieNode *getNode(char temp){
return children[temp - 'a'];
}
void setEndWord(){
this->isEndOfWord = true;
}
bool getEndWord(){
return this->isEndOfWord;
}
void insert(TrieNode*, string);
bool search(TrieNode*, string);
};
void TrieNode::insert(TrieNode *root, string key){
TrieNode *crawl = root;
//cout<<"key is "<<key<<endl;
int length = sizeof(key)/sizeof(key[0]);
//cout<<"find length\n";
for(int i = 0; key[i] != '\0'; i++){
cout<<"TEST null check key is "<<key[i]<<endl;
if(crawl->checkNull(key[i])){
cout<<"null check key is "<<key[i]<<endl;
crawl->setNode(key[i]);
crawl = crawl->getNode(key[i]);
if(key[i + 1] == '\0'){
cout<<"In setting end word\n";
if(crawl->getEndWord()){
cout<<"Word already exists";
}
else{
crawl->setEndWord();
cout<<"End word setted "<<crawl->getEndWord()<<endl;
}
}
}
else{
if(key[i + 1] == '\0'){
cout<<"In setting end word\n";
if(crawl->getEndWord()){
cout<<"Word already exists";
}
else{
crawl->setEndWord();
cout<<"End word setted\n";
}
}
else{
crawl = crawl->getNode(key[i]);
}
}
}
}
bool TrieNode::search(TrieNode *root, string key){
TrieNode *crawl = root;
cout<<"key is "<<key<<endl;
cout<<"\n In search\n";
int length = sizeof(key)/sizeof(key[0]);
for(int i = 0; key[i] != '\0'; i++){
if(crawl->checkNull(key[i])){
cout<<"INside search checknull"<<endl;
cout<<"Word does not exists"<<"sorry"<<endl;
break;
}
else{
cout<<"IN each character getting getEndWord "<<crawl->getEndWord()<<endl;
if(key[i + 1] == '\0'){
if(crawl->getEndWord()){
cout<<"Word Exists";
}
else{
cout<<"Word does not exists"<<"sorry"<<endl;
break;
}
}
else{
crawl = crawl->getNode(key[i]);
}
}
}
}
int main(){
TrieNode *root = new TrieNode();
cout<<"starting"<<endl;
root->insert(root, "hello");
cout<<"first added"<<endl;
root->insert(root, "anna");
root->insert(root, "anni");
cout<<"words added"<<endl;
root->search(root, "hello");
root->search(root, "anny");
}

Your insert and search functions can be simplified a bit.
Consider this. (Read the comments in the below code, they illustrate what the code does)
void TrieNode::insert(TrieNode *root, string key){
TrieNode *crawl = root;
if (!crawl) {
crawl = new TrieNode();
}
cout << "Adding " << key << " to the trie" << endl;
for (int index = 0, auto str_iterator = str.begin(); str_iterator < str.end(); ++str_iterator, ++index) {
char key_char = *str_iterator;
if(crawl -> checkNull(key_char)){
// If a node representing the char does not exist then make it
crawl -> setNode(key_char);
}
crawl = crawl -> getNode(key_char);
if (index == key.length() - 1) {
// We are at the last character, time to mark an end of word
crawl -> setEndWord();
}
}
}
bool TrieNode::search(TrieNode *root, string key){
TrieNode *crawl = root;
if (!crawl) {
cout << "Trie is empty!" << endl;
return false;
}
cout << "Searching for " << key << " in the trie" << endl;
for (int index = 0, auto str_iterator = str.begin(); str_iterator < str.end(); ++str_iterator, ++index) {
char key_char = *str_iterator;
if(crawl -> checkNull(key_char)){
cout << "Key is not in the trie" << endl;
return false;
}
crawl = crawl -> getNode(key_char);
if (index == key.length() - 1) {
if (!(crawl -> getEndWord())) {
cout << "Word is physically present in trie, but not present as a distinct word" << endl;
return false;
} else {
return true;
}
}
}
cout << "Code should not reach here" << endl; // IMO throw an exception I guess
return false;
}
Take advantage of the power of C++ std::string
Also your whole temp - 'a' logic is a bit iffy to me. I wouldn't much around with ASCII values unless I needed to
Why are you including a whole bunch of C headers? Just iostream should suffice to do what cstdio does.
if(!ptr) is a much more natural way to check for NULL.
In production don't use using namespace std; Instead just preface stuff like cout and endl with std::. The reason for this is to avoid polluting the standard namespace.
Read a good CPP OOP book :). It will help you a lot.
Also I lol'd at anna and anni. Your anna and anni must be proud to be in your trie :D

There are many things I'd give you feedback on, but this isn't a code review site, it's for specific questions. I'll point out briefly a few things I notice though:
1) don't include C headers; use c++ ones instead.
2) what type is string?
3) you compute length (incorrectly, assuming answer to question 2 is "the standard c++ string class"), but you don't use it.
4) search() returns a bool but you don't return anything. When you find the end of a word, you should return from the function.
5) search() calls checkNull() at the top of the for loop without ensuring that it's not null. After this: crawl = crawl->getNode(key[i]); it could be null, but then you loop and go through the pointer without testing it.
6) setNode is a public function, and unconditionally overwrites whatever is in the slot for the given variable. You can clobber an existing child if someone calls it with the same character twice and leak (and probably lose data in your tree.
7) search doesn't need to be a member of TrieNode. In fact, it doesn't access any data through "this". You probably don't want the TrieNode to be public at all, but an internal implenetation detail of Trie, which is where the search function should live, where the root should be stored and managed.
8) in c++ use nullptr instead of NULL
9) Looks like you need to debug search(), because it is not on the last letter when you check for end of word.
10) you need a destructor and need to deallocate your nodes. Or store them in unique_ptr<> for automatic deletion when your object goes out of scope.
11) don't "using namespace std;" in headers. It makes your headers toxic to include in my code.

The insert and search functions are a mess.
They use rather contrived ways to check the end of the string, duplicated unnecessarily and with a bug in one of the branches.
Here are simpler versions.
They use string size for the loop bounds, and the actions needed at the end of the loop are made after the loop, which is more natural.
void TrieNode::insert(TrieNode *root, string key){
TrieNode *crawl = root;
for(int i = 0; i < (int) (key.size()); i++){
if(crawl->checkNull(key[i])){
crawl->setNode(key[i]);
}
crawl = crawl->getNode(key[i]);
}
crawl->setEndWord();
}
bool TrieNode::search(TrieNode *root, string key){
TrieNode *crawl = root;
for(int i = 0; i < (int) (key.size()); i++){
if(crawl->checkNull(key[i])){
return false;
}
crawl = crawl->getNode(key[i]);
}
return crawl->getEndWord();
}
I used the same style, but omitted the debug outputs for readability.
Also, the code did not actually use search as a function, it didn't return a value.
Instead, it relied on debug output to show the result.
This is now corrected.
A main function complementing these is as follows.
int main(){
TrieNode *root = new TrieNode();
cout<<"starting"<<endl;
root->insert(root, "hello");
cout<<"first added"<<endl;
root->insert(root, "anna");
root->insert(root, "anni");
cout<<"words added"<<endl;
cout << root->search(root, "hello") << endl; // 1
cout << root->search(root, "anny") << endl; // 0
}

Related

Inserting a basic singly linked list node seems to break my c++ code?

Singly Linked List and Node classes and the start of the main function, where I wrote a brief outline of the code functionality. The issue is toward the end of the main function. I wrote '...' in place of what I believe to be irrelevant code because it simply parses strings and assigns them to the string temp_hold[3] array.
#include <bits/stdc++.h>
using namespace std;
class Node {
public:
string value;
string attr;
string tagname;
Node *next;
Node(string c_tagname, string c_attr, string c_value) {
this->attr = c_attr;
this->value = c_value;
this->tagname = c_tagname;
this->next = nullptr;
}
};
class SinglyLinkedList {
public:
Node *head;
Node *tail;
SinglyLinkedList() {
this->head = nullptr;
this->tail = nullptr;
}
void insert_node(string c_tagname, string c_attr,string c_value) {
Node *node = new Node(c_tagname,c_attr, c_value);
if (!this->head) {
this->head = node;
} else {
this->tail->next = node;
}
this->tail = node;
}
};
int main(int argc, char **argv) {
/* storage is a vector holding pointers to the linked lists
linked lists are created and the linked list iterator sll_itr is incremented when
previous line begins with '</' and the currentline begins with '<'
linked lists have nodes, which have strings corresponding to tagname, value, and attribute
*/
SinglyLinkedList *llist = new SinglyLinkedList();
vector<SinglyLinkedList*> sllVect;
sllVect.push_back(llist);
auto sll_itr = sllVect.begin();
string temp_hold[3];
// to determine new sll creation
bool prev = false;
bool now = false;
//input
int num1, num2;
cin >> num1; cin >> num2;
//read input in
for (int i = 0; i <= num1; ++i) {
string line1, test1;
getline(cin, line1);
test1 = line1.substr(line1.find("<") + 1);
//determine to create a new linked list or wait
if (test1[0] == '/') {
prev = now;
now = true;
} else {
//make a node for the data and add to current linked list
if (i > 0) {
prev = now;
now = false;
//if last statement starts with '</' and current statment starts with '<'
// then start a new sll and increment pointer to vector<SinglyLinkedList*>
if (prev && !now) {
SinglyLinkedList *llisttemp = new SinglyLinkedList();
sllVect.push_back(llisttemp);
sll_itr++;
}
}
//parse strings from line
int j = 0;
vector<string> datastr;
vector<char> data;
char test = test1[j];
while (test) {
if (isspace(test) || test == '>') {
string temp_for_vect(data.begin(),data.end());
if (!temp_for_vect.empty()) {
datastr.push_back(temp_for_vect);
}
data.clear();
} else
if (!isalnum(test)) {
} else {
data.push_back(test);
}
j++;
test = test1[j];
}
//each node has 3 strings to fill
int count = 0;
for (auto itrs = datastr.begin(); itrs!=datastr.end(); ++itrs) {
switch (count) {
case 0:
temp_hold[count]=(*itrs);
break;
case 1:
temp_hold[count]=(*itrs);
break;
case 2:
temp_hold[count]=(*itrs);
break;
default:
break;
}
count++;
}
}
cout << "before storing node" << endl;
(*sll_itr)->insert_node(temp_hold[0], temp_hold[1], temp_hold[2]);
cout << "after" << endl;
}
cout << "AFTER ELSE" << endl;
return 0;
}
And here is the line that breaks the code. The auto sll_itr is dereferenced which means *sll_itr is now a SinglyLinkedList* and we can call the insert_node(string, string, string) to add a node to the current linked list. However when I keep the line, anything after the else statement brace does not run, which means the cout<<"AFTER ELSE"<< endl; does not fire. If I remove the insert_node line, then the program runs the cout<<"AFTER ELSE"<< endl; I am unsure what the issue is.
(*sll_itr)->insert_node(temp_hold[0],temp_hold[1],temp_hold[2]);
cout << "after" << endl;
} //NOT HANGING. This closes an else statement.
cout << "AFTER ELSE" << endl;
return 0;
}
Compiled as g++ -o myll mylinkedlist.cpp and then myll.exe < input.txt And input.txt contains
8 3
<tag1 value = "HelloWorld">
<tag2 name = "Name2">
</tag2>
</tag1>
<tag5 name = "Name5">
</tag5>
<tag6 name = "Name6">
</tag6>
Your linked list isn't the problem, at least not the problem here.
A recipe for disaster in the making: retaining, referencing, and potentially manipulating, an iterator on a dynamic collection that potentially invalidates iterators on container-modification. Your code does just that. tossing out all the cruft between:
vector<SinglyLinkedList*> sllVect;
sllVect.push_back(llist);
auto sll_itr = sllVect.begin();
....
SinglyLinkedList *llisttemp = new SinglyLinkedList();
sllVect.push_back(llisttemp); // HERE: INVALIDATES sll_iter on internal resize
sll_itr++; // HERE: NO LONGER GUARANTEED VALID; operator++ CAN INVOKE UB
To address this, you have two choices:
Use a container that doesn't invalidate iterators on push_back. There are really only two sequence containers that fit that description: std::forward_list and std::list.
Alter your algorithm to reference by index`, not by iterator. I.e. man your loop to iterate until the indexed element reaches end-of-container, then break.
An excellent discussion about containers that do/do-not invalidate pointers and iterators can be found here. It's worth a read.

C++ binary search tree creates segmentation fault

I'm trying to make a program that identifies AVR assembly instructions by opcode, since those are just a list of 1's and 0's I thought it would be a good project to make a binary search tree for.
Sadly I keep getting segmentation faults when trying to search through the tree. As I understand it a seg fault is usually the result of trying to do stuff with a pointer that doesn't point to anything, but since I have a Boolean that I check first that should never happen.
I'm pretty sure it has something to do with the way I use pointers, as I'm not very experienced with those. But I can't seem to figure out what's going wrong.
Below is the code involved (SearchTree is only a global variable in this minimal example, not in the real program.):
The code:
#include <iostream>
void ADD(short &code) {std::cout << code << "\n";}
void LDI(short &code) {std::cout << code << "\n";}
void SBRC(short &code){std::cout << code << "\n";}
struct node
{
void(* instruct)(short &code);
bool hasInst = false;
struct node *zero;
bool hasZero = false;
struct node *one;
bool hasOne = false;
};
node SearchTree;
auto parseOpcode(short code, node *currentRoot)
{
std::cout << "Looking for a: " << ((code >> 15) & 0b01 == 1) << std::endl;
std::cout << "Current node 1: " << (*currentRoot).hasOne << std::endl;
std::cout << "Current node 0: " << (*currentRoot).hasZero << std::endl;
// Return instruction if we've found it.
if ((*currentRoot).hasInst) return (*currentRoot).instruct;
// Case current bit == 1.
else if ((code >> 15) & 0b01 == 1)
{
if ((*currentRoot).hasOne) return parseOpcode((code << 1), (*currentRoot).one);
else throw "this instruction does not exist";
}
// Case current bit == 0.
else {
if ((*currentRoot).hasZero) return parseOpcode((code << 1), (*currentRoot).zero);
else throw "this instruction does not exist";
}
}
void addopcode(void(& instruct)(short &code), int opcode, int codeLength)
{
node *latest;
latest = &SearchTree;
for (int i = 0; i <= codeLength; i++)
{
// Add function pointer to struct if we hit the bottom.
if (i == codeLength)
{
if ((*latest).hasInst == false)
{
(*latest).instruct = &instruct;
(*latest).hasInst = true;
}
}
// Case 1
else if (opcode >> (codeLength - 1 - i) & 0b01)
{
if ((*latest).hasOne)
{
latest = (*latest).one;
}
else{
node newNode;
(*latest).one = &newNode;
(*latest).hasOne = true;
latest = &newNode;
}
}
// Case 0
else {
if ((*latest).hasZero)
{
latest = (*latest).zero;
}
else{
node newNode;
(*latest).zero = &newNode;
(*latest).hasZero = true;
latest = &newNode;
}
}
}
}
int main()
{
addopcode(ADD, 0b000011, 6);
addopcode(LDI, 0b1110, 4);
addopcode(SBRC, 0b1111110, 7);
short firstOpcode = 0b1110000000010011;
void(* instruction)(short &code) = parseOpcode(firstOpcode, &SearchTree);
instruction(firstOpcode);
return 0;
}
EDIT: I still had some #includes at the top of my file that linked to code I didn't put on StackOverflow.
The error happened because I forgot to use the new keyword and was therefor populating my search tree with local variables (which were obviously now longer around by the time I started searching through the tree).
Fixed by using:
node *newNode = new node();
(*latest).one = newNode;
(*latest).hasOne = true;
latest = newNode;
Instead of:
node newNode;
(*latest).one = &newNode;
(*latest).hasOne = true;
latest = &newNode;

Returning name of lowest node

First of all, this is part of a university course, so whilst a copy-paste solution would do, I'm looking for a bit more depth. I'll be seeing my supervisor tomorrow anyways though.
Now onto the problem. I am implementing Dijkstra's algorithm for 5 linked nodes, A-E, which have their associated costs and links stored in a vector;
struct Node
{
char nodeLink; //adjacent link
int cost; //cost of a link
}; //to use in Dijkstra algorithm
class HeadNode
{
public:
char Name;
bool Visited;
vector<Node> nodes;
HeadNode(char x) { Name = x; Visited = false; }
};
class Graph
{
char Start = 'A';
char StartNode;
char CurrentNode;
char Destination = 'E';
int TotalCost = 0;
vector<HeadNode> hnode;
vector<char> path;
vector<int> weight;
public:
Graph();
void createHeadNode(char X);
void createAdjMatrix();
char LeastDistance(char node);
void printAdjMatrix();
void Dijkstra(char StartNode);
char GetStartNode();
};
int main()
{
Graph graph;
graph.createHeadNode('A');
graph.createHeadNode('B');
graph.createHeadNode('C');
graph.createHeadNode('D');
graph.createHeadNode('E');
graph.createAdjMatrix();
//graph.printAdjMatrix();
graph.Dijkstra(graph.GetStartNode());
system("pause");
return 0;
}
Graph::Graph()
{
}
void Graph::createHeadNode(char x)
{
hnode.push_back(x);
}
In order to properly implement the algorithm, I have created a precursor function, LeastDistance(), within the class graph. I also have a function to get the start node, but that isn't particularly important here;
char Graph::LeastDistance(char node)
{
int smallest = 9999;
char smallestNode;
for (int i = 0; i < hnode.size(); i++)
{
for (int j = 0; j < hnode[i].nodes.size(); ++j)
{
if ((node == hnode[i].Name) && (hnode[i].nodes[j].cost <= smallest) && (hnode[i].Visited == false))
{
smallest = hnode[i].nodes[j].cost;
smallestNode = hnode[i].nodes[j].nodeLink;
}
else
{
hnode[i].Visited = true;
break;
}
}
}
TotalCost = TotalCost + smallest;
return(smallestNode);
}
void Graph::Dijkstra(char StartNode)
{
CurrentNode = StartNode;
if (CurrentNode == Destination)
{
cout << "the start is the destination, therefore the cost will be 0." << endl;
}
else
{
while(true)
{
if (CurrentNode != Destination)
{
CurrentNode = LeastDistance(StartNode);
cout << CurrentNode << "<-";
}
else if (CurrentNode == Destination)
{
cout << endl;
cout << "The total cost of this path is:" << TotalCost;
TotalCost = 0;//reset cost
break;
}
}
}
}
My problem is that the LeastDistance fucntion appears always to return node C, leading to it being printed over and over, so it fills the console. So far, I have tried to debug using visual studio 2017, but I cant make much sense out of the watches. I have also tweaked the order of the breaks around, and tried to make sure the visited flag is being set to true. whether any precedence of operations is affecting this I am not sure.
Thanks in advance.
I would contend that there are multiple problems with the way you implement this... but I think the one that's causing you the problem you describe is the statement right here:
if (CurrentNode != Destination)
{
CurrentNode = LeastDistance(StartNode);
cout << CurrentNode << "<-";
}
Think about what this does. Let's say your first node isn't the one you're looking for, then you call least distance and find the next smallest node. Then you print it. Then you iterate on the while loop again only to find that CurrentNode isn't the one you're looking for, so you call LeastDistance(StartNode) again, which will return the exactly same value. Thus, you'll keep printing the same result which apparently is c.
Assuming everything else is correct, I think you want:
CurrentNode = LeastDistance(CurrentNode);

Why does my recursion not return to previous pointer?

I am working on an assignment in which we must create a 20-questions type game from a binary search tree. We read the tree in from a text file that is formatted like this:
Does it walk on 4 legs?
Does it fly?
*centipede?
Is it an insect?
*bird?
*butterfly?
Does it purr?
Does it howl?
*mouse?
*dog?
*cat?
Later, I am going to allow the user to add to this list. At the moment, however, I am unable to accurately read the list into a binary search tree. I have set it up so that (I think) it will use recursion and return to the previous "current" node pointer when it ends a loop of the function. Currently, however, the current node pointer remains the same.
The below function is passed a vector of the strings from the text file.
string line;
string guess;
bool start = true;
void buildTree(vector<string> gameData, Node* current, int &counter)
{
//fill node with question or answer
//recursive:
// add to the left until we encounter an asterisk
// add to the right
line = gameData[counter];
//if a question
if (line[0] != '*')
{
if (current->getData().empty())
{
current->setData(line);
cout << current->getData() << endl;
}
if (!start)
{
//if noChild is empty AND current isn't a guess, go to noChild
if ((current->getNo()->getData().empty())
&& (current->isGuess() == false))
{
current = current->getNo();
}
//otherwise, go to yes
else {
current = current->getYes();
}
}
while (counter < gameData.size())
{
if (!start) { counter++; }
start = false;
buildTree(gameData, current, counter);
}
}
//if a guess
else
{
//if data is full, go to no
if (current->getData().empty() == false)
{
current = current->getNo();
}
//otherwise, go to yes
else
{
//current = current->getYes();
for (int i = 1; i < line.size(); i++)
{
guess.push_back(line[i]);
}
current->setData(guess);
guess.clear();
cout << current->getData() << endl;
counter++;
current->setGuess(true);
}
}
}

Count word in trie implementation

I'm implementing a trie to implmenta spelling dictionary. The basic element of a trie is a trienode, which consists of a letter part (char), a flag(whether this char is the last char of a word), and an array of 26 pointers.
Private part of the TrieNode class include:
ItemType item;//char
bool isEnd;//flag
typedef TrieNode* TrieNodePtr;
TrieNodePtr myNode;
TrieNodePtr array[26];//array of pointers
This is part of the test call:
Trie t4 = Trie();
t4.insert("for");
t4.insert("fork");
t4.insert("top");
t4.insert("tops");
t4.insert("topsy");
t4.insert("toss");
t4.print();
cout << t4.wordCount() << endl;
Right now I'm trying to traverse the trie to count how many words there are (how many flags are set to true).
size_t TrieNode::wordCount() const{
for (size_t i = 0; i < 26; i++){
if (array[i] == nullptr){
return 0;
}
if (array[i]->isEnd && array[i] != nullptr){
cout << "I'm here" << endl;
return 1 + array[i]->wordCount();
}
else if(!array[i]->isEnd && array[i]!=nullptr){
cout << "I'm there" << endl;
return 0 + array[i]->wordCount();
}
else{
// do nothing
}
}
}
Every time the function returns 0. I know it's because when the first element in the array is null, then the function exits, so the count is always 0. But I don't know how to avoid this, since every time I have start from the first pointer. I also get a warning:not all control paths return a value. I'm not sure where this comes from. How do I make the function continue to the next pointer in the array if the current pointer is null? Is there a more efficient way to count words? Thank you!
Here is a simple and clear way to do it(using depth-first search):
size_t TrieNode::wordCount() const {
size_t result = isEnd ? 1 : 0;
for (size_t i = 0; i < 26; i++){
if (array[i] != null)
result += array[i]->wordCount();
return result;
}