Trie only inserting first letter of a word, not the whole word - c++

I am currently working a program where I am inserting words into a trie. Currently my insert function only adds in the first letter of the word and then stops. From everything I have looked up, my code looks correct, so I don't understand what the issue is.
I have tried moving the temp-> wordEnd = true to the outside of the for loop and within different locations in the function. For I believe this is the problem, due to everything else in my insert function looking correct.
Here is my insert function:
bool Trie::insert(string word)
{
TrieNode *temp = root;
temp->prefixAmount++;
for (int i = 0; i < word.length(); ++i)
{
int currentLetter = (int)word[i] - (int)'a';
if (temp->child[currentLetter] == NULL)
{
temp->child[currentLetter] = new TrieNode();
temp->child[currentLetter]->prefixAmount++;
temp = temp->child[currentLetter];
}
temp->wordEnd = true;
return true;
}
}
Also to help everyone follow my code a little bit better
Here is my TrieNode struct:
struct TrieNode
{
int prefixAmount;
struct TrieNode *child[ALPHA_SIZE];
bool wordEnd;
};
And here is my Trie constructor:
Trie::Trie()
{
root = new TrieNode();
root->wordEnd = false;
root->prefixAmount = 0;
}
The expected results are suppose to be that the whole word get inserted.
What actually happens is that only the first letter of the word gets added.

I've reformatted the code for you, and now you should hopefully see the main issue.
You are returning at the end of the block within the for loop. This will mean that it runs the first iteration of the for loop and just return without considering the rest of the letters.
An easy fix would be to put the return outside the for loop but there is another issue that you dont properly update the Trie if the current letter is already in it. Your NULL check is correct, but you should only new up the TrieNode on NULL but you also want to run all subsequent lines even if its not NULL. Fixed code will look like:
bool Trie::insert(string word)
{
TrieNode *temp = root;
temp->prefixAmount++;
for (int i = 0; i < word.length(); ++i)
{
int currentLetter = (int)word[i] - (int)'a';
if (temp->child[currentLetter] == NULL)
{
temp->child[currentLetter] = new TrieNode();
}
temp->child[currentLetter]->prefixAmount++;
temp = temp->child[currentLetter];
}
temp->wordEnd = true;
return true;
}
(Other minor issues in the code outside the scope of the question - prefer nullptr to NULL, why return a bool if its always true, if your string contains anything outside of a-z then you'll read outside the array bounds, prefer unique_ptr and make_unqiue to raw new/delete).

Related

C++ Hash Table and Linked List Issues

So I'm trying to work on a project for my C++ class where I read a .txt file that has 53 lines of cities, states, and superfluous information afterwards.
(example: Port Jervis,NY,New York,36071,Orange,36071,41.3782,-74.6909,16410.0,1317)
After reading the file, I separate out the city name (example: Port Jervis) and state code (example: NY) and uses the value of the two letters in the state code as the key for a hash table of 13 elements. So N=13 + Y=24 = key of 37, and since the hash has 13 elements it's 37 % 13 = hash-key of 11.
So far so good and I'm able to get all that done correctly, however when it comes to displaying the results is where I'm running into an issue as each element of the hash-table is missing one link in the linked list. So it only displays 40 outputs of the 53, with 1 missing per element and I'm really not sure why.
So I e-mailed my professor my code and he said that my insert method is not correct which he believes is causing this error. My current insert method looks like
void insert(int key, string city, string state)//insert value
{
int hash = KeyModFunction(key); //function that's %13 for hash-key
Node* tmpInsert = new Node(key, city, state); //create node to work with
if(table[hash]==NULL)//checks if table is empty
{
table[hash] = tmpInsert; //if empty, make new node with key/city/state values
}
else//if not empty
{
Node *runner = table[hash]; //made node to run through the list
while(runner->next != NULL)//make it to the end
{
runner=runner->next; // go go go
}
runner->next = tmpInsert; //and point the end at the new node to be inserted
}
} //end insert
And my professor suggested it should look something more like
if(table[hash]->next == NULL)
{
table[hash]->next = tmpInsert;
table[hash]->myCity = city;
table[hash]->myState = state;
}
else
{
// You can figure out the else code based on the above
However, whenever I put that into my code, it no longer compiles and says there is a segment fault. But when I run it through a debugger it says "[Inferior 1 (process 5453) exited normally]" which I'm not going to lie, I'm not sure what the means and have been unable to find a concrete answer online for. But I'm assuming the exited normally is a good thing, however, nothing is displayed.
I've been beating my head against this all week trying to figure out a solution and it's finally come to the point where I know I'm just getting too in my own head about it, so I've come here hoping to find some guidance, advice, or at the very least someone to point me in the right direction. If more of my code is needed on here, let me know, I just didn't want to dump my whole project on here cause I legitimately want to figure it out instead of having someone just do it for me, but yeah, I'm stuck. Thanks in advance for any help!
****2:12PST - 5/17/2020 UPDATE****
So in all fairness the insert code was plucked and modified from other peoples code I've found online looking into how to do this, so that might be why it looks better than my professor (also I'm pretty sure he mention C++ isn't his most familiar language). And yes, we are supposed to implement the hash table ourselves.
So here is the full program:
class Node{
public:
int key;
string myCity;
string myState;
Node *next;
Node(int key, string myCity, string myState)//constructor
{
this->key = key;
this->myCity = myCity;
this->myState = myState;
this->next = NULL;
}
};//end Node
class Hash{
private:
int BUCKET; //number of over all values
Node** table;
public:
//Constructor
Hash(int V)
{
this->BUCKET = V; //setting the BUCKET size to max number of enteries
table = new Node*[BUCKET]; //create table with size of BUCKET
for(int i = 0; i < BUCKET; i++) //fill table with NULL values
{
table[i] = NULL;
}
} //end constructor
//KeyModFunction
int KeyModFunction(int x) //getting the hash key value
{
return (x % BUCKET);
} //end KeyModFunction
//Insert Function
void insert(int key, string city, string state)//insert value
{
int hash = KeyModFunction(key); //function that's %13 for hash-key
Node* tmpInsert = new Node(key, city, state); //create node to work with
if(table[hash]==NULL)//checks if table is empty
{
table[hash] = tmpInsert; //if empty, make new node with key/city/state values
}
else//if not empty
{
Node *runner = table[hash]; //made node to run through the list
while(runner->next != NULL)//make it to the end
{
runner=runner->next; // go go go
}
runner->next = tmpInsert; //and point the end at the new node to be inserted
}
} //end insert
//Display function
void displayHash()
{
for(int loop = 0; loop < BUCKET; loop++)
{
cout<<loop;
if(table[loop]->next != NULL)
{
Node* tmp;
tmp = table[loop]->next;
do
{
cout<<" -->"<<tmp->myCity<<"/"<<tmp->myState;
tmp = tmp->next;
}while(tmp!=NULL);
}
cout<<endl;
}
}//end displayHash
}; //end Hash Class
int main() {
cout << "CSP 31B - Read and Process Assignment\n\n";
char myAlpha[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; //the key array for all the letter values
Hash myTbl(13); //create hashmap with BUCKET size of 13
string fCity, fState, fExtra; //string variables to hold info
int key = 0; //hash value of the state code (two letters added together)
ifstream myfile("CityOut.txt");
while ( getline(myfile, fCity, ',') && getline(myfile, fState, ','))
{
getline(myfile, fExtra);
for(int i = 0; i < sizeof(myAlpha)/sizeof(myAlpha[0]); i++)
{
if(fState.at(0) == myAlpha[i])
{
key += i;
}
else if(fState.at(1) == myAlpha[i])
{
key += i;
}
}
int checkNum = 1;
cout << "DEBUGGER: City name: "<<fCity <<" State code: " << fState.at(0) << fState.at(1) <<" key = "<<key<<endl; //temporary statement for debugging purposes
myTbl.insert(key, fCity, fState);
key = 0; //reset hash number to zero for next line of CityOut.txt
}
cout<<endl<<endl<<endl;
myTbl.displayHash();
return 0;
}//end main
Then the output should look something like:
but each table element should have 1 more output
Your print code skips the first element of your hash table.
This code:
cout<<loop;
if(table[loop]->next != NULL)
{
Node* tmp;
tmp = table[loop]->next;
do
{
cout<<" -->"<<tmp->myCity<<"/"<<tmp->myState;
tmp = tmp->next;
}while(tmp!=NULL);
}
Should be :
cout<<loop;
if(table[loop] != NULL)
{
Node* tmp;
tmp = table[loop];
do
{
cout<<" -->"<<tmp->myCity<<"/"<<tmp->myState;
tmp = tmp->next;
}while(tmp!=NULL);
}

C++ Palindrome checking solution is tripped by one test case

Given a string s, check if it is possible to make it a palindrome by deleting AT MOST one character (meaning zero deletions is acceptable). String s will contain <50,000 lowercase alphabetical characters.
The code I wrote below passed 458/460 test cases, and it got stuck on one in particular with no obvious reason, returning false instead of true. The logic of the algorithm is simple, and I've tried moving conditionals around but nothing seems to change.
class Solution {
public:
bool ispalindrome; //holds result
bool validPalindrome(string s) {
bool candelete = true; //allows one delete
ispalindrome = true; //initial condition
int lcursor = 0;
int rcursor = s.length() - 1;
while(lcursor < rcursor && ispalindrome){
//if cursor points at different letters
if(s[lcursor] != s[rcursor]){
// if delete is still allowed and delete works
if(s[lcursor + 1] == s[rcursor] && candelete){
lcursor++;
candelete = false;
} else if (s[lcursor] == s[rcursor - 1] && candelete){
rcursor--;
candelete = false;
} else {
ispalindrome = false;
}
}
lcursor++;
rcursor--;
}
return ispalindrome;
}
};
The test case that trips this solution is as follows:
aguokepatgbnvfqmgmlcupuufxoohdfpgjdmysgvhmvffcnqxjjxqncffvmhvgsymdjgpfdhooxfuupuculmgmqfvnbgtapekouga
Code testing with this testcase:
#include <iostream>
using std::string;
// class Solution { ... etc., from above
int main() {
string s = "aguokepatgbnvfqmgmlcupuufxoohdfpgjdmysgvhmvffcnqxjjxqncffvmhvgsymdjgpfdhooxfuupuculmgmqfvnbgtapekouga";
std::cout << Solution().validPalindrome(s) << std::endl;
};
If there is a case where the cursor points at different letters, and a character can be deleted from either the left or right cursors, your algorithm will only check with a delete from the left. If a palindrome is formed by deleting from the right, instead, your code will miss it.
So if you delete from the left, you need to also check if a delete from the right is possible and (potentially) check that if there is no palindrome when deleting from the left.

search function causes program to crash

I have been going through the debugger but can't seem to pinpoint exactly what is going wrong. I have come to my own conclusion i must be missing a nullptr check somewhere or something. If anyone can provide some help it would be greatly appreciated.
error message from debugger
error msg
which looks like makes the program crash on this line:
if (node->children_[index] == nullptr) {
search function
Node* search(const string& word, Node* node, int index) const {
Node* temp;
//same as recurssive lookup just difference is returns node weather terminal or not
if (index < word.length()) {
index = node->getIndex(word[index]);
if (node->children_[index] == nullptr) {
return nullptr;
}
else {
temp = search(word, node->children_[index], index++);
}
}
return temp; // this would give you ending node of partialWord
}
Node struct for reference
struct Node {
bool isTerminal_;
char ch_;
Node* children_[26];
Node(char c = '\0') {
isTerminal_ = false;
ch_ = c;
for (int i = 0; i < 26; i++) {
children_[i] = nullptr;
}
}
//given lower case alphabetic charachters ch, returns
//the associated index 'a' --> 0, 'b' --> 1...'z' --> 25
int getIndex(char ch) {
return ch - 'a';
}
};
Node* root_;
int suggest(const string& partialWord, string suggestions[]) const {
Node* temp;
temp = search(partialWord, root_, 0);
int count = 0;
suggest(partialWord, temp, suggestions, count);
return count;
}
Might be a very simple thing. Without digging I am not sure about the rank of the -> operator versus the == operator. I would take a second and try putting parenthesis around the "node->children_[index] == nullptr" part like this:
(node->children_[index]) == nullptr
just to make sure that the logic runs like you seem to intend.
Dr t
I believe the root cause is that you're using index for two distinct purposes: as an index into the word you're looking for, and as an index into the node's children.
When you get to the recursion, index has changed meaning, and it's all downhill from there.
You're also passing index++ to the recursion, but the value of index++ is the value it had before the increment.
You should pass index + 1.
[An issue in a different program would be that the order of evaluation of function parameters is unspecified, and you should never both modify a variable and use it in the same parameter list. (I would go so far as to say that you should never modify anything in a parameter list, but many disagree.)
But you shouldn't use the same variable here at all, so...]
I would personally restructure the code a little, something like this:
Node* search(const string& word, Node* node, int index) const {
// Return immediately on failure.
if (index >= word.length())
{
return nullptr;
}
int child_index = node->getIndex(word[index]);
// The two interesting cases: we either have this child or we don't.
if (node->children_[child_index] == nullptr) {
return nullptr;
}
else {
return search(word, node->children_[child_index], index + 1);
}
}
(Side note: returning a pointer to a non-const internal Node from a const function is questionable.)

Properly exiting out of recursions?

TrieNode and Trie Object:
struct TrieNode {
char nodeChar = NULL;
map<char, TrieNode> children;
TrieNode() {}
TrieNode(char c) { nodeChar = c; }
};
struct Trie {
TrieNode *root = new TrieNode();
typedef pair<char, TrieNode> letter;
typedef map<char, TrieNode>::iterator it;
Trie(vector<string> dictionary) {
for (int i = 0; i < dictionary.size(); i++) {
insert(dictionary[i]);
}
}
void insert(string toInsert) {
TrieNode * curr = root;
int increment = 0;
// while letters still exist within the trie traverse through the trie
while (curr->children.find(toInsert[increment]) != curr->children.end()) { //letter found
curr = &(curr->children.find(toInsert[increment])->second);
increment++;
}
//when it doesn't exist we know that this will be a new branch
for (int i = increment; i < toInsert.length(); i++) {
TrieNode temp(toInsert[i]);
curr->children.insert(letter(toInsert[i], temp));
curr = &(curr->children.find(toInsert[i])->second);
if (i == toInsert.length() - 1) {
temp.nodeChar = NULL;
curr->children.insert(letter(NULL, temp));
}
}
}
vector<string> findPre(string pre) {
vector<string> list;
TrieNode * curr = root;
/*First find if the pre actually exist*/
for (int i = 0; i < pre.length(); i++) {
if (curr->children.find(pre[i]) == curr->children.end()) { //DNE
return list;
}
else {
curr = &(curr->children.find(pre[i])->second);
}
}
/*Now curr is at the end of the prefix, now we will perform a DFS*/
pre = pre.substr(0, pre.length() - 1);
findPre(list, curr, pre);
}
void findPre(vector<string> &list, TrieNode *curr, string prefix) {
if (curr->nodeChar == NULL) {
list.push_back(prefix);
return;
}
else {
prefix += curr->nodeChar;
for (it i = curr->children.begin(); i != curr->children.end(); i++) {
findPre(list, &i->second, prefix);
}
}
}
};
The problem is this function:
void findPre(vector<string> &list, TrieNode *curr, string prefix) {
/*if children of TrieNode contains NULL char, it means this branch up to this point is a complete word*/
if (curr->nodeChar == NULL) {
list.push_back(prefix);
}
else {
prefix += curr->nodeChar;
for (it i = curr->children.begin(); i != curr->children.end(); i++) {
findPre(list, &i->second, prefix);
}
}
}
The purpose is to return all words with the same prefix from a trie using DFS. I manage to retrieve all the necessary strings but I can't exit out of the recursion.
The code completes the last iteration of the if statement and breaks. Visual Studio doesn't return any error code.
The typical end to a recursion is just as you said- return all words. A standard recursion looks something like this:
returnType function(params...){
//Do stuff
if(need to recurse){
return function(next params...);
}else{ //This should be your defined base-case
return base-case;
}
The issue arises in that your recursive function can never return- it can either execute the push_back, or it can call itself again. Neither of these seems to properly exit, so it'll either end quietly (with an inferred return of nothing), or it'll keep recursing.
In your situation, you likely need to store the results from recursion in an intermediate structure like a list or such, and then return that list after iteration (since it's a tree search and ought to check all the children, not return the first one only)
On that note, you seem to be missing part of the point of recursions- they exist to fill a purpose: break down a problem into pieces until those pieces are trivial to solve. Then return that case and build back to a full solution. Any tree-searching must come from this base structure, or you may miss something- like forgetting to return your results.
Check the integrity of your Trie structure. The function appears to be correct. The reason why it wouldn't terminate is if one or more of your leaf nodes doesn't have curr->nodeChar == NULL.
Another case is that any node (leaf or non-leaf) has a garbage child node. This will cause the recursion to break into reading garbage values and no reason to stop. Running in debug mode should break the execution with segmentation fault.
Write another function to test if all leaf-nodes have NULL termination.
EDIT:
After posting the code, the original poster has already pointed out that the problem was that he/she was not returning the list of strings.
Apart from that, there are a few more suggestions I would like to provide based on the code:
How does this while loop terminate if toInsert string is already in the Trie.
You will overrun the toInsert string and read a garbage character.
It will exit after that, but reading beyond your string is a bad way to program.
// while letters still exist within the trie traverse through the trie
while (curr->children.find(toInsert[increment]) != curr->children.end())
{ //letter found
curr = &(curr->children.find(toInsert[increment])->second);
increment++;
}
This can be written as follows:
while (increment < toInsert.length() &&
curr->children.find(toInsert[increment]) != curr->children.end())
Also,
Trie( vector<string> dictionary)
should be
Trie( const vector<string>& dictionary )
because dictionary can be a large object. If you don't pass by reference, it will create a second copy. This is not efficient.
I am a idiot. I forgot to return list on the first findPre() function.
vector<string> findPre(string pre) {
vector<string> list;
TrieNode * curr = root;
/*First find if the pre actually exist*/
for (int i = 0; i < pre.length(); i++) {
if (curr->children.find(pre[i]) == curr->children.end()) { //DNE
return list;
}
else {
curr = &(curr->children.find(pre[i])->second);
}
}
/*Now curr is at the end of the prefix, now we will perform a DFS*/
pre = pre.substr(0, pre.length() - 1);
findPre(list, curr, pre);
return list; //<----- this thing
}

C++ Binary search tree compare data of nodes and remove duplicates

I have created a binary search tree in c++ and have loaded it up with two types of data, strings and ints. I am reading a text file and loading the tree up alphabetically with the words I am pulling, and also the number of the line the word is found on. I am able to print the words and the numbers just fine. What I am wanting to do now is check to see if a word has already been printed, and if it has then I will only print out the number of the line from which the word is found on. The way I am thinking about doing this is by comparing previous data as the tree is traversed and printed. This is my print function.
void inOrderPrint(Node *rootPtr ) {
if ( rootPtr != NULL ) {
for (int i =0; rootPtr->data[i]; i++){
while(ispunct(rootPtr->data[i]))
rootPtr->data.erase(i,1);
}
rootPtr->data = rootPtr->data.substr(0,10);
inOrderPrint( rootPtr->left );
cout << (rootPtr->data)<<rootPtr->lineNum <<endl;
inOrderPrint( rootPtr->right );
}
}
This is what I was thinking:
if (rootPtr->data == previous rootPtr->data)
cout<<setw(10)<<theCurrentNode lineNum;
else
do normal printing
I think that if this function were to run on the first node and it compares it to the non existent previous node, it would automatically try to compare it to NULL, the if statement would return false and it would move on to the else.
Any suggestions on how to go about doing this with actual c++ syntax? Or does anyone see a flaw in my logic?
Thanks in advance!
This answer will describe how to make the program print unique entries and the line number of the first occurrence in the file. If there are duplicate occurrences it will print only the line number of the first occurrence for each duplicate occurrence. The approach is to make sure that there are no duplicate nodes in the tree and to count redundant occurrences.
To do this we might modify the node structure as follows:
struct Node{
string data;
int lineNum;
int count =1;
Node* left;
Node* right;
};
The function Insert might be edited to count duplicates like this:
Node* Insert(Node* rootPtr,string data,int lineNum){
if(rootPtr == NULL){
rootPtr = GetNewNode(data,lineNum);
for (int i =0; rootPtr->data[i]; i++){
while(ispunct(rootPtr->data[i]))
rootPtr->data.erase(i,1);
}
rootPtr->data = rootPtr->data.substr(0,10);
return rootPtr;
}
else if(data< rootPtr->data){
rootPtr->left = Insert(rootPtr->left,data,lineNum);
for (int i =0; rootPtr->data[i]; i++){
while(ispunct(rootPtr->data[i]))
rootPtr->data.erase(i,1);
}
rootPtr->data = rootPtr->data.substr(0,10);
}
else if(data > rootPtr->data) {
rootPtr->right = Insert(rootPtr->right,data,lineNum);
for (int i =0; rootPtr->data[i]; i++){
while(ispunct(rootPtr->data[i]))
rootPtr->data.erase(i,1);
}
rootPtr->data = rootPtr->data.substr(0,10);
}
else if(data == rootPtr->data)
++rootPtr->count;
return rootPtr;
}
Finally the print function can be modified:
void inOrderPrint(Node *rootPtr ) {
//ofstream outputFile;
//outputFile.open("Output.txt");
if ( rootPtr != NULL ) {
inOrderPrint( rootPtr->left );
cout << (rootPtr->data)<<" " << rootPtr->lineNum <<endl;
int j =rootPtr->count;
while( --j )
cout << rootPtr->lineNum <<endl;
//outputFile << (rootPtr->data)<<rootPtr->lineNum <<endl;
inOrderPrint( rootPtr->right );
}
}
Now this should be much closer to what you want. It would also be a good idea to separate the text processing from the node processing. (This answer sort of assumes that you will take care of that.) Otherwise duplicate nodes will be created if the preprocessed text does not match the processed text.
Good luck!