C++ Hash Table and Linked List Issues - c++

So I'm trying to work on a project for my C++ class where I read a .txt file that has 53 lines of cities, states, and superfluous information afterwards.
(example: Port Jervis,NY,New York,36071,Orange,36071,41.3782,-74.6909,16410.0,1317)
After reading the file, I separate out the city name (example: Port Jervis) and state code (example: NY) and uses the value of the two letters in the state code as the key for a hash table of 13 elements. So N=13 + Y=24 = key of 37, and since the hash has 13 elements it's 37 % 13 = hash-key of 11.
So far so good and I'm able to get all that done correctly, however when it comes to displaying the results is where I'm running into an issue as each element of the hash-table is missing one link in the linked list. So it only displays 40 outputs of the 53, with 1 missing per element and I'm really not sure why.
So I e-mailed my professor my code and he said that my insert method is not correct which he believes is causing this error. My current insert method looks like
void insert(int key, string city, string state)//insert value
{
int hash = KeyModFunction(key); //function that's %13 for hash-key
Node* tmpInsert = new Node(key, city, state); //create node to work with
if(table[hash]==NULL)//checks if table is empty
{
table[hash] = tmpInsert; //if empty, make new node with key/city/state values
}
else//if not empty
{
Node *runner = table[hash]; //made node to run through the list
while(runner->next != NULL)//make it to the end
{
runner=runner->next; // go go go
}
runner->next = tmpInsert; //and point the end at the new node to be inserted
}
} //end insert
And my professor suggested it should look something more like
if(table[hash]->next == NULL)
{
table[hash]->next = tmpInsert;
table[hash]->myCity = city;
table[hash]->myState = state;
}
else
{
// You can figure out the else code based on the above
However, whenever I put that into my code, it no longer compiles and says there is a segment fault. But when I run it through a debugger it says "[Inferior 1 (process 5453) exited normally]" which I'm not going to lie, I'm not sure what the means and have been unable to find a concrete answer online for. But I'm assuming the exited normally is a good thing, however, nothing is displayed.
I've been beating my head against this all week trying to figure out a solution and it's finally come to the point where I know I'm just getting too in my own head about it, so I've come here hoping to find some guidance, advice, or at the very least someone to point me in the right direction. If more of my code is needed on here, let me know, I just didn't want to dump my whole project on here cause I legitimately want to figure it out instead of having someone just do it for me, but yeah, I'm stuck. Thanks in advance for any help!
****2:12PST - 5/17/2020 UPDATE****
So in all fairness the insert code was plucked and modified from other peoples code I've found online looking into how to do this, so that might be why it looks better than my professor (also I'm pretty sure he mention C++ isn't his most familiar language). And yes, we are supposed to implement the hash table ourselves.
So here is the full program:
class Node{
public:
int key;
string myCity;
string myState;
Node *next;
Node(int key, string myCity, string myState)//constructor
{
this->key = key;
this->myCity = myCity;
this->myState = myState;
this->next = NULL;
}
};//end Node
class Hash{
private:
int BUCKET; //number of over all values
Node** table;
public:
//Constructor
Hash(int V)
{
this->BUCKET = V; //setting the BUCKET size to max number of enteries
table = new Node*[BUCKET]; //create table with size of BUCKET
for(int i = 0; i < BUCKET; i++) //fill table with NULL values
{
table[i] = NULL;
}
} //end constructor
//KeyModFunction
int KeyModFunction(int x) //getting the hash key value
{
return (x % BUCKET);
} //end KeyModFunction
//Insert Function
void insert(int key, string city, string state)//insert value
{
int hash = KeyModFunction(key); //function that's %13 for hash-key
Node* tmpInsert = new Node(key, city, state); //create node to work with
if(table[hash]==NULL)//checks if table is empty
{
table[hash] = tmpInsert; //if empty, make new node with key/city/state values
}
else//if not empty
{
Node *runner = table[hash]; //made node to run through the list
while(runner->next != NULL)//make it to the end
{
runner=runner->next; // go go go
}
runner->next = tmpInsert; //and point the end at the new node to be inserted
}
} //end insert
//Display function
void displayHash()
{
for(int loop = 0; loop < BUCKET; loop++)
{
cout<<loop;
if(table[loop]->next != NULL)
{
Node* tmp;
tmp = table[loop]->next;
do
{
cout<<" -->"<<tmp->myCity<<"/"<<tmp->myState;
tmp = tmp->next;
}while(tmp!=NULL);
}
cout<<endl;
}
}//end displayHash
}; //end Hash Class
int main() {
cout << "CSP 31B - Read and Process Assignment\n\n";
char myAlpha[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; //the key array for all the letter values
Hash myTbl(13); //create hashmap with BUCKET size of 13
string fCity, fState, fExtra; //string variables to hold info
int key = 0; //hash value of the state code (two letters added together)
ifstream myfile("CityOut.txt");
while ( getline(myfile, fCity, ',') && getline(myfile, fState, ','))
{
getline(myfile, fExtra);
for(int i = 0; i < sizeof(myAlpha)/sizeof(myAlpha[0]); i++)
{
if(fState.at(0) == myAlpha[i])
{
key += i;
}
else if(fState.at(1) == myAlpha[i])
{
key += i;
}
}
int checkNum = 1;
cout << "DEBUGGER: City name: "<<fCity <<" State code: " << fState.at(0) << fState.at(1) <<" key = "<<key<<endl; //temporary statement for debugging purposes
myTbl.insert(key, fCity, fState);
key = 0; //reset hash number to zero for next line of CityOut.txt
}
cout<<endl<<endl<<endl;
myTbl.displayHash();
return 0;
}//end main
Then the output should look something like:
but each table element should have 1 more output

Your print code skips the first element of your hash table.
This code:
cout<<loop;
if(table[loop]->next != NULL)
{
Node* tmp;
tmp = table[loop]->next;
do
{
cout<<" -->"<<tmp->myCity<<"/"<<tmp->myState;
tmp = tmp->next;
}while(tmp!=NULL);
}
Should be :
cout<<loop;
if(table[loop] != NULL)
{
Node* tmp;
tmp = table[loop];
do
{
cout<<" -->"<<tmp->myCity<<"/"<<tmp->myState;
tmp = tmp->next;
}while(tmp!=NULL);
}

Related

Getting a floating point exception error while doing text frequency analysis?

So for a school project, we are being asked to do a word frequency analysis of a text file using dictionaries and bucket hashing. The output should be something like this:
$ ./stats < jabberwocky.txt
READING text from STDIN. Hit ctrl-d when done entering text.
DONE.
HERE are the word statistics of that text:
There are 94 distinct words used in that text.
The top 10 ranked words (with their frequencies) are:
1. the:19, 2. and:14, 3. !:11, 4. he:7, 5. in:6, 6. .:5, 7.
through:3, 8. my:3, 9. jabberwock:3, 10. went:2
Among its 94 words, 57 of them appear exactly once.
Most of the code has been written for us, but there are four functions we need to complete to get this working:
increment(dict D, std::str w) which will increment the count of a word or add a new entry in the dictionary if it isn't there,
getCount(dict D, std::str w) which fetches the count of a word or returns 0,
dumpAndDestroy(dict D) which dumps the words and counts of those words into a new array by decreasing order of count and deletes D's buckets off the heap, and returns the pointer to that array,
rehash(dict D, std::str w) which rehashes the function when needed.
The structs used are here for reference:
// entry
//
// A linked list node for word/count entries in the dictionary.
//
struct entry {
std::string word; // The word that serves as the key for this entry.
int count; // The integer count associated with that word.
struct entry* next;
};
// bucket
//
// A bucket serving as the collection of entries that map to a
// certain location within a bucket hash table.
//
struct bucket {
entry* first; // It's just a pointer to the first entry in the
// bucket list.
};
// dict
//
// The unordered dictionary of word/count entries, organized as a
// bucket hash table.
//
struct dict {
bucket* buckets; // An array of buckets, indexed by the hash function.
int numIncrements; // Total count over all entries. Number of `increment` calls.
int numBuckets; // The array is indexed from 0 to numBuckets.
int numEntries; // The total number of entries in the whole
// dictionary, distributed amongst its buckets.
int loadFactor; // The threshold maximum average size of the
// buckets. When numEntries/numBuckets exceeds
// this loadFactor, the table gets rehashed.
};
I've written these functions, but when I try to run it with a text file, I get a Floating point exception error. I've emailed my professor for help, but he hasn't replied. This project is due very soon, so help would be much appreciated! My written functions for these are as below:
int getCount(dict* D, std::string w) {
int stringCount;
int countHash = hashValue(w, numKeys(D));
bucket correctList = D->buckets[countHash];
entry* current = correctList.first;
while (current != nullptr && current->word < w) {
if (current->word == w) {
stringCount = current->count;
}
current = current->next;
}
std::cout << "getCount working" << std::endl;
return stringCount;
}
void rehash(dict* D) {
// UNIMPLEMENTED
int newSize = (D->numBuckets * 2) + 1;
bucket** newArray = new bucket*[newSize];
for (int i = 0; i < D->numBuckets; i++) {
entry *n = D->buckets->first;
while (n != nullptr) {
entry *tmp = n;
n = n->next;
int newHashValue = hashValue(tmp->word, newSize);
newArray[newHashValue]->first = tmp;
}
}
delete [] D->buckets;
D->buckets = *newArray;
std::cout << "rehash working" << std::endl;
return;
void increment(dict* D, std::string w) {
// UNIMPLEMENTED
int incrementHash = hashValue(w, numKeys(D));
entry* current = D->buckets[incrementHash].first;
if (current == nullptr) {
int originalLF = D->loadFactor;
if ((D->numEntries + 1)/(D->numBuckets) > originalLF) {
rehash(D);
int incrementHash = hashValue(w, numKeys(D));
}
D->buckets[incrementHash].first->word = w;
D->buckets[incrementHash].first->count++;
}
while (current != nullptr && current->word < w) {
entry* follow = current;
current = current->next;
if (current->word == w) {
current->count++;
}
}
std::cout << "increment working" << std::endl;
D->numIncrements++;
}
entry* dumpAndDestroy(dict* D) {
// UNIMPLEMENTED
entry* es = new entry[D->numEntries];
for (int i = 0; i < D->numEntries; i++) {
es[i].word = "foo";
es[i].count = 0;
}
for (int j = 0; j < D->numBuckets; j++) {
entry* current = D->buckets[j].first;
while (current != nullptr) {
es[j].word = current->word;
es[j].count = current->count;
current = current->next;
}
}
delete [] D->buckets;
std::cout << "dumpAndDestroy working" << std::endl;
return es;
A floating-point exception is usually caused by the code attempting to divide-by-zero (or attempting to modulo-by-zero, which implicitly causes a divide-by-zero). With that in mind, I suspect this line is the locus of your problem:
if ((D->numEntries + 1)/(D->numBuckets) > originalLF) {
Note that if D->numBuckets is equal to zero, this line will do a divide-by-zero. I suggest temporarily inserting a line like like
std::cout << "about to divide by " << D->numBuckets << std::endl;
just before that line, and then re-running your program; that will make the problem apparent, assuming it is the problem. The solution, of course, is to make sure your code doesn't divide-by-zero (i.e. by setting D->numBuckets to the appropriate value, or alternatively by checking to see if it is zero before trying to use it is a divisor)

Inserting a basic singly linked list node seems to break my c++ code?

Singly Linked List and Node classes and the start of the main function, where I wrote a brief outline of the code functionality. The issue is toward the end of the main function. I wrote '...' in place of what I believe to be irrelevant code because it simply parses strings and assigns them to the string temp_hold[3] array.
#include <bits/stdc++.h>
using namespace std;
class Node {
public:
string value;
string attr;
string tagname;
Node *next;
Node(string c_tagname, string c_attr, string c_value) {
this->attr = c_attr;
this->value = c_value;
this->tagname = c_tagname;
this->next = nullptr;
}
};
class SinglyLinkedList {
public:
Node *head;
Node *tail;
SinglyLinkedList() {
this->head = nullptr;
this->tail = nullptr;
}
void insert_node(string c_tagname, string c_attr,string c_value) {
Node *node = new Node(c_tagname,c_attr, c_value);
if (!this->head) {
this->head = node;
} else {
this->tail->next = node;
}
this->tail = node;
}
};
int main(int argc, char **argv) {
/* storage is a vector holding pointers to the linked lists
linked lists are created and the linked list iterator sll_itr is incremented when
previous line begins with '</' and the currentline begins with '<'
linked lists have nodes, which have strings corresponding to tagname, value, and attribute
*/
SinglyLinkedList *llist = new SinglyLinkedList();
vector<SinglyLinkedList*> sllVect;
sllVect.push_back(llist);
auto sll_itr = sllVect.begin();
string temp_hold[3];
// to determine new sll creation
bool prev = false;
bool now = false;
//input
int num1, num2;
cin >> num1; cin >> num2;
//read input in
for (int i = 0; i <= num1; ++i) {
string line1, test1;
getline(cin, line1);
test1 = line1.substr(line1.find("<") + 1);
//determine to create a new linked list or wait
if (test1[0] == '/') {
prev = now;
now = true;
} else {
//make a node for the data and add to current linked list
if (i > 0) {
prev = now;
now = false;
//if last statement starts with '</' and current statment starts with '<'
// then start a new sll and increment pointer to vector<SinglyLinkedList*>
if (prev && !now) {
SinglyLinkedList *llisttemp = new SinglyLinkedList();
sllVect.push_back(llisttemp);
sll_itr++;
}
}
//parse strings from line
int j = 0;
vector<string> datastr;
vector<char> data;
char test = test1[j];
while (test) {
if (isspace(test) || test == '>') {
string temp_for_vect(data.begin(),data.end());
if (!temp_for_vect.empty()) {
datastr.push_back(temp_for_vect);
}
data.clear();
} else
if (!isalnum(test)) {
} else {
data.push_back(test);
}
j++;
test = test1[j];
}
//each node has 3 strings to fill
int count = 0;
for (auto itrs = datastr.begin(); itrs!=datastr.end(); ++itrs) {
switch (count) {
case 0:
temp_hold[count]=(*itrs);
break;
case 1:
temp_hold[count]=(*itrs);
break;
case 2:
temp_hold[count]=(*itrs);
break;
default:
break;
}
count++;
}
}
cout << "before storing node" << endl;
(*sll_itr)->insert_node(temp_hold[0], temp_hold[1], temp_hold[2]);
cout << "after" << endl;
}
cout << "AFTER ELSE" << endl;
return 0;
}
And here is the line that breaks the code. The auto sll_itr is dereferenced which means *sll_itr is now a SinglyLinkedList* and we can call the insert_node(string, string, string) to add a node to the current linked list. However when I keep the line, anything after the else statement brace does not run, which means the cout<<"AFTER ELSE"<< endl; does not fire. If I remove the insert_node line, then the program runs the cout<<"AFTER ELSE"<< endl; I am unsure what the issue is.
(*sll_itr)->insert_node(temp_hold[0],temp_hold[1],temp_hold[2]);
cout << "after" << endl;
} //NOT HANGING. This closes an else statement.
cout << "AFTER ELSE" << endl;
return 0;
}
Compiled as g++ -o myll mylinkedlist.cpp and then myll.exe < input.txt And input.txt contains
8 3
<tag1 value = "HelloWorld">
<tag2 name = "Name2">
</tag2>
</tag1>
<tag5 name = "Name5">
</tag5>
<tag6 name = "Name6">
</tag6>
Your linked list isn't the problem, at least not the problem here.
A recipe for disaster in the making: retaining, referencing, and potentially manipulating, an iterator on a dynamic collection that potentially invalidates iterators on container-modification. Your code does just that. tossing out all the cruft between:
vector<SinglyLinkedList*> sllVect;
sllVect.push_back(llist);
auto sll_itr = sllVect.begin();
....
SinglyLinkedList *llisttemp = new SinglyLinkedList();
sllVect.push_back(llisttemp); // HERE: INVALIDATES sll_iter on internal resize
sll_itr++; // HERE: NO LONGER GUARANTEED VALID; operator++ CAN INVOKE UB
To address this, you have two choices:
Use a container that doesn't invalidate iterators on push_back. There are really only two sequence containers that fit that description: std::forward_list and std::list.
Alter your algorithm to reference by index`, not by iterator. I.e. man your loop to iterate until the indexed element reaches end-of-container, then break.
An excellent discussion about containers that do/do-not invalidate pointers and iterators can be found here. It's worth a read.

Properly exiting out of recursions?

TrieNode and Trie Object:
struct TrieNode {
char nodeChar = NULL;
map<char, TrieNode> children;
TrieNode() {}
TrieNode(char c) { nodeChar = c; }
};
struct Trie {
TrieNode *root = new TrieNode();
typedef pair<char, TrieNode> letter;
typedef map<char, TrieNode>::iterator it;
Trie(vector<string> dictionary) {
for (int i = 0; i < dictionary.size(); i++) {
insert(dictionary[i]);
}
}
void insert(string toInsert) {
TrieNode * curr = root;
int increment = 0;
// while letters still exist within the trie traverse through the trie
while (curr->children.find(toInsert[increment]) != curr->children.end()) { //letter found
curr = &(curr->children.find(toInsert[increment])->second);
increment++;
}
//when it doesn't exist we know that this will be a new branch
for (int i = increment; i < toInsert.length(); i++) {
TrieNode temp(toInsert[i]);
curr->children.insert(letter(toInsert[i], temp));
curr = &(curr->children.find(toInsert[i])->second);
if (i == toInsert.length() - 1) {
temp.nodeChar = NULL;
curr->children.insert(letter(NULL, temp));
}
}
}
vector<string> findPre(string pre) {
vector<string> list;
TrieNode * curr = root;
/*First find if the pre actually exist*/
for (int i = 0; i < pre.length(); i++) {
if (curr->children.find(pre[i]) == curr->children.end()) { //DNE
return list;
}
else {
curr = &(curr->children.find(pre[i])->second);
}
}
/*Now curr is at the end of the prefix, now we will perform a DFS*/
pre = pre.substr(0, pre.length() - 1);
findPre(list, curr, pre);
}
void findPre(vector<string> &list, TrieNode *curr, string prefix) {
if (curr->nodeChar == NULL) {
list.push_back(prefix);
return;
}
else {
prefix += curr->nodeChar;
for (it i = curr->children.begin(); i != curr->children.end(); i++) {
findPre(list, &i->second, prefix);
}
}
}
};
The problem is this function:
void findPre(vector<string> &list, TrieNode *curr, string prefix) {
/*if children of TrieNode contains NULL char, it means this branch up to this point is a complete word*/
if (curr->nodeChar == NULL) {
list.push_back(prefix);
}
else {
prefix += curr->nodeChar;
for (it i = curr->children.begin(); i != curr->children.end(); i++) {
findPre(list, &i->second, prefix);
}
}
}
The purpose is to return all words with the same prefix from a trie using DFS. I manage to retrieve all the necessary strings but I can't exit out of the recursion.
The code completes the last iteration of the if statement and breaks. Visual Studio doesn't return any error code.
The typical end to a recursion is just as you said- return all words. A standard recursion looks something like this:
returnType function(params...){
//Do stuff
if(need to recurse){
return function(next params...);
}else{ //This should be your defined base-case
return base-case;
}
The issue arises in that your recursive function can never return- it can either execute the push_back, or it can call itself again. Neither of these seems to properly exit, so it'll either end quietly (with an inferred return of nothing), or it'll keep recursing.
In your situation, you likely need to store the results from recursion in an intermediate structure like a list or such, and then return that list after iteration (since it's a tree search and ought to check all the children, not return the first one only)
On that note, you seem to be missing part of the point of recursions- they exist to fill a purpose: break down a problem into pieces until those pieces are trivial to solve. Then return that case and build back to a full solution. Any tree-searching must come from this base structure, or you may miss something- like forgetting to return your results.
Check the integrity of your Trie structure. The function appears to be correct. The reason why it wouldn't terminate is if one or more of your leaf nodes doesn't have curr->nodeChar == NULL.
Another case is that any node (leaf or non-leaf) has a garbage child node. This will cause the recursion to break into reading garbage values and no reason to stop. Running in debug mode should break the execution with segmentation fault.
Write another function to test if all leaf-nodes have NULL termination.
EDIT:
After posting the code, the original poster has already pointed out that the problem was that he/she was not returning the list of strings.
Apart from that, there are a few more suggestions I would like to provide based on the code:
How does this while loop terminate if toInsert string is already in the Trie.
You will overrun the toInsert string and read a garbage character.
It will exit after that, but reading beyond your string is a bad way to program.
// while letters still exist within the trie traverse through the trie
while (curr->children.find(toInsert[increment]) != curr->children.end())
{ //letter found
curr = &(curr->children.find(toInsert[increment])->second);
increment++;
}
This can be written as follows:
while (increment < toInsert.length() &&
curr->children.find(toInsert[increment]) != curr->children.end())
Also,
Trie( vector<string> dictionary)
should be
Trie( const vector<string>& dictionary )
because dictionary can be a large object. If you don't pass by reference, it will create a second copy. This is not efficient.
I am a idiot. I forgot to return list on the first findPre() function.
vector<string> findPre(string pre) {
vector<string> list;
TrieNode * curr = root;
/*First find if the pre actually exist*/
for (int i = 0; i < pre.length(); i++) {
if (curr->children.find(pre[i]) == curr->children.end()) { //DNE
return list;
}
else {
curr = &(curr->children.find(pre[i])->second);
}
}
/*Now curr is at the end of the prefix, now we will perform a DFS*/
pre = pre.substr(0, pre.length() - 1);
findPre(list, curr, pre);
return list; //<----- this thing
}

Solving leaky memory and syntax issues in a simple hash table

I'm implementing a basic hashtable. My logic for the table makes sense (at least to me), but I'm a bit rusty with my C++. My program returns a free memory error when I run it, but I can't seem to figure out where my problem is. I think is has to do with how I call the pointers in the various class functions.
#include <iostream>
#include <unordered_map>
#include <string>
#include <cmath>
#include <exception>
using namespace std;
int hashU(string in/*, int M*/){ //hThe hash function that utilizes a smal pseusorandom number
char *v = new char[in.size() + 1]; //generator to return an number between 0 and 50. (I arbitrarily chose 50 as the upper limit)
copy(in.begin(), in.end(), v); //First the input string is turned into a char* for use in the the function.
v[in.size()] = '\0';
int h, a = 31415, b = 27183;
for(h=0;*v!=0;v++,a=a*b%(49-1))
h = (a*h + *v)%50;
delete[] v; //Delete the char* to prevent leaky memory.
return (h<0) ? (h+50) : h; //Return number
}
struct hashNode{ //The node that will store the key and the values
string key;
float val;
struct hashNode *next;
};
struct hashLink{ //The linked list that will store additional keys and values should there be a collision.
public:
struct hashNode *start; //Start pointer
struct hashNode *tail; //Tail pointer
hashLink(){ //hashLink constructor
start=NULL;
tail=NULL;
}
void push(string key, float val); //Function to push values to stack. Used if there is a collision.
};
void hashLink::push(string key, float val){
struct hashNode *ptr;
ptr = new hashNode;
ptr->key = key;
ptr->val = val;
ptr->next = NULL;
if(start != NULL){
ptr->next = tail;
}
tail = ptr;
return;
}
struct hashTable{ //The "hash table." Creates an array of Linked Lists that are indexed by the values returned by the hash function.
public:
hashLink hash[50];
hashTable(){ //Constructor
}
void emplace(string in, float val); //Function to insert a new key and value into the table.
float fetch(string in); //Function to retrieve a stored key.
};
void hashTable::emplace(string in, float val){
int i = hashU(in); //Retrieve index of key from hash function.
hashNode *trav; //Create node traveler
trav = hash[i].start; //Set the traveler to the start of the desired linked list
while(trav!=hash[i].tail){ //Traverse the list searching to see if the input key already exists
if(trav->key.compare(in)==0){ //If the input key already exists, its associated value is updated, and the function returns.
trav->val = val;
return;
}
else //Travler moves to next node if the input key in not found.
trav = trav->next;
}
hash[i].push(in,val); //If the traveler does not see the input key, the request key must not exist and must be created by pushing the input key and associated value to the stack.
return;
}
float hashTable::fetch(string in){
int i = hashU(in); //Retrieve index of key
hashNode *trav; //Create node traveler and set it to the start of the appropriate list.
trav = hash[i].start;
while(trav!=hash[i].tail){ //Traverse the linked list searching for the requested key.
if(trav->key.compare(in)==0){ //If the the requested key is found, return the associated value.
return trav->val;
}
else
trav = trav->next; //If not found in the current node, move to the next.
}
return false; //If the requested key is not found, return false.
}
int main(){
hashTable vars; //initialize the hash table
float num = 5.23; //create test variable
vars.emplace("KILO",num);
cout<<vars.fetch("KILO")<<endl;
return 0;
}
The problem is that when you call delete[] v, you have advanced v such that it is pointing to the 0 at the end of the string, which is the wrong address to delete.
Also, you're wasting a lot of code unnecessarily copying the string out of where it is already available as a c-string.
unsigned int hashU(string in/*, int M*/) {
const char* v = in.c_str();
unsigned int h, a = 31415, b = 27183;
for(h=0;*v!=0;v++,a=a*b%(49-1))
h = (a*h + *v);
return h % 50;
}
for(h=0;*v!=0;v++,a=a*b%(49-1))
h = (a*h + *v)%50;
delete[] v; //Delete the char* to prevent leaky
You are incrementing v, then deleting an invalid memory location.

C++ Binary search tree compare data of nodes and remove duplicates

I have created a binary search tree in c++ and have loaded it up with two types of data, strings and ints. I am reading a text file and loading the tree up alphabetically with the words I am pulling, and also the number of the line the word is found on. I am able to print the words and the numbers just fine. What I am wanting to do now is check to see if a word has already been printed, and if it has then I will only print out the number of the line from which the word is found on. The way I am thinking about doing this is by comparing previous data as the tree is traversed and printed. This is my print function.
void inOrderPrint(Node *rootPtr ) {
if ( rootPtr != NULL ) {
for (int i =0; rootPtr->data[i]; i++){
while(ispunct(rootPtr->data[i]))
rootPtr->data.erase(i,1);
}
rootPtr->data = rootPtr->data.substr(0,10);
inOrderPrint( rootPtr->left );
cout << (rootPtr->data)<<rootPtr->lineNum <<endl;
inOrderPrint( rootPtr->right );
}
}
This is what I was thinking:
if (rootPtr->data == previous rootPtr->data)
cout<<setw(10)<<theCurrentNode lineNum;
else
do normal printing
I think that if this function were to run on the first node and it compares it to the non existent previous node, it would automatically try to compare it to NULL, the if statement would return false and it would move on to the else.
Any suggestions on how to go about doing this with actual c++ syntax? Or does anyone see a flaw in my logic?
Thanks in advance!
This answer will describe how to make the program print unique entries and the line number of the first occurrence in the file. If there are duplicate occurrences it will print only the line number of the first occurrence for each duplicate occurrence. The approach is to make sure that there are no duplicate nodes in the tree and to count redundant occurrences.
To do this we might modify the node structure as follows:
struct Node{
string data;
int lineNum;
int count =1;
Node* left;
Node* right;
};
The function Insert might be edited to count duplicates like this:
Node* Insert(Node* rootPtr,string data,int lineNum){
if(rootPtr == NULL){
rootPtr = GetNewNode(data,lineNum);
for (int i =0; rootPtr->data[i]; i++){
while(ispunct(rootPtr->data[i]))
rootPtr->data.erase(i,1);
}
rootPtr->data = rootPtr->data.substr(0,10);
return rootPtr;
}
else if(data< rootPtr->data){
rootPtr->left = Insert(rootPtr->left,data,lineNum);
for (int i =0; rootPtr->data[i]; i++){
while(ispunct(rootPtr->data[i]))
rootPtr->data.erase(i,1);
}
rootPtr->data = rootPtr->data.substr(0,10);
}
else if(data > rootPtr->data) {
rootPtr->right = Insert(rootPtr->right,data,lineNum);
for (int i =0; rootPtr->data[i]; i++){
while(ispunct(rootPtr->data[i]))
rootPtr->data.erase(i,1);
}
rootPtr->data = rootPtr->data.substr(0,10);
}
else if(data == rootPtr->data)
++rootPtr->count;
return rootPtr;
}
Finally the print function can be modified:
void inOrderPrint(Node *rootPtr ) {
//ofstream outputFile;
//outputFile.open("Output.txt");
if ( rootPtr != NULL ) {
inOrderPrint( rootPtr->left );
cout << (rootPtr->data)<<" " << rootPtr->lineNum <<endl;
int j =rootPtr->count;
while( --j )
cout << rootPtr->lineNum <<endl;
//outputFile << (rootPtr->data)<<rootPtr->lineNum <<endl;
inOrderPrint( rootPtr->right );
}
}
Now this should be much closer to what you want. It would also be a good idea to separate the text processing from the node processing. (This answer sort of assumes that you will take care of that.) Otherwise duplicate nodes will be created if the preprocessed text does not match the processed text.
Good luck!