Counting number of occurrences of a string in a Hash Table - c++

I am writing my own HashTable class in C++ and need to output to the user the number of occurrences of each string in the table. For example, if this is the input: testing, 1, 2, testing, and this is the hash table (done with chaining, and node pointers):
[0]->testing, testing
[1]->2
[2]->1
this would be the output to the user (the count, followed by the word):
2 testing
1 2
1 1
The problem I'm having is how to keep track of how many of each word is in the Hash Table, or how to find it. I started with this question but was unable to implement another array in my code.
I also tried the solution in this question, but it didn't work because of my use of pointers/chained hashing.
My question is, do I need to use a separate array of strings to keep track of what's already been used, or is there an easy way to recursively go through each index of the Hash Table and print out the number of occurrences of each string? I think I need to accomplish this in either my insert function or my printData function.
For reference, here is my code:
HashTable.h:
#include <string>
#include <iostream>
using namespace std;
struct Entry {
string word;
Entry* next;
};
class HashTable {
public:
HashTable();
HashTable(int);
int hash(string);
void insert(string);
void printData();
int getCapacity() const;
private:
//Member variables
int CAPACITY; // The initial capacity of the HashTable
Entry **data; // The array to store the data of strings (Entries)
};
HashTable.cpp:
#include "HashTable.h"
HashTable::HashTable()
{
CAPACITY = 0;
data = new Entry*[0];
}
HashTable::HashTable(int _cap)
{
CAPACITY = _cap;
data = new Entry*[_cap];
for (int i = 0; i < CAPACITY; i++) {
data[i] = new Entry;
data[i]->word = "empty";
data[i]->next = nullptr;
}
}
int HashTable::hash(string key)
{
int hash = 0;
for (unsigned int i = 0; i < key.length(); i++) {
hash = hash + (int)key[i];
}
return hash % CAPACITY;
}
void HashTable::insert(string entry)
{
int index = hash(entry);
if (data[index]->word == "empty") {
data[index]->word = entry;
} else {
Entry* temp = data[index];
Entry* e = new Entry;
e->word = entry;
e->next = nullptr;
while (temp->next != nullptr) {
temp = temp->next;
}
temp->next = e;
}
}
void HashTable::printData()
{
for (int i = 0; i < CAPACITY; i++) {
if (data[i]->next != nullptr) {
while(data[i]->next != nullptr) {
cout << data[i]->word << " -> ";
data[i] = data[i]->next;
}
cout << data[i]->word << endl;
} else {
cout << data[i]->word << endl;
}
}
}
int HashTable::getCapacity() const
{
return CAPACITY;
}
NOTE: I can't use any function/data structure from the standard C++ Library.

I only see two options here
Traverse entire linked list to count occurances. Use a map< string, int > to count occurances for each string.
You should make your linked list sorted. So when you insert a new node, you will insert it in its exact place. You can use strcmp for comparison. This way you can count every word exactly in one traverse and using just one integer variable, but your insert time and complexity will increase.

Related

Can't modify a string in C++ array

Trying to learn datastructures, I made this class for a stack. It works just fine with integers but it throws a mysterious error with strings.
The class List is the API for my stack. Its meant to resize automatically when it reaches the limit. The whole code is just for the sake of learning but the error I get doesn't make any sense and it happens somewhere in some assembly code.
#include <iostream>
#include<string>
using namespace std;
class List {
private:
int N = 0;
string* list = new string[1];
void resize(int sz) {
max = sz;
string* oldlist = list;
string* list = new string[max];
for (int i = 0; i < N; i++) {
list[i] = oldlist[i];
}
}
int max = 1;
public:
void push(string str) {
if (N == max) {
resize(2 * N);
}
cout << max << endl;
list[N] = str;
N++;
}
void pop() {
cout << list[--N] << endl;
}
};
int main()
{
string in;
List list;
while (true) {
cin >> in;
if (in == "-") {
list.pop();
}
else {
list.push(in);
}
}
}
string* list = new string[max]; in the resize method defines a new variable named list that "shadows", replaces, the member variable list. The member list goes unchanged and the local variable list goes out of scope at the end of the function, losing all of the work.
To fix: Change
string* list = new string[max];
to
list = new string[max];
so that the function will use the member variable.
Don't forget to delete[] oldlist; when you're done with it to free up the storage it points at.

Getting a floating point exception error while doing text frequency analysis?

So for a school project, we are being asked to do a word frequency analysis of a text file using dictionaries and bucket hashing. The output should be something like this:
$ ./stats < jabberwocky.txt
READING text from STDIN. Hit ctrl-d when done entering text.
DONE.
HERE are the word statistics of that text:
There are 94 distinct words used in that text.
The top 10 ranked words (with their frequencies) are:
1. the:19, 2. and:14, 3. !:11, 4. he:7, 5. in:6, 6. .:5, 7.
through:3, 8. my:3, 9. jabberwock:3, 10. went:2
Among its 94 words, 57 of them appear exactly once.
Most of the code has been written for us, but there are four functions we need to complete to get this working:
increment(dict D, std::str w) which will increment the count of a word or add a new entry in the dictionary if it isn't there,
getCount(dict D, std::str w) which fetches the count of a word or returns 0,
dumpAndDestroy(dict D) which dumps the words and counts of those words into a new array by decreasing order of count and deletes D's buckets off the heap, and returns the pointer to that array,
rehash(dict D, std::str w) which rehashes the function when needed.
The structs used are here for reference:
// entry
//
// A linked list node for word/count entries in the dictionary.
//
struct entry {
std::string word; // The word that serves as the key for this entry.
int count; // The integer count associated with that word.
struct entry* next;
};
// bucket
//
// A bucket serving as the collection of entries that map to a
// certain location within a bucket hash table.
//
struct bucket {
entry* first; // It's just a pointer to the first entry in the
// bucket list.
};
// dict
//
// The unordered dictionary of word/count entries, organized as a
// bucket hash table.
//
struct dict {
bucket* buckets; // An array of buckets, indexed by the hash function.
int numIncrements; // Total count over all entries. Number of `increment` calls.
int numBuckets; // The array is indexed from 0 to numBuckets.
int numEntries; // The total number of entries in the whole
// dictionary, distributed amongst its buckets.
int loadFactor; // The threshold maximum average size of the
// buckets. When numEntries/numBuckets exceeds
// this loadFactor, the table gets rehashed.
};
I've written these functions, but when I try to run it with a text file, I get a Floating point exception error. I've emailed my professor for help, but he hasn't replied. This project is due very soon, so help would be much appreciated! My written functions for these are as below:
int getCount(dict* D, std::string w) {
int stringCount;
int countHash = hashValue(w, numKeys(D));
bucket correctList = D->buckets[countHash];
entry* current = correctList.first;
while (current != nullptr && current->word < w) {
if (current->word == w) {
stringCount = current->count;
}
current = current->next;
}
std::cout << "getCount working" << std::endl;
return stringCount;
}
void rehash(dict* D) {
// UNIMPLEMENTED
int newSize = (D->numBuckets * 2) + 1;
bucket** newArray = new bucket*[newSize];
for (int i = 0; i < D->numBuckets; i++) {
entry *n = D->buckets->first;
while (n != nullptr) {
entry *tmp = n;
n = n->next;
int newHashValue = hashValue(tmp->word, newSize);
newArray[newHashValue]->first = tmp;
}
}
delete [] D->buckets;
D->buckets = *newArray;
std::cout << "rehash working" << std::endl;
return;
void increment(dict* D, std::string w) {
// UNIMPLEMENTED
int incrementHash = hashValue(w, numKeys(D));
entry* current = D->buckets[incrementHash].first;
if (current == nullptr) {
int originalLF = D->loadFactor;
if ((D->numEntries + 1)/(D->numBuckets) > originalLF) {
rehash(D);
int incrementHash = hashValue(w, numKeys(D));
}
D->buckets[incrementHash].first->word = w;
D->buckets[incrementHash].first->count++;
}
while (current != nullptr && current->word < w) {
entry* follow = current;
current = current->next;
if (current->word == w) {
current->count++;
}
}
std::cout << "increment working" << std::endl;
D->numIncrements++;
}
entry* dumpAndDestroy(dict* D) {
// UNIMPLEMENTED
entry* es = new entry[D->numEntries];
for (int i = 0; i < D->numEntries; i++) {
es[i].word = "foo";
es[i].count = 0;
}
for (int j = 0; j < D->numBuckets; j++) {
entry* current = D->buckets[j].first;
while (current != nullptr) {
es[j].word = current->word;
es[j].count = current->count;
current = current->next;
}
}
delete [] D->buckets;
std::cout << "dumpAndDestroy working" << std::endl;
return es;
A floating-point exception is usually caused by the code attempting to divide-by-zero (or attempting to modulo-by-zero, which implicitly causes a divide-by-zero). With that in mind, I suspect this line is the locus of your problem:
if ((D->numEntries + 1)/(D->numBuckets) > originalLF) {
Note that if D->numBuckets is equal to zero, this line will do a divide-by-zero. I suggest temporarily inserting a line like like
std::cout << "about to divide by " << D->numBuckets << std::endl;
just before that line, and then re-running your program; that will make the problem apparent, assuming it is the problem. The solution, of course, is to make sure your code doesn't divide-by-zero (i.e. by setting D->numBuckets to the appropriate value, or alternatively by checking to see if it is zero before trying to use it is a divisor)

C++ memory leak, where?

I'm having a problem with the code attached below. Essentially it generates a huge memory leak but I can't see where it happens.
What the code does is receiving an array of strings, called prints, containing numbers (nodes) separated by ',' (ordered by desc number of nodes), finding other compatible prints (compatible means that the other string has no overlapping nodes 0 excluded because every print contains it) and when all nodes are covered it calculates a risk function on the basis of a weighted graph. In the end it retains the solution having the lowest risk.
The problem is that leak you see in the picture. I really can't get where it comes from.
Here's the code:
#include "Analyzer.h"
#define INFINITY 999999999
// functions prototypes
bool areFullyCompatible(int *, int, string);
bool contains(int *, int, int);
bool selectionComplete(int , int);
void extractNodes(string , int *, int &, int);
void addNodes(int *, int &, string);
Analyzer::Analyzer(Graph *graph, string *prints, int printsLen) {
this->graph = graph;
this->prints = prints;
this->printsLen = printsLen;
this->actualResult = new string[graph->nodesNum];
this->bestResult = new string[graph->nodesNum];
this->bestReSize = INFINITY;
this->bestRisk = INFINITY;
this-> actualSize = -1;
}
void Analyzer::getBestResult(int &size) {
for (int i = 0; i < bestReSize; i++)
cout << bestResult[i] << endl;
}
void Analyzer::analyze() {
// the number of selected paths is at most equal to the number of nodes
int maxSize = this->graph->nodesNum;
float totRisk;
int *actualNodes = new int[maxSize];
int nodesNum;
bool newCycle = true;
for (int i = 0; i < printsLen - 1; i++) {
for (int j = i + 1; j < printsLen; j++) {
// initializing the current selection
if (newCycle) {
newCycle = false;
nodesNum = 0;
extractNodes(prints[i], actualNodes, nodesNum, maxSize);
this->actualResult[0] = prints[i];
this->actualSize = 1;
}
// adding just fully compatible prints
if (areFullyCompatible(actualNodes, nodesNum, prints[j])) {
this->actualResult[actualSize] = prints[j];
actualSize++;
addNodes(actualNodes, nodesNum, prints[j]);
}
if (selectionComplete(nodesNum, maxSize)) {
// it means it's no more a possible best solution with the minimum number of paths
if (actualSize > bestReSize) {
break;
}
// calculating the risk associated to the current selection of prints
totRisk = calculateRisk();
// saving the best result
if (actualSize <= bestReSize && totRisk < bestRisk) {
bestReSize = actualSize;
bestRisk = totRisk;
for(int k=0;k<actualSize; k++)
bestResult[k] = actualResult[k];
}
}
}
newCycle = true;
}
}
float Analyzer::calculateRisk() {
float totRisk = 0;
int maxSize = graph->nodesNum;
int *nodes = new int[maxSize];
int nodesNum = 0;
for (int i = 0; i < actualSize; i++) {
extractNodes(this->actualResult[i], nodes, nodesNum, maxSize);
// now nodes containt all the nodes from the print but 0, so I add it (it's already counted but misses)
nodes[nodesNum-1] = 0;
// at this point I use the graph to calculate the risk
for (int i = 0; i < nodesNum - 1; i++) {
float add = this->graph->nodes[nodes[i]].edges[nodes[i+1]]->risk;
totRisk += this->graph->nodes[nodes[i]].edges[nodes[i+1]]->risk;
//cout << "connecting " << nodes[i] << " to " << nodes[i + 1] << " with risk " << add << endl;
}
}
delete nodes;
return totRisk;
}
// -------------- HELP FUNCTIONS--------------
bool areFullyCompatible(int *nodes, int nodesNum, string print) {
char *node;
char *dup;
int tmp;
bool flag = false;
dup = strdup(print.c_str());
node = strtok(dup, ",");
while (node != NULL && !flag)
{
tmp = atoi(node);
if (contains(nodes, nodesNum, tmp))
flag = true;
node = strtok(NULL, ",");
}
// flag signals whether an element in the print is already contained. If it is, there's no full compatibility
if (flag)
return false;
delete dup;
delete node;
return true;
}
// adds the new nodes to the list
void addNodes(int *nodes, int &nodesNum, string print) {
char *node;
char *dup;
int tmp;
// in this case I must add the new nodes to the list
dup = strdup(print.c_str());
node = strtok(dup, ",");
while (node != NULL)
{
tmp = atoi(node);
if (tmp != 0) {
nodes[nodesNum] = tmp;
nodesNum++;
}
node = strtok(NULL, ",");
}
delete dup;
delete node;
}
// verifies whether a node is already contained in the nodes list
bool contains(int *nodes, int nodesNum, int node) {
for (int i = 0; i < nodesNum; i++)
if (nodes[i] == node)
return true;
return false;
}
// verifies if there are no more nodes to be added to the list (0 excluded)
bool selectionComplete(int nodesNum, int maxSize) {
return nodesNum == (maxSize-1);
}
// extracts nodes from a print add adds them to the nodes list
void extractNodes(string print, int *nodes, int &nodesNum, int maxSize) {
char *node;
char *dup;
int idx = 0;
int tmp;
dup = strdup(print.c_str());
node = strtok(dup, ",");
while (node != NULL)
{
tmp = atoi(node);
// not adding 0 because every prints contains it
if (tmp != 0) {
nodes[idx] = tmp;
idx++;
}
node = strtok(NULL, ",");
}
delete dup;
delete node;
nodesNum = idx;
}
You have forgotten to delete several things and used the wrong form of delete for arrays where you have remembered, e.g.
float Analyzer::calculateRisk() {
float totRisk = 0;
int maxSize = graph->nodesNum;
int *nodes = new int[maxSize];
//...
delete [] nodes; //<------- DO THIS not delete nodes
The simplest solution is to avoid using raw pointers and use smart ones instead. Or a std::vector if you just want to store stuff somewhere to index into.
You have new without corresponding delete
this->actualResult = new string[graph->nodesNum];
this->bestResult = new string[graph->nodesNum];
These should be deleted somewhere using delete [] ...
You allocate actualNodes in analyze() but you don't release the memory anywhere:
int *actualNodes = new int[maxSize];
In Addition, Analyzer::bestResult and Analyzer::actualResult are allocated in the constructor of Analyzer but not deallocated anywhere.
this->actualResult = new string[graph->nodesNum];
this->bestResult = new string[graph->nodesNum];
If you must use pointers, I really suggest to use smart pointers, e.g. std::unique_ptr and/or std::shared_ptr when using C++11 or later, or a Boost equivalent when using C++03 or earlier. Otherwise, using containers, e.g. std::vector is preferred.
PS: You're code also has a lot of mismatches in terms of allocation and deallocation. If memory is allocated using alloc/calloc/strdup... it must be freed using free. If memory is allocated using operator new it must be allocated with operator delete. If memory is allocated using operator new[] it must be allocated with operator delete[]. And I guess you certainly should not delete the return value of strtok.

Hash table implementation in C++

I am trying the following code for Hash table implementation in C++. The program compiles and accepts input and then a popup appears saying " the project has stopped working and windows is checking for a solution to the problem. I feel the program is going in the infinite loop somewhere. Can anyone spot the mistake?? Please help!
#include <iostream>
#include <stdlib.h>
#include <string>
#include <sstream>
using namespace std;
/* Definitions as shown */
typedef struct CellType* Position;
typedef int ElementType;
struct CellType{
ElementType value;
Position next;
};
/* *** Implements a List ADT with necessary functions.
You may make use of these functions (need not use all) to implement your HashTable ADT */
class List{
private:
Position listHead;
int count;
public:
//Initializes the number of nodes in the list
void setCount(int num){
count = num;
}
//Creates an empty list
void makeEmptyList(){
listHead = new CellType;
listHead->next = NULL;
}
//Inserts an element after Position p
int insertList(ElementType data, Position p){
Position temp;
temp = p->next;
p->next = new CellType;
p->next->next = temp;
p->next->value = data;
return ++count;
}
//Returns pointer to the last node
Position end(){
Position p;
p = listHead;
while (p->next != NULL){
p = p->next;
}
return p;
}
//Returns number of elements in the list
int getCount(){
return count;
}
};
class HashTable{
private:
List bucket[10];
int bucketIndex;
int numElemBucket;
Position posInsert;
string collision;
bool reportCol; //Helps to print a NO for no collisions
public:
HashTable(){ //constructor
int i;
for (i=0;i<10;i++){
bucket[i].setCount(0);
}
collision = "";
reportCol = false;
}
int insert(int data){
bucketIndex=data%10;
int col;
if(posInsert->next==NULL)
bucket[bucketIndex].insertList(data,posInsert);
else { while(posInsert->next != NULL){
posInsert=posInsert->next;
}
bucket[bucketIndex].insertList(data,posInsert);
reportCol=true;}
if (reportCol==true) col=1;
else col=0;
numElemBucket++;
return col ;
/*code to insert data into
hash table and report collision*/
}
void listCollision(int pos){
cout<< "("<< pos<< "," << bucketIndex << "," << numElemBucket << ")"; /*codeto generate a properly formatted
string to report multiple collisions*/
}
void printCollision();
};
int main(){
HashTable ht;
int i, data;
for (i=0;i<10;i++){
cin>>data;
int abc= ht.insert(data);
if(abc==1){
ht.listCollision(i);/* code to call insert function of HashTable ADT and if there is a collision, use listCollision to generate the list of collisions*/
}
//Prints the concatenated collision list
ht.printCollision();
}}
void HashTable::printCollision(){
if (reportCol == false)
cout <<"NO";
else
cout<<collision;
}
The output of the program is the point where there is a collision in the hash table, thecorresponding bucket number and the number of elements in that bucket.
After trying dubbuging, I come to know that, while calling a constructor you are not emptying the bucket[bucketIndex].
So your Hash Table constructor should be as follow:
HashTable(){ //constructor
int i;
for (i=0;i<10;i++){
bucket[i].setCount(0);
bucket[i].makeEmptyList(); //here we clear for first use
}
collision = "";
reportCol = false;
}
//Creates an empty list
void makeEmptyList(){
listHead = new CellType;
listHead->next = NULL;
}
what you can do is you can get posInsert using
bucket[bucketIndex].end()
so that posInsert-> is defined
and there is no need to
while(posInsert->next != NULL){
posInsert=posInsert->next;
because end() function is doing just that so use end() function

Array of Linked Lists C++

So I thought I understood how to implement an array of pointers but my compiler says otherwise =(. Any help would be appreciated, I feel like I'm close but am missing something crucial.
1.) I have a struct called node declared:.
struct node {
int num;
node *next;
}
2.) I've declared a pointer to an array of pointers like so:
node **arrayOfPointers;
3.) I've then dynamically created the array of pointers by doing this:
arrayOfPointers = new node*[arraySize];
My understanding is at this point, arrayOfPointers is now pointing to an array of x node type, with x being = to arraySize.
4.) But when I want to access the fifth element in arrayOfPointers to check if its next pointer is null, I'm getting a segmentation fault error. Using this:
if (arrayOfPointers[5]->next == NULL)
{
cout << "I'm null" << endl;
}
Does anyone know why this is happening? I was able to assign a value to num by doing: arrayOfPointers[5]->num = 77;
But I'm confused as to why checking the pointer in the struct is causing an error. Also, while we're at it, what would be the proper protoype for passing in arrayOfPointers into a function? Is it still (node **arrayOfPointers) or is it some other thing like (node * &arrayOfPointers)?
Thanks in advance for any tips or pointers (haha) you may have!
Full code (Updated):
/*
* Functions related to separate chain hashing
*/
struct chainNode
{
int value;
chainNode *next;
};
chainNode* CreateNewChainNode (int keyValue)
{
chainNode *newNode;
newNode = new (nothrow) chainNode;
newNode->value = keyValue;
newNode->next = NULL;
return newNode;
}
void InitDynamicArrayList (int tableSize, chainNode **chainListArray)
{
// create dynamic array of pointers
chainListArray = new (nothrow) chainNode*[tableSize];
// allocate each pointer in array
for (int i=0; i < tableSize; i++)
{
chainListArray[i]= CreateNewChainNode(0);
}
return;
}
bool SeparateChainInsert (int keyValue, int hashAddress, chainNode **chainListArray)
{
bool isInserted = false;
chainNode *newNode;
newNode = CreateNewChainNode(keyValue); // create new node
// if memory allocation did not fail, insert new node into hash table
if (newNode != NULL)
{
//if array cell at hash address is empty
if (chainListArray[hashAddress]->next == NULL)
{
// insert new node to front of list, keeping next pointer still set to NULL
chainListArray[hashAddress]->next = newNode;
}
else //else cell is pointing to a list of nodes already
{
// new node's next pointer will point to former front of linked list
newNode->next = chainListArray[hashAddress]->next;
// insert new node to front of list
chainListArray[hashAddress]->next = newNode;
}
isInserted = true;
cout << keyValue << " inserted into chainListArray at index " << hashAddress << endl;
}
return isInserted;
}
/*
* Functions to fill array with random numbers for hashing
*/
void FillNumArray (int randomArray[])
{
int i = 0; // counter for for loop
int randomNum = 0; // randomly generated number
for (i = 0; i < ARRAY_SIZE; i++) // do this for entire array
{
randomNum = GenerateRandomNum(); // get a random number
while(!IsUniqueNum(randomNum, randomArray)) // loops until random number is unique
{
randomNum = GenerateRandomNum();
}
randomArray[i] = randomNum; // insert random number into array
}
return;
}
int GenerateRandomNum ()
{
int num = 0; // randomly generated number
// generate random number between start and end ranges
num = (rand() % END_RANGE) + START_RANGE;
return num;
}
bool IsUniqueNum (int num, int randomArray[])
{
bool isUnique = true; // indicates if number is unique and NOT in array
int index = 0; // array index
//loop until end of array or a zero is found
//(since array elements were initialized to zero)
while ((index < ARRAY_SIZE) && (!randomArray[index] == 0))
{
// if a value in the array matches the num passed in, num is not unique
if (randomArray[index] == num)
{
isUnique = false;
}
index++; // increment index counter
} // end while
return isUnique;
}
/*
*main
*/
int main (int argc, char* argv[])
{
int randomNums[ARRAY_SIZE] = {0}; // initialize array elements to 0
int hashTableSize = 0; // size of hash table to use
chainNode **chainListArray;
bool chainEntry = true; //testing chain hashing
//initialize random seed
srand((unsigned)time(NULL));
FillNumArray(randomNums); // fill randomNums array with random numbers
//test print array
for(int i = 0; i < ARRAY_SIZE; i++)
{
cout << randomNums[i] << endl;
}
//test chain hashing insert
hashTableSize = 19;
int hashAddress = 0;
InitDynamicArrayList(hashTableSize, chainListArray);
//try to hash into hash table
for (int i = 0; i < ARRAY_SIZE; i++)
{
hashAddress = randomNums[i] % hashTableSize;
chainEntry = SeparateChainInsert(randomNums[i], hashAddress, chainListArray);
}
system("pause");
return 0;
}
arrayOfPointers = new node*[arraySize];
That returns a bunch of unallocated pointers. Your top level array is fine, but its elements are still uninitialized pointers, so when you do this:
->next
You invoke undefined behavior. You're dereferencing an uninitialized pointer.
You allocated the array properly, now you need to allocate each pointer, i.e.,
for(int i = 0; i < arraySize; ++i) {
arrayOfPointers[i] = new node;
}
As an aside, I realize that you're learning, but you should realize that you're essentially writing C here. In C++ you have a myriad of wonderful data structures that will handle memory allocation (and, more importantly, deallocation) for you.
Your code is good, but it's about how you declared your InitDynamicArrayList. One way is to use ***chainListArray, or the more C++-like syntax to use references like this:
void InitDynamicArrayList (int tableSize, chainNode **&chainListArray)