Creating a hashtable using vectors of vectors? - c++

I'm currently trying to write a program that creates a hash table, using vectors of vectors for my collision resolution method.
The problem I am facing is that during runtime, a vector of vectors is created, but all of the Entry vectors inside remain of size 0. I know my put functions are faulty but I don't know where/why.
This is my first time creating a hash table and I'd appreciate any assistance in what the problem might be. My goal is to create a vector of Entry vectors, and each Entry has its associated key and value. After finding the hash value for a new Entry key, it should check the Entry vectors' key values to see if the key already exists. If it does, it updates that key's value.
This is a segment of table.cpp:
Table::Table(unsigned int maximumEntries) : maxEntries(100){
this->maxEntries = maximumEntries;
this->Tsize = 2*maxEntries;
}
Table::Table(unsigned int entries, std::istream& input){ //do not input more than the specified number of entries.
this->maxEntries = entries;
this->Tsize = 2*maxEntries;
std::string line = "";
int numEntries = 0;
getline(input, line);
while(numEntries<maxEntries || input.eof()){ // reads to entries or end of file
int key;
std::string strData = "";
convertToValues(key, strData, line);
put(key, strData); // adds each of the values to the tab;e
numEntries++;
getline(input,line);
}
}
void Table::put(unsigned int key, std::string data){
Entry newEntryObj(key,data); //create a new Entry obj
put(newEntryObj);
}
void Table::put(Entry e){ // creating the hash table
assert(currNumEntries < maxEntries);
int hash = (e.get_key() % Tsize);
Entry newEntry = Entry(e.get_key(), e.get_data());
for(int i = 0; i < hashtable[hash].size(); i++){
if (e.get_key() == hashtable[hash][i].get_key()){
hashtable[hash][i].set_data(e.get_data());
}
else{
hashtable[hash].push_back(newEntry); // IF KEY DOESNT EXIST, ADD TO THE VECTOR
}
}
}
This is Table.h
#ifndef table_h
#define table_h
#include "entry.h"
#include <string>
#include <istream>
#include <fstream>
#include <iostream>
#include <vector>
class Table{
public:
Table(unsigned int max_entries = 100); //Builds empty table with maxEntry value
Table(unsigned int entries, std::istream& input); //Builds table designed to hold number of entires
void put(unsigned int key, std::string data); //creates a new Entry to put in
void put(Entry e); //puts COPY of entry into the table
std::string get(unsigned int key) const; //returns string associated w/ param, "" if no entry exists
bool remove(unsigned int key); //removes Entry containing the given key
friend std::ostream& operator<< (std::ostream& out, const Table& t); //overloads << operator to PRINT the table.
int getSize();
std::vector<std::vector<Entry>> getHashtable();
private:
std::vector<std::vector<Entry>> hashtable; //vector of vectors
int Tsize; //size of table equal to twice the max number of entries
int maxEntries;
int currNumEntries;
#endif /* table_h */
};
and Entry.h:
#include <string>
#include <iosfwd>
class Entry {
public:
// constructor
Entry(unsigned int key = 0, std::string data = "");
// access and mutator functions
unsigned int get_key() const;
std::string get_data() const;
static unsigned int access_count();
void set_key(unsigned int k);
void set_data(std::string d);
// operator conversion function simplifies comparisons
operator unsigned int () const;
// input and output friends
friend std::istream& operator>>
(std::istream& inp, Entry &e);
friend std::ostream& operator<<
(std::ostream& out, Entry &e);
private:
unsigned int key;
std::string data;
static unsigned int accesses;
};

There are various problems with your code, but the answer for your question would be this:
In
void Table::put(Entry e){ // creating the hash table
Have a look at the loop.
for(int i = 0; i < hashtable[hash].size(); i++){
Now, hashtable[hash] is a vector. But initially it doesn't have any elements. So hashtable[hash].size() is 0. So you don't enter the loop.
On top of this, trying to access hashtable[hash] in the first place results in undefined behaviour due to hashtable not being properly resized to Tsize. Try this in your constructor(s):
this->maxEntries = maximumEntries;
this->Tsize = 2*maxEntries;
this->hashtable.resize(this->Tsize);
EDIT:
It would be easier for you to understand if you use std::vector::at function instead of std::vector::operator[]. For example:
void Table::put(Entry e){ // creating the hash table
assert(currNumEntries < maxEntries);
int hash = (e.get_key() % Tsize);
Entry newEntry = Entry(e.get_key(), e.get_data());
for(int i = 0; i < hashtabl.at(hash).size(); i++){
if (e.get_key() == hashtable.at(hash).at(i).get_key()){
hashtable.at(hash).at(i).set_data(e.get_data());
}
else{
hashtable.at(hash).push_back(newEntry); // IF KEY DOESNT EXIST, ADD TO THE VECTOR
}
}
}
Without resizing hashtable, this code would throw an out_of_range exception when you try to do hashtable.at(hash) the first time.
P.S. None of this is tested.

Related

Resizing and copying elements in a Hashtable Array

Right now I have struct IndexLocation that defines a page number pageNum and a word number wordNum on a page, and a struct IndexRecord that consists of a specific word and its locations that is a vector of IndexLocations.
In IndexRecord.h:
struct IndexLocation {
int pageNum; //1 = first page
int wordNum; //1 = first word on page
IndexLocation(int pageNumber, int wordNumber);
};
struct IndexRecord {
//indexed word
std::string word;
//list of locations it appears
std::vector<IndexLocation> locations;
IndexRecord();
//Constructor - make a new index record with no locations
explicit IndexRecord(const std::string& wordVal);
//Add an IndexLocation to the record
// Does NOT check for duplicate records
void addLocation(const IndexLocation& loc);
//Returns true if the record contains the indicated location
bool hasLocation(const IndexLocation& loc) const;
};
Then, I have a Hash Map IndexMap which stores values of IndexRecords using the word as the key. Within one, an IndexRecord may be stored at bucket 3, have a word apple, and have locations be 1,2 and 2,5.
#include "IndexRecord.h"
class IndexMap
{
private:
int numBuckets;
int keyCount;
IndexRecord* buckets;
//handle resizing the hash table into a new array with twice as many buckets
void grow();
//Get the location this key should be placed at.
// Will either containt IndexRecord with that key or an empty IndexRecord
unsigned int getLocationFor(const std::string& key) const;
public:
//Construct HashMap with given number of buckets
IndexMap(int startingBuckets = 10);
//Destructor
~IndexMap();
//Copy constructor and assignment operators
IndexMap(const IndexMap &other);
IndexMap& operator=(const IndexMap& other);
//Returns true of indicated key is in the map
bool contains(const std::string& key) const;
//Add indicated location to the map.
// If the key does not exist in the map, add an IndexRecord for it
// If the key does exist, add a Location to its IndexRecord
void add(const std::string& key, int pageNumber, int wordNumber);
void IndexMap::add2(const std::string &key, IndexLocation location)
};
Furthermore, in IndexMap.cpp, I have the add function, the add2 function, and grow function.
void IndexMap::add(const std::string &key, int pageNumber, int wordNumber) {
if (keyCount == numBuckets)
grow();
int bucketNumber = getLocationFor(key);
if (this->contains(key) == true)
buckets[bucketNumber].addLocation(IndexLocation(pageNumber, wordNumber));
else if (this->contains(key) == false) {
while (buckets[bucketNumber].word != "?") {
if (bucketNumber < numBuckets)
bucketNumber++;
else if (bucketNumber == numBuckets)
bucketNumber = 0;
}
string foo = key;
buckets[bucketNumber].word = key;
buckets[bucketNumber].addLocation(IndexLocation(pageNumber, wordNumber));
keyCount++;
}
return;
}
void IndexMap::add2(const std::string &key, IndexLocation location) {
if (keyCount > 0.7 * numBuckets)
grow();
int bucketNumber = getLocationFor(key);
if (this->contains(key) == true)
buckets[bucketNumber].addLocation(location);
else if (this->contains(key) == false) {
while (buckets[bucketNumber].word != "?") {
if (bucketNumber < numBuckets)
bucketNumber++;
else if (bucketNumber == numBuckets)
bucketNumber = 0;
}
string foo = key;
buckets[bucketNumber].word = key;
buckets[bucketNumber].addLocation(location);
keyCount++;
}
return;
}
void IndexMap::grow() {
IndexRecord* oldTable = buckets;
int oldSize = numBuckets;
numBuckets = numBuckets * 2 + 1;
IndexRecord* newArray = new IndexRecord[numBuckets];
keyCount = 0;
for (int i = 0; i < oldSize; i++) {
if (oldTable[i].word != "?") {
this->add2(oldTable[i].word, oldTable[i].locations[i]); // having trouble here
}
}
buckets = newArray;
delete [] oldTable;
}
My issue begins here. I believe my basic logic is sound: keep the old array around with a pointer, make a new, larger one and reset the size of the HashTable, iterate through the old array and add anything it contains back into the hashtable with the add function, and then delete the old array, but this just results in a segmentation fault (SIGSEGV) once keyCount hits numBuckets. (The reason I have an add2 function which is almost identical to my add function and use it in grow is because I didn't know how to modify get a pageNumber and a wordNumber for the this->add2 line within grow; the assignment specifications say we cannot modify the original add function's header).
You never assign to buckets in grow, so the newly enlarged array is not accessible by your other functions.

Reading in from CSV to template vector

I've been having difficulty all week trying to get one of my projects up and running. I'm required to read in from a 10,000 line CSV file from a meteorological database and output certain fields with a few demonstrations (Max blah blah).
I'm meant to design this using a self made template vector and aren't allowed access to the STL libraries.
As i'm just learning and this has been a few weeks in the making I think i've over complicated it for myself and now i'm stuck not knowing how to progress.
The main issue here is my confusion of how i'm going to not only read into a struct and parse the information to only read in what i need but then transform that data into the template vector.
Anyway, without further ado, here is my source code:
#include <iostream>
#include <fstream>
#include "Date.h"
#include "Time.h"
#include "Vector.h"
typedef struct {
Date d;
Time t;
float speed;
} WindLogType;
int main()
{
Vector<WindLogType> windlog;
std::string temp;
std::ifstream inputFile("MetData-31-3.csv");
int timeIndex, windSpeedIndex;
//18 Elements per line
//Need the elements at index 0 & 10
while(!inputFile.eof())
{
getline(inputFile, WindLogType.d,' ');
getline(inputFile, WindLogType.t,',');
for(int i = 0; i < 9; i++)
{
getline(inputFile, temp, ',');
}
getline(inputFile, WindLogType.speed);
windlog.push_back(WindLogType);
}
return 0;
}
Vector.h
#ifndef VECTOR_H
#define VECTOR_H
template <class elemType>
class Vector
{
public:
bool isEmpty() const;
bool isFull() const;
int getLength() const;
int getMaxSize() const;
void sort();
// T* WindLogType;
Vector(int nMaxSize = 64); //Default constructor, array size of 64.
Vector(const Vector&); //Copy constructor
~Vector(); //Destructor
void push_back(int);
int operator[](int);
int at(int i);
private:
int maxSize, length;
elemType* anArray;
void alloc_new();
};
template <class elemType>
bool Vector<elemType>::isEmpty() const
{
return (length == 0);
}
template <class elemType>
bool Vector<elemType>::isFull() const
{
return (length == maxSize);
}
template <class elemType>
int Vector<elemType>::getLength() const
{
return length;
}
template <class elemType>
int Vector<elemType>::getMaxSize() const
{
return maxSize;
}
//Constructor that takes the max size of vector
template <class elemType>
Vector<elemType>::Vector(int nMaxSize)
{
maxSize = nMaxSize;
length = 0;
anArray = new elemType[maxSize];
}
//Destructor
template <class elemType>
Vector<elemType>::~Vector()
{
delete[] anArray;
}
//Sort function
template <class elemType>
void Vector<elemType>::sort()
{
int i, j;
int min;
elemType temp;
for(i = 0; i < length; i++)
{
min = i;
for(j = i+1; j<length; ++j)
{
if(anArray[j] < anArray[min])
min = j;
}
temp = anArray[i];
anArray[i] = anArray[min];
anArray[min] = temp;
}
}
//Check if vector is full, if not add the item to the vector
template <class elemType>
void Vector<elemType>::push_back(int i)
{
if(length+1 > maxSize)
alloc_new();
anArray[length]=i;
length++;
}
template <class elemType>
int Vector<elemType>::operator[](int i)
{
return anArray[i];
}
//Return the vector at position 'i'
template <class elemType>
int Vector<elemType>::at(int i)
{
if(i < length)
return anArray[i];
throw 10;
}
//If the vector is about to get full, create a new temporary
//vector of double size and copy the contents across.
template <class elemType>
void Vector<elemType>::alloc_new()
{
maxSize = length*2;
int* tmp=new int[maxSize];
for(int i = 0; i < length; i++)
tmp[i]= anArray[i];
delete[] anArray;
anArray = tmp;
}
/**
//Copy Constructor, takes a reference to a vector and copies
//the values across to a new vector.
Vector::Vector(const Vector& v)
{
maxSize= v.maxSize;
length = v.length;
anArray = new int[maxSize];
for(int i=0; i<v.length; i++)
{
anArray[i] = v.anArray[i];
}
}**/
#endif
There are some things in the vector class that are completely unnecessary, they were just from a bit of practice.
Here is a sample of the CSV file:
WAST,DP,Dta,Dts,EV,QFE,QFF,QNH,RF,RH,S,SR,ST1,ST2,ST3,ST4,Sx,T
31/03/2016 9:00,14.6,175,17,0,1013.4,1016.9,1017,0,68.2,6,512,22.7,24.1,25.5,26.1,8,20.74
31/03/2016 9:10,14.6,194,22,0.1,1013.4,1016.9,1017,0,67.2,5,565,22.7,24.1,25.5,26.1,8,20.97
31/03/2016 9:20,14.8,198,30,0.1,1013.4,1016.9,1017,0,68.2,5,574,22.7,24,25.5,26.1,8,20.92
31/03/2016 9:30,15.1,215,27,0,1013.4,1016.8,1017,0,66.6,5,623,22.6,24,25.5,26.1,8,21.63
I require the elements in the WAST column and the S column, as WAST contains the date and S contains windspeed.
By no means do i want people to give me just the solution, I need to understand how i would read in and parse this data utilizing the struct & template vector.
There's no real "error" per se, I just lack the fundamental understanding of where to go next.
Any help would be greatly appreciated!
Thankyou
One easy and efficient way would be to have a vector per column, aka column-oriented storage. Column-oriented storage minimizes space requirements and allows you to easily apply linear algebra algorithms (including SIMD optimized), whithout having to pick individual struct members (as would be the case with row-oriented storage).
You can then parse each line using fscanf, each value into a separate variable. And then push_back the variables into the corresponding columns.
As fscanf does not parse dates, you would need to extract the date string into a char[64] and then parse that into struct tm which then can be converted to time_t.
The above assumes that you know the layout of the CSV and the types of the columns.
Pseudo-code:
vector<time_t> timestamps;
vector<double> wind_speeds;
for(;;) {
// Parse the CSV line into variables.
char date_str[64 + 1];
double wind_speed;
fscanf(file, "%64[^,], ..., %lf,...", date_str, ..., &wind_speed, ...);
time_t timestamp = parse_date(date_str);
// Store the parsed variables into the vectors.
timestamps.push_back(timestamp);
wind_speed.push_back(wind_speed);
}
double average_wind_speed = std::accumulate(wind_speeds.begin(), wind_speeds.end(), 0.) / wind_speeds.size();
.csv files are a representation of a table, delimited by "," (coma) to change cell and ";" (semi-column) for the end of the line.
EDIT: In the case of the ; does not work, the usual "\n" works. The below algorithm can easily be applied with the "\n"
In fact, there are no need to create a complicate program.. just if and while are enough. Here is an idea on how to proceed, I hope it can help you to understand a method, as it is what you are requesting.
1- Read every character (store it in a char) and add it to a string (the string += the char).
1.1- If the character is a ",", increase a counter and then you compare the string to the value desired (Here WAST).
1.1.2- If the string equales the desired value, save the counter in an integer (It allows knowing the position of the column you want.)
1.1.2- If not, continue until the end of the line ";" (which means in your case the desired column does not exist) or until you have a match (your string == "WAST")
NB: You can do it with different counters so that you know WAST position, S position etc.
Then:
Initialise a new counter
2- Compare the new counter to the saved value in 1.1.2.
2.1.1- If the values match, store the char contents in a string until you have a new coma.
2.1.2- If not, read every char until you find a new coma. Then increase your counter and restart from 2.
3- Continue to read the characters until you find a semi-column ";", and restart at step 2, until you finish to read the file.
To summarise, in this case the first step it to read every columns names, until finding the one you want or arriving at the end of the line. Store its position (noticed by the "," (comas)) thanks to a counter1.
Read every other line and storing the string in the desired column position (noticed by the "," (comas)) with counter1 compared to a new counter.
It may not be the most powerful algorithm by far, but it works and is easy to understand.
I tried to avoid writting it in C so that you can understand the steps without seeing the programmed solution. I hope it suits you.

Seg fault at the specified line: Hash table insert/search functions

I am getting a EXC_BAD_ACCESS error. I'm trying to insert words into a hash table and am using separate chaining. Here is my class Hash.h that has, within it, class wordData to store the word and pageNumbers the word appears on:
class Hash
{
private:
class wordData
{
public:
string word;
vector < int >pageNum;
wordData *nextWord;
// Initializing the next pointer to null in the constructor
wordData()
{
nextWord = nullptr;
}
// Constructor that accepts a word and pointer to next word
wordData(string word, wordData * nextWord)
{
this->word = word;
this->nextWord = nextWord;
}
// Getting and setting the next linked word
wordData *getNext()
{
return nextWord; //-------------------> BAD_ACCESS ERROR
}
void setNext(wordData * newInfo)
{
nextWord = newInfo;
}
// Setting info for the word node.
void setInfo(string & w, int pNum)
{
this->word = w;
this->pageNum.push_back(pNum);
}
// ******************* Gives a thread-bad access error************************
string getWord()
{
return word;
}
void addPageNums(int x)
{
this->pageNum.push_back(x);
}
};
private:
// Head to point to the head node of the linked list for a particular word
wordData ** head;
int size;
int *bucketSize;
int totalElements;
public:
// Class hash function functions
Hash();
// Function to calculate bucket number based on string passed
int hashFunction(string key);
// search if word is present
bool Search(string);
// Insert word
void Insert(string, int);
int bucketNumberOfElements(int index);
};
#endif /* Hash_h */
After running the debugger I found the value of nextWord to be 0x00000000000 which I understand is not the same as nullptr but is due to a NULL assignment although I can't seem to figure out where and why. I haven't included the Hash.cpp file because I think there is an obvious pointer manipulation that I'm doing wrong in the .h file.
Any help will be appreciated. Thanks.

Converting String to int using stoi inside a function

I've tried multiple methods to make the stoi (or other functions) work for my needs (converting a given string to an integer for the purpose of making a key for a hash table). Specifically, I would like to ask why stoi does not like this conversion. I get the error "std::invalid_argument at memory location 0x0034F41C." I have looked around and couldn't find what I am doing wrong.
#include <iostream>
#include <cstdlib>
#include <string>
using namespace std;
enum EntryType { Legitimate, Empty, Deleted }; //creat new data type that holds node status
struct HashNode //create new struct with (hash node)
{
string element; // item inside node
enum EntryType info; //status
};
struct HashTable //Creates a new struct (hash table)
{
int size; //defines size of table
HashNode *table; //creates pointer to table
};
int HashFun1(std::string skey, int size) //first hash function
{
std::string key2 = skey;
int key = stoi(key2);
return key % size; // will return the inputed value
}
//int HashFunc2(string key, int size) //second hash function
//{
// return(key * size - 1) % size; //needs to convert between string and int
//}
int main()
{
cout<<HashFun1("t", 2);
}

Hash table(linear probing)

I am making a hash table using linear probing and i have to resize the array when ever the load factor i.e (no. of elements entered in hashtable)/(size of hashtable), becomes greater than 0.5, i have to resize the array.I am doing the resizing by initializing a pointer in a class which contains functions related to hashtable.I am putting the pointer equal to an array of a struct (struct only contains a string) of size 100.every time load factor becomes greater than 0.5, i resize the array by making a new array of double the previous size and point the pointer to the new array.I also have an int which stores current size of array and which is updated with every instance in which resize function is used.The number of elements inserted are incremented with every call to insert function.Am I doing this correctly?Below is my code
#include <cstring>
#include <vector>
#include <math.h>
#include <iomanip>
using namespace std;
int power(int a,int b)
{
for (int i=0;i<b;i++)
{
a*=a;
}
return a;
};
struct Bucket
{
string word;
};
const int size=100;
class LProbing
{
private:
int a; //a constant which is used in hashing
int cursize; //current size of hash table
Bucket *Table; //pointer to array of struct
int loadfactor; //ratio of number of elements entered over size of hashtable
int n; //number of elements entered
Bucket table[size]; //array of structs
public:
LProbing(int A); //constant is decided by user
void resize();
void insert(string word);
void Lookup(string word);
};
LProbing::LProbing(int A)
{
cursize=size;
a=A;
Table=table;
loadfactor=0; //initially loadfactor is 0 as number of elements entered are 0
n=0;
}
void LProbing::resize()
{
cout<<"resize"<<endl;
loadfactor=n/cursize; //ensuring if resize needs to be done
if (loadfactor<=0.5)
{
return;
}
const int s=2*cursize;
Bucket PTable[s];
for (int i=0;i<cursize;i++)
{
if (Table[i].word.empty())
continue;
//rehashing the word onto the new array
string w=Table[i].word;
int key=0;
for (int j=0;j<w.size();j++)
{
unsigned char b=(unsigned char)w[j];
key+=(int)power(a,i)*b;
}
key=key%(2*cursize);
PTable[key].word=w; //entering the word in the new array
}
Table=PTable; //putting pointer equal to new array
cursize=2*cursize; //doubling the current size of array
}
void LProbing::insert(string word)
{
cout<<"1"<<endl;
n++; //incrementing the number of elements entered with every call to insert
//if loadfactor is greater than 0.5, resize array
loadfactor=n/cursize;
if (loadfactor>0.5)
{
resize();
}
//hashing the word
int k=0;
for (int i=0;i<word.size();i++)
{
unsigned char b=(unsigned char)word[i];
int c=(int)((power(a,i))*b);
k+=c;
cout<<c<<endl;
}
int key=0;
key=k%cursize;
cout<<key<<endl;
//if the respective key index is empty enter the word in that slot
if (Table[key].word.empty()==1)
{
cout<<"initial empty slot"<<endl;
Table[key].word=word;
}
else //otherwise enter in the next slot
{
//searching array for empty slot
while (Table[key].word.empty()==0)
{
k++;
key=k%cursize;
}
//when empty slot found,entering the word in that bucket
Table[key].word=word;
cout<<"word entered"<<endl;
}
}
#include "Linear Probing.cpp"
#include <fstream>
using namespace std;
int main()
{
LProbing H(35);
ifstream fin;
fin.open("dict.txt");
vector<string> D;
string d;
while (getline(fin,d))
{
if (!d.empty())
{
D.push_back(d);
}
}
fin.close();
for (int i=0;i<D.size();i++)
{
H.insert(D[i]);
}
system("PAUSE");
return 0;
}
You may find it helpful somewhere:
http://www.cs.rmit.edu.au/online/blackboard/chapter/05/documents/contribute/chapter/05/linear-probing.html
You are dealing with big numbers and variable "key" is overflowing in:
key += (int)power(a,i)*b
It looks like loadfactor is calculated as int/int so it will stay 0 until it reaches 1. Try casting the inputs to the division into floats.