Trying to implement a hash table with linear probing for a project but I am running into a few issues, where I think one of them is the main culprit.
For starters, after compiling the code, if I were to run the program 10 times in a row, I would experience a segmentation fault: 11 around 2/3 of the time.
When the code does actually run, it seems to "mostly" work. indicies 9500-10000 are perfect with all slots filled. But when continuing down(9000-9500), more than 10 NULL spaces are seen and there are some slots filled with bogus values, ie. value > 100,000.
I am using a dataset of 10,000 integers from a csv file all with values < 100,000. I was going to try to debug this using GDB and core however my computer isn't too pleased with my installing it at the moment.
#ifndef HASHLINEAR_HPP
#define HASHLINEAR_HPP
struct node{
int key;
};
class HashLinear{
struct node** table;
int tableSize;
int numCollisions = 0;
public:
HashLinear(int bsize);
void insert(int key);
unsigned int hashFunction(int key);
int search(int key);
int getCollisions();
void printTable();
};
#endif
#include "hashlinear.hpp"
#include <iostream>
using namespace std;
HashLinear::HashLinear(int bsize){
this->tableSize = bsize;
table = new node*[tableSize];
for(int i = 0; i < tableSize; i++){
table[i] = NULL;
}
}
int HashLinear::getCollisions(){
return numCollisions;
}
unsigned int HashLinear::hashFunction(int key){
return key % tableSize;
}
void HashLinear::insert(int key){
node* newNode = new node;
newNode->key = key;
int index = hashFunction(key);
while(table[index] != NULL && table[index]->key != key){
numCollisions++;
index = (index + 1) % tableSize;
}
table[index] = newNode;
}
int HashLinear::search(int key){
int value = hashFunction(key);
int num = 0;
while(table[value] != NULL){
num = 0;
if(num++ > tableSize){
break;
}
if(table[value]->key == key){
return value;
}
value++;
value %= tableSize;
}
return -1;
}
void HashLinear::printTable(){
for(int i = 0; i < tableSize; i++){
cout << i << " || ";
if(table[i] == NULL){
cout << "NULL" << endl;
}
else{
cout << table[i]->key << endl;
}
}
}
#include "hashlinear.hpp"
#include <iostream>
#include <fstream>
#include <sstream>
#include <time.h>
#include <stdlib.h>
#include <chrono>
using namespace std;
int main(){
//******Read in data******//
int testData[10000];
float insertTime[100];
float searchTime[100];
int index = 0;
string line, temp, word;
ifstream inputFile;
inputFile.open("dataSetA-updatedhashlinear.csv");
if(inputFile.fail()){
cout << "Could not open data." << endl;
return -1;
}
else{
while(inputFile >> temp){
getline(inputFile, temp);
stringstream inStream(temp);
while(getline(inStream, word, ',')){
testData[index] = stoi(word);
index++;
}
}
inputFile.close();
}
//******Read in data******//
//cout << "Printing random data in range of 0 ~ 10: " << testData[rand() % 10 + 0] << endl;
//******Insert/Search data in Linked List******//
HashLinear table(10009);
int hashIndex = 0;
int insertTimeIndex = 0;
int searchTimeIndex = 0;
int num = 0;
int upperIndex = 99;
while(hashIndex < 10009){
//Block for 100 insertions
auto insertionStart = chrono::steady_clock::now();//Insert time start
for(int i = hashIndex; i < upperIndex; i++){ //Keep track of current index as well as an upper index to control amount of inserts
table.insert(testData[i]);
hashIndex++;
}
auto insertionEnd = chrono::steady_clock::now();
insertTime[insertTimeIndex] = chrono::duration_cast<chrono::microseconds>(insertionEnd - insertionStart).count() / 100.0;//Insert time end
insertTimeIndex++;
//Block for 100 insertions
//Block for 100 searches
num = 0;
auto searchStart = chrono::steady_clock::now();//Search time start
while(num < 100){ //Do 100 random searches from 0 index to upperindex
srand((unsigned)time(0));
int searchNode = table.search(testData[rand() % upperIndex + 0]);
num++;
}
auto searchEnd = chrono::steady_clock::now();
searchTime[searchTimeIndex] = chrono::duration_cast<chrono::microseconds>(searchEnd - searchStart).count() / 100.0;//Search time end
searchTimeIndex++;
//Block for 100 searches
upperIndex += 100;
}
//******Insert/Search data in Linked List******//
//******TESTING******//
table.printTable();
cout << "Search time: " << searchTime[20] << endl;
cout << "Insert time: " << insertTime[20] << endl;
cout << "Collisons: " << table.getCollisions() << endl;
int testIndex = table.search(34262);
cout << "Index of 34262: " << testIndex << endl;
//******TESTING******//
}
So after some more debugging I figured out I am just slow. I was iterating 100 times past the end of the testData array creating bogus values and making the hash table not fill correctly.
I'm trying to implement a Boyer-Moore string search algorithm. The search algorithm itself seems to work fine, up until a point. It prints out all occurrences until it reaches around the 3300 character area, then it does not search any further.
I am unsure if this is to do with the text file being too big to fit into my string or something entirely different. When I try and print the string holding the text file, it cuts off the first 185122 characters as well. For reference, the text file is Lord of the Rings: Fellowship of the Ring - it is 1016844 characters long.
Here is my code for reference:
#include <fstream>
#include <iostream>
#include <algorithm>
#include <vector>
#include <chrono>
using namespace std;
# define number_chars 256
typedef std::chrono::steady_clock clocktime;
void boyer_moore(string text, string pattern, int textlength, int patlength) {
clocktime::time_point start = clocktime::now();
vector<int> indexes;
int chars[number_chars];
for (int i = 0; i < number_chars; i++) {
chars[i] = -1;
}
for (int i = 0; i < patlength; i++) {
chars[(int)pattern[i]] = i;
}
int shift = 0;
while (shift <= (textlength - patlength)) {
int j = patlength - 1;
while (j >= 0 && pattern[j] == text[shift + j]) {
j--;
}
if (j < 0) {
indexes.push_back(shift);
if (shift + patlength < textlength) {
shift += patlength - chars[text[shift + patlength]];
}
else {
shift += 1;
}
}
else {
shift += max(1, j - chars[text[shift + j]]);
}
}
clocktime::time_point end = clocktime::now();
auto time_taken = chrono::duration_cast<chrono::milliseconds>(end - start).count();
for (int in : indexes) {
cout << in << endl;
}
}
int main() {
ifstream myFile;
//https://www.kaggle.com/ashishsinhaiitr/lord-of-the-rings-text/version/1#01%20-%20The%20Fellowship%20Of%20The%20Ring.txt
myFile.open("lotr.txt");
if (!myFile) {
cout << "no text file found";
}
string text((istreambuf_iterator<char>(myFile)), (istreambuf_iterator<char>()));
cout << text;
string pattern;
cin >> pattern;
int n = text.size();
int m = pattern.size();
boyer_moore(text, pattern, n, m);
}
I have tried to do some researching about what could be the cause but couldn't find anyone with this particular issue. Would appreciate any nudges in the right direction.
#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <string>
#include <iomanip>
using namespace std;
int main()
{
string sentence = "some random sentence";
int i = 0; //runs through the bigger string
int j = 0; //runs through the smaller string
int k = 0; //variable to mark the position where the string starts being equal in order to delete it using substring
string remove = "random";
int a = sentence.size();
int b = remove.size();
while (i < a)
{
if (sentence[i] == remove[j])
{
if (b == j - 1)
{
cout << sentence.substr(0, k) << sentence.substr(i, (a - 1));
break;
}
else
{
i++;
j++;
}
}
else
{
i++;
j = 0;
k++;
}
}
return 1;
}
I want to remove the word random from the bigger string and print it out but when I run the code, it does not return anything. What's missing?
I already tried putting a break right below de "cout", but it does not work.
Thank you :)
As b == 6, j would have to be 7 in order for b == j-1 to become true. But remove[6] is the terminating \0 of the random string, so j can never grow beyond 6.
Here is the code I edited
if (b-1 == j)
{
cout << sentence.substr(0, k) << sentence.substr(i+2, (a - 1));
break;
}
This is assuming, you have spaces between the words.
I have a vector full of monster objects which are initialized onto a 10X10 map which works. I am now playing with some code to prevent monsters being spawned on the same map co-ordinate. when i run the code it cuts and brings up "vector subscript out of range" and i have no idea why. Any help would be great.
main function
#include "stdafx.h"
#include "character.h"
#include "monster.h"
#include "player.h"
#include <iostream>
#include <ctime>
#include <cstdlib>
#include <vector>
using namespace std;
vector<monster*> monVec;
vector<int> monx;
vector<int> mony;
player player1;
bool collision();
void initialise();
int main(){
initialise();
player1.moveChar(3, 6);
bool temp;
temp = collision();
system("pause");
return 0;
}
initialize function
void initialize()
{
srand(time(NULL));
for (int n = 0; n < 10; n++)
{
int inx = rand() % 9;
int iny = rand() % 9;
if (n == 0){
monx.push_back(inx);
mony.push_back(iny);
}
for (int i = 0; i < n; i++)
{
-------->if (inx != monx[i] && iny != mony[i]){
monx.push_back(inx);
mony.push_back(iny);
}
else n--;
}
monVec.push_back(new monster());
monVec[n]->moveChar(inx, iny);
cout << endl << inx << ", " << iny << endl;
}
}
cout is just to check if its working once it runs and arrow indicates problem line.
thanks
In your initialize
you do the following
<for 10 times>
<when first time, add one item to the x,y vectors>
<access the up to 10nth element of the x,y vectors> //But vectors are only guaranteed to have at least one element each
<maybe add one item to the x,y vectors>
Problem is already that there is a path where there are not enough elements in your vectors. Plus the mistake about assignment and comparison in your if like #Michael Waltz already mentioned.
void initialize()
{
srand(time(NULL));
for (int n = 0; n < 10; n++)
{
int inx = rand() % 9;
int iny = rand() % 9;
if (n = 0){ //<<--------------------------- replace (n = 0) by (n == 0)
monx.push_back(inx);
mony.push_back(iny);
}
for (int i = 0; i < 10; i++)
{
// <<<< here monx and mony may contain only
// one element so monx[1] and mony[1] are invalid
if (inx != monx[i] && iny != mony[i]){
monx.push_back(inx);
mony.push_back(iny);
}
else n--;
}
monVec.push_back(new monster());
monVec[n]->moveChar(inx, iny);
cout << endl << inx << ", " << iny << endl;
}
}
I can not figure out where I'm having my problem with my heap sort.
The program takes a filename from the command line, imports the words into a vector then that vector is turned into a vector pair of vector<string,int> where string is the word and int is the count of how many instances of that word are in the file.
The vector<PAIR> is then sorted by either the string (value or v) or by int (key or k). My sorting by Key works fine however sort by value is off. I suspect I'm missing an if statement in max_heapify when sorting by value. Here's my code:
main.cpp
#include <fstream>
#include <iostream>
#include <stdlib.h>
#include <vector>
#include <string>
#include <string.h>
#include <stdio.h>
#include <map>
#include <time.h>
#include "readwords.h"
using namespace std;
readwords wordsinfile;
vector<string> allwords;
bool times;
char *filename;
timespec timestart,timeend;
vector< pair<string,int> > allwords_vp;
timespec diffclock(timespec start, timespec end);
int main ( int argc, char *argv[] ) {
filename = argv[1];
//Lets open the file
ifstream ourfile2(filename);
//Lets get all the words using our requirements
allwords = wordsinfile.getwords(ourfile2);
//Convert all the words from file and count how many times they
//appear. We will store them in a vector<string,int> string
//being the word and int being how many time the word appeared
allwords_vp = wordsinfile.count_vector(allwords);
cout << "HeapSort by Values" << endl;
if (times) {
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, ×tart);
wordsinfile.heapsort(const_cast<char *>("v"));
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &timeend);
cout << "HeapSort by Values ran in "
<< diffclock(timestart,timeend).tv_nsec << " nanosecond or "
<< diffclock(timestart,timeend).tv_nsec/1000 << " millisecond"
<< endl;
} else {
wordsinfile.heapsort(const_cast<char *>("v"));
}
cout << "HeapSort by Keys" << endl;
if (times) {
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, ×tart);
wordsinfile.heapsort(const_cast<char *>("k"));
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &timeend);
cout << "HeapSort by Keys ran in "
<< diffclock(timestart,timeend).tv_nsec << " nanosecond or "
<< diffclock(timestart,timeend).tv_nsec/1000 << " millisecond"
<< endl;
} else {
wordsinfile.heapsort(const_cast<char *>("k"));
}
}
timespec diffclock(timespec start, timespec end) {
timespec temp;
if ((end.tv_nsec-start.tv_nsec)<0) {
temp.tv_sec = end.tv_sec-start.tv_sec-1;
temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
} else {
temp.tv_sec = end.tv_sec-start.tv_sec;
temp.tv_nsec = end.tv_nsec-start.tv_nsec;
}
return temp;
}
readwords.h
#ifndef READWORDS_H
#define READWORDS_H
#include <vector>
#include <map>
#include <utility>
#include <time.h>
typedef std::pair<std::string, int> PAIR;
bool isasciifile(std::istream& file);
class readwords {
private:
std::vector<PAIR> vp;
public:
std::vector<std::string> getwords(std::istream& file);
std::vector<PAIR> count_vector(std::vector<std::string> sv);
void print_vectorpair(std::vector<PAIR> vp);
void print_vector(std::vector<std::string> sv);
void heapsort(char how[]);
void buildmaxheap(std::vector<PAIR> &vp, int heapsize, char how[]);
void max_heapify(std::vector<PAIR> &vp, int i, int heapsize, char how[]);
void swap_pair(PAIR &p1, PAIR &p2);
};
readwords.cpp
#include <fstream>
#include <iostream>
#include <map>
#include "readwords.h"
#include <vector>
#include <string>
#include <utility>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
//using std::vector;
using namespace std;
typedef pair<string, int> PAIR;
// Do we have a ASCII file?
// Lets test the second 10 chars to make sure
// This method is flawed if the file is less than 10 chars
bool isasciifile(std::istream& file) {
int c = 0;
bool foundbin = false;
for(c=0; c < 10;c++) {
if(!isprint(file.get())){
// Looks like we found a non ASCII file, or its empty.
foundbin = true;
}
}
return foundbin;
}
// This is our workhorse as it splits up the words based on our criteria and
// passes them back as a vector of strings.
vector<string> readwords::getwords(std::istream& file) {
char c;
string aword;
vector<string> sv;
//Let go through the file till the end
while(file.good()) {
c = file.get();
if (isalnum(c)) {
//convert any uppercase to lowercase
if(isupper(c)) {
c = (tolower(c));
}
//if its a space lets go onto the next char
if(isspace(c)) { continue; }
//everything looks good lets add the char to our word
aword.insert(aword.end(),c);
} else {
//its not a alphnum or a space so lets skip it
if(!isspace(c)) { continue; }
//reset our string and increment
if (aword != "") {sv.push_back(aword);}
aword = "";
continue;
}
}
return sv;
}
vector<PAIR> readwords::count_vector(vector<string> sv) {
unsigned int i = 0;
int j = 0;
int match = 0;
// cout << "Working with these string: " << endl;
// print_vector(sv);
for (i=0; i < sv.size(); i++) {
// cout << "count of i: " << i << " word is: " << sv.at(i) << endl;
match = 0;
if(readwords::vp.size() == 0) {
readwords::vp.push_back(make_pair(sv.at(i),1)); continue;
}
for (j=readwords::vp.size() - 1; j >= 0; --j) {
if (sv.at(i) == readwords::vp.at(j).first) {
// cout << "Match found with: " << sv.at(i) << endl;;
readwords::vp.at(j).second = readwords::vp.at(j).second + 1;
match = 1;
}
// cout << "Value of j and match: " << j << match << endl;
if ( j == 0 && match == 0) {
// cout << "Match found at end with: " << sv.at(i) << endl;;
readwords::vp.push_back(make_pair(sv.at(i),1));
}
}
}
//Prob need to sort by first data type then second here, prior to sort functions.
//Might not be the best place as the sort functions would alter it, if not here
//then each sort requires to do secondary search
return readwords::vp;
}
void readwords::print_vectorpair(vector<PAIR> vp) {
unsigned int i = 0;
for (i=0; i < vp.size(); ++i) {
cout << vp.at(i).first << " " << vp.at(i).second << endl;
}
}
void readwords::print_vector(vector<string> sv) {
unsigned int i = 0;
for (i=0; i < sv.size(); ++i) {
cout << sv.at(i) << endl;
}
}
void readwords::heapsort(char how[]) {
int heapsize = (readwords::vp.size() - 1);
buildmaxheap(readwords::vp, heapsize, how);
for(int i=(readwords::vp.size() - 1); i >= 0; i--) {
swap(readwords::vp[0],readwords::vp[i]);
heapsize--;
max_heapify(readwords::vp, 0, heapsize, how);
}
print_vectorpair(readwords::vp);
}
void readwords::buildmaxheap(vector<PAIR> &vp, int heapsize, char how[]) {
for(int i=(heapsize/2); i >= 0 ; i--) {
max_heapify(vp, i, heapsize, how);
}
}
void readwords::max_heapify(vector<PAIR> &vp, int i, int heapsize, char how[]) {
int left = ( 2 * i ) + 1;
int right = left + 1;
int largest;
if(!strcmp(how,"v")) {
if(left <= heapsize && vp.at(left).second >= vp.at(i).second ) {
if( vp.at(left).first >= vp.at(i).first ) {
largest = left;
} else {
largest = i;
}
} else {
largest = i;
}
if(right <= heapsize && vp.at(right).second >= vp.at(largest).second) {
if( vp.at(right).first >= vp.at(largest).first) {
largest = right;
}
}
}
if(!strcmp(how,"k")) {
if(left <= heapsize && vp.at(left).first > vp.at(i).first) {
largest = left;
} else {
largest = i;
}
if(right <= heapsize && vp.at(right).first > vp.at(largest).first) {
largest = right;
}
}
if(largest != i) {
swap(vp[i], vp[largest]);
max_heapify(vp, largest, heapsize, how);
}
}
The vector is then sorted by either the string (value or v) or by int (key or k).
That description doesn't match the code, sorting with a how parameter of "k" sorts by the first component only, which is the string, and sorting with "v" as how parameter takes both components into account.
I think it's a rather bad idea to pass a char[] to determine the sorting criterion, it should be a comparator function, so you need only one implementation in max_heapify.
My sorting by Key works fine however sort by value is off. I suspect I'm missing an if statement in max_heapify when sorting by value.
The problem is that a heap sort needs a total ordering or it won't sort properly.
Your conditions
if(left <= heapsize && vp.at(left).second >= vp.at(i).second ) {
if( vp.at(left).first >= vp.at(i).first ) {
largest = left;
} else {
largest = i;
}
} else {
largest = i;
}
check whether both components of vp.at(left) (resp. right) are at least as large as the corresponding component of vp.at(i), resulting in the product partial ordering, two general pairs are not comparable, and in that case, your max_heapify doesn't do anything.
Example, for <"a",3>, <"b",2> and <"c",1> in the positions i, left, right, in whichever order, your max_heapify sets largest to i.
If your sorting by "v" is meant to sort based on the int component first, and in case of a tie, take the string component into account, you'd need to distinguish the cases vp.at(left).second > vp.at(i).second and equality (for right too, of course). For example
if(left <= heapsize && vp.at(left).second >= vp.at(i).second ) {
if(vp.at(left).second > vp.at(i).second || vp.at(left).first >= vp.at(i).first ) {
largest = left;
} else {
largest = i;
}
} else {
largest = i;
}
To sort a vector<pair<string, int> > by values, consider adding vector<pair<int, string> >
vector<pair<int, string> > v(orignal.size());
for (int i = 0; i < v.size(); ++i) v[i] = make_pair(original[i].second, original[i].first);
sort(v.begin(), v.end());