Hash function for strings not working on some strings? - c++

Basically my program reads a text file with the following format:
3
chairs
tables
refrigerators
The number on the first line indicates the number of items in the file to read.
Here's my hash function:
int hash(string& item, int n) {
int hashVal = 0;
int len = item.length();
for(int i = 0; i < len; i++)
hashVal = hashVal*37 + item[i];
hashVal %= n;
if(hashVal < 0) hashVal += n;
return hashVal;
}
when my program read the text file above, it was successful. But when I tried another one:
5
sabel
ziyarah
moustache
math
pedobear
The program would freeze. Not a segmentation fault or anything but it would just stop.
Any ideas?
Edit:
int n, tableSize;
myFile >> n;
tableSize = generateTableSize(n);
string item, hashTable[tableSize];
for(int i = 0; i < tableSize; i++)
hashTable[i] = "--";
while(myFile >> item && n!=0) {
int index = hash(item,tableSize);
if(hashTable[index] == "--")
hashTable[index] = item;
else {
int newIndex = rehash(item,tableSize);
while(hashTable[newIndex] != "--") {
newIndex = rehash(item,tableSize);
}
hashTable[newIndex] = item;
}
n--;
}
int rehash(string item, int n) {
return hash(item,n+1);
}

The code freezes because it ends in an endless loop:
int index = hash(item,tableSize);
if(hashTable[index] == "--")
hashTable[index] = item;
else {
int newIndex = rehash(item,tableSize);
while(hashTable[newIndex] != "--") {
newIndex = rehash(item,tableSize);
}
hashTable[newIndex] = item;
}
You continuously recalculate the index, but do not change the input parameters, so the output stays the same, and therefore it is being recalculated again.
In the code above newIndex is calculated, based on the same inputs as index was calculated from using a different calculaton function though, so most likely it will have a different value than the first time, however the new index is also occupied. So we recalculate the newIndex again this time using the same function as before, with the exact same input, which gives the exact same output again. You look up the same index in the hash table, which is still the same value as the last time you did so, so you recalculate again, once again with the same input parameters, giving the same output, which you look up in the hashtable once again, etc.
The reason why you didn't see this with the first 3 lines, is that you did not have a collision (or at least only a single collisison, meaning the newIndex calculated from the rehash function was useful the first time).
The solution is not to increment the table size (since incrementing the table size, will at best lower the chance of collision which in it self can be good, but won't solve your problem entirely), but to either alter the inputs to your functions, so you get a different output, or change the hashtable structure.
I always found Sedgewick's book on algorithms in C++ useful, there is a chapter on hashing it.
Sadly I don't have my copy of Algorithms in C++ at hand, so I cannot tell you how Sedgewick solved it, but I would suggest for the simple educational purpose of solving your problem, starting by simply incrementing the index by 1 until you find a free slot in the hash table.

Related

How to find the power set of a given set without using left shift bit?

I'm trying to figure out how to implement an algorithm to find a power set given a set, but I'm having some trouble. The sets are actually vectors so for example I am given Set<char> set1{ 'a','b','c' };
I would do PowerSet(set1); and I would get all the sets
but if I do Set<char> set2{ 'a','b','c', 'd' };
I would do PowerSet(set2) and I would miss a few of those sets.
Set<Set<char>> PowerSet(const Set<char>& set1)
{
Set<Set<char>> result;
Set<char> temp;
result.insertElement({});
int card = set1.cardinality();
int powSize = pow(2, card);
for (int i = 0; i < powSize; ++i)
{
for (int j = 0; j < card; ++j)
{
if (i % static_cast<int> ((pow(2, j)) + 1))
{
temp.insertElement(set1[j]);
result.insertElement(temp);
}
}
temp.clear();
}
return result;
}
For reference:
cardinality() is a function in my .h where it returns the size of the set.
insertElement() inserts element into the set while duplicates are ignored.
Also the reason why I did temp.insertElement(s[j]) then result.insertElement(temp) is because result is a set of a set and so I needed to create a temporary set to insert the elements into then insert it into result.
clear() is a function that empties the set.
I also have removeElem() which removes that element specified if it exists, otherwise it'll ignore it.
Your if test is nonsense -- it should be something like
if ((i / static_cast<int>(pow(2,j))) % 2)
you also need to move the insertion of temp into result after the inner loop (just before the temp.clear()).
With those changes, this should work as long as pow(2, card) does not overflow an int -- that is up to about card == 30 on most machines.

C++ reading data from txt file and putting it into 2d array

I've got a file with numbers separated by a single space and I want to put them into the 2D array. There are 200 rows and 320 numbers each.
This is my code:
int data[200][320];
int i = 0;
int j = 0;
file.open("./../../../Data_PR2/data.txt", ios::in);
while (file>> data[i][j])
{
if (j == 319) {
j = 0;
i++;
} else
j++;
}
And it kinda works, because first rows are correctly inserted but not all rows.
So what's wrong?
A simpler method is to use / and % instead of if statement:
unsigned int count = 0;
unsigned int row = 0;
unsigned int column = 0;
while (file >> data[row][column])
{
++count;
column = count % 320;
row = count / 320;
}
Maybe a more efficient method is to treat the array as a single dimension array, since all slots are contiguous:
int * p_slot = &data[0][0];
while (file >> *p_slot)
{
++p_slot;
}
There are other methods, such as input iterators.
The above examples do not check for overflow. Overflow checking is left as an exercise for the reader. :-)
Note: This is not an optimization, but a simplification. Format conversion, out of bounds checking and the process of inputting, make optimizing moot. The biggest optimization would be reading bigger blocks into memory, then reading from memory; but for this size, it's not worthwhile.

Pointers, dynamic arrays, and memory leak

I'm trying to create a program which allows for a dynamically allocated array to store some integers, increase the maximum size if need be, and then display both the unsorted and sorted array in that order.
Link to my full code is at the bottom.
The first issue I have is the dynamically allocated array going haywire after the size needs to be increase the first time. The relevant code is below.
while (counter <= arraySize)
{
cout <<"Please enter an integer number. Use 999999 (six 9's) to stop\n";
if (counter == arraySize) //If the counter is equal to the size of the array
{ //the array must be resized
arraySize +=2;
int *temp = new int[arraySize];
for (counter = 0; counter < arraySize; counter++)
{
temp[counter] = arrayPtr[counter];
}
delete [] arrayPtr;
arrayPtr = temp;
counter ++; //the counter has to be reset to it's original position
} //which should be +1 of the end of the old array
cin >> arrayPtr[counter];
if (arrayPtr[counter] == sentinel)
{
cout << "Sentinel Value given, data entry ending.\n";
break;
}
counter ++;
}
This produces the unintended operation where instead of waiting for the sentinel value, it just begins to list the integers in memory past that point (because no bounds checking).
The next issue is that my sorting function refuses to run. I tried testing this on 5 values and the program just crashes upon reaching that particular part of code.
The function is called using
sorting (arrayPtr);
but the function itself looks like this:
void sorting (int *arr)
{
int count = 0, countTwo = 0, tempVal;
for (count = 0; arr[count] != 999999; count++) //I figured arr[count] != 999999 is easier and looks better
{ //A bunch of if statements
for (countTwo = 0; arr[countTwo] != 99999; countTwo++)
{
if (arr[countTwo] > arr[countTwo+1])
{
tempVal = arr[countTwo];
arr[countTwo] = arr[countTwo+1];
arr[countTwo+1] = tempVal;
}
}
}
}
Any help on this issue is appreciated.
Link to my source code:
http://www.mediafire.com/file/w08su2hap57fkwo/Lab1_2336.cpp
Due to community feedback, this link will remain active as long as possible.
The link below is to my corrected source code. It is annotated in order to better highlight the mistakes I made and the answers to fixing them.
http://www.mediafire.com/file/1z7hd4w8smnwn29/Lab1_2336_corrected.cpp
The first problem I can spot in your code is in the for loop where counter goes from 0 to arraySize-1, the last two iteration of the loop will access arrrayPtr out of bounds.
Next, at the end of the if (counter == arraySize) there is a counter++; This is not required since at this moment counter is already indexing the array out of bound.
Finally in your sorting function the inner loop looks for the wrong value (99999 instead of 999999), so it never stop and goes out of bounds. To prevent this kind of error, you should define your sentinel as a const in an unnamed namespace and use it through the code instead of typing 999999 (which is error prone...).

Getting from top left to bottom right element of array using dynamic programming

I'm having a hard time thinking of a way to make a solution for a dynamic programming problem. Basically, I have to get from the top left to bottom right element in a NxN array, being able to move only down or right, but I should move to the bigger element and sum it in a variable (get highest score by moving only right and down). F.e., if I have this matrix:
0 1 1
0 4 2
1 1 1
It should move 0-> 1 -> 4 -> 2 -> 1 and print out 8.
I've read about dynamic optimizing for a long time now and still can't get to solve this. Would appreciate if anybody could help me.
Thanks in advance!
Edit: Thanks #sestus ! I've managed to solve the problem, however the solution is slow and I have to optimize it to perform faster. Here's my solution:
#include <iostream>
#include <algorithm>
using namespace std;
const int MAX = 100;
int arr[MAX][MAX];
int move(int row,int col, int n)
{
if(row >= n || col >= n)
{
return 0;
}
return max(arr[row][col] + move(row + 1, col, n),
arr[row][col] + move(row, col + 1, n));
}
int main()
{
int examples, result;
cin>>examples;
int n;
int results[examples];
for(int k =1; k <= examples; k++)
{
cin >> n;
int s = 0;
int i = 0, j = 0;
for(i = 0; i < n; i++)
{
for(j = 0; j < n; j++)
{
cin>> arr[i][j];
}
}
i = 0, j = 0;
s+=move(i,j, n);
results[k] = s;
}
for(int k = 1; k <= examples; k++)
{
cout<<results[k]<<endl;
}
return 0;
}
(The programs actually has to take all the examples and output the answers for all of them at the end). Mind helping me with the optimizing?
I m not going to paste ready-to-go code here, I ll just give you a high level description of the solution.
So what are your possible choices when deciding where to move? You can move either down or right, adding the value of your current block. Your aim is to maximize the sum of the blocks that you have visited till you make it to the bottom-right block. That gives us:
move(row, column):
//handle the cases when you move out of bounds of the array here
return(max{array[row,column] + move(row + 1, column),
array[row,column] + move(row, column + 1)})
For the above to be a complete dynamic programming solution, you ll need to add some memoization e.g get the values of the problems that you 've already solved without having to compute them again. Check this on stackoverflow for more details : Dynamic programming and memoization: bottom-up vs top-down approaches
So for example, take this board:
Notice the two different routes. We 've arrived to the block[1][2] (the one with value 3 that the red and blue lines end) via two different routes. According to the blue route, we moved down-right-right, while we moved right-right-down via the read. The pseudocode I pasted dictates that we are going to take the blue route first, cause we encounter the recursive call move(row + 1, column) prior to the recursive call move(row, column + 1).
So, when we reach the block[1][2] from the red route, we don't actually need to compute this solution again. We 've already done this, back when we were there via the read route! If we kept this solution in an array (or a map / hash table), we 'd be able to just pick the solution without having to compute it again. That's memoization!
Based on the above, you can utilize a map since you 're using c++:
std::map<std::pair<int, int>, int> cache;
And before doing the recursive call, you ll want to check if the pair exist in the map. If it doesn't you add it in the map. So move becomes:
int move(int row,int col, int n)
{
if(row >= n || col >= n)
{
return 0;
}
pair<int, int> rowplus_column = make_pair(row + 1,col);
pair<int, int> row_columnplus = make_pair(row, col + 1);
int solution_right = 0;
int solution_down = 0;
map<char, int>::iterator it;
it = cache.find(rowplus_column);
if (it == cache.end()) {
solution_down = move(row + 1, col);
cache.insert(rowplus_column, solution_down);
}
else {
solution_down = it->second;
}
it = cache.find(row_columnplus);
if (it == cache.end()) {
solution_right = move(row, col + 1);
cache.insert(row_columnplus, solution_right);
}
else {
solution_right = it->second;
}
return max(arr[row][col] + solution_down,
arr[row][col] + solution_right);
}
I am a little rusty in C++, but hopefully you got the idea:
Before actually computing the solution, check the map for the pair. If that pair exists, you 've already solved that part of the problem, so get your solution from the map and avoid the recursive call.

C++ Sorting a struct vector alphabetically

I've got a task that I'm stuck on. I need to create a program that reads an input file, stores each word into a vector along with how many times that word was read (hence the struct). Those values then need to print out in alphabetical order.
I've come up with something that I think is along the right lines:
struct WordInfo {
string text;
int count;
} uwords, temp;
string word;
int count = 0; //ignore this. For a different part of the task
vector<WordInfo> uwords;
while (cin >> word) {
bool flag = false;
count += 1;
for (int i = 0; i < uwords.size(); i++) {
if (uwords[i].text == word) {
flag = true;
uwords[i].count += 1;
}
}
if (flag == false) {
if (count == 1) { //stores first word into vector
uwords.push_back(WordInfo());
uwords[0].count = 1;
uwords[0].text = word;
} else {
for (int i = 0; i < uwords.size(); i++) {
if (word < uwords[i].text) {
uwords.push_back(WordInfo());
WordInfo temp = {word, 1};
uwords.insert(uwords.begin() + i, temp);
}
}
}
}
}
Now the problem I'm having, is that when I run the program it appears to get stuck in an infinite loop and I can't see why. Although I've done enough testing to realise it's probably in that last if statement, but my attempts to fix it were no good. Any help is appreciated. Cheers.
EDIT: I forgot to mention, we must use vector class and we're limited in what we can use, and sort is not an option :(
if (word < uwords[i].text) {
uwords.push_back(WordInfo());
WordInfo temp = {word, 1};
uwords.insert(uwords.begin() + i, temp);
}
Take a good look at this piece of code:
First, it will actually insert 2 words into your list; one time an "empty" one with push_back, and one time with insert. And it will do that whenever the current word is smaller than the one at the position i.
And as soon as it has inserted, there's 2 new elements to walk over; one actually being at the current position of i, so in the next iteration, we will again compare the same word - so your loop gets stuck because index i increases by 1 each iteration, but the increase of i only steps over the just inserted element!
For a quick solution, you want to (1) search for the position where the word before is "smaller" than the current one, but the next one is bigger. Something like
if (uwords[i-1].text < word && word < uwords[i].text) {
(2) and you want to get rid of the push_back call.
Furthermore, (3) you can break the loop after the if condition was true - you have already inserted then, no need to iterate further. And (4), with a bit of condition tweaking, the count == 1 can actually be merged into the loop. Modified code part (will replace your whole if (code == false) block - warning, not tested yet):
if (!flag) {
for (int i = 0; i <= uwords.size(); ++i) {
if ((i == 0 || uwords[i-1].text < word) &&
(i == uwords.size() || word < uwords[i].text)) {
WordInfo temp = {word, 1};
uwords.insert(uwords.begin() + i, temp);
break;
}
}
}
You should not push your words nin vector, but in map
std::map<std::string,int>
Since map has comparable keys iterator over map, automaticaly returns sorted range that can be later pushed in vector if needed.