How do I implement an erase function for a hash table? - c++

I have a hash table using linear probing. I've been given the task to write an erase(int key) function with the following guidelines.
void erase(int key);
Preconditions: key >= 0
Postconditions: If a record with the specified key exists in the table, then
that record has been removed; otherwise the table is unchanged.
I was also given some hints to accomplish the task
It is important to realize that the insert function will allow you to add a new entry to the table, or to update an existing entry in the table.
For the linear probing version, notice that the code to insert an
item has two searches. The insert() function calls function
findIndex() to search the table to see if the item is already in the
table. If the item is not in the table, a second search is done to
find the position in the table to insert the item. Adding the ability
to delete an entry will require that the insertion process be
modified. When searching for an existing item, be sure that the
search does not stop when it comes to a location that was occupied
but is now empty because the item was deleted. When searching for a
position to insert a new item, use the first empty position - it does
not matter if the position has ever been occupied or not.
So I've started writing erase(key) and I seem to have run into the problem that the hints are referring to, but I'm not positive what it means. I'll provide code in a second, but what I've done to test my code is set up the hash table so that it will have a collision and then I erase that key and rehash the table but it doesn't go into the correct location.
For instance, I add a few elements into my hash table:
The hash table is:
Index Key Data
0 31 3100
1 1 100
2 2 200
3 -1
4 -1
5 -1
6 -1
7 -1
8 -1
9 -1
10 -1
11 -1
12 -1
13 -1
14 -1
15 -1
16 -1
17 -1
18 -1
19 -1
20 -1
21 -1
22 -1
23 -1
24 -1
25 -1
26 -1
27 -1
28 -1
29 -1
30 -1
So all of my values are empty except the first 3 indices. Obviously key 31 should be going into index 1. But since key 1 is already there, it collides and settles for index 0. I then erase key 1 and rehash the table but key 31 stays at index 0.
Here are the functions that may be worth looking at:
void Table::insert( const RecordType& entry )
{
bool alreadyThere;
int index;
assert( entry.key >= 0 );
findIndex( entry.key, alreadyThere, index );
if( alreadyThere )
table[index] = entry;
else
{
assert( size( ) < CAPACITY );
index = hash( entry.key );
while ( table[index].key != -1 )
index = ( index + 1 ) % CAPACITY;
table[index] = entry;
used++;
}
}
Since insert uses findIndex, I'll include that as well
void Table::findIndex( int key, bool& found, int& i ) const
{
int count = 0;
assert( key >=0 );
i = hash( key );
while ( count < CAPACITY && table[i].key != -1 && table[i].key != key )
{
count++;
i = (i + 1) % CAPACITY;
}
found = table[i].key == key;
}
And here is my current start on erase
void Table::erase(int key)
{
assert(key >= 0);
bool found, rehashFound;
int index, rehashIndex;
//check if key is in table
findIndex(key, found, index);
//if key is found, remove it
if(found)
{
//remove key at position
table[index].key = -1;
table[index].data = NULL;
cout << "Found key and removed it" << endl;
//reduce the number of used keys
used--;
//rehash the table
for(int i = 0; i < CAPACITY; i++)
{
if(table[i].key != -1)
{
cout << "Rehashing key : " << table[i].key << endl;
findIndex(table[i].key, rehashFound, rehashIndex);
cout << "Rehashed to index : " << rehashIndex << endl;
table[rehashIndex].key = table[i].key;
table[rehashIndex].data = table[i].data;
}
}
}
}
Can someone explain what I need to do to make it rehash properly? I understand the concept of a hash table, but I seem to be doing something wrong here.
EDIT
As per user's suggestion:
void Table::erase(int key)
{
assert(key >= 0);
bool found;
int index;
findIndex(key, found, index);
if(found)
{
table[index].key = -2;
table[index].data = NULL;
used--;
}
}
//modify insert(const RecordType & entry)
while(table[index].key != -1 || table[index].key != -2)
//modify findIndex
while(count < CAPACITY && table[i].key != -1
&& table[i].key != -2 && table[i].key != key)

When deleting an item from the table, don't move anything around. Just stick a "deleted" marker there. On an insert, treat deletion markers as empty and available for new items. On a lookup, treat them as occupied and keep probing if you hit one. When resizing the table, ignore the markers.
Note that this can cause problems if the table is never resized. If the table is never resized, after a while, your table will have no entries marked as never used, and lookup performance will go to hell. Since the hints mention keeping track of whether an empty position was ever used and treating once-used cells differently from never-used, I believe this is the intended solution. Presumably, resizing the table will be a later assignment.

It's not necessary to rehash the entire table every time a delete is done. If you want to minimise degradation in performance, then you can compact the table by considering whether any of the elements after (with wrapping from end to front allowed) the deleted element but before the next -1 hash to a bucket at or before the deleted element - if so, then they can be moved to or at least closer to their hash bucket, then you can repeat the compaction process for the just-moved element.
Doing this kind of compaction will remove the biggest flaw in your current code, which is that after a little use every bucket will be marked as either in use or having been used, and performance for e.g. find of a non-existent value will degrade to O(CAPACITY).
Off the top of my head with no compiler/testing...
int Table::next(int index) const
{
return (index + 1) % CAPACITY;
}
int Table::distance(int from, int to) const
{
return from < to ? to - from : to + CAPACITY - from;
}
void Table::erase(int key)
{
assert(key >= 0);
bool found;
int index;
findIndex(key, found, index);
if (found)
{
// compaction...
int limit = CAPACITY - 1;
for (int compact_from = next(index);
limit-- && table[compact_from].key >= 0;
compact_from = next(compact_from))
{
int ideal = hash(table[compact_from].key);
if (distance(ideal, index) <
distance(ideal, compact_from))
{
table[index] = table[compact_from];
index = compact_from;
}
}
// deletion
table[index].key = -1;
delete table[index].data; // or your = NULL if not a leak? ;-.
--used;
}
}

Related

Inserting node in Hash table with open addressing [Optimizing the logic]

I am trying to understand a data structure, hash table with open addressing.
I am currently reading on the source code provided by geekforgeeks, but I have a few questions on the code.
Below, is the pasted function for inserting Node from geekforgeeks.
//Function to add key value pair
void insertNode(K key, V value)
{
HashNode<K,V> *temp = new HashNode<K,V>(key, value);
// Apply hash function to find index for given key
int hashIndex = hashCode(key);
//find next free space
while(arr[hashIndex] != NULL && arr[hashIndex]->key != key //// LINE 9 //////
&& arr[hashIndex]->key != -1)
{
hashIndex++;
hashIndex %= capacity;
}
//if new node to be inserted increase the current size
if(arr[hashIndex] == NULL || arr[hashIndex]->key == -1) //// LINE 17 //////
size++;
arr[hashIndex] = temp;
}
Questions
In line 9, why would you check three conditionals, being,
if slot inside the hash table is null ===> arr[hashIndex] != NULL
AND if slot has the same key with the node that is going to be inserted ===> arr[hashIndex]->key != key
AND if slot has the key of -1, which indicates the slot where node was deleted before ===> arr[hashIndex]->key != -1
If I were to optimize this code, I believe checking whether the slot is NULL or not is already enough.
In line 17, why would you increment the size property of HashMap before assigning the node to the slot? ===> if(arr[hashIndex] == NULL || arr[hashIndex]->key == -1)
size++;
To me, this logic seems to be messy.
I would rather do, arr[hashIndex] = temp; size++;
With the assumption of geekforgeeks's logic is well written, could you explain to me why the logic for inserting the new node to a hash table with open addressing is implemented as above specifically on the two points I have raised?
The three conditions to have a valid index are:
The object at the index is NULL
OR the object is not NULL, but its key is the same of the one we're inserting
OR the object is not NULL, but its key value is -1
Since the negation of all three conditions occurs, we don't have a valid index, and the loop rolls on.
In line 17: size is incremented only if the insertion doesn't reuse an existing index, so the node is new (which means either condition 1 or 3 applies).

Find minimum value different than zero given some conditions

I've started learning C++ Sets and Iterators and I can't figure if I'm doing this correctly since I'm relatively new to programming.
I've created a Set of a struct with a custom comparator that puts the items in a decreasing order. Before receiving the input I don't know how many items my Set will contain. It can contain any number of items from 0 to 1000.
Here are the Setdefinitions:
typedef struct Pop {
int value_one; int node_value;
} Pop;
struct comparator {
bool operator() (const Pop& lhs, const Pop& rhs) const {
if (rhs.value_one == lhs.value_one) {
return lhs.node_value < rhs.node_value;
} else { return rhs.value_one < lhs.value_one;}
}
};
set<Pop, comparator> pop;
set<Pop>::iterator it;
And this is the algorithm. It should find a minimum value and print that value. If it does not find (the function do_some_work(...) returns 0), it should print "Zero work found!\n":
int minimum = (INT_MAX) / 2; int result;
int main(int argc, char** argv) {
//....
//After reading input and adding values to the SET gets to this part
Pop next;
Pop current;
for (it = pop.begin(); it != pop.end() && minimum != 1; it++) {
current = *it;
temp_it = it;
temp_it++;
if (temp_it != pop.end()) {
next = *temp_it;
// This function returns a integer value that can be any number from 0 to 5000.
// Besides this, it checks if the value found is less that the minimum (declared as global) and different of 0 and if so
// updates the minimum value. Even if the set as 1000 items and at the first iteration the value
// found is 1, minimum is updated with 1 and we should break out of the for loop.
result = do_some_work(current.node_value);
if (result > 0 && next.value_one < current.value_one) {
break;
}
} else {
result = do_some_work(current.node_value);
}
}
if (minimum != (INT_MAX) / 2) {
printf("%d\n", minimum);
} else {
printf("Zero work found!\n");
}
return 0;
}
Here are some possible outcomes.
If the Set is empty it should print Zero work found!
If the Set as one item and do_some_work(current.node_value) returns a value bigger than 0 it should printf("%d\n", minimum); or Zero work found! otherwise.
Imagine I have this Set (first position value_one and second position node_value:
4 2
3 6
3 7
3 8
3 10
2 34
If in the first iteration do_some_work(current.node_value) returns a value bigger than 0, since all other items value_one are smaller, it should break the loop, print the minimum and exit the program.
If in the first iteration do_some_work(current.node_value) returns 0, I advance in the Set and since there are 4 items with value_one as 3 I must analyze this 4 items because any of these can return a possible valid minimum value. If any of these updates the minimum value to 1, it should break the loop, print the minimum and exit the program.
In this case, the last item of the Set is only analysed if all other items return 0 or minimum value is set to 1.
For me this is both an algorithmic problem and a programming problem.
With this code, am I analysing all the possibilities and if minimum is 1, breaking the loop since if 1 is returned there's no need to check any other items?

Recursive Function Error

Im trying to create a recursive function that contains a vector of numbers and has a key, which is the number we are looking for in the vector.
Each time the key is found the function should display a count for how many times the key appears in the vector.
For some reason my recursive function is only returning the number 1 (disregard the 10 I was just testing something)
Here's my code:
int recursive_count(const vector<int>& vec, int key, size_t start){
if (start == vec.size())
return true;
return (vec[start] == key? 23 : key)
&& recursive_count(vec, key, (start+1));
}
int main() {
vector <int> coco;
for (int i = 0; i<10; i++) {
coco.push_back(i);
}
cout << coco.size() << endl;
int j = 6;
cout << recursive_count(coco, j, 0) << endl;
}
Not sure what you are trying to do, but as is - your function will return false (0) if and only if the input key is 0 and it is in the vector. Otherwise it will return 1.
This is because you are basically doing boolean AND operation. The operands are true for all values that are not 0, and the only way to get a 0 - is if it is in the vector - and the key is 0.
So, unless you get a false (0) along the way, the answer to the boolean formula is true, which provides the 1.
EDIT:
If you are trying to do count how many times the key is in vec - do the same thing you did in iterative approach:
Start from 0 (make stop condition return 0; instead of return true;)
Increase by 1 whenever the key is found instead of using operator&&, use the operator+.
(I did not give a direct full answer because it seems like HW, try to follow these hints, and ask if you have more questions).
To me it seems that a recursive function for that is nonsense, but anyway...
Think about the recursion concepts.
What is the break condition? That the current character being checked is not in the string anymore. You got that right.
But the recursion case is wrong. You return some kind of bool (what's with the 23 by the way?
The one recursion round needs to return 1 if the current element equals key, and 0 otherwise.
Then we only need to add up the recursion results, and we're there!
Here's the code
int recursive_count(const vector<int>& vec, int key, size_t start) {
if (start >= vec.size()) {
return 0;
} else {
return
((vec[start] == key) ? 1 : 0) +
recursive_count(vec, key, start+1);
}
}
Since this is even tail-recursion, good compilers will remove the recursion for you by the way, and turn it into its iterative counterpart...
Your recursive_count function always evaluates to a bool
You are either explicitly returning true
if (start == vec.size())
return true;
or returning a boolean compare
return (vec[start] == key? 23 : key) // this term gets evaluated
&& // the term above and below get 'anded', which returns true or false.
recursive_count(vec, key, (start+1)) // this term gets evaluated
It then gets cast to your return type ( int ), meaning you will only ever get 0 or 1 returned.
As per integral promotion rules on cppreference.com
The type bool can be converted to int with the value false becoming
​0​ and true becoming 1.
With,
if (start == vec.size())
return true;
your function with return type int returns 1

How do I make make my hash table with linear probing more efficient?

I'm trying to implement an efficient hash table where collisions are solved using linear probing with step. This function has to be as efficient as possible. No needless = or == operations. My code is working, but not efficient. This efficiency is evaluated by an internal company system. It needs to be better.
There are two classes representing a key/value pair: CKey and CValue. These classes each have a standard constructor, copy constructor, and overridden operators = and ==. Both of them contain a getValue() method returning value of internal private variable. There is also the method getHashLPS() inside CKey, which return hashed position in hash table.
int getHashLPS(int tableSize,int step, int collision) const
{
return ((value + (i*step)) % tableSize);
}
Hash table.
class CTable
{
struct CItem {
CKey key;
CValue value;
};
CItem **table;
int valueCounter;
}
Methods
// return collisions count
int insert(const CKey& key, const CValue& val)
{
int position, collision = 0;
while(true)
{
position = key.getHashLPS(tableSize, step, collision); // get position
if(table[position] == NULL) // free space
{
table[position] = new CItem; // save item
table[position]->key = CKey(key);
table[position]->value = CValue(val);
valueCounter++;
break;
}
if(table[position]->key == key) // same keys => overwrite value
{
table[position]->value = val;
break;
}
collision++; // current positions is full, try another
if(collision >= tableSize) // full table
return -1;
}
return collision;
}
// return collisions count
int remove(const CKey& key)
{
int position, collision = 0;
while(true)
{
position = key.getHashLPS(tableSize, step, collision);
if(table[position] == NULL) // free position - key isn't in table or is unreachable bacause of wrong rehashing
return -1;
if(table[position]->key == key) // found
{
table[position] = NULL; // remove it
valueCounter--;
int newPosition, collisionRehash = 0;
for(int i = 0; i < tableSize; i++, collisionRehash = 0) // rehash table
{
if(table[i] != NULL) // if there is a item, rehash it
{
while(true)
{
newPosition = table[i]->key.getHashLPS(tableSize, step, collisionRehash++);
if(newPosition == i) // same position like before
break;
if(table[newPosition] == NULL) // new position and there is a free space
{
table[newPosition] = table[i]; // copy from old, insert to new
table[i] = NULL; // remove from old
break;
}
}
}
}
break;
}
collision++; // there is some item on newPosition, let's count another
if(collision >= valueCounter) // item isn't in table
return -1;
}
return collision;
}
Both functions return collisions count (for my own purpose) and they return -1 when the searched CKey isn't in the table or the table is full.
Tombstones are forbidden. Rehashing after removing is a must.
The biggest change for improvement I see is in the removal function. You shouldn't need to rehash the entire table. You only need to rehash starting from the removal point until you reach an empty bucket. Also, when re-hashing, remove and store all of the items that need to be re-hashed before doing the re-hashing so that they don't get in the way when placing them back in.
Another thing. With all hashes, the quickest way to increase efficiency to to decrease the loadFactor (the ratio of elements to backing-array size). This reduces the number of collisions, which means less iterating looking for an open spot, and less rehashing on removal. In the limit, as the loadFactor approaches 0, collision probability approaches 0, and it becomes more and more like an array. Though of course memory use goes up.
Update
You only need to rehash starting from the removal point and moving forward by your step size until you reach a null. The reason for this is that those are the only objects that could possibly change their location due to the removal. All other objects would wind up hasing to the exact same place, since they don't belong to the same "collision run".
A possible improvement would be to pre-allocate an array of CItems, that would avoid the malloc()s / news and free() deletes; and you would need the array to be changed to "CItem *table;"
But again: what you want is basically a smooth ride in a car with square wheels.

hashkey collision when removing C++

To make the search foreach "symbol" i want to remove from my hashTable, i have chosen to generate the hashkey i inserted it at. However, the problem that Im seeing in my remove function is when I need to remove a symbol from where a collision was found it previously results in my while loop condition testing false where i do not want.
bool hashmap::get(char const * const symbol, stock& s) const
{
int hash = this->hashStr( symbol );
while ( hashTable[hash].m_symbol != NULL )
{ // try to find a match for the stock associated with the symbol.
if ( strcmp( hashTable[hash].m_symbol , symbol ) == 0 )
{
s = &hashTable[hash];
return true;
}
++hash %= maxSize;
}
return false;
}
bool hashmap::put(const stock& s, int& usedIndex, int& hashIndex, int& symbolHash)
{
hashIndex = this->hashStr( s.m_symbol ); // Get remainder, Insert at that index.
symbolHash = (int&)s.m_symbol;
usedIndex = hashIndex;
while ( hashTable[hashIndex].m_symbol != NULL ) // collision found
{
++usedIndex %= maxSize; // if necessary wrap index around
if ( hashTable[usedIndex].m_symbol == NULL )
{
hashTable[usedIndex] = s;
return true;
}
else if ( strcmp( hashTable[usedIndex].m_symbol , s.m_symbol ) == 0 )
{
return false; // prevent duplicate entry
}
}
hashTable[hashIndex] = s; // insert if no collision
return true;
}
// What if I need to remove an index i generate?
bool hashmap::remove(char const * const symbol)
{
int hashVal = this->hashStr( symbol );
while ( hashTable[hashVal].m_symbol != NULL )
{
if ( strcmp( hashTable[hashVal].m_symbol, symbol ) == 0 )
{
stock temp = hashTable[hashVal]; // we cansave it
hashTable[hashVal].m_symbol = NULL;
return true;
}
++hashVal %= maxSize; // wrap around if needed
} // go to the next cell meaning their was a previous collision
return false;
}
int hashmap::hashStr(char const * const str)
{
size_t length = strlen( str );
int hash = 0;
for ( unsigned i = 0; i < length; i++ )
{
hash = 31 * hash + str[i];
}
return hash % maxSize;
}
What would I need to do to remove a "symbol" from my hashTable from a previous collision?
I am hoping it is not java's equation directly above.
It looks like you are implementing a hash table with open addressing, is that right? Deleting is a little tricky in that scheme. See http://www.maths.lse.ac.uk/Courses/MA407/del-hash.pdf:
"Deletion of keys is problematic with open addressing: If there are two colliding keys x and y with h(x) = h(y), and key x is inserted before key y, and one wants to delete key x, this cannot simply be done by marking T[h(x)] as FREE, since then y would no longer be found. One possibility would be to mark T[h(x)] as DELETED (another special entry), which is skipped when searching for a key. A table place marked as DELETED may also be re-used for storing another key z that one wants to insert if one is sure that this key z is not already in the table (i.e., by reaching the end of the probe sequence for key z and not finding it). Such re-use complicates the insertion method. Moreover, places with DELETED keys fill the table."
What you need to do is create a dummy sentinel value that represents a "deleted" item. When you insert a new value into the table, you need to check to see if an element is NULL or "deleted". If a slot contains this sentinel "deleted" value or the slot is NULL, then the slot is a valid slot for insertion.
That said, if you are writing this code for production, you should consider using the boost::unordered_map, instead of rolling your own hash map implementation. If this is for schoolwork,... well, good luck.