Inserting node in Hash table with open addressing [Optimizing the logic] - c++

I am trying to understand a data structure, hash table with open addressing.
I am currently reading on the source code provided by geekforgeeks, but I have a few questions on the code.
Below, is the pasted function for inserting Node from geekforgeeks.
//Function to add key value pair
void insertNode(K key, V value)
{
HashNode<K,V> *temp = new HashNode<K,V>(key, value);
// Apply hash function to find index for given key
int hashIndex = hashCode(key);
//find next free space
while(arr[hashIndex] != NULL && arr[hashIndex]->key != key //// LINE 9 //////
&& arr[hashIndex]->key != -1)
{
hashIndex++;
hashIndex %= capacity;
}
//if new node to be inserted increase the current size
if(arr[hashIndex] == NULL || arr[hashIndex]->key == -1) //// LINE 17 //////
size++;
arr[hashIndex] = temp;
}
Questions
In line 9, why would you check three conditionals, being,
if slot inside the hash table is null ===> arr[hashIndex] != NULL
AND if slot has the same key with the node that is going to be inserted ===> arr[hashIndex]->key != key
AND if slot has the key of -1, which indicates the slot where node was deleted before ===> arr[hashIndex]->key != -1
If I were to optimize this code, I believe checking whether the slot is NULL or not is already enough.
In line 17, why would you increment the size property of HashMap before assigning the node to the slot? ===> if(arr[hashIndex] == NULL || arr[hashIndex]->key == -1)
size++;
To me, this logic seems to be messy.
I would rather do, arr[hashIndex] = temp; size++;
With the assumption of geekforgeeks's logic is well written, could you explain to me why the logic for inserting the new node to a hash table with open addressing is implemented as above specifically on the two points I have raised?

The three conditions to have a valid index are:
The object at the index is NULL
OR the object is not NULL, but its key is the same of the one we're inserting
OR the object is not NULL, but its key value is -1
Since the negation of all three conditions occurs, we don't have a valid index, and the loop rolls on.
In line 17: size is incremented only if the insertion doesn't reuse an existing index, so the node is new (which means either condition 1 or 3 applies).

Related

BST search key in C++

I was trying to implement BST search method for finding a key in BST. Below is the code.
node* search_key(node **root,int key)
{
if (*root == NULL || (*root)->data == key ){
return (*root);
}
if ( key < (*root)->data ) {
search_key(&(*root)->left, key);
}
else{
search_key(&(*root)->right, key);
}
}
Above code was always returning null except for searching root node. I modified the code to the following and it is working perfectly.
Can anyone please explain the recursion involved here?
node* search_key(node **root,int key)
{
if (*root == NULL || (*root)->data == key ){
return (*root);
}
if ( key < (*root)->data ) {
return search_key(&(*root)->left, key); // note return
}
else{
return search_key(&(*root)->right, key);
}
}
In the first code snippet you have a function which is supposed to return something, but doesn't do that in some cases. That will lead to undefined behavior.
In the second snippet you actually return something in all paths of the function.
When you call same function or another function with a function it make a call of that function and when it completed call comes back to the caller function here call is coming back to caller but no result as you are not returning any result.
So after returning value to the called function it starts showing correct result.
Follow this image of factorial function for better understanding
In the binary tree there are 2 roots the left and right root. So your first condition says if you are in the last node (where root == NULL) or you find the key (value on root->data match the value of key) so, the program returns the exact block that contain the data (there it is not recursive).
if (*root == NULL || (*root)->data == key ){
return (*root);
The second condition say if the key is less than the value stored into root->data, the node go to the left root. So, the line returnsearch_key(root->left,key)` means: redo the same operation but now go to the left root.
if ( key < (*root)->data ) {
return search_key(&(*root)->left, key);
}
So, let's imagine that in the left root, the root->data is higher than the key,
so the line return search_key(root->right,key) means redo this operation but now go to the right root.
else{
return search_key(&(*root)->right, key);
}
The recursion will end only if you find the key somewhere or you check everything and you did not find anything. So remember, recursion means redo it again and again.
While implementing recursion we should always make sure that for each condition when we call the function recursively, then there should be a "return" statement so that immediate caller of a function will get the resultant value and therefore at the end of the recursion final value will be returned.
What in case if return statement is not used:
If there is no return statement, then the value computed by the current function will be discarded or unsaved so that the immediate caller won't get any value and further it will end up with undefined behaviour.

Determining length for an array allocation

This is the snippet of code that I'm puzzled about. I'm checking for how long an incoming string is. I've appended * in order to have a sentinel value to stop the while loop. Yet, I'm consistently getting a length value that is inclusive of the * and I don't understand why, since the while loop with the nested if ought to stop prior to the *. Can someone point out what I'm doing wrong and why I'm having this issue?
void conversion(string romanIn)
{
length=0;
romanIn.append("*");
while(item!="*")
{
if(item != "*")
{
item = romanIn[length];
length++;
}
cout<<item;
}
you are naturally going to get a +1 the first time through the loop because you aren't initializing the variable "item". Also make it a do while instead of a while loop.
Try this:
do
{
// This line moves out of the if statement
item = romanIn[length];
if(item != "*")
{
length++;
}
cout<<item;
}while(item!="*")
What is the initial value of item?
Let's assume it's 0. You enter the loop
item == 0 != marker, so you enter the if as well, and you say
item = romanIn[0], length++
If romanIn[0] == "*" you will exit the loop, but your length now says 1 which includes the marker

Weird pointer issue...constructing BST using array of pointers

So I am trying to create a BST using an array of pointers. My algorithm is correct (tested a version that doesn't use pointers), but when using the below code, the following occurs:
If I add the first element, it is added to position 1 of the array.
If I add a second element, for some reason, position 1 of the array is overwritten to this element, and then the program continues (else part) and attempts to insert it again.
EG. (traced the program with a bunch of couts)
1. Call add(5, 1)
inserting 5 into position 1
position 1 is now 5
2. Call add(4, 1)
position 1 is now 4
moving right
inserting 4 into position 3
position 1 is now 4
...
template <typename Item> void ABTree<Item>::add(Item input, int index){
if (array[1]==0){
array[1] = &input;
size++;
}else{
if (input < *array[index]){
if (array[2*index] == 0){
array[2*index] = &input;
size++;
}else
add(input, 2*index);
}else{
if (array[(2*index)+1] == 0){
array[(2*index)+1] = &input;
size++;
}else
add(input, (2*index)+1);
}
}
There are a whole lotta troubles in the code you've provided.
Assigning address of the temporary variable is
inadmissible. The "input" variable ends up right after the add
function returns. Storing its address than is meaningless and leads
to almost inevitable crash. In particular, when you're adding second element to your array, those condition checks:
if (array[1]==0)
if (input < *array[index])
yields undefined result, and therefore undefined control flow.
I don't know type of the container you're using but I assume it to
be a kind of vector or even plain C array. If my guess is true, you
should perform a bounds check before accessing array's elements with
the index.
No reallocation of the memory accupied by the array is performed.
Or, alternatively, if you're using a constant size array, no check
is performed (see 2).
So, basing on the output, you've got the following sequence when adding second element:
On the check that head is empty - false;
On the check if (input < *array[index]) - false, since array[index] contains garbage, this is why "inserting 4 into position 3" despite we should add 4 to the position 2;
On the check if (array[(2*index)+1] == 0) - again false, since this element contains garbage too (am I right that you didn't initialize your array with zeros?);
So, recursive call to the add routine happens...
I altered the code to accept a pointer to the item to be inserted instead.
if (array[1]==0){
array[1] = input;
size++;
}else{
if (*input < *array[index]){
if (array[2*index] == 0){
array[2*index] = input;
size++;
}else
add(input, 2*index);
}else{
if (array[(2*index)+1] == 0){
array[(2*index)+1] = input;
size++;
}else
add(input, (2*index)+1);
}
}

c++ program crashes linked list?

if(tmpPtr->number<tmpPtr->next_number->number)
{
int tmpV1=tmpPtr->next_number->number;
int tmpV2=tmpPtr->number;
tmpPtr->next_number->number=tmpV2;
tmpV2=tmpPtr->number=tmpV1;
}
This is what I have tried so far, this is supposed to be sorting the linked list as member are being added each time. But when the compiler crashes when I try to put in the second node. The break point is the if statement, if(tmpPtr->number<tmpPtr->next_number->number). I tried really hard to figure out what the problem was, but couldnt.
Your problem is that on the second run tmpPtr points to your first element which has a next_number value of NULL. So as soon as you try to dereference it, it will basically reduce itself to a NULL pointer which leads to a SIGSEGV.
after the first run
n->number = input
n->next_number = NULL
h = n
t = n
counter2 = 1
so starting with the second input
n->number
n->next_number = NULL
tmpPtr = h // which is the previous n and therefor h->next_number = NULL
tmpPtr->next_number == NULL // this is your problem since you do not check if next_number is a valid pointer
UPDATE:
if uploaded a (hackish) version of a solution at https://gist.github.com/sahne/c36e835e7c7dbb855076
for the second add, h->next_number is NULL, so on the first iteration of the inner while loop, you dereference NULL (alias of h->next_number->number).
Edit
When you're inserting the 2nd item:
head == tail, so head->next == NULL.
you start the inner loop:
head->number == first inserted item.
head->next == NULL.
head->next->number == dereferenced NULL.

Pointer comparision issue

I'm having a problem with a pointer and can't get around it..
In a HashTable implementation, I have a list of ordered nodes in each bucket.The problem I have It's in the insert function, in the comparision to see if the next node is greater than the current node(in order to inserted in that position if it is) and keep the order.
You might find this hash implementation strange, but I need to be able to do tons of lookups(but sometimes also very few) and count the number of repetitions if It's already inserted (so I need fasts lookups, thus the Hash , I've thought about self-balanced trees as AVL or R-B trees, but I don't know them so I went with the solution I knew how to implement...are they faster for this type of problem?),but I also need to retrieve them by order when I've finished.
Before I had a simple list and I'd retrieve the array, then do a QuickSort, but I think I might be able to improve things by keeping the lists ordered.
What I have to map It's a 27 bit unsigned int(most exactly 3 9 bits numbers, but I convert them to a 27 bit number doing (Sr << 18 | Sg << 9 | Sb) making at the same time their value the hash_value. If you know a good function to map that 27 bit int to an 12-13-14 bit table let me know, I currently just do the typical mod prime solution.
This is my hash_node struct:
class hash_node {
public:
unsigned int hash_value;
int repetitions;
hash_node *next;
hash_node( unsigned int hash_val,
hash_node *nxt);
~hash_node();
};
And this is the source of the problem
void hash_table::insert(unsigned int hash_value) {
unsigned int p = hash_value % tableSize;
if (table[p]!=0) { //The bucket has some elements already
hash_node *pred; //node to keep the last valid position on the list
for (hash_node *aux=table[p]; aux!=0; aux=aux->next) {
pred = aux; //last valid position
if (aux->hash_value == hash_value ) {
//It's already inserted, so we increment it repetition counter
aux->repetitions++;
} else if (hash_value < (aux->next->hash_value) ) { //The problem
//If the next one is greater than the one to insert, we
//create a node in the middle of both.
aux->next = new hash_node(hash_value,aux->next);
colisions++;
numElem++;
}
}//We have arrive to the end od the list without luck, so we insert it after
//the last valid position
ant->next = new hash_node(hash_value,0);
colisions++;
numElem++;
}else { //bucket it's empty, insert it right away.
table[p] = new hash_node(hash_value, 0);
numElem++;
}
}
This is what gdb shows:
Program received signal SIGSEGV, Segmentation fault.
0x08050b4b in hash_table::insert (this=0x806a310, hash_value=3163181) at ht.cc:132
132 } else if (hash_value < (aux->next->hash_value) ) {
Which effectively indicates I'm comparing a memory adress with a value, right?
Hope It was clear. Thanks again!
aux->next->hash_value
There's no check whether "next" is NULL.
aux->next might be NULL at that point? I can't see where you have checked whether aux->next is NULL.