Hashing with linear probing in c++ - c++

Can anyone explain me, what DeletedNode class is doing what the purpose of DeletedNode():HashNode(-1, -1){}.
Please explain the pointer concept here
HashNode ** htable = new HashNode*[Tablesize];
const int Tablesize=10;
class HashNode {
public:
int key;
int value;
HashNode(int key, int value) {
this->key=key;
this->value=value;
}
};
class DeletedNode : public HashNode {
private:
static DeletedNode * entry;
DeletedNode() : HashNode(-1, -1) {} // Please explain this
public:
static DeletedNode * getNode() {
if (entry == NULL)
entry = new DeletedNode();
return entry;
}
};
DeletedNode * DeletedNode::entry = NULL; //Why this?
class HashMap {
public:
HashNode ** htable;
HashMap() {
htable = new HashNode*[Tablesize]; // Please explain the pointer concept here
for (int i = 0; i < Tablesize; i++)
htable[i] = NULL;
}
int HashFunc(int key) { return key % Tablesize; }
};

Consider what happens when you delete an entry from the hash table which is part of a "collision cluster", a contiguous block of elements that happen to have the same hash value.
Let's say elements A, B, and C all hash to the same value h. In this case, they will be inserted into the table at positions h, h + 1 and h + 2, respectively:
--------
A h
--------
B h + 1
--------
C h + 2
--------
Now what happens if you delete B? If we do the deletion naively, then there will be a hole between A and C:
--------
A h
--------
h + 1
--------
C h + 2
--------
Now if you try to look up C in the hash table, its hash value will be h, so the search for it will begin at position h. However, the next entry at position h + 1 is now empty, hence the linear probing search will terminate prematurely, and you will get the wrong result that C isn't in the table.
In order to prevent the premature termination of the search, a special "dummy" node needs to be inserted in the empty place, which says "there was something here some day which has now been deleted, but I'm part of a collision cluster anyway, so keep searching".

Related

C++ permutation tree

I have tasks and I want to calculate the most profitable order to arrange them.
Instead of checking every permutation and doing n*n! calculations, I want to build a tree of permutations, that is, the number of children at each level decreases by 1, and at each node the sub-permutation that has already been calculated will be saved and not recalculated.
For example, if I have 4 tasks, the tree will look like this:
My attached code is missing. I don't know how to build the tree and the give nodes the indexes as in the figure. I know how to deal with a binary tree, but not with a tree where the number of children is different at each lavel.
(The value of each task depends on its location.
I know how to do that, so I didn't include it in the question).
int n = 4;
struct node
{
int task_index = -1;
double value;
struct node **next;
};
void build_tree(node *current_node, int current_level = 0)
{
if (current_level < 1 || current_level >= n)
return;
// current_node->task_index = ? ;
current_node->next = new node *[n - current_level];
for (int i = 0; i < n - current_level; i++)
{
build_tree(current_node->next[i], current_level + 1);
}
}
void print_tree(node *current_node, int current_level = 0)
{
// print indexes
}
void delete_tree(node *current_node, int current_level = 0)
{
// delete nodes
}
int main()
{
struct node *root = new node;
build_tree(root);
print_tree(root);
delete_tree(root);
delete root;
return 0;
}
void build_tree(node *current_node, int current_level = 0)
{
if (current_level < 1 || current_level >= n)
return;
// current_node->task_index = ? ;
current_node->next = new node *[n - current_level];
for (int i = 0; i < n - current_level; i++)
{
build_tree(current_node->next[i], current_level + 1);
}
}
When called with the default parameter of current_level = 0, as you illustrate in your code below, this function exits on the first line without doing anything. You need to decide whether you are indexing starting from 0 or from 1.
Other than that, the general outline of the algorithm looks okay, although I did not explicitly check for correctness.
Now, more broadly: is this an exercise to see if you can write a tree structure, or are you trying to get the job done? In the latter case you probably want to use a prebuilt data structure like that in the boost graph library.
If it's an exercise in building a tree structure, is it specifically an exercise to see if you can write code dealing with raw pointers-to-pointers? If not, you should work with the correct C++ containers for the job. For instance you probably want to store the list of child nodes in a std::vector rather than have a pointer-to-pointer with the only way to tell how many child nodes exist being the depth of the node in the tree. (There may be some use case for such an extremely specialized structure if you are hyper-optimizing something for a very specific reason, but it doesn't look like that's what's going on here.)
From your explanation what you are trying to build is a data structure that reuses sub-trees for common permutations:
012 -> X
210 -> X
such that X is only instantiated once. This, of course, is recursive, seeing as
01 -> Y
10 -> Y
Y2 -> X
If you look at it closely, there are 2^n such subtrees, because any prefix can have any one of the n input tasks used or not. This means you can represent the subtree as an index into an array of size 2^n, with a total footprint O(n*2^n), which improves on the vastly larger >n! tree:
struct Edge {
std::size_t task;
std::size_t sub;
};
struct Node {
std::vector<Edge> successor; // size in [0,n]
};
std::vector<Node> permutations; // size exactly 2^n
This will have this structure:
permutations: 0 1 2 3 4 ...
|-^
|---^
|-------^
|---^
|-^
Where the node at, e.g., location 3 has both task 0 and 1 already used and "points" to all (n-2) subtrees.
Of course, building this is not entirely trivial, but it compressed the search space and allows you re-use results for specific sub-trees.
You can build the table like this:
permutations.resize(1<<n);
for (std::size_t i = 0; i < size(permutations); ++i) {
permutations[i].successor.reserve(n); // maybe better heuristic?
for (std::size_t j = 0; j < n; ++j) {
if (((1<<j) & i) == 0) {
permutations[i].successor.push_back({j,(1<<j)|i});
}
}
}
Here is a live demo for n=4.
The recursive way to generate permutations is if you have n items then all of the permutations of the items are each of the n items concatenated with the permutations of the n-1 remaining items. In code this is easier to do if you pass around the collection of items.
Below I do it with an std::vector<int>. Once using a vector it makes more sense to just follow the "rule of zero" pattern and let the nodes have vectors of children and then not need to dynamically allocate anything manually:
#include <vector>
#include <algorithm>
#include <iostream>
struct node
{
int task_index = -1;
double value;
std::vector<node> next;
};
std::vector<int> remove_item(int item, const std::vector<int>& items) {
std::vector<int> output(items.size() - 1);
std::copy_if(items.begin(), items.end(), output.begin(),
[item](auto v) {return v != item; }
);
return output;
}
void build_tree(node& current_node, const std::vector<int>& tasks)
{
auto n = static_cast<int>(tasks.size());
for (auto curr_task : tasks) {
node child{ curr_task, 0.0, {} };
if (n > 1) {
build_tree(child, remove_item(curr_task, tasks));
}
current_node.next.emplace_back(std::move(child));
}
}
void print_tree(const node& current_node)
{
std::cout << "( " << current_node.task_index << " ";
for (const auto& child : current_node.next) {
print_tree(child);
}
std::cout << " )";
}
int main()
{
node root{ -1, 0.0, {} };
build_tree(root, { 1, 2, 3 });
print_tree(root);
return 0;
}

Getting a floating point exception error while doing text frequency analysis?

So for a school project, we are being asked to do a word frequency analysis of a text file using dictionaries and bucket hashing. The output should be something like this:
$ ./stats < jabberwocky.txt
READING text from STDIN. Hit ctrl-d when done entering text.
DONE.
HERE are the word statistics of that text:
There are 94 distinct words used in that text.
The top 10 ranked words (with their frequencies) are:
1. the:19, 2. and:14, 3. !:11, 4. he:7, 5. in:6, 6. .:5, 7.
through:3, 8. my:3, 9. jabberwock:3, 10. went:2
Among its 94 words, 57 of them appear exactly once.
Most of the code has been written for us, but there are four functions we need to complete to get this working:
increment(dict D, std::str w) which will increment the count of a word or add a new entry in the dictionary if it isn't there,
getCount(dict D, std::str w) which fetches the count of a word or returns 0,
dumpAndDestroy(dict D) which dumps the words and counts of those words into a new array by decreasing order of count and deletes D's buckets off the heap, and returns the pointer to that array,
rehash(dict D, std::str w) which rehashes the function when needed.
The structs used are here for reference:
// entry
//
// A linked list node for word/count entries in the dictionary.
//
struct entry {
std::string word; // The word that serves as the key for this entry.
int count; // The integer count associated with that word.
struct entry* next;
};
// bucket
//
// A bucket serving as the collection of entries that map to a
// certain location within a bucket hash table.
//
struct bucket {
entry* first; // It's just a pointer to the first entry in the
// bucket list.
};
// dict
//
// The unordered dictionary of word/count entries, organized as a
// bucket hash table.
//
struct dict {
bucket* buckets; // An array of buckets, indexed by the hash function.
int numIncrements; // Total count over all entries. Number of `increment` calls.
int numBuckets; // The array is indexed from 0 to numBuckets.
int numEntries; // The total number of entries in the whole
// dictionary, distributed amongst its buckets.
int loadFactor; // The threshold maximum average size of the
// buckets. When numEntries/numBuckets exceeds
// this loadFactor, the table gets rehashed.
};
I've written these functions, but when I try to run it with a text file, I get a Floating point exception error. I've emailed my professor for help, but he hasn't replied. This project is due very soon, so help would be much appreciated! My written functions for these are as below:
int getCount(dict* D, std::string w) {
int stringCount;
int countHash = hashValue(w, numKeys(D));
bucket correctList = D->buckets[countHash];
entry* current = correctList.first;
while (current != nullptr && current->word < w) {
if (current->word == w) {
stringCount = current->count;
}
current = current->next;
}
std::cout << "getCount working" << std::endl;
return stringCount;
}
void rehash(dict* D) {
// UNIMPLEMENTED
int newSize = (D->numBuckets * 2) + 1;
bucket** newArray = new bucket*[newSize];
for (int i = 0; i < D->numBuckets; i++) {
entry *n = D->buckets->first;
while (n != nullptr) {
entry *tmp = n;
n = n->next;
int newHashValue = hashValue(tmp->word, newSize);
newArray[newHashValue]->first = tmp;
}
}
delete [] D->buckets;
D->buckets = *newArray;
std::cout << "rehash working" << std::endl;
return;
void increment(dict* D, std::string w) {
// UNIMPLEMENTED
int incrementHash = hashValue(w, numKeys(D));
entry* current = D->buckets[incrementHash].first;
if (current == nullptr) {
int originalLF = D->loadFactor;
if ((D->numEntries + 1)/(D->numBuckets) > originalLF) {
rehash(D);
int incrementHash = hashValue(w, numKeys(D));
}
D->buckets[incrementHash].first->word = w;
D->buckets[incrementHash].first->count++;
}
while (current != nullptr && current->word < w) {
entry* follow = current;
current = current->next;
if (current->word == w) {
current->count++;
}
}
std::cout << "increment working" << std::endl;
D->numIncrements++;
}
entry* dumpAndDestroy(dict* D) {
// UNIMPLEMENTED
entry* es = new entry[D->numEntries];
for (int i = 0; i < D->numEntries; i++) {
es[i].word = "foo";
es[i].count = 0;
}
for (int j = 0; j < D->numBuckets; j++) {
entry* current = D->buckets[j].first;
while (current != nullptr) {
es[j].word = current->word;
es[j].count = current->count;
current = current->next;
}
}
delete [] D->buckets;
std::cout << "dumpAndDestroy working" << std::endl;
return es;
A floating-point exception is usually caused by the code attempting to divide-by-zero (or attempting to modulo-by-zero, which implicitly causes a divide-by-zero). With that in mind, I suspect this line is the locus of your problem:
if ((D->numEntries + 1)/(D->numBuckets) > originalLF) {
Note that if D->numBuckets is equal to zero, this line will do a divide-by-zero. I suggest temporarily inserting a line like like
std::cout << "about to divide by " << D->numBuckets << std::endl;
just before that line, and then re-running your program; that will make the problem apparent, assuming it is the problem. The solution, of course, is to make sure your code doesn't divide-by-zero (i.e. by setting D->numBuckets to the appropriate value, or alternatively by checking to see if it is zero before trying to use it is a divisor)

Solving leaky memory and syntax issues in a simple hash table

I'm implementing a basic hashtable. My logic for the table makes sense (at least to me), but I'm a bit rusty with my C++. My program returns a free memory error when I run it, but I can't seem to figure out where my problem is. I think is has to do with how I call the pointers in the various class functions.
#include <iostream>
#include <unordered_map>
#include <string>
#include <cmath>
#include <exception>
using namespace std;
int hashU(string in/*, int M*/){ //hThe hash function that utilizes a smal pseusorandom number
char *v = new char[in.size() + 1]; //generator to return an number between 0 and 50. (I arbitrarily chose 50 as the upper limit)
copy(in.begin(), in.end(), v); //First the input string is turned into a char* for use in the the function.
v[in.size()] = '\0';
int h, a = 31415, b = 27183;
for(h=0;*v!=0;v++,a=a*b%(49-1))
h = (a*h + *v)%50;
delete[] v; //Delete the char* to prevent leaky memory.
return (h<0) ? (h+50) : h; //Return number
}
struct hashNode{ //The node that will store the key and the values
string key;
float val;
struct hashNode *next;
};
struct hashLink{ //The linked list that will store additional keys and values should there be a collision.
public:
struct hashNode *start; //Start pointer
struct hashNode *tail; //Tail pointer
hashLink(){ //hashLink constructor
start=NULL;
tail=NULL;
}
void push(string key, float val); //Function to push values to stack. Used if there is a collision.
};
void hashLink::push(string key, float val){
struct hashNode *ptr;
ptr = new hashNode;
ptr->key = key;
ptr->val = val;
ptr->next = NULL;
if(start != NULL){
ptr->next = tail;
}
tail = ptr;
return;
}
struct hashTable{ //The "hash table." Creates an array of Linked Lists that are indexed by the values returned by the hash function.
public:
hashLink hash[50];
hashTable(){ //Constructor
}
void emplace(string in, float val); //Function to insert a new key and value into the table.
float fetch(string in); //Function to retrieve a stored key.
};
void hashTable::emplace(string in, float val){
int i = hashU(in); //Retrieve index of key from hash function.
hashNode *trav; //Create node traveler
trav = hash[i].start; //Set the traveler to the start of the desired linked list
while(trav!=hash[i].tail){ //Traverse the list searching to see if the input key already exists
if(trav->key.compare(in)==0){ //If the input key already exists, its associated value is updated, and the function returns.
trav->val = val;
return;
}
else //Travler moves to next node if the input key in not found.
trav = trav->next;
}
hash[i].push(in,val); //If the traveler does not see the input key, the request key must not exist and must be created by pushing the input key and associated value to the stack.
return;
}
float hashTable::fetch(string in){
int i = hashU(in); //Retrieve index of key
hashNode *trav; //Create node traveler and set it to the start of the appropriate list.
trav = hash[i].start;
while(trav!=hash[i].tail){ //Traverse the linked list searching for the requested key.
if(trav->key.compare(in)==0){ //If the the requested key is found, return the associated value.
return trav->val;
}
else
trav = trav->next; //If not found in the current node, move to the next.
}
return false; //If the requested key is not found, return false.
}
int main(){
hashTable vars; //initialize the hash table
float num = 5.23; //create test variable
vars.emplace("KILO",num);
cout<<vars.fetch("KILO")<<endl;
return 0;
}
The problem is that when you call delete[] v, you have advanced v such that it is pointing to the 0 at the end of the string, which is the wrong address to delete.
Also, you're wasting a lot of code unnecessarily copying the string out of where it is already available as a c-string.
unsigned int hashU(string in/*, int M*/) {
const char* v = in.c_str();
unsigned int h, a = 31415, b = 27183;
for(h=0;*v!=0;v++,a=a*b%(49-1))
h = (a*h + *v);
return h % 50;
}
for(h=0;*v!=0;v++,a=a*b%(49-1))
h = (a*h + *v)%50;
delete[] v; //Delete the char* to prevent leaky
You are incrementing v, then deleting an invalid memory location.

How do i use doubly(**) pointer in C++ for general tree data structure?

I have a structure s :
struct s{
int x;
/********************************************************************
* NoOfchild doesn't represent maximum no of children for node s .
* This represent no of children node s have at any given instance .
*********************************************************************/
int NoOfChild;
s **child;
}
I would like to use ** to declare array of pointers dynamically . Node s is added one by one to array .There is any way to achieve this. This tree is going to use for FpGrowth Algorithm.
*(0)
|
_____________________________________________________________
| | |
[* (1) *(2) *(3)]
| | |
_______________ _________________ __________________________
| | | | | | | | | | | | |
[* * * *] [* * *] [* * * * * *]
** represent Node s . I don't want to declare all children of a node at the same time i.e. I would like to add child node one by one when ever it's required . e.g. o is added as root then node 1 is added as child of root if it requires then node 2 is added and so on .[* * * * ] represents children of a node x .
Edit:
People are assuming NoOfChild as maximum no of a child for a given node that's not true ,Here NoOfChild represents how many children a node have at given instance , it may vary according to requirement or time to time .
Explanation :
Initially node 0 is Initialized so it has zero(0) child .
then node 1 is added as child of node 0 so o->NoOfChild = 1 and 1 ->NoOfChild = 0 ;
then node [*] is added as child of node 1 so 0->NoOfChild = 1 and 1 ->NoOfChild = 1 ;
then 2 is added as child of node 0 so 0->NoOfChild = 2 and 1 ->NoOfChild = 1 ;
and so on .
Edit:
Finally used vector<s*> child .
For general tree data structure you can use :-
struct tree{
int element;
struct tree *firstchild;
struct tree *nextsibling;
};
element contains the data to be inserted at the node.
FirstChild contains the firstchild of the node.
nextsibling contains the other child of the same parent node.
Example :-
A
B C D
EF G H
then
A->firstchild = B;
B->nextsibling=C;
C->nextsibling=D;
B->firstchild=E;
E->nextsibling=F;
C->firstchild=g;
D->firstchild=H;
Other values which are not specified can taken as NULL;
The first answer is: you don't. Use a container class like everyone in the comments indicated.
The second answer is: dynamic allocation:
void addNewChild(s *into, s* new_child)
{
s **tmp_s = new s*[into->NoOfChild+1]; ///< will throw if allocation fails
for(int i=0; i<into->NoOfChild; i++) tmps[i] = into->child[i]; ///< use a memcpy instead
tmps[into->NoOfChild++] = new_child;
s **del_s = into->child;
into->child = tmp_s;
delete[] del_s;
}
And finally: don't do this. Use std::vector<s> or std::vector<s*> depending on how many parents a child can have.
A plain c version:
struct node
{
int key;
int noOfChild;
struct node** childrenArray;
};
struct node* newNode(int key, int noOfChild)
{
int i;
struct node* node = (struct node*) malloc(sizeof(struct node));
node->key = key;
node->noOfChild = noOfChild;
node->childrenArray = (struct node**) malloc(noOfChild * sizeof(struct node*));
for(i=0;i<noOfChild;i++)
{
node->childrenArray[i] = NULL;
}
return(node);
}
Since you tagged c++:
#include <vector>
#include <memory>
#include <algorithm>
#include <iostream>
template <class ValueType>
class VariadicTree
{
public:
VariadicTree(const ValueType& value) : m_value(value), m_size(0)
{
}
VariadicTree<ValueType>& addNode(const ValueType& value)
{
m_children.emplace_back(new VariadicTree<ValueType>(value));
++m_size;
return *m_children.back();
}
bool leaf()
{
return std::all_of(m_children.begin(), m_children.end(),
[&](const std::unique_ptr<VariadicTree<ValueType>>& ptr)
{return ptr == nullptr;});
}
size_t size()
{
return m_size;
}
const ValueType& value()
{
return m_value;
}
private:
size_t m_size;
const ValueType& m_value;
std::vector<std::unique_ptr<VariadicTree<ValueType>>> m_children;
};
int main()
{
VariadicTree<int> root(5);
auto& c1 = root.addNode(4);
auto& c2 = root.addNode(6);
auto& c3 = root.addNode(2);
auto& c11 = c1.addNode(2);
std::cout << root.leaf() << "\n";
std::cout << c11.leaf() << "\n";
std::cout << root.size() << "\n";
std::cout << c1.size() << "\n";
std::cout << c11.size() << "\n";
return 0;
}
Ownership can probably be handled more elegantly but this should do for demonstration purposes.
I solved using
struct s{
int x;
vector<s*> child ;
}
This helps more , since all pointers/references is managed by STL .

Array Based Binary Search Tree C++

Im trying to build an array based, "Binary Search Tree" by following the algorithm at:
http://highered.mcgraw-hill.com/olcweb/cgi/pluginpop.cgi?it=gif%3A:600%3A:388%3A%3A/sites/dl/free/0070131511/25327/tree%5Finsert.gif%3A%3ATREE-INSERT
Up until I need to realloacte, my tree resembles:
R
/
A
\
F
\
L
/
B
\
C
\
T
Recursively. However, im notice that I need to get back to the root, "R"....Trying to do that now..
void BST::insert(const data& aData)
{
item *y = NULL; // Algorithm calls for NULL assignment..
item *x = new item();
// How do i Init LEFT and RIGHT?
// With no nested copy ctor for struct item?
if ( items->empty )
{
items = new item();
items->empty = false;
items->theData = aData; // Get the data.
++size;
}
else if ( size == maxSize ) this->reallocate();
else
{
if ( aData < items->theData )
{
x[x->LEFT].theData = aData;
x->LEFT = x->LEFT + 1;
this->insert(items->theData);
}
else if ( items->theData < aData )
{
x[x->RIGHT].theData = aData;
x->RIGHT = x->RIGHT + 1;
this->insert(items->theData);
}
else this->insert(items->theData);
}
Here is my struct for the items array in the private section of the BST class object file:
...
private:
int size; // size of the ever growing/expanding tree :)
int maxSize;
struct item
{
bool empty;
int LEFT;
int RIGHT;
data theData;
};
item *items; // The tree array
item *oldRoot;
int root_index; // index for the root(s)
Hmm. Im also overloading the assignment operator...I dont know what to say. Its hard. I've looked at so many examples and lectures online; as well as algorithms....
The relloaction method as requested:
void BST::reallocate()
{
item *new_array = new item[size*2];
for ( int array_index = 0; array_index < size; array_index++ )
{
new_array[array_index].theData = items[array_index].theData;
new_array[array_index].empty = false;
}
size *= 2;
delete [] items;
items = NULL;
items = new_array;
}
There is well established way to implement binary tree as array, which says that root is sitting at index 0, LEFT of element at index i will be found at 2i + 1 and RIGHT will be found at 2i + 2. You do not need to have LEFT and RIGHT as part of your structure.
it has to be
struct item
{
int index;
data theData;
};
Do NOT store left index and right index in your structure. Also you don't need to keep track of root index. It is always 0. Check out wiki article on binary trees. Search for "Methods for storing binary trees" string.
This implementation allows easy traversal down and up the tree if needed.