Multiple Hash Tables for the Word Count Project

Multiple Hash Tables for the Word Count Project - c++

I already wrote a working project but my problem is, it is way slower than what I aimed in the first place so I have some ideas about how to improve it but I don't know how to implement these ideas or should I actually implement these ideas in the first place?
The topic of my project is, reading a CSV (Excel) file full of tweets and counting every single word of it, then displaying most used words.
(Every row of the Excel there is information about the tweet and the tweet itself, I should only care about the tweet)
Instead of sharing the whole code I will just simply wrote what I did so far and only ask about the part I am struggling.
First of all, I want to apologize because it will be a long question.
Important note: Only thing I should focus is speed, storage or size is not a problem.
All the steps:
Read a new line from Excel file.
Find the "tweet" part from the whole line and store it as a string.
Split the tweet string into words and store it in the array.
For every word stored in an array, calculate the ASCII value of the word.
(For finding ascii value of the word I simply sum the ascii value of each letter it has)
Put the word in Hash Table with the key of ASCII value.
(Example: Word "hello" has ascii value of 104+101+108+108+111 = 532, so it stored with key 532 in the hast table)
In Hash Table only the word "as a string" and the key value "as an int" is stored and count of the words (how much the same word is used) is stored in a separated array.
I will share the Insert function (for inserting something to the Hashtable) because I believe it will be confusing if I will try to explain this part without a code.
void Insert(int key, string value) //Key (where we want to add), Value (what we want to add)
{
if (key < 0) key = 0; //If key is somehow less than 0, for not taking any error key become 0.
if (table[key] != NULL) //If there is already something in hast table
{
if (table[key]->value == value) //If existing value is same as the value we want to add
{
countArray[key][0]++;
}
else //If value is different,
{
Insert(key + 100, value); //Call this function again with the key 100 more than before.
}
}
else //There is nothing saved in this place so save this value
{
table[key] = new HashEntry(key, value);
countArray[key][1] = key;
countArray[key][0]++;
}
}
So "Insert" function has three-part.
Add the value to hash table if hast table with the given key is empty.
If hast table with the given key is not empty that means we already put a word with this ascii value.
Because different words can have exact same ascii value.
The program first checks if this is the same word.
If it is, count increase (In the count array).
If not, Insert function is again called with the key value of (same key value + 100) until empty space or same value is found.
After whole lines are read and every word is stored in Hashtable ->
Sort the Count array
Print the first 10 element
This is the end of the program, so what is the problem?
Now my biggest problem is I am reading a very huge CSV file with thousands of rows, so every unnecessary thing increases the time noticeably.
My second problem is there is a lot of values with the same ASCII value, my method of checking hundred more than normal ascii value methods work, but ? for finding the empty space or the same word, Insert function call itself hundred times per word.
(Which caused the most problem).
So I thought about using multiple Hashtables.
For example, I can check the first letter of the word and if it is
Between A and E, store in the first hash table
Between F and J, store in the second hash table
...
Between V and Z, store in the last hash table.
Important note again: Only thing I should focus is speed, storage or size is not a problem.
So conflicts should minimize mostly in this way.
I can even create an absurd amount of hash tables and for every different letter, I can use a different hash table.
But I am not sure if it is the logical thing to do or maybe there are much simpler methods I can use for this.
If it is okay to use multiple hash tables, instead of creating hash tables, one by one, is it possible to create an array which stores a Hashtable in every location?
(Same as Array of Arrays but this time array store Hashtables)
If it is possible and logical, can someone show how to do it?
This is the hash table I have:
class HashEntry
{
public:
int key;
string value;
HashEntry(int key, string value)
{
this->key = key;
this->value = value;
}
};
class HashMap
{
private:
HashEntry * *table;
public:
HashMap()
{
table = new HashEntry *[TABLE_SIZE];
for (int i = 0; i < TABLE_SIZE; i++)
{
table[i] = NULL;
}
}
//Functions
}
I am very sorry for such a long question I asked and I am again very sorry if I couldn't explain every part clear enough, English is not my mother language.
Also one last note, I am doing this for a school project so I shouldn't use any third party software or include any different libraries because it is not allowed.

You are using a very bad hash function (adding all characters), that's why you get so many collisions and your Insert method calls itself so many times as a result.
For a detailed overview of different hash functions see the answer to this question. I suggest you try DJB2 or FNV-1a (which is used in some implementations of std::unordered_map).
You should also use more localized "probes" for the empty place to improve cache-locality and use a loop instead of recursion in your Insert method.
But first I suggest you tweak your HashEntry a little:
class HashEntry
{
public:
string key; // the word is actually a key, no need to store hash value
size_t value; // the word count is the value.
HashEntry(string key)
: key(std::move(key)), value(1) // move the string to avoid unnecessary copying
{ }
};
Then let's try to use a better hash function:
// DJB2 hash-function
size_t Hash(const string &key)
{
size_t hash = 5381;
for (auto &&c : key)
hash = ((hash << 5) + hash) + c;
return hash;
}
Then rewrite the Insert function:
void Insert(string key)
{
size_t index = Hash(key) % TABLE_SIZE;
while (table[index] != nullptr) {
if (table[index]->key == key) {
++table[index]->value;
return;
}
++index;
if (index == TABLE_SIZE) // "wrap around" if we've reached the end of the hash table
index = 0;
}
table[index] = new HashEntry(std::move(key));
}
To find the hash table entry by key you can use a similar approach:
HashEntry *Find(const string &key)
{
size_t index = Hash(key) % TABLE_SIZE;
while (table[index] != nullptr) {
if (table[index]->key == key) {
return table[index];
}
++index;
if (index == TABLE_SIZE)
index = 0;
}
return nullptr;
}

Related

Map arbitrary set of symbols to consecutive integers

Given a set of byte-representable symbols (e.g. characters, short strings, etc), is there a way to find a mapping from that set to a set of consecutive natural numbers that includes 0? For example, suppose there is the following unordered set of characters (not necessarily in any particular character set).
'a', '(', '🍌'
Is there a way to find a "hash" function of sorts that would map each symbol (e.g. by means of its byte representation) uniquely to one of the integers 0, 1, and 2, in any order? For example, 'a'=0, '('=1, '🍌'=2 is just as valid as 'a'=2, '('=0, '🍌'=1.
Why?
Because I am developing something for a memory-constrained (think on the order of kiB) embedded target that has a lot of fixed reverse-lookup tables, so something like std::unordered_map would be out of the question. The ETL equivalent etl::unordered_map would be getting there, but there's quite a bit of size overhead, and collisions can happen, so lookup timings could differ. A sparse lookup table would work, where the byte representation of the symbol would be the index, but that would be a lot of wasted space, and there are many different tables.
There's also the chance that the "hash" function may end up costing more than the above alternatives, but my curiosity is a strong driving force. Also, although both C and C++ are tagged, this question is specific to neither of them. I just happen to be using C/C++.

The normal way to do things like this, for example when coding a font for a custom display, is to map everything to a sorted, read-only look-up table array with indices 0 to 127 or 0 to 255. Where symbols corresponding to the old ASCII table are mapped to their respective index. And other things like your banana symbol could be mapped beyond index 127.
So when you use FONT [97] or FONT ['a'], you end up with the symbol corresponding to 'a'. That way you can translate from ASCII strings to your custom table, or from your source editor font to the custom table.
Using any other data type such as a hash table sounds like muddy program design to me. Embedded systems should by their nature be deterministic, so overly complex data structures don't make sense most of the time. If you for some reason unknown must have the data unordered, then you should describe the reason why in detail, or otherwise you are surely asking an "XY question".

Yes, there is such a map. Just put all of them in an array of strings... then sort it, and make a function that searchs for the word in the array and returns the index in the array.
static char *strings[] = {
"word1", "word2", "hello", "world", NULL, /* to end the array */
};
int word_number(const char *word)
{
for(int i = 0; strings[i] != NULL; i++) {
if (strcmp(strings[i], word) == 0)
return i;
}
return -1; /* not found */
}
The cost of this (in space terms) is very low (considering that the compiler assigning pointers can optimice string allocation based on common suffixes (making a string overlap others if it is a common suffix of them) and if you give the compiler an already sorted array of literals, you can use bsearch() algorithm (which is O(log(n)) of the number of elements in the table)
static char *strings[] = { /* this time sorted */
"Hello",
"rella", /* this and the next are merged into positions on the same literal below
* This can be controlled with a compiler option. */
"umbrella",
"world"
};
const int strings_len = 4;
int string_cmp(const void *_s1, const void *_s2)
{
const char *s1 = _s1, *s2 = _s2;
return strcmp(s1, s2);
}
int word_number(const char *word)
{
char *result = bsearch(strings, 4, sizeof *strings, string_cmp);
return result ? result - strings : -1;
}
If you want a function that gives you a number for any string, and maps biyectively that string with that number... It's even easier. First start with zero. For each byte in the string, just multiply your number by 256 (the number of byte values) and add the next byte to the result, then return back that result once you have done this operation with every char in the string. You will get a different number for each possible string, covering all possible strings and all possible numbers. But I think this is not what you want.
super_long_integer char2number(const unsigned char *s)
{
super_long_integer result = 0;
int c;
while ((c = *s++) != 0) {
result *= 256;
result += c;
}
return result;
}
But that integer must be capable of supporting numbers in the range [0...256^(maximum lenght of accepted string)] which is a very large number.

How do I implement linear probing in C++?

I'm new to Hash Maps and I have an assignment due tomorrow. I implemented everything and it all worked out fine, except for when I get a collision. I cant quite understand the idea of linear probing, I did try to implement it based on what I understood, but the program stopped working for table size < 157, for some reason.
void hashEntry(string key, string value, entry HashTable[], int p)
{
key_de = key;
val_en = value;
for (int i = 0; i < sizeof(HashTable); i++)
{
HashTable[Hash(key, p) + i].key_de = value;
}
}
I thought that by adding a number each time to the hash function, 2 buckets would never get the same Hash index. But that didn't work.

A hash table with linear probing requires you
Initiate a linear search starting at the hashed-to location for an empty slot in which to store your key+value.
If the slot encountered is empty, store your key+value; you're done.
Otherwise, if they keys match, replace the value; you're done.
Otherwise, move to the next slot, hunting for any empty or key-matching slot, at which point (2) or (3) transpires.
To prevent overrun, the loop doing all of this wraps modulo the table size.
If you run all the way back to the original hashed-to location and still have no empty slot or matching-key overwrite, your table is completely populated (100% load) and you cannot insert more key+value pairs.
That's it. In practice it looks something like this:
bool hashEntry(string key, string value, entry HashTable[], int p)
{
bool inserted = false;
int hval = Hash(key, p);
for (int i = 0; !inserted && i < p; i++)
{
if (HashTable[(hval + i) % p].key_de.empty())
{
HashTable[(hval + i) % p].key_de = key;
}
if (HashTable[(hval + i) % p].key_de == key)
{
HashTable[(hval + i) % p].val_en = value;
inserted = true;
}
}
return inserted;
}
Note that expanding the table in a linear-probing hash algorithm is tedious. I suspect that will be forthcoming in your studies. Eventually you need to track how many slots are taken so when the table exceeds a specified load factor (say, 80%), you expand the table, rehashing all entries on the new p size, which will change where they all end up residing.
Anyway, hope it makes sense.

hash table for strings in c++

i've done in the past a small exercise about hashtable but the user was giving array size and also the struct was like this (so the user was giving a number and a word each time as input)
struct data
{
int key;
char c[20];
};
So it was quite simple since i knew the array size and also the user was saying how many items he will be give as input. The way i did it was
Hashing the keys the user gave me
find the position array[hashed(key)] in the array
if it was empty i would put the data there
if it wasn't i would put it in the next free position i would find.
But now i have to make inverted index and i am reasearching so i can make a hashtable for it. So the words will be collected from around 30 txts and they will be so many.
So in this case how long should the array be? How can i hash words? Should i use hasing with open adressing or with chaining. The exercise sais that we could use a hash table as it is if we find it online. but i prefer to understand and create it by my own. Any clues will help me :)
In this exerice(inverted index using hash table) the structs looks like this.
data type is the type of the hash table i will create.
struct posting
{
string word;
posting *next
}
struct data
{
string word;
posting *ptrpostings;
data *next
};

Hashing can be done anyway you choose. Suppose that the string is ABC. You can employ hashing as A=1, B=2, C=3, Hash = 1+2+3/(length = 3) = 2. But, this is very primitive.
The size of the array will depend on the hash algorithm that you deploy, but it is better to choose an algorithm that returns a definite length hash for every string. For example, if you chose to go with SHA1, you can safely allocate 40 bytes per hash. Refer Storing SHA1 hash values in MySQL Read up on the algorithm http://en.wikipedia.org/wiki/SHA-1. I believe that it can be safely used.
On the other hand, if it just for a simple exercise, you can also use MD5 hash. I wouldn't recommend using it in practical purposes as its rainbow tables are available easily :)
---------EDIT-------
You can try to implement like this ::
#include <iostream>
#include <string>
#include <stdlib.h>
#include <stdio.h>
#define MAX_LEN 30
using namespace std;
typedef struct
{
string name; // for the filename
... change this to your specification
}hashd;
hashd hashArray[MAX_LEN]; // tentative
int returnHash(string s)
{
// A simple hashing, no collision handled
int sum=0,index=0;
for(string::size_type i=0; i < s.length(); i++)
{
sum += s[i];
}
index = sum % MAX_LEN;
return index;
}
int main()
{
string fileName;
int index;
cout << "Enter filename ::\t" ;
cin >> fileName;
cout << "Enter filename is ::\t" + fileName << "\n";
index = returnHash(fileName);
cout << "Generated index is ::\t" << index << "\n";
hashArray[index].name = fileName;
cout << "Filename in array ::\t" <<hashArray[index].name ;
return 0;
}
Then, to achieve O(1), anytime you want to fetch the filename's contents, just run the returnHash(filename) function. It will directly return the index of the array :)

A hash table can be implemented as a simple 2-dimensional array. The question is how to compute the unique key for each item to be stored. Some things have keys built into the data, and for other things you'll have to compute one: MD5 as suggested above is probably just fine for your needs.
The next problem you need to solve is how to lay out, or size, your hash table. That's something that you'll ultimately need to tune to your own needs through some testing. You might start by setting up the 1st dimension of your array with 255 entries -- one for each combination of the first 2 digits of the MD5 hash. Whenever you have a collision, you add another entry along the 2nd dimension of your array at that 1st dimension index. This means that you'll statically define a 1-dimensional array while dynamically allocating the 2nd dimension entries as needed. Hopefully that makes as much sense to you as it does to me.
When doing lookups, you can immediately find the right 1st dimension index using the 1st 2-digits of the MD5 hash. Then a relativley short linear search along the 2nd dimension will quickly bring you to the item you seek.
You might find from experimentation that it's more efficient to use a larger 1st dimension (use the fisrt 3 digits of the MD5 hash) if your data set is sufficiently large. Depending on the size of texts involved and the distribution of their use of the lexicon, your results will probably dictate some of your architecture.
On the other hand, you might just start small and build in some intelligence to automatically resize and layout your table. If your table gets too long in either direction, performance will suffer.

Optimizating my code simulating a database

I have been working on a program, simulating a small database where I could make queries, and after writing the code, I have executed it, but the performance is quite bad. It works really slow. I have tried to improve it, but I started with C++ on my own a few months ago, so my knowledge is still very low. So I would like to find a solution to improve the performance.
Let me explain how my code works. Here I have atached a summarized example of how my code works.
First of all I have a .txt file simulating a database table with random strings separated with "|". Here you have an example of table (with 5 rows and 5 columns).
Table.txt
0|42sKuG^uM|24465\lHXP|2996fQo\kN|293cvByiV
1|14772cjZ`SN|28704HxDYjzC|6869xXj\nIe|27530EymcTU
2|9041ByZM]I|24371fZKbNk|24085cLKeIW|16945TuuU\Nc
3|16542M[Uz\|13978qMdbyF|6271ait^h|13291_rBZS
4|4032aFqa|13967r^\\`T|27754k]dOTdh|24947]v_uzg
This information in a .txt file is read by my program and stored in the computer memory. Then, when making queries, I will access to this information stored in the computer memory. Loading the data in the computer memory can be a slow process, but accessing to the data later will be faster, what really matters me.
Here you have the part of the code that read this information from a file and store in the computer.
Code that reads data from the Table.txt file and store it in the computer memory
string ruta_base("C:\\a\\Table.txt"); // Folder where my "Table.txt" is found
string temp; // Variable where every row from the Table.txt file will be firstly stored
vector<string> buffer; // Variable where every different row will be stored after separating the different elements by tokens.
vector<ElementSet> RowsCols; // Variable with a class that I have created, that simulated a vector and every vector element is a row of my table
ifstream ifs(ruta_base.c_str());
while(getline( ifs, temp )) // We will read and store line per line until the end of the ".txt" file.
{
size_t tokenPosition = temp.find("|"); // When we find the simbol "|" we will identify different element. So we separate the string temp into tokens that will be stored in vector<string> buffer
while (tokenPosition != string::npos)
{
string element;
tokenPosition = temp.find("|");
element = temp.substr(0, tokenPosition);
buffer.push_back(element);
temp.erase(0, tokenPosition+1);
}
ElementSet ss(0,buffer);
buffer.clear();
RowsCols.push_back(ss); // We store all the elements of every row (stores as vector<string> buffer) in a different position in "RowsCols"
}
vector<Table> TablesDescriptor;
Table TablesStorage(RowsCols);
TablesDescriptor.push_back(TablesStorage);
DataBase database(1, TablesDescriptor);
After this, comes the IMPORTANT PART. Let's suppose that I want to make a query, and I ask for input. Let's say that my query is row "n", and also the consecutive tuples "numTuples", and the columns "y". (We must say that the number of columns is defined by a decimal number "y", that will be transformed into binary and will show us the columns to be queried, for example, if I ask for columns 54 (00110110 in binary) I will ask for columns 2, 3, 5 and 6). Then I access to the computer memory to the required information and store it in a vector shownVector. Here I show you the part of this code.
Code that access to the required information upon my input
int n, numTuples;
unsigned long long int y;
clock_t t1, t2;
cout<< "Write the ID of the row you want to get more information: " ;
cin>>n; // We get the row to be represented -> "n"
cout<< "Write the number of followed tuples to be queried: " ;
cin>>numTuples; // We get the number of followed tuples to be queried-> "numTuples"
cout<<"Write the ID of the 'columns' you want to get more information: ";
cin>>y; // We get the "columns" to be represented ' "y"
unsigned int r; // Auxiliar variable for the columns path
int t=0; // Auxiliar variable for the tuples path
int idTable;
vector<int> columnsToBeQueried; // Here we will store the columns to be queried get from the bitset<500> binarynumber, after comparing with a mask
vector<string> shownVector; // Vector to store the final information from the query
bitset<500> mask;
mask=0x1;
t1=clock(); // Start of the query time
bitset<500> binaryNumber = Utilities().getDecToBin(y); // We get the columns -> change number from decimal to binary. Max number of columns: 5000
// We see which columns will be queried
for(r=0;r<binaryNumber.size();r++) //
{
if(binaryNumber.test(r) & mask.test(r)) // if both of them are bit "1"
{
columnsToBeQueried.push_back(r);
}
mask=mask<<1;
}
do
{
for(int z=0;z<columnsToBeQueried.size();z++)
{
int i;
i=columnsToBeQueried.at(z);
vector<int> colTab;
colTab.push_back(1); // Don't really worry about this
//idTable = colTab.at(i); // We identify in which table (with the id) is column_i
// In this simple example we only have one table, so don't worry about this
const Table& selectedTable = database.getPointer().at(0); // It simmulates a vector with pointers to different tables that compose the database, but our example database only have one table, so don't worry ElementSet selectedElementSet;
ElementSet selectedElementSet;
selectedElementSet=selectedTable.getRowsCols().at(n);
shownVector.push_back(selectedElementSet.getElements().at(i)); // We save in the vector shownVector the element "i" of the row "n"
}
n=n+1;
t++;
}while(t<numTuples);
t2=clock(); // End of the query time
float diff ((float)t2-(float)t1);
float microseconds = diff / CLOCKS_PER_SEC*1000000;
cout<<"The query time is: "<<microseconds<<" microseconds."<<endl;
Class definitions
Here I attached some of the class definitions so that you can compile the code, and understand better how it works:
class ElementSet
{
private:
int id;
vector<string> elements;
public:
ElementSet();
ElementSet(int, vector<string>);
const int& getId();
void setId(int);
const vector<string>& getElements();
void setElements(vector<string>);
};
class Table
{
private:
vector<ElementSet> RowsCols;
public:
Table();
Table(vector<ElementSet>);
const vector<ElementSet>& getRowsCols();
void setRowsCols(vector<ElementSet>);
};
class DataBase
{
private:
int id;
vector<Table> pointer;
public:
DataBase();
DataBase(int, vector<Table>);
const int& getId();
void setId(int);
const vector<Table>& getPointer();
void setPointer(vector<Table>);
};
class Utilities
{
public:
Utilities();
static bitset<500> getDecToBin(unsigned long long int);
};
So the problem that I get is that my query time is very different depending on the table size (it has nothing to do a table with 100 rows and 100 columns, and a table with 10000 rows and 1000 columns). This makes that my code performance is very low for big tables, what really matters me... Do you have any idea how could I optimizate my code????
Thank you very much for all your help!!! :)

Whenever you have performance problems, the first thing you want to do is to profile your code. Here is a list of free tools that can do that on windows, and here for linux. Profile your code, identify the bottlenecks, and then come back and ask a specific question.
Also, like I said in my comment, can't you just use SQLite? It supports in-memory databases, making it suitable for testing, and it is lightweight and fast.

One obvious issue is that your get-functions return vectors by value. Do you need to have a fresh copy each time? Probably not.
If you try to return a const reference instead, you can avoid a lot of copies:
const vector<Table>& getPointer();
and similar for the nested get's.

I have not done the job, but you may analyse the complexity of your algorithm.
The reference says that access an item is in constant time, but when you create loops, the complexity of your program increases:
for (i=0;i<1000; ++i) // O(i)
for (j=0;j<1000; ++j) // O(j)
myAction(); // Constant in your case
The program complexity is O(i*j), so how big may be i an j?
What if myAction is not constant in time?

No need to reinvent the wheel again, use FirebirdSQL embedded database instead. That combined with IBPP C++ interface gives you a great foundation for any future needs.
http://www.firebirdsql.org/
http://www.ibpp.org/

Though I advise you to please use a profiler to find out which parts of your code are worth optimizing, here is how I would write your program:
Read the entire text file into one string (or better, memory-map the file.) Scan the string once to find all | and \n (newline) characters. The result of this scan is an array of byte offsets into the string.
When the user then queries item M of row N, retrieve it with code something like this:
char* begin = text+offset[N*items+M]+1;
char* end = text+offset[N*items+M+1];
If you know the number of records and fields before the data is read, the array of byte offsets can be a std::vector. If you don't know and must infer from the data, it should be a std::deque. This is to minimize costly memory allocation and deallocation, which I imagine is the bottleneck in such a program.

Pointer comparision issue

I'm having a problem with a pointer and can't get around it..
In a HashTable implementation, I have a list of ordered nodes in each bucket.The problem I have It's in the insert function, in the comparision to see if the next node is greater than the current node(in order to inserted in that position if it is) and keep the order.
You might find this hash implementation strange, but I need to be able to do tons of lookups(but sometimes also very few) and count the number of repetitions if It's already inserted (so I need fasts lookups, thus the Hash , I've thought about self-balanced trees as AVL or R-B trees, but I don't know them so I went with the solution I knew how to implement...are they faster for this type of problem?),but I also need to retrieve them by order when I've finished.
Before I had a simple list and I'd retrieve the array, then do a QuickSort, but I think I might be able to improve things by keeping the lists ordered.
What I have to map It's a 27 bit unsigned int(most exactly 3 9 bits numbers, but I convert them to a 27 bit number doing (Sr << 18 | Sg << 9 | Sb) making at the same time their value the hash_value. If you know a good function to map that 27 bit int to an 12-13-14 bit table let me know, I currently just do the typical mod prime solution.
This is my hash_node struct:
class hash_node {
public:
unsigned int hash_value;
int repetitions;
hash_node *next;
hash_node( unsigned int hash_val,
hash_node *nxt);
~hash_node();
};
And this is the source of the problem
void hash_table::insert(unsigned int hash_value) {
unsigned int p = hash_value % tableSize;
if (table[p]!=0) { //The bucket has some elements already
hash_node *pred; //node to keep the last valid position on the list
for (hash_node *aux=table[p]; aux!=0; aux=aux->next) {
pred = aux; //last valid position
if (aux->hash_value == hash_value ) {
//It's already inserted, so we increment it repetition counter
aux->repetitions++;
} else if (hash_value < (aux->next->hash_value) ) { //The problem
//If the next one is greater than the one to insert, we
//create a node in the middle of both.
aux->next = new hash_node(hash_value,aux->next);
colisions++;
numElem++;
}
}//We have arrive to the end od the list without luck, so we insert it after
//the last valid position
ant->next = new hash_node(hash_value,0);
colisions++;
numElem++;
}else { //bucket it's empty, insert it right away.
table[p] = new hash_node(hash_value, 0);
numElem++;
}
}
This is what gdb shows:
Program received signal SIGSEGV, Segmentation fault.
0x08050b4b in hash_table::insert (this=0x806a310, hash_value=3163181) at ht.cc:132
132 } else if (hash_value < (aux->next->hash_value) ) {
Which effectively indicates I'm comparing a memory adress with a value, right?
Hope It was clear. Thanks again!

aux->next->hash_value
There's no check whether "next" is NULL.

aux->next might be NULL at that point? I can't see where you have checked whether aux->next is NULL.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js