HashSet c++ clarification - c++

I'm lost on this topic I have been studying. In my class we are implementing our own hash set class. Thus we have an underlying data structure , like a vector or array , and use a hash function to quickly determine whether an element is in the set it not . That is the part I do not follow. How would a hash function be used for this determination ?

imagine you have an underlying array of size 100, and you can only insert values from 0 to 99.
something like this:
class UselessHashMap
{
public:
void insert(int value){
_arr[hash(i)] = i;
}
private:
int hash(int i) { return i };
std::array<int,100> _arr;
}
now, imagine you want to store more than 100 elements, and you can't have an array that has an infinite (std::numeric_limits::max() )size.
In this case, your hash function will have to return you a value between 0-99, and of course your UselessHashMap class will need to take care of collisions as well, because that function could return the same value for different inputs.

Related

C++ Hash Table - How is collision for unordered_map with custom data type as keys resolved?

I have defined a class called Point which is to be used as a key inside an unordered_map. So, I have provided an operator== function inside the class and I have also provided a template specialization for std::hash. Based on my research, these are the two things I found necessary. The relevant code is as shown:
class Point
{
int x_cord = {0};
int y_cord = {0};
public:
Point()
{
}
Point(int x, int y):x_cord{x}, y_cord{y}
{
}
int x() const
{
return x_cord;
}
int y() const
{
return y_cord;
}
bool operator==(const Point& pt) const
{
return (x_cord == pt.x() && y_cord == pt.y());
}
};
namespace std
{
template<>
class hash<Point>
{
public:
size_t operator()(const Point& pt) const
{
return (std::hash<int>{}(pt.x()) ^ std::hash<int>{}(pt.y()));
}
};
}
// Inside some function
std::unordered_map<Point, bool> visited;
The program compiled and gave the correct results in the cases that I tested. However, I am not convinced if this is enough when using a user-defined class as key. How does the unordered_map know how to resolve collision in this case? Do I need to add anything to resolve collision?
That's a terrible hash function. But it is legal, so your implementation will work.
The rule (and really the only rule) for Hash and Equals is:
if a == b, then std::hash<value_type>(a) == std::hash<value_type>(b).
(It's also important that both Hash and Equals always produce the same value for the same arguments. I used to think that went without saying, but I've seen several SO questions where unordered_map produced unexpected results precisely because one or both of these functions depended on some external value.)
That would be satisfied by a hash function which always returned 42, in which case the map would get pretty slow as it filled up. But other than the speed issue, the code would work.
std::unordered_map uses a chained hash, not an open-addressed hash. All entries with the same hash value are placed in the same bucket, which is a linked list. So low-quality hashes do not distribute entries very well among the buckets.
It's clear that your hash gives {x, y} and {y, x} the same hash value. More seriously, any collection of points in a small rectangle will share the same small number of different hash values, because the high-order bits of the hash values will all be the same.
Knowing that Point is intended to store coordinates within an image, the best hash function here is:
pt.x() + pt.y() * width
where width is the width of the image.
Considering that x is a value in the range [0, width-1], the above hash function produces a unique number for any valid value of pt. No collisions are possible.
Note that this hash value corresponds to the linear index for the point pt if you store the image as a single memory block. That is, given y is also in a limited range ([0, height-1]), all hash values generated are within the range [0, width* height-1], and all integers in that range can be generated. Thus, consider replacing your hash table with a simple array (i.e. an image). An image is the best data structure to map a pixel location to a value.

Counting Sort for a set

Hello i know how couting sort works, how to implement it, But is it possible to implement it on a class who got 3 attributs and need to countSort the whole DisjointSet on a specific attribut.
If so, lets say i have this class:
class myStructure {
public:
int m_id = -1;
myStructure* m_parent = NULL;
int m_sortie = -1;
int m_echeance = -1;
myStructure() {}
myStructure(int id, myStructure* parent, int sortie, int echeance)
: m_id(id), m_parent(parent), m_sortie(sortie), m_echeance(echeance)
{ }
};
How can i implement the counting sort on the m_echance.
Thanks
Surely you can apply counting sort.
It is applicable to any field which can be mapped with integers. In general counting sort should be used if the range of values(in your case m_echeance) is small.
Below is high level approach to do that-
Let's say your objects are stored in array A[]
range of m_echeance is [0,R-1]
Make a count array.
loop through the array A to count frequencies of the objects with different m_echeance values.
something like count[A[i]->m_echeance + 1]++;
Get the cumulative frequencies for the count array.
Copy objects in auxiliary array based on cumulative frequencies.
Copy back objects from auxiliary array to original array.
Hope it helps!

From array to priority queue

I have items in array created from this struct:
struct ks{
int cap;
int val;
};
Array is named items and contains quantity of items.
items = new ks[quantity];
I want to put them in priority queue - which basically means sort them.
This is my compare function:
struct itemsCompare{
bool operator () (const ks &item1, const ks &item2){
if (item1.val/item1.cap > item2.val/item2.cap) return true;
return false;
}
};
How should creating of this queue looks like?
priority_queue <ks, What should I put here?, itemsCompare> comparedItems;
for(int i=0; i<quantity; i++) comparedItems.push(items[i]);
I know, that template requires having vector as container. How should I modify code to make it work? I know that I can put items into vector just before declaration of priority queue, but I'm curious if there's a way to do it just with array.
To create a std::priorty_queue from the array you can use
std::priority_queue <ks, std::vector<ks>, itemsCompare> comparedItems(items, items + quantity);
Answering the question as asked:
std::priority_queue <ks, std::vector<ks>, itemsCompare> comparedItems;
However, the question has some issues not directly asked. First, it sports division on uncontrolled substances :). What is going to happen if you divide by 0?
Second. You divide integer by integer. This result is always integer, and somehow I doubt this is what you want.

implementing hash table using vector c++

I've tried to implement hash table using vector. My table size will be defined in the constructor, for example lets say table size is 31, to create hash table I do followings:
vector<string> entires; // it is filled with entries that I'll put into hash table;
vector<string> hashtable;
hashtable.resize(31);
for(int i=0;i<entries.size();i++){
int index=hashFunction(entries[i]);
// now I need to know whether I've already put an entry into hashtable[index] or not
}
Is there anyone to help me how could I do that ?
Each cell in your hashtable comes with a bit of extra packaging.
If your hash allows deletions you need a state such that a cell can be marked as "deleted". This enables your search to continue looking even if it encounters this cell which has no actual value in it.
So a cell can have 3 states, occupied, empty and deleted.
You might also wish to store the hash-value in the cell. This is useful when you come to resize the table as you don't need to rehash all the entries.
In addition it can be an optimal first-comparison because comparing two numbers is likely to be quicker than comparing two objects.
These are considerations if this is an exercise, or if you find that std::unordered_map / std::unordered_set is not adequate for your purpose or if those are not available to you.
For practical purpose, at least try using those first.
It is possible to have several items for the same hash value
You just need to define your hash-table like this:
vector<vector<string>> hashtable;
hashtable.resize(32); //0-31
for(int i=0;i<entries.size();i++){
int index=hashFunction(entries[i]);
hashtable[index].push_back(entries[i]);
}
the simple implementation of hash table uses vector of pointers to actual entries:
class hash_map {
public:
iterator find(const key_type& key);
//...
private:
struct Entry { // representation
key_type key;
mepped_type val;
Entry* next; // hash overflow link
};
vector<Entry> v; // the actual entries
vector<Entry*> b; // the hash table, pointers into v
};
to find a value operator uses a hash function to find an index in the hash table for the key:
mapped_type& hash_map::operator[](const key_type& k) {
size_type i = hash(k)%b.size(); // hash
for (Entry* p=b[i];p;p=p->next) // search among entries hashed to i
if (eq(k,p->key)) { // found
if (p->erased) { // re-insert
p->erased=false;
no_of_erased--;
return p->val=default_value;
}
// not found, resize if needed
return operator[](k);
v.push_back(Entry(k,default_value,b[i])); // add Entry
b[i]=&v.back(); // point to new element
return b[i]->val;
}

Order a container by member with STL

Suppose I have some data stored in a container of unique_ptrs:
struct MyData {
int id; // a unique id for this particular instance
data some_data; // arbitrary additional data
};
// ...
std::vector<std::unique_ptr<MyData>> my_data_vec;
The ordering of my_data_vec is important. Suppose now I have another vector of IDs of MyDatas:
std::vector<int> my_data_ids;
I now want to rearrange my_data_vec such that the elements are in the sequence specified by my_data_ids. (Don't forget moving a unique_ptr requires move-semantics with std::move().)
What's the most algorithmically efficient way to achieve this, and do any of the STL algorithms lend themselves well to achieving this? I can't see that std::sort would be any help.
Edit: I can use O(n) memory space (not too worried about memory), but the IDs are arbitrary (in my specific case they are actually randomly generated).
Create a map that maps ids to their index in my_data_ids.
Create a function object that compares std::unique_ptr<MyData> based on their ID's index in that map.
Use std::sort to sort the my_data_vec using that function object.
Here's a sketch of this:
// Beware, brain-compiled code ahead!
typedef std::vector<int> my_data_ids_type;
typedef std::map<int,my_data_ids_type::size_type> my_data_ids_map_type;
class my_id_comparator : public std::binary_function< bool
, std::unique_ptr<MyData>
, std::unique_ptr<MyData> > {
public:
my_id_comparator(const my_data_ids_map_type& my_data_ids_map)
: my_data_ids_map_(my_data_ids_map) {}
bool operator()( const std::unique_ptr<MyData>& lhs
, const std::unique_ptr<MyData>& rhs ) const
{
my_data_ids_map_type::const_iterator it_lhs = my_data_ids_map_.find(lhs.id);
my_data_ids_map_type::const_iterator it_rhs = my_data_ids_map_.find(rhs.id);
if( it_lhs == my_data_ids_map_.end() || it_rhs == my_data_ids_map_.end() )
throw "dammit!"; // whatever
return it_lhs->second < it_rhs->second;
}
private
my_data_ids_map_type& my_data_ids_map_;
};
//...
my_data_ids_map_type my_data_ids_map;
// ...
// populate my_data_ids_map with the IDs and their indexes from my_data_ids
// ...
std::sort( my_data_vec.begin(), my_data_vec.end(), my_id_comparator(my_data_ids_map) );
If memory is scarce, but time doesn't matter, you could do away with the map and search the IDs in the my_data_ids vector for each comparison. However, you would have to be really desperate for memory to do that, since two linearly complex operations per comparison are going to be quite expensive.
Why don't you try moving the data into a STL Set ? you need only to implement the comparison function, and you will end up with a perfectly ordered set of data very fast.
Why don't you just use a map<int, unique_ptr<MyData>> (or multimap)?