I've tried to implement hash table using vector. My table size will be defined in the constructor, for example lets say table size is 31, to create hash table I do followings:
vector<string> entires; // it is filled with entries that I'll put into hash table;
vector<string> hashtable;
hashtable.resize(31);
for(int i=0;i<entries.size();i++){
int index=hashFunction(entries[i]);
// now I need to know whether I've already put an entry into hashtable[index] or not
}
Is there anyone to help me how could I do that ?
Each cell in your hashtable comes with a bit of extra packaging.
If your hash allows deletions you need a state such that a cell can be marked as "deleted". This enables your search to continue looking even if it encounters this cell which has no actual value in it.
So a cell can have 3 states, occupied, empty and deleted.
You might also wish to store the hash-value in the cell. This is useful when you come to resize the table as you don't need to rehash all the entries.
In addition it can be an optimal first-comparison because comparing two numbers is likely to be quicker than comparing two objects.
These are considerations if this is an exercise, or if you find that std::unordered_map / std::unordered_set is not adequate for your purpose or if those are not available to you.
For practical purpose, at least try using those first.
It is possible to have several items for the same hash value
You just need to define your hash-table like this:
vector<vector<string>> hashtable;
hashtable.resize(32); //0-31
for(int i=0;i<entries.size();i++){
int index=hashFunction(entries[i]);
hashtable[index].push_back(entries[i]);
}
the simple implementation of hash table uses vector of pointers to actual entries:
class hash_map {
public:
iterator find(const key_type& key);
//...
private:
struct Entry { // representation
key_type key;
mepped_type val;
Entry* next; // hash overflow link
};
vector<Entry> v; // the actual entries
vector<Entry*> b; // the hash table, pointers into v
};
to find a value operator uses a hash function to find an index in the hash table for the key:
mapped_type& hash_map::operator[](const key_type& k) {
size_type i = hash(k)%b.size(); // hash
for (Entry* p=b[i];p;p=p->next) // search among entries hashed to i
if (eq(k,p->key)) { // found
if (p->erased) { // re-insert
p->erased=false;
no_of_erased--;
return p->val=default_value;
}
// not found, resize if needed
return operator[](k);
v.push_back(Entry(k,default_value,b[i])); // add Entry
b[i]=&v.back(); // point to new element
return b[i]->val;
}
Related
I need to use a data structure which supports constant time lookups on average. I think that using a std::unordered_map is a good way to do it. My data is a "collection" of numbers.
|115|190|380|265|
These numbers do not have to be in a particular order. I need to have about O(1) time to determine whether or not a given number exists in this data structure. I have the idea of using a std::unordered_map, which is actually a hash table (am I correct?). So the numbers will be keys, and then I would just have dummy values.
So basically I first need to determine if the key matching a given number exists in the data structure, and I run some algorithm based on that condition. And independently of that condition I also want to update a particular key. Let's say 190, and I want to add 20 to it, so now the key would be 210.
And now the data structure would look like this:
|115|210|380|265|
The reason I want to do this is because I have a recursive algorithm which traverses a binary search tree. Each node has an int value, and two pointers to the left and right nodes. When a leaf node is reached, I need to create a new field in the "hash table" data structure holding the current_node->value. Then when I go back up the tree in the recursion, I need to successively add each of the node's value to the previous sum stored in the key. And the reason why my data structure (which I suggest should be a std::unordered_map) has multiple fields of numbers is because each one of them represents a unique path going from a leaf node up the tree to a certain node in the middle. I check if the sum of all the values of the nodes on the path from the leaf going up to a given node is equal to the value of that node. So basically into each key is added the current value of the node, storing the sum of all the nodes on that path. I need to scan that data structure to determine if any one of the fields or keys is equal to the value of the current node. Also I want to insert new values into the data structure in near constant time. This is for competitive programming, and I would hesitate to use a std::vector because looking up an element and inserting an element takes linear time, I think. That would screw up my time complexity. Maybe I should use another data structure other than a std::unordered_map?
You can use unordered_map::erase and unordered_map::insert to update a key. The average time complexity is O(1)(BTW, the worst is O(n)). If you are using C++17, you can also use unordered_map::extract to update a key. The time complexity is the same.
However, since you only need a set of number, I think unordered_set is more suitable for your algorithm.
#include <unordered_map>
#include <iostream>
int main()
{
std::unordered_map<int, int> m;
m[42]; // add
m[69]; // some
m[90]; // keys
int value = 90; // value to check for
auto it = m.find(90);
if (it != m.end()) {
m.erase(it); // remove it
m[value + 20]; // add an altered value
}
}
#include <unordered_map>
#include <string>
int main() {
// replace same key but other instance
std::unordered_map<std::string, int> eden;
std::string k1("existed key");
std::string k2("existed key");
const auto &[it, first] = eden.try_emplace(k1, 1);
if (!first) {
eden.erase(it);
eden.emplace_hint(it, k2, 123);
}
}
Since C++17, you can also use its extract function as follows:
std::unordered_map<int, int> map = make_map();
auto node = map.extract(some_key);
node.key() = new_key;
map.insert(std::move(node));
What is an idiomatic way in C++ to have an object store that can be searched with respect to two keys? Essentially what I would like is to store things of type A in a binary search tree (BST) with the BST constructed using the order relation on A.key. However, each A also has a unique A.otherval and I essentially need to delete keys based on this value.
In C I would typically just have a BST with parent pointers and a hash table based with the other values as a key storing pointers to nodes of the BST. I can delete keys through the hash table by getting the node and calling tree delete on that node.
I'm looking for how to do this correctly using STL containers.
If I got the question correctly, all you need is a map in a map, so
std::map<first_key_type, std::map<second_key_type, value_type>> map;
map[key1][key2] = something;
Edit:
I assume that all the values that have the same first key are the same, and the second key is only used as an additional search/remove criteria. In that case, to get the value by the first key only you can use something like
map.at(key).cbegin()->second;
I would recommend two maps, one to map the key to the instance of A (the primary map), and another to map the otherVal to the key:
typedef ... Key;
typedef ... OtherVal;
struct A { Key key; OtherVal otherVal; ... };
typedef std::map<Key,A> KeyToAMap;
typedef std::map<OtherVal,Key> OtherValToKeyMap;
KeyToAMap keyToAMap;
OtherValToKeyMap otherValToKeyMap;
This way you can work with keyToAMap without any additional complexity, but when it comes time to delete, you just need an additional lookup.
To ease the usage, I would also recommend writing functions for wrapping insertion and deletion in both maps:
void insertNewA(const A& a) {
keyToAMap.insert(std::make_pair(a.key, a ));
otherValToKeyMap.insert(std::make_pair(a.otherVal, a.key ));
}
void deleteByOtherVal(const OtherVal& otherVal) {
OtherValToKeyMap::iterator it1 = otherValToKeyMap.find(otherVal);
if (it1 == otherValToKeyMap.end()) { /* error */ }
Key& key = it1->second;
KeyToAMap::iterator it2 = keyToAMap.find(key);
if (it2 == keyToAMap.end()) { /* error */ }
keyToAMap.erase(it2);
otherValToKeyMap.erase(it1);
}
An advantage of this solution is it only requires two maps, as opposed to a multi-level map solution, which requires 1+N maps, where N is the number of entries in the primary map.
In a C++ std::map, is there any way to search for the key given the mapped value? Example:
I have this map:
map<int,string> myMap;
myMap[0] = "foo";
Is there any way that I can find the corresponding int, given the value "foo"?
cout << myMap.some_function("foo") <<endl;
Output: 0
std::map doesn't provide a (fast) way to find the key of a given value.
What you want is often called a "bijective map", or short "bimap". Boost has such a data structure. This is typically implemented by using two index trees "glued" together (where std::map has only one for the keys). Boost also provides the more general multi index with similar use cases.
If you don't want to use Boost, if storage is not a big problem, and if you can affort the extra code effort, you can simply use two maps and glue them together manually:
std::map<int, string> myMapForward;
std::map<string, int> myMapBackward; // maybe even std::set
// insertion becomes:
myMapForward.insert(std::make_pair(0, "foo"));
myMapBackward.insert(std::make_pair("foo", 0));
// forward lookup becomes:
myMapForwar[0];
// backward lookup becomes:
myMapBackward["foo"];
Of course you can wrap those two maps in a class and provide some useful interface, but this might be a bit overkill, and using two maps with the same content is not an optional solution anyways. As commented below, exception safety is also a problem of this solution. But in many applications it's already enough to simply add another reverse map.
Please note that since std::map stores unique keys, this approach will support backward lookup only for unique values, as collisions in the value space of the forward map correspond to collisions in the key space of the backward map.
No, not directly.
One option is to examine each value in the map until you find what you are looking for. This, obviously, will be O(n).
In order to do this you could just write a for() loop, or you could use std::find_if(). In order to use find_if(), you'll need to create a predicate. In C++11, this might be a lambda:
typedef std::map <unsigned, Student> MyMap;
MyMap myMap;
// ...
const string targetName = "Jones";
find_if (myMap.begin(), myMap.end(), [&targetName] (const MyMap::value_type& test)
{
if (test.second.mName == targetName)
return true;
});
If you're using C++03, then this could be a functor:
struct MatchName
: public std::unary_function <bool, MyMap::value_type>
{
MatchName (const std::string& target) : mTarget (target) {}
bool operator() (const MyMap::value_type& test) const
{
if (test.second.mName == mTarget)
return true;
return false;
}
private:
const std::string mTarget;
};
// ...
find_if (myMap.begin(), myMap.end(), MatchName (target));
Another option is to build an index. The index would likely be another map, where the key is whatever values you want to find and the value is some kind of index back to the main map.
Suppose your main map contains Student objects which consist of a name and some other stuff, and the key in this map is the Student ID, an integer. If you want to find the student with a particular last name, you could build an indexing map where the key is a last name (probably want to use multimap here), and the value is the student ID. You can then index back in to the main map to get the remainder of the Student's attributes.
There are challenges with the second approach. You must keep the main map and the index (or indicies) synchronized when you add or remove elements. You must make sure the index you choose as the value in the index is not something that may change, like a pointer. If you are multithreading, then you have to give a think to how both the map and index will be protected without introducing deadlocks or race conditions.
The only way to accomplish this that I can think of is to iterate through it. This is most likely not what you want, but it's the best shot I can think of. Good luck!
No, You can not do this. You simply have to iterate over map and match each value with the item to be matched and return the corresponding key and it will cost you high time complexity equal to O(n).
You can achieve this by iterating which will take O(n) time. Or you can store the reverse map which will take O(n) space.
By iterating:
std::map<int, string> fmap;
for (std::map<int,string>::iterator it=fmap.begin(); it!=fmap.end(); ++it)
if (strcmp(it->second,"foo"))
break;
By storing reverse map:
std::map<int, string> fmap;
std::map<string, int> bmap;
fmap.insert(std::make_pair(0, "foo"));
bmap.insert(std::make_pair("foo", 0));
fmap[0]; // original map lookup
bmap["foo"]; //reverse map lookup
I need to create a data structure that can access elements by a string key, or by their ordinal.
the class currently uses an array of nodes that contain the string key and a pointer to whatever element. This allows for O(n) looping through, or O(1) getting an element by ordinal, however the only way I've found to find an element by key is doing an O(n) loop and comparing keys until I find what I want, which is SLOW when there are 1000+ elements. is there a way to use the key to reference the pointer, or am I out of luck?
EDIT: the by ordinal is not so much important as the O(n) looping. This is going to be used as a base structure that will be inherited for use in other ways, for instance, if it was a structure of draw able objects, i'd want to be able to draw all of them in a single loop
You can use std::map for O(log n) searching speed. View this branch for more details. In this branch exactly your situation (fast retrieving values by string or/and ordinal key) is discussed.
Small example (ordinal keys are used, you can do similiar things with strings):
#include <map>
#include <string>
using std::map;
using std::string;
struct dummy {
unsigned ordinal_key;
string dummy_body;
};
int main()
{
map<unsigned, dummy> lookup_map;
dummy d1;
d1.ordinal_key = 10;
lookup_map[d1.ordinal_key] = d1;
// ...
unsigned some_key = 20;
//determing if element with desired key is presented in map
if (lookup_map.find(some_key) != lookup_map.end())
//do stuff
}
If you seldom modify your array you can just keep it sorted and use binary_search on it to find the element by key in O(logn) time (technically O(klogn) since you're using strings [where k is the average length of a key string]).
Of course this (just like using a map or unordered_map) will mess up your ordinal retrieval since the elements are going to be stored in sorted order not insertion order.
Use vector and map:
std::vector<your_struct> elements;
std::map<std::string, int> index;
Map allows you to retrieve the key's index in O(lg n) time, whereas the vector allows O(1) element access by index.
Use a hashmap
I'm currently working on a DNA database class and I currently associate each row in the database with both a match score (based on edit distance) and the actual DNA sequence itself, is it safe to modify first this way within an iteration loop?
typedef std::pair<int, DnaDatabaseRow> DnaPairT;
typedef std::vector<DnaPairT> DnaDatabaseT;
// ....
for(DnaDatabaseT::iterator it = database.begin();
it != database.end(); it++)
{
int score = it->second.query(query);
it->first = score;
}
The reason I am doing this is so that I can sort them by score later. I have tried maps and received a compilation error about modifying first, but is there perhaps a better way than this to store all the information for sorting later?
To answer your first question, yes. It is perfectly safe to modify the members of your pair, since the actual data in the pair does not affect the vector itself.
edit: I have a feeling that you were getting an error when using a map because you tried to modify the first value of the map's internal pair. That would not be allowed because that value is part of the map's inner workings.
As stated by dribeas:
In maps you cannot change first as it would break the invariant of the map being a sorted balanced tree
edit: To answer your second question, I see nothing at all wrong with the way you are structuring the data, but I would have the database hold pointers to DnaPairT objects, instead of the objects themselves. This would dramatically reduce the amount of memory that gets copied around during the sort procedure.
#include <vector>
#include <utility>
#include <algorithm>
typedef std::pair<int, DnaDatabaseRow> DnaPairT;
typedef std::vector<DnaPairT *> DnaDatabaseT;
// ...
// your scoring code, modified to use pointers
void calculateScoresForQuery(DnaDatabaseT& database, queryT& query)
{
for(DnaDatabaseT::iterator it = database.begin(); it != database.end(); it++)
{
int score = (*it)->second.query(query);
(*it)->first = score;
}
}
// custom sorting function to handle DnaPairT pointers
bool sortByScore(DnaPairT * A, DnaPairT * B) { return (A->first < B->first); }
// function to sort the database
void sortDatabaseByScore(DnaDatabaseT& database)
{
sort(database.begin(), database.end(), sortByScore);
}
// main
int main()
{
DnaDatabaseT database;
// code to load the database with DnaPairT pointers ...
calculateScoresForQuery(database, query);
sortDatabaseByScore(database);
// code that uses the sorted database ...
}
The only reason you might need to look into more efficient methods is if your database is so enormous that the sorting loop takes too long to complete. If that is the case, though, I would imagine that your query function would be the one taking up most of the processing time.
You can't modify since the variable first of std::pair is defined const