I am trying to represent a hash table as a vector of pair < string, int>. I am using a hash function to return the value of the index of the vector where I wish to place the pair. I have been able to successfully create a pair and index the pair's string with the hash function. Now that I know where I want to place my pair in my vector I try to put it there but my program has a segmentation fault at this point.
My hash function:
size_t hashfunction(const string& ident){
unsigned hash = 0;
for(int i = 0; i < ident.size(); ++i) {
char c = ident[i];
hash ^= c + 0x9e3779b9 + (hash<<6) + (hash>>2);
}
return hash;
}
My main function:
int main(){
vector < pair < string, int > > hashtable;
pair <string, int> testone ("bartering", 5);
size_t testoneindex = hashfunction(testone.first);
hashtable[testoneindex] = testone;
return 0;
}
This section of code compiles but produces a segmentation fault at the line
hashtable[testoneindex] = testone;
What am I doing wrong?
You cannot realistically have your container done this way because of the memory required. Instead you'd want the container and insertion code to be closer to classic hash container design, something like this:
typedef pair <string, int> value_t;
value_t val;
vector<list<value_t>> buckets;
buckets.resize(current_size);
auto& bucket = buckets[hashfunc(val.first) % buckets.size()];
auto itr = find_if(bucket.begin(), bucket.end(), [&](value_t const& other) {
return other.first == val.first;
});
if (itr == bucket.end()) bucket.push_back(val);
You need to modulo your hash index down to the range of indices in your vector. For example, initialize your vector to have 1000 buckets, and use hashfunction(..) % 1000.
The std::vector<...> you created is empty. Placing an object anywhere in this object won't work. You need to resize the hashtable object to a suitable size, i.e., you need to give that object the number of buckets, e.g., using
std::size_t number_of_buckets = ...;
std::vector<std::pair<std::string, int> > hashtable(number_of_buckets);
Note, that the approach you take for hashing is a bit too simplistic, though: especially for smaller number of buckets there is a chance that two different hashes as keyed to the same bucket. That is, you'll need to deal with collisions. The two approaches for dealing with collisions I'm aware of are
Determine a new bucket with a key if the first bucket found is already used (and keep searching for new buckets until an empty bucket is found). The main issue with this approach is that you can't really remove objects as other rebucketed objects can be found.
Use a list in each bucket for all the keys which use the same bucket. This approach has the additional advantage that you don't need to create any key or value until a bucket is actually used (you'd have the lists, though, but this can be made fairly cheap).
Related
I am trying to use an unordered_map with another unordered_map as a key (custom hash function). I've also added a custom equal function, even though it's probably not needed.
The code does not do what I expect, but I can't make heads or tails of what's going on. For some reason, the equal function is not called when doing find(), which is what I'd expect.
unsigned long hashing_func(const unordered_map<char,int>& m) {
string str;
for (auto& e : m)
str += e.first;
return hash<string>()(str);
}
bool equal_func(const unordered_map<char,int>& m1, const unordered_map<char,int>& m2) {
return m1 == m2;
}
int main() {
unordered_map<
unordered_map<char,int>,
string,
function<unsigned long(const unordered_map<char,int>&)>,
function<bool(const unordered_map<char,int>&, const unordered_map<char,int>&)>
> mapResults(10, hashing_func, equal_func);
unordered_map<char,int> t1 = getMap(str1);
unordered_map<char,int> t2 = getMap(str2);
cout<<(t1 == t2)<<endl; // returns TRUE
mapResults[t1] = "asd";
cout<<(mapResults.find(t2) != mapResults.end()); // returns FALSE
return 0;
}
First of all, the equality operator is certainly required, so you should keep it.
Let's look at your unordered map's hash function:
string str;
for (auto& e : m)
str += e.first;
return hash<string>()(str);
Since it's an unordered map, by definition, the iterator can iterate over the unordered map's keys in any order. However, since the hash function must produce the same hash value for the same key, this hash function will obviously fail in that regard.
Additionally, I would also expect that the hash function will also include the values of the unorderered map key, in addition to the keys themselves. I suppose that you might want to do it this way -- for two unordered maps to be considered to be the same key as long as their keys are the same, ignoring their values. It's not clear from the question what your expectation is, but you may want to think it over.
Comparing two std::unordered_map objects using == compares whether the maps contain the same keys. It does nothing to tell whether they contain them in the same order (it's an unordered map, after all). However, your hashing_func depends on the order of items in the map: hash<string>()("ab") is in general different from hash<string>()("ba").
A good place to start is with what hashing_func returns for each map, or more easily what the string construction in hashing_func generates.
A more obviously correct hash function for such a type could be:
unsigned long hashing_func(const unordered_map<char,int>& m) {
unsigned long res = 0;
for (auto& e : m)
res ^ hash<char>()(e.first) ^ hash<int>()(e.second);
return res;
}
I have a vector of structs which have about 100 members within the struct. The vector itself can grow to be as large as 1000 elements. I am trying to find a simple way to search the list based on a set of 3 elements every struct contains amongst its many members:
std::string firstName;
std::string lastName;
size_t age;
I'm trying to find a way to search the vector based on a key derived from these three values, rather than iterating through the list and doing something like:
for ( int i = 0; i < list.length(); i++ )
{
if (element[i].lastName == lastNameToFind &&
element[i].firstName == firstNameToFind &&
element[i].age == ageToFind)
{
// found the element
}
}
I am looking for faster methods that take advantage of the underlying logic in std::vector to operate more efficiently, and if I want to search by different key tuples, I just change a couple lines of code rather than writing another search function. Is such an approach possible?
You could use std::find_if and provide a lambda as a predicate. It will be simpler and more flexible but I'm not sure it will necessarily be faster.
auto findByNameAndAge = [&lastNameToFind, &firstNameToFind, &ageToFind]
(const MyStruct& s) {
return s.lastName == lastNameToFind &&
s.firstName == firstNameToFind &&
s.age == ageToFind;
};
auto result = std::find_if(list.begin(), list.end(), findByNameAndAge);
Live demo.
Alternatively, you could create a comparison operator with a key tuple or struct
using MyKey = std::tuple<std::string, std::string, int>;
bool operator==(const MyStruct& s, const MyKey& key){
return std::tie(s.lastName, s.firstName, s.age) == key;
}
and use std::find:
auto key = MyKey{"Smith", "John", 10};
auto result = std::find(list.begin(), list.end(), key);
Live demo.
If you want faster search you might need to reconsider how you are storing the structs. Perhaps maintain indexes or keep the vector sorted but this may impact the performance of insertions.
The first off, why put it in a vector? I believe an urordered_map might be better off with the hash:
[&last_name, &first_name, age]()
{
return std::hash<std::string>(last_name+","+first_name) ^ age;
};
I think ^ is a good way of merging two hashes into one. Maybe google that part?
If you insist on a vector, maybe make a smart_ptr and store that in your vector then an unordered_map with the smart_ptr as a value.
PS:
OK xor is a crappy way to hash. use boost::hash_combine or this answer.
I currently have a std::map<std::string,int> that stores an integer value to a unique string identifier, and I do look up with the string. It does mostly what I want, except that it does not keep track of the insertion order. So when I iterate the map to print out the values, they are sorted according to the string; but I want them to be sorted according to the order of (first) insertion.
I thought about using a vector<pair<string,int>> instead, but I need to look up the string and increment the integer values about 10,000,000 times, so I don't know whether a std::vector will be significantly slower.
Is there a way to use std::map or is there another std container that better suits my need?
I'm on GCC 3.4, and I have probably no more than 50 pairs of values in my std::map.
If you have only 50 values in std::map you could copy them to std::vector before printing out and sort via std::sort using appropriate functor.
Or you could use boost::multi_index. It allows to use several indexes.
In your case it could look like the following:
struct value_t {
string s;
int i;
};
struct string_tag {};
typedef multi_index_container<
value_t,
indexed_by<
random_access<>, // this index represents insertion order
hashed_unique< tag<string_tag>, member<value_t, string, &value_t::s> >
>
> values_t;
You might combine a std::vector with a std::tr1::unordered_map (a hash table). Here's a link to Boost's documentation for unordered_map. You can use the vector to keep track of the insertion order and the hash table to do the frequent lookups. If you're doing hundreds of thousands of lookups, the difference between O(log n) lookup for std::map and O(1) for a hash table might be significant.
std::vector<std::string> insertOrder;
std::tr1::unordered_map<std::string, long> myTable;
// Initialize the hash table and record insert order.
myTable["foo"] = 0;
insertOrder.push_back("foo");
myTable["bar"] = 0;
insertOrder.push_back("bar");
myTable["baz"] = 0;
insertOrder.push_back("baz");
/* Increment things in myTable 100000 times */
// Print the final results.
for (int i = 0; i < insertOrder.size(); ++i)
{
const std::string &s = insertOrder[i];
std::cout << s << ' ' << myTable[s] << '\n';
}
Tessil has a very nice implementaion of ordered map (and set) which is MIT license. You can find it here: ordered-map
Map example
#include <iostream>
#include <string>
#include <cstdlib>
#include "ordered_map.h"
int main() {
tsl::ordered_map<char, int> map = {{'d', 1}, {'a', 2}, {'g', 3}};
map.insert({'b', 4});
map['h'] = 5;
map['e'] = 6;
map.erase('a');
// {d, 1} {g, 3} {b, 4} {h, 5} {e, 6}
for(const auto& key_value : map) {
std::cout << "{" << key_value.first << ", " << key_value.second << "}" << std::endl;
}
map.unordered_erase('b');
// Break order: {d, 1} {g, 3} {e, 6} {h, 5}
for(const auto& key_value : map) {
std::cout << "{" << key_value.first << ", " << key_value.second << "}" << std::endl;
}
}
Keep a parallel list<string> insertionOrder.
When it is time to print, iterate on the list and do lookups into the map.
each element in insertionOrder // walks in insertionOrder..
print map[ element ].second // but lookup is in map
If you need both lookup strategies, you will end up with two containers. You may use a vector with your actual values (ints), and put a map< string, vector< T >::difference_type> next to it, returning the index into the vector.
To complete all that, you may encapsulate both in one class.
But I believe boost has a container with multiple indices.
What you want (without resorting to Boost) is what I call an "ordered hash", which is essentially a mashup of a hash and a linked list with string or integer keys (or both at the same time). An ordered hash maintains the order of the elements during iteration with the absolute performance of a hash.
I've been putting together a relatively new C++ snippet library that fills in what I view as holes in the C++ language for C++ library developers. Go here:
https://github.com/cubiclesoft/cross-platform-cpp
Grab:
templates/detachable_ordered_hash.cpp
templates/detachable_ordered_hash.h
templates/detachable_ordered_hash_util.h
If user-controlled data will be placed into the hash, you might also want:
security/security_csprng.cpp
security/security_csprng.h
Invoke it:
#include "templates/detachable_ordered_hash.h"
...
// The 47 is the nearest prime to a power of two
// that is close to your data size.
//
// If your brain hurts, just use the lookup table
// in 'detachable_ordered_hash.cpp'.
//
// If you don't care about some minimal memory thrashing,
// just use a value of 3. It'll auto-resize itself.
int y;
CubicleSoft::OrderedHash<int> TempHash(47);
// If you need a secure hash (many hashes are vulnerable
// to DoS attacks), pass in two randomly selected 64-bit
// integer keys. Construct with CSPRNG.
// CubicleSoft::OrderedHash<int> TempHash(47, Key1, Key2);
CubicleSoft::OrderedHashNode<int> *Node;
...
// Push() for string keys takes a pointer to the string,
// its length, and the value to store. The new node is
// pushed onto the end of the linked list and wherever it
// goes in the hash.
y = 80;
TempHash.Push("key1", 5, y++);
TempHash.Push("key22", 6, y++);
TempHash.Push("key3", 5, y++);
// Adding an integer key into the same hash just for kicks.
TempHash.Push(12345, y++);
...
// Finding a node and modifying its value.
Node = TempHash.Find("key1", 5);
Node->Value = y++;
...
Node = TempHash.FirstList();
while (Node != NULL)
{
if (Node->GetStrKey()) printf("%s => %d\n", Node->GetStrKey(), Node->Value);
else printf("%d => %d\n", (int)Node->GetIntKey(), Node->Value);
Node = Node->NextList();
}
I ran into this SO thread during my research phase to see if anything like OrderedHash already existed without requiring me to drop in a massive library. I was disappointed. So I wrote my own. And now I've shared it.
Here is solution that requires only standard template library without using boost's multiindex:
You could use std::map<std::string,int>; and vector <data>; where in map you store the index of the location of data in vector and vector stores data in insertion order. Here access to data has O(log n) complexity. displaying data in insertion order has O(n) complexity. insertion of data has O(log n) complexity.
For Example:
#include<iostream>
#include<map>
#include<vector>
struct data{
int value;
std::string s;
}
typedef std::map<std::string,int> MapIndex;//this map stores the index of data stored
//in VectorData mapped to a string
typedef std::vector<data> VectorData;//stores the data in insertion order
void display_data_according_insertion_order(VectorData vectorData){
for(std::vector<data>::iterator it=vectorData.begin();it!=vectorData.end();it++){
std::cout<<it->value<<it->s<<std::endl;
}
}
int lookup_string(std::string s,MapIndex mapIndex){
std::MapIndex::iterator pt=mapIndex.find(s)
if (pt!=mapIndex.end())return it->second;
else return -1;//it signifies that key does not exist in map
}
int insert_value(data d,mapIndex,vectorData){
if(mapIndex.find(d.s)==mapIndex.end()){
mapIndex.insert(std::make_pair(d.s,vectorData.size()));//as the data is to be
//inserted at back
//therefore index is
//size of vector before
//insertion
vectorData.push_back(d);
return 1;
}
else return 0;//it signifies that insertion of data is failed due to the presence
//string in the map and map stores unique keys
}
You cannot do that with a map, but you could use two separate structures - the map and the vector and keep them synchronized - that is when you delete from the map, find and delete the element from the vector. Or you could create a map<string, pair<int,int>> - and in your pair store the size() of the map upon insertion to record position, along with the value of the int, and then when you print, use the position member to sort.
One thing you need to consider is the small number of data elements you are using. It is possible that it will be faster to use just the vector. There is some overhead in the map that can cause it to be more expensive to do lookups in small data sets than the simpler vector. So, if you know that you will always be using around the same number of elements, do some benchmarking and see if the performance of the map and vector is what you really think it is. You may find the lookup in a vector with only 50 elements is near the same as the map.
Another way to implement this is with a map instead of a vector. I will show you this approach and discuss the differences:
Just create a class that has two maps behind the scenes.
#include <map>
#include <string>
using namespace std;
class SpecialMap {
// usual stuff...
private:
int counter_;
map<int, string> insertion_order_;
map<string, int> data_;
};
You can then expose an iterator to iterator over data_ in the proper order. The way you do that is iterate through insertion_order_, and for each element you get from that iteration, do a lookup in the data_ with the value from insertion_order_
You can use the more efficient hash_map for insertion_order since you don't care about directly iterating through insertion_order_.
To do inserts, you can have a method like this:
void SpecialMap::Insert(const string& key, int value) {
// This may be an over simplification... You ought to check
// if you are overwriting a value in data_ so that you can update
// insertion_order_ accordingly
insertion_order_[counter_++] = key;
data_[key] = value;
}
There are a lot of ways you can make the design better and worry about performance, but this is a good skeleton to get you started on implementing this functionality on your own. You can make it templated, and you might actually store pairs as values in data_ so that you can easily reference the entry in insertion_order_. But I leave these design issues as an exercise :-).
Update: I suppose I should say something about efficiency of using map vs. vector for insertion_order_
lookups directly into data, in both cases are O(1)
inserts in the vector approach are O(1), inserts in the map approach are O(logn)
deletes in the vector approach are O(n) because you have to scan for the item to remove. With the map approach they are O(logn).
Maybe if you are not going to use deletes as much, you should use the vector approach. The map approach would be better if you were supporting a different ordering (like priority) instead of insertion order.
This is somewhat related to Faisals answer. You can just create a wrapper class around a map and vector and easily keep them synchronized. Proper encapsulation will let you control the access method and hence which container to use... the vector or the map. This avoids using Boost or anything like that.
// Should be like this man!
// This maintains the complexity of insertion is O(logN) and deletion is also O(logN).
class SpecialMap {
private:
int counter_;
map<int, string> insertion_order_;
map<string, int> insertion_order_reverse_look_up; // <- for fast delete
map<string, Data> data_;
};
There is no need to use a separate std::vector or any other container for keeping track of the insertion order. You can do what you want as shown below.
If you want to keep the insertion order then you can use the following program(version 1):
Version 1: For counting unique strings using std::map<std::string,int> in insertion order
#include <iostream>
#include <map>
#include <sstream>
int findExactMatchIndex(const std::string &totalString, const std::string &toBeSearched)
{
std::istringstream ss(totalString);
std::string word;
std::size_t index = 0;
while(ss >> word)
{
if(word == toBeSearched)
{
return index;
}
++index;
}
return -1;//return -1 when the string to be searched is not inside the inputString
}
int main() {
std::string inputString = "this is a string containing my name again and again and again ", word;
//this map maps the std::string to their respective count
std::map<std::string, int> wordCount;
std::istringstream ss(inputString);
while(ss >> word)
{
//std::cout<<"word:"<<word<<std::endl;
wordCount[word]++;
}
std::cout<<"Total unique words are: "<<wordCount.size()<<std::endl;
std::size_t i = 0;
std::istringstream gothroughStream(inputString);
//just go through the inputString(stream) instead of map
while( gothroughStream >> word)
{
int index = findExactMatchIndex(inputString, word);
if(index != -1 && (index == i)){
std::cout << word <<"-" << wordCount.at(word)<<std::endl;
}
++i;
}
return 0;
}
The output of the above program is as follows:
Total unique words are: 9
this-1
is-1
a-1
string-1
containing-1
my-1
name-1
again-3
and-2
Note that in the above program, if you have a comma or any other delimiter then it is counted as a separate word. So for example lets say you have the string this is, my name is then the string is, has count of 1 and the string is has count of 1. That is is, and is are different. This is because the computer doesn't know our definition of a word.
Note
The above program is a modification of my answer to How do i make the char in an array output in order in this nested for loop? which is given as version 2 below:
Version 2: For counting unique characters using std::map<char, int> in insertion order
#include <iostream>
#include <map>
int main() {
std::string inputString;
std::cout<<"Enter a string: ";
std::getline(std::cin,inputString);
//this map maps the char to their respective count
std::map<char, int> charCount;
for(char &c: inputString)
{
charCount[c]++;
}
std::size_t i = 0;
//just go through the inputString instead of map
for(char &c: inputString)
{
std::size_t index = inputString.find(c);
if(index != inputString.npos && (index == i)){
std::cout << c <<"-" << charCount.at(c)<<std::endl;
}
++i;
}
return 0;
}
In both cases/versions there is no need to use a separate std::vector or any other container to keep track of the insertion order.
Use boost::multi_index with map and list indices.
A map of pair (str,int) and static int that increments on insert calls indexes pairs of data. Put in a struct that can return the static int val with an index () member perhaps?
I have vector of some data type (Let's say-int) and I need to push back only unique values from the file? I am new to use STL. So i don't know how can i do it using map as i read that map only takes unique values. If I simply push back, then it will take all the values irrespective of its uniqueness.
The correct container to use for unique values is either std::set or std::unordered_set:
std::set<int> s;
s.insert(4); // s has size 1
s.insert(5); // s has size 2
s.insert(4); // s still has size 2
If you want to use vector, you'd have to maintain it sorted, which is a lot more code and work, and doesn't have the nice characteristic of set that everybody knows the contents are unique:
void add_value(std::vector<int>& v, int value) {
// do a binary search to find value
std::vector<int>::iterator it = std::lower_bound(v.begin(), v.end(), value);
if (it != v.end() && *it == value) {
// duplicate - do nothing
}
else {
// insert our value here
v.insert(it, value);
}
}
... or I guess you could delete the duplicates at the end using a rarely-used algorithm (std::unique) that will probably raise some eyebrows:
void uniqify(std::vector<int>& v) {
std::sort(v.begin(), v.end());
v.erase(std::unique(v.begin(), v.end()), v.end());
}
[UPDATE] It has been pointed out to me that I completely misunderstood your question - and that you may have been looking for just which values occur exactly once - not a list of which values occur without duplicate. For that, the correct container to use is either a std::map or std::unordered_map - so you can associate a count with a particular key:
std::map<int, int> keyCounts;
int value;
while (fileStream >> value) { // or whatever
++keyCounts[value]; // operator[] gives us a reference to the value
// if it wasn't present before, it'll insert a default
// one - which for int is zero - so this handles
// both cases correctly
}
// Now, any key with value 1 is a unique key
// what you want to do with them is up to you
// e.g., let's put it in a vector
std::vector<int> uniq;
uniq.reserve(keyCounts.size());
for (std::map<int, int>::iterator it = keyCounts.begin(); it != keyCounts.end(); ++it)
{
if (it->second == 1) {
uniq.push_back(it->first);
}
}
A std::map will let you handle a mapping of unique keys to some values (which may or may not be unique). Math-wise, You may see it as a surjective function from the set of keys to the set of values of your dataset.
If your goal is to keep unique indices (or keys), then std::map is what you need. Otherwise, use std::set to store unique values.
Now, to keep only unique values from your dataset, you basically want to remove values which appear more than once. The simplest algorithm is to add values from the file as keys in a map, with its corresponding value being a counter for the number of occurrences of that entry in the file. Initialize a counter to 1 the first time the value is met in the file, and increment it each time it is met again. After having parsed the whole file, simply keep the keys whose values are exactly 1.
Counting the values:
template <typename key>
void count(std::istream &is, std::map<key,int> &map){
while (!is.eof() && is.good()){
key << is;
auto it = map.find(key);
if (it == map.end())
map[key] = 1;
else (*it)++;
}
}
The above assumes that the << has been overloaded to extract values from the stream sequentially. You will have to adapt the algorithm to fit your own way of parsing the data.
Filtering the resulting map to keep unique values can be achieved with std::remove_if and a function returning true when the counter is above 1:
The function:
bool duplicate (std::const_iterator<int> &it){ return *it > 1;}
The map filtering:
std::remove_if (map.begin(), map.end(), duplicate);
I have this small program that reads a line of input & prints the words in it, with their respective number of occurrences. I want to sort the elements in the map that stores these values according to their occurrences. I mean, the words that only appear once, will be ordered to be at the beginning, then the words that appeared twice 7 so on. I know that the predicate should return a bool value, but I don't know what the parameters should be. Should it be two iterators to the map? If some one could explain this, it would be greatly appreciated. Thank you in advance.
#include<iostream>
#include<map>
using std::cout;
using std::cin;
using std::endl;
using std::string;
using std::map;
int main()
{
string s;
map<string,int> counters; //store each word & an associated counter
//read the input, keeping track of each word & how often we see it
while(cin>>s)
{
++counters[s];
}
//write the words & associated counts
for(map<string,int>::const_iterator iter = counters.begin();iter != counters.end();iter++)
{
cout<<iter->first<<"\t"<<iter->second<<endl;
}
return 0;
}
std::map is always sorted according to its key. You cannot sort the elements by their value.
You need to copy the contents to another data structure (for example std::vector<std::pair<string, int> >) which can be sorted.
Here is a predicate that can be used to sort such a vector. Note that sorting algorithms in C++ standard library need a "less than" predicate which basically says "is a smaller than b".
bool cmp(std::pair<string, int> const &a, std::pair<string, int> const &b) {
return a.second < b.second;
}
You can't resort a map, it's order is predefined (by default, from std::less on the key type). The easiest solution for your problem would be to create a std::multimap<int, string> and insert your values there, then just loop over the multimap, which will be ordered on the key type (int, the number of occurences), which will give you the order that you want, without having to define a predicate.
You are not going to be able to do this with one pass with an std::map. It can only be sorted on one thing at a time, and you cannot change the key in-place. What I would recommend is to use the code you have now to maintain the counters map, then use std::max_element with a comparison function that compares the second field of each std::pair<string, int> in the map.
A map has its keys sorted, not its values. That's what makes the map efficent. You cannot sort it by occurrences without using another data structure (maybe a reversed index!)
As stated, it simply won't work -- a map always remains sorted by its key value, which would be the strings.
As others have noted, you can copy the data to some other structure, and sort by the value. Another possibility would be to use a Boost bimap instead. I've posted a demo of the basic idea previously.
You probably want to transform map<string,int> to vector<pair<const string, int> > then sort the vector on the int member.
You could do
struct PairLessSecond
{
template< typename P >
bool operator()( const P& pairLeft, const P& pairRight ) const
{
return pairLeft.second < pairRight.second;
}
};
You can probably also construct all this somehow using a lambda with a bind.
Now
std::vector< std::map<std::string,int>::value_type > byCount;
std::sort( byCount.begin(), byCount.end(), PairLessSecond() );