Replacing std::map with std::set and search by index

Replacing std::map with std::set and search by index - c++

Say we have a map with larger objects and an index value. The index value is also part of the larger object.
What I would like to know is whether it is possible to replace the map with a set, extracting the index value.
It is fairly easy to create a set that sorts on a functor comparing two larger objects by extracting the index value.
Which leaves searching by index value, which is not supported by default in a set, I think.
I was thinking of using std::find_if, but I believe that searches linearly, ignoring the fact we have set.
Then I thought of using std::binary_search with a functor comparing the larger object and the value, but I believe that it doesn't work in this case as it wouldn't make use of the structure and would use traversal as it doesn't have a random access iterator. Is this correct? Or are there overloads which correctly handle this call on a set?
And then finally I was thinking of using a boost::containter::flat_set, as this has an underlying vector and thus presumably should be able to work well with std::binary_search?
But maybe there is an all together easier way to do this?
Before you answer just use a map where a map ought to be used - I am actually using a vector that is manually sorted (well std::lower_bound) and was thinking of replacing it with boost::containter::flat_set, but it doesn't seem to be easily possible to do so, so I might just stick with the vector.

C++14 will introduce the ability to lookup by a key that does not require the construction of the entire stored object. This can be used as follows:
#include <set>
#include <iostream>
struct StringRef {
StringRef(const std::string& s):x(&s[0]) { }
StringRef(const char *s):x(s) { std::cout << "works: " << s << std::endl; }
const char *x;
};
struct Object {
long long data;
std::size_t index;
};
struct ObjectIndexer {
ObjectIndexer(Object const& o) : index(o.index) {}
ObjectIndexer(std::size_t index) : index(index) {}
std::size_t index;
};
struct ObjComp {
bool operator()(ObjectIndexer a, ObjectIndexer b) const {
return a.index < b.index;
}
typedef void is_transparent; //Allows the comparison with non-Object types.
};
int main() {
std::set<Object, ObjComp> stuff;
stuff.insert(Object{135, 1});
std::cout << stuff.find(ObjectIndexer(1))->data << "\n";
}
More generally, these sorts of problems where there are multiple ways of indexing your data can be solved using Boost.MultiIndex.

Use boost::intrusive::set which can utilize the object's index value directly. It has a find(const KeyType & key, KeyValueCompare comp) function with logarithmic complexity. There are also other set types based on splay trees, AVL trees, scapegoat trees etc. which may perform better depending on your requirements.

If you add the following to your contained object type:
less than operator that only compares the object indices
equality operator that only compares the object indices
a constructor that takes your index type and initializes a dummy object with that value for the index
then you can pass your index type to find, lower_bound, equal_range, etc... and it will act the way you want. When you pass your index to the set's (or flat_set's) find methods it will construct a dummy object of the contained type to use for the comparisons.
Now if your object is really big, or expensive to construct, this might not be the way you want to go.

Related

Erase by value in a vector of shared pointers

I want to erase by value from a vector of shared ptr of string (i.e vector<shared_ptr<string>>) . Is there any efficient way of doing this instead of iterating the complete vector and then erasing from the iterator positions.
#include <bits/stdc++.h>
using namespace std;
int main()
{
vector<shared_ptr<string>> v;
v.push_back(make_shared<string>("aaa"));
int j = 0,ind;
for(auto i : v) {
if((*i)=="aaa"){
ind = j;
}
j++;
}
v.erase(v.begin()+ind);
}
Also I dont want to use memory for a map ( value vs address).

Try like that (Erase-Remove Idiom):
string s = "aaa";
auto cmp = [s](const shared_ptr<string> &p) { return s == *p; };
v.erase(std::remove_if(v.begin(), v.end(), cmp), v.end());

There is no better way then O(N) - you have to find the object in a vector, and you have to iterate the vector once to find it. Does not really matter if it is a pointer or any object.
The only way to do better is to use a different data structure, which provides O(1) finding/removal. A set is the first thing that comes to mind, but that would indicate your pointers are unique. A second option would be a map, such that multiple pointers pointing to the same value exist at the same hash key.
If you do not want to use a different structure, then you are out of luck. You could have an additional structure hashing the pointers, if you want to retain the vector but also have O(1) access.
For example if you do use a set, and define a proper key - hasher or key_equal. probably hasher is enough defined as the hash for *elementInSet, so each pointer must point to a distinct string for example:
struct myPtrHash {
size_t operator()(const std::shared_ptr<std::string>& p) const {
//Maybe we want to add checks/throw a more meaningful error if p is invalid?
return std::hash<std::string>()(*p);
}
};
such that your set is:
std::unordered_set<std::shared_ptr<std::string>,myPtrHash > pointerSet;
Then erasing would be O(1) simply as:
std::shared_ptr<std::string> toErase = make_shared("aaa");
pointerSet.erase(toErase)
That said, if you must use a vector a more idomatic way to do this is to use remove_if instead of iterating yourself - this will not improve time complexity though, just better practice.

Don't include bits/stdc++.h, and since you're iterating through the hole vector, you should be using std::for_each with a lambda.

Container with key and sorting criteria separate

I want to have a collection of items which are searchable based on a key (an unsigned value), but I want the elements to be sorted based on a different criteria i.e. the last accessed time (Which is part of the value).
How can I achieve this in C++? I can sort them separately on demand, but can I create the container itself such that sorting happens automatically?
Are there ready made containers (in boost) that can have similar feature built into them?

You could probably implement something of this kind, using std::list and std::unordered_map pointing to each other.
#include <list>
#include <unordered_map>
template <typename A>
struct Cache {
using key = unsigned;
struct Composite {
Composite(A &_a, std::list<key>::iterator _it) : a(_a), it(_it) {}
A &a;
std::list<key>::iterator it;
};
std::unordered_map<key, Composite> map;
std::list <key> list;
void insert(key k, A &a) { // Assuming inserting contains accessing
list.emplace_front(k);
map[k] = Composite(a, list.front());
}
A &operator[](key k) {
list.erase(map[k].it);
list.emplace_front(k);
return map[k].a;
}
A &last_accessed() { // or whatever else you wish to implement
assert(!list.empty());
return map[list.front()].a;
}
};
This solution is optimized for keeping track of which element was accessed last. If you want to sort given a different attribute, you can follow a similar process but use an std::set to store the values with your comparison function, and then iterators to that from an std::unordered_map hashed with a key of your choice.

Ordering a container on something else than the key

I am currently trying to implement a A* algorithm and I've come to a problem :
I want to keep a set of distinct objects, identified by a hash (I've used boost::hash and family, but can use anything else) and ordered by a public int value, member of those objects.
The goal is being able to retrieve the smaller object based on the int value in O(1) and guarantee uniqueness in the most efficient manner (hash seemed a good way to achieve that, but i'm open to alternatives). I don't need to iterate over the container if those two conditions are met.
Is there any already present implementation that answer those specifications ? Am I mistaken in my assumptions ? Should I just extend any existing container ?
EDIT :
Apparently unclear on what "smaller based on int value" means. I mean that my object has a public attribute (lets say score). For two objects a and b, a < b if and only if a.score < b.score.
I want a and b to be in a container, ordered by score. And if I try to insert c with c.hash == a.hash, I want the insertion to fail.

Although std::priority_queue is an adapter, its Container template parameter has to satisfy SequenceContainer, so you can't build one backed by a std::set.
It looks like your best option is to maintain both a set and a priority queue, and use the former to control insertion into the latter. It may be a good idea to encapsulate that into a container-concept class, but you might get away with a couple of methods if your use of it is quite localised.

use a custom comparator and a std::set :
#include <set>
#include <string>
struct Object
{
int value;
long hash;
std::string data;
Object(int value, std::string data) :
value(value), data(data)
{
}
bool operator<(const Object& other) const
{
return data < other.data;
}
};
struct ObjComp1
{
bool operator()(const Object& lhs, const Object& rhs) const
{
return lhs.value < rhs.value;
}
};
struct ObjComp2
{
bool operator()(const Object& lhs, const Object& rhs) const
{
if (lhs.value != rhs.value)
{
return lhs.value < rhs.value;
}
return lhs < rhs;
}
};
int main()
{
Object o1(5, "a");
Object o2(1, "b");
Object o3(1, "c");
Object o4(1, "c");
std::set<Object, ObjComp1> set;
set.insert(o1);
set.insert(o2);
set.insert(o3);
set.insert(o4);
std::set<Object, ObjComp2> set2;
set2.insert(o1);
set2.insert(o2);
set2.insert(o3);
set2.insert(o4);
return 0;
}
First variant will allow you to only insert o1 and o2, second variant will allow you to insert o1, o2 and o3, as it's not really clear which one you need. The only downside is that you need to code your own operator< for the Object type.
alternatively if you don't want to create a custom operator< for you data type, you can wrap a std::map > but this is less straightforward

You could use the stl type priority_queue. If your elements are integers then you could do:
priority_queue<int> q;
Priority queues are internally implemented with heaps, a complete binary tree whose root always is the minimum element of the set. So, you could consult in O(1) by invoking top().
However, as you algorithm progress, you will need extract the items with pop(). Since is a binary tree, the extraction takes O(log N), which it is not O(1), but is a very good time and it is guaranteed, by contrast with a expected time, which would be the case for an imperfect hash table .
I do not know a way for maintaining a set and extracting the minimum in O(1).

Why does `std::unordered_map` "speak like the Yoda" - re-arrange elements?

When trying to write the std::string keys of an std::unordered_map in the following example, the keys get written in a different order than the one given by the initializer list:
#include <iostream>
#include <unordered_map>
class Data
{
typedef std::unordered_map<std::string, double> MapType;
typedef MapType::const_iterator const_iterator;
MapType map_;
public:
Data(const std::initializer_list<std::string>& i)
{
int counter = 0;
for (const auto& name : i)
{
map_[name] = counter;
}
}
const_iterator begin() const
{
return map_.begin();
}
const_iterator end() const
{
return map_.end();
}
};
std::ostream& operator<<(std::ostream& os, const Data& d)
{
for (const auto& pair : d)
{
os << pair.first << " ";
}
return os;
}
using namespace std;
int main(int argc, const char *argv[])
{
Data d = {"Why", "am", "I", "sorted"};
// The unordered_map speaks like Yoda.
cout << d << endl;
return 0;
}
I expected to see 'Why am I sorted', but I got a Yoda-like output:
sorted I am Why
Reading on the unordered_map here, I saw this:
Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its key. This allows fast access to individual elements, since once hash is computed, it refers to the exact bucket the element is placed into.
Is this why the elements are not ordered in the same way as in the initializer list?
What data structure do I then use when I want the keys to be ordered in the same way as the initializer list? Should I internally keep a vector of strings to somehow save the argument order? Can the bucket organization be turned off somehow by choosing a specific hashing function?

What data structure do I then use when I want the keys to be ordered in the same way as the initializer list? Should I internally keep a vector of strings to somehow save the argument order?
Maybe all you want is actually a list/vector of (key, value) pairs?
If you want both O(1) lookup (hashmap) and iteration in the same order as insertion - then yes, using a vector together with an unordered_map sounds like a good idea. For example, Django's SortedDict (Python) does exactly that, here's the source for inspiration:
https://github.com/django/django/blob/master/django/utils/datastructures.py#L122
Python 2.7's OrderedDict is a bit more fancy (map values point to doubly-linked list links), see:
http://code.activestate.com/recipes/576693-ordered-dictionary-for-py24/
I'm not aware of an existing C++ implementation in standard libs, but this might get you somewhere. See also:
a C++ hash map that preserves the order of insertion
A std::map that keep track of the order of insertion?

unordered_map is, by definition, unordered, so you shall not expect any ordering when accessing the map sequentially.
If you don't want elements sorted by the key value, just use a container that keeps your order of insertion, be it a vector, deque, list or whatever, of pair<key, value> element type if you insist on using it.
Then, if an alement B is added after element A, it will always appear later. This holds true for initializer_list initialization as well.
You could probably use something like Boost.MultiIndex to keep it both sorted by insertion order and arbitrary key.

What is the best way to use a HashMap in C++?

I know that STL has a HashMap API, but I cannot find any good and thorough documentation with good examples regarding this.
Any good examples will be appreciated.

The standard library includes the ordered and the unordered map (std::map and std::unordered_map) containers. In an ordered map (std::map) the elements are sorted by the key, insert and access is in O(log n). Usually the standard library internally uses red black trees for ordered maps. But this is just an implementation detail. In an unordered map (std::unordered_map) insert and access is in O(1). It is just another name for a hashtable.
An example with (ordered) std::map:
#include <map>
#include <iostream>
#include <cassert>
int main(int argc, char **argv)
{
std::map<std::string, int> m;
m["hello"] = 23;
// check if key is present
if (m.find("world") != m.end())
std::cout << "map contains key world!\n";
// retrieve
std::cout << m["hello"] << '\n';
std::map<std::string, int>::iterator i = m.find("hello");
assert(i != m.end());
std::cout << "Key: " << i->first << " Value: " << i->second << '\n';
return 0;
}
Output:
23
Key: hello Value: 23
If you need ordering in your container and are fine with the O(log n) runtime then just use std::map.
Otherwise, if you really need a hash-table (O(1) insert/access), check out std::unordered_map, which has a similar to std::map API (e.g. in the above example you just have to search and replace map with unordered_map).
The unordered_map container was introduced with the C++11 standard revision. Thus, depending on your compiler, you have to enable C++11 features (e.g. when using GCC 4.8 you have to add -std=c++11 to the CXXFLAGS).
Even before the C++11 release GCC supported unordered_map - in the namespace std::tr1. Thus, for old GCC compilers you can try to use it like this:
#include <tr1/unordered_map>
std::tr1::unordered_map<std::string, int> m;
It is also part of boost, i.e. you can use the corresponding boost-header for better portability.

A hash_map is an older, unstandardized version of what for standardization purposes is called an unordered_map (originally in TR1, and included in the standard since C++11). As the name implies, it's different from std::map primarily in being unordered -- if, for example, you iterate through a map from begin() to end(), you get items in order by key1, but if you iterate through an unordered_map from begin() to end(), you get items in a more or less arbitrary order.
An unordered_map is normally expected to have constant complexity. That is, an insertion, lookup, etc., typically takes essentially a fixed amount of time, regardless of how many items are in the table. An std::map has complexity that's logarithmic on the number of items being stored -- which means the time to insert or retrieve an item grows, but quite slowly, as the map grows larger. For example, if it takes 1 microsecond to lookup one of 1 million items, then you can expect it to take around 2 microseconds to lookup one of 2 million items, 3 microseconds for one of 4 million items, 4 microseconds for one of 8 million items, etc.
From a practical viewpoint, that's not really the whole story though. By nature, a simple hash table has a fixed size. Adapting it to the variable-size requirements for a general purpose container is somewhat non-trivial. As a result, operations that (potentially) grow the table (e.g., insertion) are potentially relatively slow (that is, most are fairly fast, but periodically one will be much slower). Lookups, which cannot change the size of the table, are generally much faster. As a result, most hash-based tables tend to be at their best when you do a lot of lookups compared to the number of insertions. For situations where you insert a lot of data, then iterate through the table once to retrieve results (e.g., counting the number of unique words in a file) chances are that an std::map will be just as fast, and quite possibly even faster (but, again, the computational complexity is different, so that can also depend on the number of unique words in the file).
1 Where the order is defined by the third template parameter when you create the map, std::less<T> by default.

Here's a more complete and flexible example that doesn't omit necessary includes to generate compilation errors:
#include <iostream>
#include <unordered_map>
class Hashtable {
std::unordered_map<const void *, const void *> htmap;
public:
void put(const void *key, const void *value) {
htmap[key] = value;
}
const void *get(const void *key) {
return htmap[key];
}
};
int main() {
Hashtable ht;
ht.put("Bob", "Dylan");
int one = 1;
ht.put("one", &one);
std::cout << (char *)ht.get("Bob") << "; " << *(int *)ht.get("one");
}
Still not particularly useful for keys, unless they are predefined as pointers, because a matching value won't do! (However, since I normally use strings for keys, substituting "string" for "const void *" in the declaration of the key should resolve this problem.)

Evidence that std::unordered_map uses a hash map in GCC stdlibc++ 6.4
This was mentioned at: https://stackoverflow.com/a/3578247/895245 but in the following answer: What data structure is inside std::map in C++? I have given further evidence of such for the GCC stdlibc++ 6.4 implementation by:
GDB step debugging into the class
performance characteristic analysis
Here is a preview of the performance characteristic graph described in that answer:
How to use a custom class and hash function with unordered_map
This answer nails it: C++ unordered_map using a custom class type as the key
Excerpt: equality:
struct Key
{
std::string first;
std::string second;
int third;
bool operator==(const Key &other) const
{ return (first == other.first
&& second == other.second
&& third == other.third);
}
};
Hash function:
namespace std {
template <>
struct hash<Key>
{
std::size_t operator()(const Key& k) const
{
using std::size_t;
using std::hash;
using std::string;
// Compute individual hash values for first,
// second and third and combine them using XOR
// and bit shifting:
return ((hash<string>()(k.first)
^ (hash<string>()(k.second) << 1)) >> 1)
^ (hash<int>()(k.third) << 1);
}
};
}

For those of us trying to figure out how to hash our own classes whilst still using the standard template, there is a simple solution:
In your class you need to define an equality operator overload ==. If you don't know how to do this, GeeksforGeeks has a great tutorial https://www.geeksforgeeks.org/operator-overloading-c/
Under the standard namespace, declare a template struct called hash with your classname as the type (see below). I found a great blogpost that also shows an example of calculating hashes using XOR and bitshifting, but that's outside the scope of this question, but it also includes detailed instructions on how to accomplish using hash functions as well https://prateekvjoshi.com/2014/06/05/using-hash-function-in-c-for-user-defined-classes/
namespace std {
template<>
struct hash<my_type> {
size_t operator()(const my_type& k) {
// Do your hash function here
...
}
};
}
So then to implement a hashtable using your new hash function, you just have to create a std::map or std::unordered_map just like you would normally do and use my_type as the key, the standard library will automatically use the hash function you defined before (in step 2) to hash your keys.
#include <unordered_map>
int main() {
std::unordered_map<my_type, other_type> my_map;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js