std::unordered_set::find - construct an instance only for find() - c++

A lot of times I see my key is actually inside my value.
For example:
struct Elem {
int key;
// ... Other variables ...
}
That makes me want to use std::unordered_set instead of std::unordered_map, because I already have the key stored inside my value - no need to waste more place for std::unordered_map's .first (key).
Then I start implementing with std::unordered_set and get to the place I need to perform a find() over my std::unordered_set.
Then I realize I need to create an empty-shell Elem so I would be able to find(), beacuse std::unordered_set::find gets a Key for input
template < class Key, // unordered_set::key_type/value_type
class Hash = hash<Key>, // unordered_set::hasher
class Pred = equal_to<Key>, // unordered_set::key_equal
class Alloc = allocator<Key> // unordered_set::allocator_type
> class unordered_set;
Sometimes building an empty-shell Elem is hard / wasteful / maybe even not possible?
For example, when my key/value is
An iterator
A reference
A class with specific c'tor (not constructing the instance only with the key)
Q. Am I missing something?
Q. Is there a way to do find() that isn't wasteful? I mean that doesn't make me create an instance I didn't want to
Something really strange to me - that I already should have the element I'm looking for in order to find it, or at least an empty-shell of it.

When choosing a data structure to hold your data you need to consider your use case.
If you want to look up data from a key you should use a map. If you just want to store unique values in a collection and you don't need to look them up use set.
I don't see why its so much trouble to insert a element as map.emplace_back(elem.key, elem) vs set.emplace_back(elem) if it means that down the road you can just query the elem as map.at(key) or map[key] vs having create an empty elem.
Besides, std::set does the whole key thingamajig (roughly) underwater anyway. (source: What is the difference between set vs map in C++?)

Related

Implement Map in C++ with a predefined size

I want to implement my own version of a map data structure in C++.
I need its size to be predefined so I decided that when the user creates the map he will have to specify not only the size, but also a default value for the keys and for the values.
before you ask - I need this in order to be able to use that data structore in an embedded system.
I have written this constructor:
template <typename T, typename K>
myMap<T, K>::myMap(int size, const T& keyInit, const K& valueInit) :
size(nSize), defaultKey(keyInit), defaultValue(valueInit)
{
for (int i = 0; i < nSize; i++)
{
container.insert(std::make_pair(keyInit,valueInit));
}
}
where container is a member of type: std::map<T, K> (and basically the class is just a wrapper for the stl map)
now I'm not sure how to implement the insert function. aside from that I have figured out how to delete (reassgin default values) and how to search (use stl's maps find).
So my only problem right now is the insert method.
I thought about iterating through the map and looking for the first free cell but it got me confused and I' stuck.
Any other ideas would be great.
If I've understood well, you make a wrapper for a standard container, i.e. a map.
At construction, you intend to insert several the same default key. This is not allowed in a std::map, where the key MUST be unique.
If you want to use several time the same key in a map, you have to use a multimap: you can use it almost as a map, but it allows to have duplicate keys.
I'm not sure why you create items with a default key, but you have to be aware that maps and multimaps work with ordered keys. So if you intend later to replace the default key with another one, you can't just replace the values: you have to delete the entry and insert a new one.
Key in std::map is unique, so what you do at construction does not make sense.
If you seek great performance (as this may be the case in embedded world), then I would advise to find some other hash table implementation.
You can also implement your own hash table to meet your needs.
For an embedded system what I think you actually want is custom memory allocators. You can define an allocation pool for each data structure you want to use. Then you pass it to the constructor for the map and it will allocate data from the pool. When the pool is empty it will send out bad_alloc exceptions.

C++11 unordered_map time complexity

I'm trying to figure out the best way to do a cache for resources. I am mainly looking for native C/C++/C++11 solutions (i.e. I don't have boost and the likes as an option).
What I am doing when retrieving from the cache is something like this:
Object *ResourceManager::object_named(const char *name) {
if (_object_cache.find(name) == _object_cache.end()) {
_object_cache[name] = new Object();
}
return _object_cache[name];
}
Where _object_cache is defined something like: std::unordered_map <std::string, Object *> _object_cache;
What I am wondering is about the time complexity of doing this, does find trigger a linear-time search or is it done as some kind of a look-up operation?
I mean if I do _object_cache["something"]; on the given example it will either return the object or if it doesn't exist it will call the default constructor inserting an object which is not what I want. I find this a bit counter-intuitive, I would have expected it to report in some way (returning nullptr for example) that a value for the key couldn't be retrieved, not second-guess what I wanted.
But again, if I do a find on the key, does it trigger a big search which in fact will run in linear time (since the key will not be found it will look at every key)?
Is this a good way to do it, or does anyone have some suggestions, perhaps it's possible to use a look up or something to know if the key is available or not, I may access often and if it is the case that some time is spent searching I would like to eliminate it or at least do it as fast as possible.
Thankful for any input on this.
The default constructor (triggered by _object_cache["something"]) is what you want; the default constructor for a pointer type (e.g. Object *) gives nullptr (8.5p6b1, footnote 103).
So:
auto &ptr = _object_cache[name];
if (!ptr) ptr = new Object;
return ptr;
You use a reference into the unordered map (auto &ptr) as your local variable so that you assign into the map and set your return value in the same operation. In C++03 or if you want to be explicit, write Object *&ptr (a reference to a pointer).
Note that you should probably be using unique_ptr rather than a raw pointer to ensure that your cache manages ownership.
By the way, find has the same performance as operator[]; average constant, worst-case linear (only if every key in the unordered map has the same hash).
Here's how I'd write this:
auto it = _object_cache.find(name);
return it != _object_cache.end()
? it->second
: _object_cache.emplace(name, new Object).first->second;
The complexity of find on an std::unordered_map is O(1) (constant), specially with std::string keys which have good hashing leading to very low rate of collisions. Even though the name of the method is find, it doesn't do a linear scan as you pointed out.
If you want to do some kind of caching, this container is definitely a good start.
Note that a cache typically is not just a fast O(1) access but also a bounded data structure. The std::unordered_map will dynamically increase its size when more and more elements are added. When resources are limited (e.g. reading huge files from disk into memory), you want a bounded and fast data structure to improve the responsiveness of your system.
In contrast, a cache will use an eviction strategy whenever size() reaches capacity(), by replacing the least valuable element.
You can implement a cache on top of a std::unordered_map. The eviction strategy can then be implemented by redefining the insert() member. If you want to go for an N-way (for small and fixed N) associative cache (i.e. one item can replace at most N other items), you could use the bucket() interface to replace one of the bucket's entries.
For a fully associative cache (i.e. any item can replace any other item), you could use a Least Recently Used eviction strategy by adding a std::list as a secondary data structure:
using key_tracker_type = std::list<K>;
using key_to_value_type = std::unordered_map<
K,std::pair<V,typename key_tracker_type::iterator>
>;
By wrapping these two structures inside your cache class, you can define the insert() to trigger a replace when your capacity is full. When that happens, you pop_front() the Least Recently Used item and push_back() the current item into the list.
On Tim Day's blog there is an extensive example with full source code that implements the above cache data structure. It's implementation can also be done efficiently using Boost.Bimap or Boost.MultiIndex.
The insert/emplace interfaces to map/unordered_map are enough to do what you want: find the position, and insert if necessary. Since the mapped values here are pointers, ekatmur's response is ideal. If your values are fully-fledged objects in the map rather than pointers, you could use something like this:
Object& ResourceManager::object_named(const char *name, const Object& initialValue) {
return _object_cache.emplace(name, initialValue).first->second;
}
The values name and initialValue make up arguments to the key-value pair that needs to be inserted, if there is no key with the same value as name. The emplace returns a pair, with second indicating whether anything was inserted (the key in name is a new one) - we don't care about that here; and first being the iterator pointing to the (perhaps newly created) key-value pair entry with key equivalent to the value of name. So if the key was already there, dereferencing first gives the original Ojbect for the key, which has not been overwritten with initialValue; otherwise, the key was newly inserted using the value of name and the entry's value portion copied from initialValue, and first points to that.
ekatmur's response is equivalent to this:
Object& ResourceManager::object_named(const char *name) {
bool res;
auto iter = _object_cache.end();
std::tie(iter, res) = _object_cache.emplace(name, nullptr);
if (res) {
iter->second = new Object(); // we inserted a null pointer - now replace it
}
return iter->second;
}
but profits from the fact that the default-constructed pointer value created by operator[] is null to decide whether a new Object needs to be allocated. It's more succinct and easier to read.

std::map<int, int> vs. vector of vector

I need a container to store a value (int) according to two attributes, source (int) and destination (int) i.e. when a source sends something to a destination, I need to store it as an element in a container. The source is identified by a unique int ID (an integer from 0-M), where M is in the tens to hundreds, and so is the destination (0-N). The container will be updated by iterations of another function.
I have been using a vector(vector(int)) which means goes in the order of source(destination(value)). A subsequent process needs to check this container, to see if an element exists in for a particular source, and a particular destination - it will need to differentiate between an empty 'space' and a filled one. The container has the possibility of being very sparse.
The value to be stored CAN be 0 so I haven't had success trying to find out if the space is empty, since I can't seem to do something like container[M][N].empty().
I have no experience with maps, but I have seen another post that suggests a map might be useful, and an std::map<int, int> seems to be similar to a vector<vector<int>>.
To summarise:
Is there a way to check if a specific vector of vector 'space' is empty (since I can't compare it to 0)
Is a std::map<int, int> better for this purpose, and how do I use one?
I need a container to store a value (int) according to two attributes,
source (int) and destination (int)
std::map<std::pair<int, int>, int>
A subsequent process needs to check this container, to see if an
element exists in for a particular source, and a particular
destination - it will need to differentiate between an empty 'space'
and a filled one.
std::map::find
http://www.cplusplus.com/reference/map/map/find/
The container has the possibility of being very sparse.
Use a std::map. The "correct" choice of a container is based on how you need to find things and how you need to insert/delete things. If you want to find things fast, use a map.
First of all, assuming you want an equivalent structure of
vector<vector<int>>
you would want
std::map<int,std::vector<int>>
because for each key in a map, there is one unique value only.
If your sources are indexed very closely sequentially as 0...N, will be doing a lot of look-ups, and few deletions, you should use a vector of vectors.
If your sources have arbitrary IDs that do not closely follow a sequential order or if you are going to do a lot of insertions/deletions, you should use a map<int,vector<int>> - usually implemented by a binary tree.
To check the size of a vector, you use
myvec.size()
To check whether a key exists in a map, you use
mymap.count(ID) //this will return 0 or 1 (we cannot have more than 1 value to a key)
I have used maps for a while and even though I'm nowhere close to an expert, they've been very convenient for me to use for storing and modifying connections between data.
P.S. If there's only up to one destination matching a source, you can proceed with
map<int,int>
Just use the count() method to see whether a key exists before reading it
If you want to keep using a vector but want to add a check for whether the item contains a valid value, look at boost::optional. The type would now be std::vector<std::vector<boost::optional<int>>>.
You can also use a map, but the key into the map needs to be both IDs not just one.
std::map<std::pair<int,int>,int>
Edit: std::pair implements a comparison operator operator< that should be sufficient for use in a map, see http://en.cppreference.com/w/cpp/utility/pair/operator_cmp.

Address of map value

I have a settings which are stored in std::map. For example, there is WorldTime key with value which updates each main cycle iteration. I don't want to read it from map when I do need (it's also processed each frame), I think it's not fast at all. So, can I get pointer to the map's value and access it? The code is:
std::map<std::string, int> mSettings;
// Somewhere in cycle:
mSettings["WorldTime"] += 10; // ms
// Somewhere in another place, also called in cycle
DrawText(mSettings["WorldTime"]); // Is slow to call each frame
So the idea is something like:
int *time = &mSettings["WorldTime"];
// In cycle:
DrawText(&time);
How wrong is it? Should I do something like that?
Best use a reference:
int & time = mSettings["WorldTime"];
If the key doesn't already exist, the []-access will create the element (and value-initialize the mapped value, i.e. 0 for an int). Alternatively (if the key already exists):
int & time = *mSettings.find("WorldTime");
As an aside: if you have hundreds of thousands of string keys or use lookup by string key a lot, you might find that an std::unordered_map<std::string, int> gives better results (but always profile before deciding). The two maps have virtually identical interfaces for your purpose.
According to this answer on StackOverflow, it's perfectly OK to store a pointer to a map element as it will not be invalidated until you delete the element (see note 3).
If you're worried so much about performance then why are you using strings for keys? What if you had an enum? Like this:
enum Settings
{
WorldTime,
...
};
Then your map would be using ints for keys rather than strings. It has to do comparisons between the keys because I believe std::map is implemented as a balanced tree. Comparisons between ints are much faster than comparisons between strings.
Furthermore, if you're using an enum for keys, you can just use an array, because an enum IS essentially a map from some sort of symbol (ie. WorldTime) to an integer, starting at zero. So then do this:
enum Settings
{
WorldTime,
...
NumSettings
};
And then declare your mSettings as an array:
int mSettings[NumSettings];
Which has faster lookup time compared to a std::map. Reference like this then:
DrawText(mSettings[WorldTime]);
Since you're basically just accessing a value in an array rather than accessing a map this is going to be a lot faster and you don't have to worry about the pointer/reference hack you were trying to do in the first place.

C++: insert into std::map without knowing a key

I need to insert values into std::map (or it's equivalent) to any free position and then get it's key (to remove/modify later). Something like:
std::map<int, std::string> myMap;
const int key = myMap.insert("hello");
Is it possibly to do so with std::map or is there some appropriate container for that?
Thank you.
In addition to using a set, you can keep a list of allocated (or free)
keys, and find a new key before inserting. For a map indexed by
int, you can simply take the last element, and increment its key. But
I rather think I'd go with a simple std::vector; if deletion isn't
supported, you can do something simple like:
int key = myVector.size();
myVector.push_back( newEntry );
If you need to support deletions, then using a vector of some sort of
"maybe" type (boost::optional, etc.—you probably already have
one in your toolbox, maybe under the name of Fallible or Maybe) might be
appropriate. Depending on use patterns (number of deletions compared to
total entries, etc.), you may want to search the vector in order to
reuse entries. If your really ambitious, you could keep a bitmap of the
free entries, setting a bit each time you delete and entry, and
resetting it whenever you reuse the space.
You can add object to an std::set, and then later put the whole set into a map. But no, you can't put a value into a map without a key.
The closest thing to what you're trying to do is probably
myMap[myMap.size()] = "some string";
The only advantage this has over std::set is that you can pass the integer indexes around to other modules without them needing to know the type of std::set<Foo>::iterator or similar.
It is impossible. Such an operation would require intricate knowledge of the key type to know which keys are available. For example, std::map would have to increment int values for int maps or append to strings for string maps.
You could use a std::set and drop keying altogether.
If you want to achieve something similar to automatically generated primary keys in SQL databases than you can maintain a counter and use it to generate a unique key. But perhaps std::set is what you really need.