The difference between python dict and tr1::unordered_map in C++ - c++

I have a question related to understanding of how python dictionaries work.
I remember reading somewhere strings in python are immutable to allow hashing, and it is the same reason why one cannot directly use lists as keys, i.e. the lists are mutable (by supporting .append) and hence they cannot be used as dictionary keys.
I wanted to know how does implementation of unordered_map in C++ handles these cases. (since strings in C++ are mutable)

Keys in all C++ map/set containers are const and thus immutable (after added to the container).
Notice that C++ containers are not specific to string keys, you can use any objects, but the constness will prevent modifications after the key is copied to the container.

Related

Does it risk if use c++ std::unordered_map like this?

Does it risk if use c++ std::unordered_map like this?
std::unordered_map<std::string, std::unordered_set<std::shared_ptr<SomeType>>> _some_map;
...
void init() {
auto item = std::make_shared<SomeType>();
_some_map["circle"].insert(item);
}
_some_map is a member variable. init() function can be called only once, so insert is thread-safe. After initialization, the map will be read only in multi-thread. I think it is thread-safe too.
I'm not sure if I can use insert like this. Because it will make a unordered_set and a pair when there is no val of key "circle". The code works normally. Just want to make sure it is without potential risk.
(I am not good at English writing. )
Thank you.
I'm not sure if I can use insert like this.
Yes, you can, because operator[]
Returns a reference to the value that is mapped to a key equivalent to key, performing an insertion if such key does not already exist.
Your value_type is std::unordered_set which is default constructible, so there is no problem.
After initialization, the map will be read only in multi-thread
Here also you are safe, because according to the containers documentation
All const member functions can be called concurrently by different threads on the same container.
so if you are only reading the values without modifying them, it is OK. You could even perform write operations if you could guarantee that different threads are accessing different elements in your container.

Does Rust implicitly create map entries when indexing, like C++?

In C++ the indexing operator is defined for std::map and std::unordered_map so that if your reference to the container is non-const just the act of indexing without assigning is enough to have implicitly created a value inside the container. This sometimes creates subtle bugs, where you expect to be referencing a value inside a container but instead actually create one, e.g. config["stting"] instead of config["setting"].
I know Python addresses this by having __setitem__ and __getitem__ be separate methods but this requires cooperation with the parser.
Does Rust do anything to address this frequent source of bugs?
No Rust doesn't have that problem. There's no implicit creation of items in Rust collections.
For example you'd insert a key-value pair to a std::collections::HashMap with map.insert(key, value) and retrieve a value with let value = map.get(key);.
Note that .get() will return an Option<&V>, so if the key didn't exist you will get None as a result.
Rust also offers an easy way to either retrieve a value or to insert some default value for the given key if the value doesn't exist with:
let value = map.entry(key).or_insert(0);
HashMap also implements the Index trait, which allows retrieval of a value with let value = map[key];, which will panic if key doesn't exist in map.
Note that because HashMap does not implement IndexMut, this [ ] bracket syntax always returns an immutable reference to the value, so you cannot insert or modify values this way.

Can I have a C++ map where multiple keys reference the value without using pointers?

From a C background I find myself falling back into C habits where there is generally a better way. In this case I can't think of a way to do this without pointers.
I would like
struct foo {
int i;
int j;
};
mymap['a'] = foo
mymap['b'] = bar
As long as only one key references a value mymap.find will return a reference so I can modify the value, but if I do this:
mymap['c'] = mymap.find('a') // problematic because foo is copied right?
The goal is to be able to find 'a' or 'c' modify foo and then the next find of 'a' or 'c' will show the updated result.
No, you will need to use pointers for this. Each entry in the map maintains a copy of the value assigned, which means that you cannot have two keys referring to the same element. Now if you store pointers to the element, then two keys will refer to two separate pointers that will refer to the exact same in memory element.
For some implementation details, std::map is implemented as a balanced tree where in each node contains a std::pair<const Key,Value> object (and extra information for the tree structure). When you do m[ key ] the node containing the key is looked up or a new node is created in the tree and the reference to the Value subobject of the pair is returned.
I would use std::shared_ptr here. You have an example of shared ownership, and shared_ptr is made for that. While pointers tend to be overused, it is nothing wrong with using them when necessary.
Boost.Intrusive
Boost.Intrusive is a library presenting some intrusive containers to
the world of C++. Intrusive containers are special containers that
offer better performance and exception safety guarantees than
non-intrusive containers (like STL containers).
The performance benefits of intrusive containers makes them ideal as a
building block to efficiently construct complex containers like
multi-index containers or to design high performance code like memory
allocation algorithms.
While intrusive containers were and are widely used in C, they became
more and more forgotten in C++ due to the presence of the standard
containers which don't support intrusive techniques.Boost.Intrusive
not only reintroduces this technique to C++, but also encapsulates the
implementation in STL-like interfaces. Hence anyone familiar with
standard containers can easily use Boost.Intrusive.

Concurrent factory/flyweight with TBB

I have a flyweight pattern working in serial where the factory uses std::map to store and provide access to the created objects. The factory returns an iterator that points to the object in the map. The objects in the factory are constants, so they will not be updated once inserted, unless they are erased.
I would like to make the factory concurrent using tbb::concurrent_hash_map, but I am unsure what the return should be. I could use an iterator (should it be const_iterator?), but the documentation says that all iterators are invalidated when something does a find or insert in the concurrent_hash_map. So I could use a const_accessor since only read-only access is needed, but then this is different from the serial implementation (iterator vs accessor).
Which one is better to use? Should consistency in types (ie. both iterators) be important? Both serial and threaded compile-time options need to be there.
If you do not erase elements simultaneously with other threads accessing the map, you may use tbb::concurrent_unordered_map instead. This is also a hash-based associative container, but with simpler and more STL-like API. It does not invalidate iterators by insert and find, but as a tradeoff, it does not allow concurrent removal of elements.
If you do need to remove elements concurrently, the only choice with TBB is to use tbb::concurrent_hash_map with accessors.
I also suggest you to discuss your use case at the TBB forum.

C++ associative containers - why doesn't the standard defines methods to exchange and replaces keys?

I need to replace specific key values, while the rest of the value_type is left untouched. What I actually need to do, is copy the value, erase the entry and insert it with changed key value again. This is absolutely bad. I need to copy the whole value_type twice, and deallocate/allocate again.
Why the standard doesn't define methods like this:
// returns count of exchanged keys
size_type exchange_key(key_type const& x, key_type const& y);
// returns count of replaced keys
size_type replace_key(key_type const& old_key, key_type const& new_key);
Is there anything I'm missing?
I don't why it was not added in the first place, and i understand that it is too bad. I guess they just added what they felt was absolutely necessary.
I think i have read somewhere that Boost.MultiIndex provided this ability.
Associative containers are implemented in a way that does not allow to change the 'key' in an efficient manner. To make this explicit it does not provide convienence methods to replace a key. The associative container would also have to remove and insert again under the covers.
I think this is an abstraction problem. The standard doesn't say exactly how the containers are to be implemented, it only specifies the maximum complexity of some of the operations and leaves the rest to the implementation.
If the standard were to add a replace_key function, it would also have to specify that this should have a different complexity than the existing erase-insert combination. How can it do that without leaking implementation details? If the function isn't guaranteed to be faster on all implementations, it is pretty useless.
When you say that it would obviously be faster, you make assumptions about implementation details that the standard tries to avoid.
Now, you can, with .extract(key) (since C++17).
https://en.cppreference.com/w/cpp/container/map/extract
This is because changing a key could affect the structure of an associative containers. Notably, std::map, which is a typically Red-Black tree, the tree structure mostly will be changed once you modify a key (e.g., rotating sub trees). In some data structures, even such dynamic changes are disallowed. So, it is challenging to expose such operation as a standard in an associative container.
Regarding the overhead you concerned, once you have value_type as a pointer or reference, the overhead of deleting/inserting a pair isn't too bad.
Well, honestly behind the screens it would result into an insert and delete operation anyhow, with the sole difference that the value-part will not be copied. While this seems to be your biggest concern, unless your object is very heavy on copying, in a large container, the update operation to re-stabilize the ordered container will be heavier anyhow.
Now this would require some important changes however going further than the associative containers, the two most important I can see are:
The std::pair class needs an update, because you must be able to update the key without creating a new pair (as this would also copy the value object).
You need an update function that removes a pair from the tree, calls the new logic from 1., and reinserts it.
I think the main problem lies with the first one, as std::pair is at the moment actually a very simple wrapper, and your proposal would delete that principle and add some (unnecessary) complexity to it. Also note that call 2 does not actually add new functionality but wraps a system the developer could manage himself easily through references and the like. If they would add all this wrapping to std, it would become a really huge bloated piece of library.
If you want this principle, you should look for more complex libraries (probably boost has some). Or you should simply use a reference/shared_ptr as your value_type.