In C++ the indexing operator is defined for std::map and std::unordered_map so that if your reference to the container is non-const just the act of indexing without assigning is enough to have implicitly created a value inside the container. This sometimes creates subtle bugs, where you expect to be referencing a value inside a container but instead actually create one, e.g. config["stting"] instead of config["setting"].
I know Python addresses this by having __setitem__ and __getitem__ be separate methods but this requires cooperation with the parser.
Does Rust do anything to address this frequent source of bugs?
No Rust doesn't have that problem. There's no implicit creation of items in Rust collections.
For example you'd insert a key-value pair to a std::collections::HashMap with map.insert(key, value) and retrieve a value with let value = map.get(key);.
Note that .get() will return an Option<&V>, so if the key didn't exist you will get None as a result.
Rust also offers an easy way to either retrieve a value or to insert some default value for the given key if the value doesn't exist with:
let value = map.entry(key).or_insert(0);
HashMap also implements the Index trait, which allows retrieval of a value with let value = map[key];, which will panic if key doesn't exist in map.
Note that because HashMap does not implement IndexMut, this [ ] bracket syntax always returns an immutable reference to the value, so you cannot insert or modify values this way.
Related
I see little use of std::unordered_set or any hash table for that matter if the objects themselves are the keys: in order to search anything, you'd need to have the element in the first place. But if I already have the element, I don't need to search it (the exception being if I only need to check if it's a duplicate).
Much more interesting is the so called "heterogeneous lookup", the case where your data is an object containing multiple pieces of data, and treating one of those as key. For example I want to get all the employee data of one employee named "Pete Johnson" by doing auto it = employees.find("Pete Johnson").
I read about transparent comparators and tried creating my own hash function and 'equal_to' operator to use a subset of the data. Creating the set and adding objects works, but I still cannot get the 'find' function to work the way I want.
The easy solution for what I want is to just use std::unordered_map, but I hoped there would be a better way than to keep two copies of each key. Another workaround is to create a dummy object, copy the key into the dummy, then use that to do the lookup. However, I think the ideal solution would be if I had the option to provide a 'getKey' function to the constructor, so that 'find' can accept a naked key and use 'getKey' to compare it to the object to see if it matches.
std::string getKey(const userData_t& object){
return object.name;
}
std::size_t userData_t_hash(const userData_t& object){
return std::hash<std::string>()(getKey(object));
}
bool userData_t_equals_to(const userData_t& object, const std::string& str){
return getKey(object) == str;
}
bool userData_t_equals_to(const userData_t& object, const userData_t& other){
return getKey(object) == getKey(other);
}
Why would I need to search and obtain (a reference to) the object from the hash table if I already had the object?
One reason is to find out if the object is in the hash table. Sometimes sets are used to eliminate duplicates from a collection. If sorting is not needed, then unordered_set might be the best choice for the task.
It would make more sense if I could provide a 'getKey' function [etc.]
You can in principle. The syntax is not exactly what you had in mind, but the functionality is available.
The hash of an object does not need to take into account all data in the object. The requirement is that if two objects should be considered the same, then they must have the same hash. If you define the hash of your object to be the hash of object.getKey() and define equality to be having the same key, then you could find a real object by constructing a dummy object with just the key set. If your objects represent employees and the key is the name (not the best choice, but sufficient for now), then you could construct an employee object with no data set other than the name, and use that to find the real employee object.
Admittedly, constructing a dummy object to set one field is a bit of a hassle. Hence, C++14 introduced find overloads that accept data that compares equivalent to elements of your set. This requires that the hash and compare types have a member type named is_transparent. For example, this member type exists in std::equal_to<void> (which is not the same as the default comparison, std::equal_to<Key>). With the appropriate hash and comparison definitions, you could use employees.find("Pete") to find the employee object whose name is "Pete".
(I'm going to leave further details as "out of scope" for the question that was asked. Some general information can be found in What are transparent comparators?.)
So one use case for unordered_set is a replacement for unordered_map when the values would necessarily contain the key and the values do not need to change.
Actually, I want to pass an object to function by taking input from User. There are many objects and I want user to tell which object to pass. One way that I can think of is by using if/else-if statements (e.g if user says 1(int) then it means object-1). But is there any direct method by which I can directly take object as input. So I can pass it to function without using if/else-if statements.
You cannot have a user input an object directly out-of-the-box, but you can certainly write code to obtain that result (for example by implementing deserialization and receiving a JSON representation of the object).
However, if I understood your question correctly, you have a predefined set of objects with known integer keys. In that case, the most straightforward way is to store these objects in a container, such as std::map<int, YourObject> (or an std::vector<YourObject> if your keys are easily mappable to [0;N)). Once you've had the user input the key, you can then lookup into the container to retrieve the corresponding object via the container's at() member function.
Consider the case where I have a user defined type with say a id() member function which returns a unique std::string.
I want a container of this objects, where the id() uniquely identifies the elements, but I want to "use" the objects to do other things which may modify their members.
I am currently constructing the objects.by calling std::set::emplace and capturing the returned iterator, bool pair.
But I am then not allowed to modify it's value as the iterator is const.
Is there a good way to do what I want? The only two I can think of are:
Store unique_ptrs to the object in the set, this way the pointer value is what differentiates it rather than the name and the object pointed to can be modified.
Store a map using the id() as the Key, but this means I have duplicated the keys.
I am happy to use well adopted and modern libraries, such as boost, if they have the right container for my problem.
Is there a good way to do what I want?
No not really. The granularity of std::set is at object level. There is no way to express that a portion of an object contributes to the key.
Some people recommend declaring all non-key members mutable. This is wrong, as mutable is meant for things that are hidden from the public interface of the object (e.g. a mutex).
The "official" way is to take the object out the set, modify it and put it back in. C++17 has set::extract which helps to improve performance of this task a bit (which of course remains inefficient if you never modify the key, since the tree still has to be checked/rebalanced).
I want to "use" the objects to do other things which may modify their members.
If you're absolutely sure you never modify the object key, just cast away constness. From a legal point of view it is OK to cast away constness from objects that were not born const. For extra safety you can wrap the key into another, const member:
struct Element {
const Key key;
Value value;
};
This won't help if you have a data cube with multiple sets each using its own "view" on the key.
1. Store unique_ptrs to the object in the set
This would be a pessimization due to extra indirection. Since the elements are on the heap, you will take an extra cache miss. And again end up with UB if you accidentally modify the key.
2. Store a map using the id() as the Key
Yes, different variations of this approach are possible, but you must still ensure to never modify the key.
For example you could store a key + pointer to data. This approach is often combined with a dense_hash_set with linear probing for best performance. Since the value is accessed only once after the element is found, it doesn't really matter that it is located elsewhere.
I would suggest using Boost.MultiIndex as a drop-in replacement for std::set, as it adds the modify method which allows modification of an element, checking whether the position within the container has changed:
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
struct S { /* ... */ };
boost::multi_index_container<S> t; // default configuration emulates std::set<S>
auto [it, inserted] = t.emplace(...);
t.modify(it, [&](S& s) {
// modify s here
// if the key is unchanged, s does not move
// the iterator `it` remains valid regardless
});
Example.
There is a small overhead in checking that the key is indeed unchanged, but this should be minimal compared to the rest of the program and should optimize and predict well.
std::set maintains its elements sorted, and the keys the elements are sorted by, correspond to the elements themself. As a result, the elements in the std::set are const qualified to prevent the user from modifying the elements (i.e., the keys) and thus breaking the std::set order.
Traditionally, if you wanted to modify an element of an std::set, you would have first to remove the element object you wish to modify from the std::set, modify it, and insert it into the std::set again. The problem is that this results in the allocation of an std::set internal node.
Since C++17 you can remove and reinsert an element into the std::set without allocating an std::set internal node thanks to std::set::extract(). This member function returns the node handle corresponding to the requested element. After modifying the element through this returned node, you can reinsert the node with the corresponding insert() overload. No node allocation takes place as you are reusing an already allocated node.
The drawback to these approaches – regardless of whether or not allocation occurs – is that reinserting the element into the std::set takes logarithmic time in the size of the set (unless you can take advantage of the hint to insert()).
Casting away constness and modifying std::set elements
You can still cast const away from an element of the std::set and modify its data members, as long as your std::set's comparison function doesn't take into account the data members you change. That is, if you only modify data members of an element belonging to an std::set whose comparison function doesn't consider, the order won't break.
I am trying to use the following data collection in my program:
boost::unordered_set<boost::shared_ptr<Entity> > _entities;
I am using unordered_set because I want fast insertion and removal (by key, not iterator) of Entities.
My doubt is, if I implement the following two functions:
void addEntity(boost::shared_ptr<Entity> entity) {
_entities.insert(entity);
}
void removeEntity(boost::shared_ptr<Entity> entity) {
_entities.remove(entity);
}
When I try to remove the entity, will the unordered_set find it? Because the shared_ptr that is stored inside the unordered_set is a copy of the shared_ptr that I am trying to use to remove the entity from the unordered_set if I call removeEntity().
What do I need to do for the unordered_set to find the Entity? Do I need to create a comparison function that checks the values of the shared_ptr's? But then won't the unordered_set slow down because its hash function uses the shared_ptr as hash? Will I need to create a hash function that uses the Entity as hash also?
Yes you can use boost::shared_ptr in boost::unordered_set (same applies to std version of these classes)
boost::unordered_set uses boost::hash template function to generate keys for boost::unordered_set. This function is specialised for boost::shared_ptr to take underlying pointer into account.
Let me try to explain this if I got it right :-
boost::shared_ptr is implemented using reference counting mechanism. That means whenever you are passing it to some other let's say function you are just increasing reference count whereas when you delete it you are decreasing its reference count. When reference count is 0 then only that object is deleted from memory.
Always be cautious when using it. It can save you from memory leaks only as far as your design is good to accomodate them.
For example I faced one issue.
I'd a class with map containing shared_ptrs. Later this class ( without my knowledge )was also made responsible for passing these shared-ptrs to some other classes which in-turn used some containers to store these. As a result this code blowed up luckily on tester's face in the form of memory leak.
I hope you can figure out why.
I have a question related to understanding of how python dictionaries work.
I remember reading somewhere strings in python are immutable to allow hashing, and it is the same reason why one cannot directly use lists as keys, i.e. the lists are mutable (by supporting .append) and hence they cannot be used as dictionary keys.
I wanted to know how does implementation of unordered_map in C++ handles these cases. (since strings in C++ are mutable)
Keys in all C++ map/set containers are const and thus immutable (after added to the container).
Notice that C++ containers are not specific to string keys, you can use any objects, but the constness will prevent modifications after the key is copied to the container.