Please help to figure out the logic of using unordered_set with custom structures.
Consider I have following class
struct MyClass {
int id;
// other members
};
used with shared_ptr
using CPtr = std::shared_ptr<MyClass>;
Because of fast access by key I supposed to use an unordered_set with a custom hash and the MyClass::id member as a key):
template <class T> struct CHash;
template<> struct CHash<CPtr>
{
std::size_t operator() (const CPtr& c) const
{
return std::hash<decltype(c->id)> {} (c->id);
}
};
using std::unordered_set<CPtr, CHash>;
Right now, unordered_set still seems to be an appropriate container. However standard find() functions for sets are assumed to be const to ensure keys won't be changed. I intend to change objects guaranteeing keeping keys unchanged. So, the questions are:
1) How to realize easy accessing to element of set by int key reserving possibility to change element, something like
auto element = my_set.find(5);
element->b = 3.3;
It is possible to add converting constructor and use something like
auto element = my_set.find(MyClass (5));
But it doesn't solve the problem with constness and what if the class is huge.
2) Am I actually going wrong way? Should I use another container? For example unordered_map, that will store one more int key for each entry consuming more memory.
A pointer doesn't project its constness to the object it points to. Meaning, if you have a constant reference to a std::shared_ptr (as in a set) you can still modify the object via this pointer. Whether or not that is something you should do a is a different question and it doesn't solve your lookup problem.
OF course, if you want to lookup a value by a key, then this is what std::unordered_map was designed for so I'd have a closer look there. The main problem I see with this approach is not so much the memory overhead (unordered_set and unordered_map as well as shared_ptr have noticeable memory overhead anyway), but that you have to maintain redundant information (id in the object and id as a key).
If you have not many insertions and you don't absolutely need the (on average) constant lookup time and memory overhead is really important to you, you could consider a third solution (besides using a third-party or self written data structure of courses): namely to write a thin wrapper around a sorted std::vector<std::shared_ptr<MyClass>> or - if appropriate - even better std::vector<std::unique_ptr<MyClass>> that uses std::upper_bound for lookups.
I think you are going a wrong way using unordered_set,because unordered_set's definition is very clear that:
Keys are immutable, therefore, the elements in an unordered_set cannot be modified once in the container - they can be inserted and removed, though.
You can see its definition in site:
http://www.cplusplus.com/reference/unordered_set/unordered_set/.
And hope it is helpful for you.Thanks.
Related
Consider the case where I have a user defined type with say a id() member function which returns a unique std::string.
I want a container of this objects, where the id() uniquely identifies the elements, but I want to "use" the objects to do other things which may modify their members.
I am currently constructing the objects.by calling std::set::emplace and capturing the returned iterator, bool pair.
But I am then not allowed to modify it's value as the iterator is const.
Is there a good way to do what I want? The only two I can think of are:
Store unique_ptrs to the object in the set, this way the pointer value is what differentiates it rather than the name and the object pointed to can be modified.
Store a map using the id() as the Key, but this means I have duplicated the keys.
I am happy to use well adopted and modern libraries, such as boost, if they have the right container for my problem.
Is there a good way to do what I want?
No not really. The granularity of std::set is at object level. There is no way to express that a portion of an object contributes to the key.
Some people recommend declaring all non-key members mutable. This is wrong, as mutable is meant for things that are hidden from the public interface of the object (e.g. a mutex).
The "official" way is to take the object out the set, modify it and put it back in. C++17 has set::extract which helps to improve performance of this task a bit (which of course remains inefficient if you never modify the key, since the tree still has to be checked/rebalanced).
I want to "use" the objects to do other things which may modify their members.
If you're absolutely sure you never modify the object key, just cast away constness. From a legal point of view it is OK to cast away constness from objects that were not born const. For extra safety you can wrap the key into another, const member:
struct Element {
const Key key;
Value value;
};
This won't help if you have a data cube with multiple sets each using its own "view" on the key.
1. Store unique_ptrs to the object in the set
This would be a pessimization due to extra indirection. Since the elements are on the heap, you will take an extra cache miss. And again end up with UB if you accidentally modify the key.
2. Store a map using the id() as the Key
Yes, different variations of this approach are possible, but you must still ensure to never modify the key.
For example you could store a key + pointer to data. This approach is often combined with a dense_hash_set with linear probing for best performance. Since the value is accessed only once after the element is found, it doesn't really matter that it is located elsewhere.
I would suggest using Boost.MultiIndex as a drop-in replacement for std::set, as it adds the modify method which allows modification of an element, checking whether the position within the container has changed:
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
struct S { /* ... */ };
boost::multi_index_container<S> t; // default configuration emulates std::set<S>
auto [it, inserted] = t.emplace(...);
t.modify(it, [&](S& s) {
// modify s here
// if the key is unchanged, s does not move
// the iterator `it` remains valid regardless
});
Example.
There is a small overhead in checking that the key is indeed unchanged, but this should be minimal compared to the rest of the program and should optimize and predict well.
std::set maintains its elements sorted, and the keys the elements are sorted by, correspond to the elements themself. As a result, the elements in the std::set are const qualified to prevent the user from modifying the elements (i.e., the keys) and thus breaking the std::set order.
Traditionally, if you wanted to modify an element of an std::set, you would have first to remove the element object you wish to modify from the std::set, modify it, and insert it into the std::set again. The problem is that this results in the allocation of an std::set internal node.
Since C++17 you can remove and reinsert an element into the std::set without allocating an std::set internal node thanks to std::set::extract(). This member function returns the node handle corresponding to the requested element. After modifying the element through this returned node, you can reinsert the node with the corresponding insert() overload. No node allocation takes place as you are reusing an already allocated node.
The drawback to these approaches – regardless of whether or not allocation occurs – is that reinserting the element into the std::set takes logarithmic time in the size of the set (unless you can take advantage of the hint to insert()).
Casting away constness and modifying std::set elements
You can still cast const away from an element of the std::set and modify its data members, as long as your std::set's comparison function doesn't take into account the data members you change. That is, if you only modify data members of an element belonging to an std::set whose comparison function doesn't consider, the order won't break.
what I plan to do: I have a own container-structure. But it should not store all elements, but only some with special properties, while others can be created dynamically from the information in the stucture.
Thus I have a insert(ITEM* i) method, that checks, whether i needs to be stored explicitly and is then stored into a map, or of it can be reconstructed dynamically. Then only the information that the item was added is stored.
The same for a ITEM* get(ITEMINDEX idx) method. It checks whether the ITEM belonging to idx is stored explicitly. If yes, it is read from the internal map and the pointer returned. If it is registered, but implicitly stored, the ITEM is created dynamically and returned.
In order to be compatible with other structures in the code, I planned to overload the [] operator, but I don't know how to approach this or if this is even possible for this more complex structure.
Is it possible and if yes, how?
Thanks in advance!
Update
Nims code works. But I recognised a problem now (although it was obvious from the beginning...): If get() finds a entry the pointer is returned by []. If it is not stored, the ITEM is constructed and the pointer returned. But the RAM is never released, because the algorithm that uses the container cannot distinguish between saved an constructed ITEMS to delete the second kind.
If you would like to provide operator[] for your object you have to answer your self if following code makes sense:
MyObj obj;
/*1*/ obj[some_index] = new_object;
/*2*/ Obj& some_object = obj[some_index];
what semantics would be for case 1 - you say: it should insert some value into your class but, you add new data with insert(ITEM* i) (no some_index provided here), so you should prohibit usage of your class as in case of 1.
Now for case 2, in your example you show that ITEM* get(ITEMINDEX idx), so your class client code must know what is ITEMINDEX, so it looks like case 2 would be ok.
The problem is IMO only with case 1, and inserting new data with operator[]. If you look at the std::map container then you have no problem with both above cases. Users of you class will want to use it in similar way to std::map operator[], if its functionality will differ it will cause confusion.
A C++ class operator is a method with a weird name. It can do whatever you wish, provided it has the expected weird name. In the case of operator [], it looks like
template <typename T>
T& operator[](std::size_t idx) { /* arbitrary code */ };
That's about all there is to it really.
This interface is quite constrained. You'll get an index and need to produce a reference to an instance of your class in response. You can return a reference to something with unusual ideas about assignment, e.g. on assignment it does some checking with a data structure accessed via an internal pointer. There may be a lot of incidental complexity down this path.
I am trying to use the following data collection in my program:
boost::unordered_set<boost::shared_ptr<Entity> > _entities;
I am using unordered_set because I want fast insertion and removal (by key, not iterator) of Entities.
My doubt is, if I implement the following two functions:
void addEntity(boost::shared_ptr<Entity> entity) {
_entities.insert(entity);
}
void removeEntity(boost::shared_ptr<Entity> entity) {
_entities.remove(entity);
}
When I try to remove the entity, will the unordered_set find it? Because the shared_ptr that is stored inside the unordered_set is a copy of the shared_ptr that I am trying to use to remove the entity from the unordered_set if I call removeEntity().
What do I need to do for the unordered_set to find the Entity? Do I need to create a comparison function that checks the values of the shared_ptr's? But then won't the unordered_set slow down because its hash function uses the shared_ptr as hash? Will I need to create a hash function that uses the Entity as hash also?
Yes you can use boost::shared_ptr in boost::unordered_set (same applies to std version of these classes)
boost::unordered_set uses boost::hash template function to generate keys for boost::unordered_set. This function is specialised for boost::shared_ptr to take underlying pointer into account.
Let me try to explain this if I got it right :-
boost::shared_ptr is implemented using reference counting mechanism. That means whenever you are passing it to some other let's say function you are just increasing reference count whereas when you delete it you are decreasing its reference count. When reference count is 0 then only that object is deleted from memory.
Always be cautious when using it. It can save you from memory leaks only as far as your design is good to accomodate them.
For example I faced one issue.
I'd a class with map containing shared_ptrs. Later this class ( without my knowledge )was also made responsible for passing these shared-ptrs to some other classes which in-turn used some containers to store these. As a result this code blowed up luckily on tester's face in the form of memory leak.
I hope you can figure out why.
What is the best way (in C++) to set up a container allowing for double-indexing? Specifically, I have a list of objects, each indexed by a key (possibly multiple per key). This implies a multimap. The problem with this, however, is that it means a possibly worse-than-linear lookup to find the location of an object. I'd rather avoid duplication of data, so having each object maintain it's own coordinate and have to move itself in the map would be bad (not to mention that moving your own object may indirectly call your destructor whilst in a member function!). I would rather some container that maintains an index both by object pointer and coordinate, and that the objects themselves guarantee stable references/pointers. Then each object could store an iterator to the index (including the coordinate), sufficiently abstracted, and know where it is. Boost.MultiIndex seems like the best idea, but it's very scary and I don't wany my actual objects to need to be const.
What would you recommend?
EDIT: Boost Bimap seems nice, but does it provide stable indexing? That is, if I change the coordinate, references to other elements must remain valid. The reason I want to use pointers for indexing is because objects have otherwise no intrinsic ordering, and a pointer can remain constant while the object changes (allowing its use in a Boost MultiIndex, which, IIRC, does provide stable indexing).
I'm making several assumptions based on your writeup:
Keys are cheap to copy and compare
There should be only one copy of the object in the system
The same key may refer to many objects, but only one object corresponds to a given key (one-to-many)
You want to be able to efficiently look up which objects correspond to a given key, and which key corresponds to a given object
I'd suggest:
Use a linked list or some other container to maintain a global list of all objects in the system. The objects are allocated on the linked list.
Create one std::multimap<Key, Object *> that maps keys to object pointers, pointing to the single canonical location in the linked list.
Do one of:
Create one std::map<Object *, Key> that allows looking up the key attached to a particular object. Make sure your code updates this map when the key is changed. (This could also be a std::multimap if you need a many-to-many relationship.)
Add a member variable to the Object that contains the current Key (allowing O(1) lookups). Make sure your code updates this variable when the key is changed.
Since your writeup mentioned "coordinates" as the keys, you might also be interested in reading the suggestions at Fastest way to find if a 3D coordinate is already used.
Its difficult to understand what exactly you are doing with it, but it seems like boost bimap is what you want. It's basically boost multi-index except a specific use case, and easier to use. It allows fast lookup based on the first element or the second element. Why are you looking up the location of an object in a map by its address? Use the abstraction and let it do all the work for you. Just a note: iteration over all elements in a map is O(N) so it would be guaranteed O(N) (not worse) to look up the way you are thinking of doing it.
One option would be to use two std::maps that referenced shared_ptrs. Something like this may get you going:
template<typename T, typename K1, typename K2>
class MyBiMap
{
public:
typedef boost::shared_ptr<T> ptr_type;
void insert(const ptr_type& value, const K1& key1, const K2& key2)
{
_map1.insert(std::make_pair(key1, value));
_map2.insert(std::make_pair(key2, value));
}
ptr_type find1(const K1& key)
{
std::map<K1, ptr_type >::const_iterator itr = _map1.find(key);
if (itr == _map1.end())
throw std::exception("Unable to find key");
return itr->second;
}
ptr_type find2(const K2& key)
{
std::map<K2, ptr_type >::const_iterator itr = _map2.find(key);
if (itr == _map2.end())
throw std::exception("Unable to find key");
return itr->second;
}
private:
std::map<K1, ptr_type > _map1;
std::map<K2, ptr_type > _map2;
};
Edit: I just noticed the multimap requirement, this still expresses the idea so I'll leave it.
In C++, what alternatives do I have for exposing a collection, from the point of view of performance and data integrity?
My problem is that I want to return an internal list of data to the caller, but I don't want to generate a copy. Thant leaves me with either returning a reference to the list, or a pointer to the list. However, I'm not crazy about letting the caller change the data, I just want to let it read the data.
Do I have to choose between performance and data integrity?
If so, is in general better to go one way or is it particular to the case?
Are there other alternatives?
Many times the caller wants access just to iterate over the collection. Take a page out of Ruby's book and make the iteration a private aspect of your class.
#include <algorithm>
#include <boost/function.hpp>
class Blah
{
public:
void for_each_data(const std::function<void(const mydata&)>& f) const
{
std::for_each(myPreciousData.begin(), myPreciousData.end(), f);
}
private:
typedef std::vector<mydata> mydata_collection;
mydata_collection myPreciousData;
};
With this approach you're not exposing anything about your internals, i.e. that you even have a collection.
RichQ's answer is a reasonable technique, if you're using an array, vector, etc.
If you're using a collection that isn't indexed by ordinal values... or think you might need to at some point in the near future... then you might want to consider exposing your own iterator type(s), and associated begin()/end() methods:
class Blah
{
public:
typedef std::vector<mydata> mydata_collection;
typedef myDataCollection::const_iterator mydata_const_iterator;
// ...
mydata_const_iterator data_begin() const
{ return myPreciousData.begin(); }
mydata_const_iterator data_end() const
{ return myPreciousData.end(); }
private:
mydata_collection myPreciousData;
};
...which you can then use in the normal fashion:
Blah blah;
for (Blah::mydata_const_iterator itr = blah.data_begin();
itr != blah.data_end();
++itr)
{
// ...
}
Maybe something like this?
const std::vector<mydata>& getData()
{
return _myPrivateData;
}
The benefit here is that it's very, very simple, and as safe as you getin C++. You can cast this, like RobQ suggests, but there's nothing you can do that would prevent someone from that if you're not copying. Here, you would have to use const_cast, which is pretty easy to spot if you're looking for it.
Iterators, alternatively, might get you pretty much the same thing, but it's more complicated. The only added benefit of using iterators here (that I can think of) is that you can have better encapsulation.
Use of const reference or shared pointer will only help if the contents of underlying collection do not change over time.
Consider your design. Does the caller really need to see the internal array? Can you restructure the code so that the caller tells object what to do with the array? E.g., if the caller intends to search the array, could the owner object do it?
You could pass a reference to result vector to the function. On some compilers that may result in marginally faster code.
I would recommend trying to redesign first, going with a clean solution second, optimizing for performance third (if necessary).
One advantage of both #Shog9's and #RichQ's solutions is that they de-couple the client from the collection implementation.
If you decide th change your collection type to something else, your clients will still work.
What you want is read-only access without copying the entire blob of data. You have a couple options.
Firstly, you could just return a const refererence to whatever your data container is, like suggested above:
const std::vector<T>& getData() { return mData; }
This has the disadvantage of concreteness: you can't change how you store the data internally without changing the interface of your class.
Secondly, you can return const-ed pointers to the actual data:
const T* getDataAt(size_t index)
{
return &mData[index];
}
This is a bit nicer, but also requires that you provide a getNumItems call, and protect against out-of-bounds indices. Also, the const-ness of your pointers is easily cast away, and your data is now read-write.
Another option is to provide a pair of iterators, which is a bit more complex. This has the same advantages of pointers, as well as not (necessarily) needing to provide a getNumItems call, and there's considerably more work involved to strip the iterators of their const-ness.
Probably the easiest way to manage this is by using a Boost Range:
typedef vector<T>::const_iterator range_iterator_type;
boost::iterator_range< range_iterator_type >& getDataRange()
{
return boost::iterator_range(mData.begin(), mData.end());
}
This has the advantages of ranges being composable, filterable, etc, as you can see on the website.
Using const is a reasonable choice.
You may also wish to check out the boost C++ library for their shared pointer implementation. It provides the advantages of pointers i.e. you may have the requirement to return a shared pointer to "null" which a reference would not allow.
http://www.boost.org/doc/libs/1_36_0/libs/smart_ptr/smart_ptr.htm
In your case you would make the shared pointer's type const to prohibit writes.
If you have a std::list of plain old data (what .NET would call 'value types'), then returning a const reference to that list will be fine (ignoring evil things like const_cast)
If you have a std::list of pointers (or boost::shared_ptr's) then that will only stop you modifying the collection, not the items in the collection. My C++ is too rusty to be able to tell you the answer to that at this point :-(
I suggest using callbacks along the lines of EnumChildWindows. You will have to find some means to prevent the user from changing your data. Maybe use a const pointer/reference.
On the other hand, you could pass a copy of each element to the callback function overwriting the copy each time. (You do not want to generate a copy of your entire collection. I am only suggesting making a copy one element at a time. That shouldn't take much time/memory).
MyClass tmp;
for(int i = 0; i < n; i++){
tmp = elements[i];
callback(tmp);
}
The following two articles elaborate on some of the issues involved in, and the need for, encapsulating container classes. Although they do not provide a complete worked solution, they essentially lead to the same approach as given by Shog9.
Part 1: Encapsulation and Vampires
Part 2 (free registration is now required to read this): Train Wreck Spotting
by Kevlin Henney