Best container for double-indexing

Best container for double-indexing - c++

What is the best way (in C++) to set up a container allowing for double-indexing? Specifically, I have a list of objects, each indexed by a key (possibly multiple per key). This implies a multimap. The problem with this, however, is that it means a possibly worse-than-linear lookup to find the location of an object. I'd rather avoid duplication of data, so having each object maintain it's own coordinate and have to move itself in the map would be bad (not to mention that moving your own object may indirectly call your destructor whilst in a member function!). I would rather some container that maintains an index both by object pointer and coordinate, and that the objects themselves guarantee stable references/pointers. Then each object could store an iterator to the index (including the coordinate), sufficiently abstracted, and know where it is. Boost.MultiIndex seems like the best idea, but it's very scary and I don't wany my actual objects to need to be const.
What would you recommend?
EDIT: Boost Bimap seems nice, but does it provide stable indexing? That is, if I change the coordinate, references to other elements must remain valid. The reason I want to use pointers for indexing is because objects have otherwise no intrinsic ordering, and a pointer can remain constant while the object changes (allowing its use in a Boost MultiIndex, which, IIRC, does provide stable indexing).

I'm making several assumptions based on your writeup:
Keys are cheap to copy and compare
There should be only one copy of the object in the system
The same key may refer to many objects, but only one object corresponds to a given key (one-to-many)
You want to be able to efficiently look up which objects correspond to a given key, and which key corresponds to a given object
I'd suggest:
Use a linked list or some other container to maintain a global list of all objects in the system. The objects are allocated on the linked list.
Create one std::multimap<Key, Object *> that maps keys to object pointers, pointing to the single canonical location in the linked list.
Do one of:
Create one std::map<Object *, Key> that allows looking up the key attached to a particular object. Make sure your code updates this map when the key is changed. (This could also be a std::multimap if you need a many-to-many relationship.)
Add a member variable to the Object that contains the current Key (allowing O(1) lookups). Make sure your code updates this variable when the key is changed.
Since your writeup mentioned "coordinates" as the keys, you might also be interested in reading the suggestions at Fastest way to find if a 3D coordinate is already used.

Its difficult to understand what exactly you are doing with it, but it seems like boost bimap is what you want. It's basically boost multi-index except a specific use case, and easier to use. It allows fast lookup based on the first element or the second element. Why are you looking up the location of an object in a map by its address? Use the abstraction and let it do all the work for you. Just a note: iteration over all elements in a map is O(N) so it would be guaranteed O(N) (not worse) to look up the way you are thinking of doing it.

One option would be to use two std::maps that referenced shared_ptrs. Something like this may get you going:
template<typename T, typename K1, typename K2>
class MyBiMap
{
public:
typedef boost::shared_ptr<T> ptr_type;
void insert(const ptr_type& value, const K1& key1, const K2& key2)
{
_map1.insert(std::make_pair(key1, value));
_map2.insert(std::make_pair(key2, value));
}
ptr_type find1(const K1& key)
{
std::map<K1, ptr_type >::const_iterator itr = _map1.find(key);
if (itr == _map1.end())
throw std::exception("Unable to find key");
return itr->second;
}
ptr_type find2(const K2& key)
{
std::map<K2, ptr_type >::const_iterator itr = _map2.find(key);
if (itr == _map2.end())
throw std::exception("Unable to find key");
return itr->second;
}
private:
std::map<K1, ptr_type > _map1;
std::map<K2, ptr_type > _map2;
};
Edit: I just noticed the multimap requirement, this still expresses the idea so I'll leave it.

Related

Modifying the value of a user defined type in `std::set`

Consider the case where I have a user defined type with say a id() member function which returns a unique std::string.
I want a container of this objects, where the id() uniquely identifies the elements, but I want to "use" the objects to do other things which may modify their members.
I am currently constructing the objects.by calling std::set::emplace and capturing the returned iterator, bool pair.
But I am then not allowed to modify it's value as the iterator is const.
Is there a good way to do what I want? The only two I can think of are:
Store unique_ptrs to the object in the set, this way the pointer value is what differentiates it rather than the name and the object pointed to can be modified.
Store a map using the id() as the Key, but this means I have duplicated the keys.
I am happy to use well adopted and modern libraries, such as boost, if they have the right container for my problem.

Is there a good way to do what I want?
No not really. The granularity of std::set is at object level. There is no way to express that a portion of an object contributes to the key.
Some people recommend declaring all non-key members mutable. This is wrong, as mutable is meant for things that are hidden from the public interface of the object (e.g. a mutex).
The "official" way is to take the object out the set, modify it and put it back in. C++17 has set::extract which helps to improve performance of this task a bit (which of course remains inefficient if you never modify the key, since the tree still has to be checked/rebalanced).
I want to "use" the objects to do other things which may modify their members.
If you're absolutely sure you never modify the object key, just cast away constness. From a legal point of view it is OK to cast away constness from objects that were not born const. For extra safety you can wrap the key into another, const member:
struct Element {
const Key key;
Value value;
};
This won't help if you have a data cube with multiple sets each using its own "view" on the key.
1. Store unique_ptrs to the object in the set
This would be a pessimization due to extra indirection. Since the elements are on the heap, you will take an extra cache miss. And again end up with UB if you accidentally modify the key.
2. Store a map using the id() as the Key
Yes, different variations of this approach are possible, but you must still ensure to never modify the key.
For example you could store a key + pointer to data. This approach is often combined with a dense_hash_set with linear probing for best performance. Since the value is accessed only once after the element is found, it doesn't really matter that it is located elsewhere.

I would suggest using Boost.MultiIndex as a drop-in replacement for std::set, as it adds the modify method which allows modification of an element, checking whether the position within the container has changed:
#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
struct S { /* ... */ };
boost::multi_index_container<S> t; // default configuration emulates std::set<S>
auto [it, inserted] = t.emplace(...);
t.modify(it, [&](S& s) {
// modify s here
// if the key is unchanged, s does not move
// the iterator `it` remains valid regardless
});
Example.
There is a small overhead in checking that the key is indeed unchanged, but this should be minimal compared to the rest of the program and should optimize and predict well.

std::set maintains its elements sorted, and the keys the elements are sorted by, correspond to the elements themself. As a result, the elements in the std::set are const qualified to prevent the user from modifying the elements (i.e., the keys) and thus breaking the std::set order.
Traditionally, if you wanted to modify an element of an std::set, you would have first to remove the element object you wish to modify from the std::set, modify it, and insert it into the std::set again. The problem is that this results in the allocation of an std::set internal node.
Since C++17 you can remove and reinsert an element into the std::set without allocating an std::set internal node thanks to std::set::extract(). This member function returns the node handle corresponding to the requested element. After modifying the element through this returned node, you can reinsert the node with the corresponding insert() overload. No node allocation takes place as you are reusing an already allocated node.
The drawback to these approaches – regardless of whether or not allocation occurs – is that reinserting the element into the std::set takes logarithmic time in the size of the set (unless you can take advantage of the hint to insert()).
Casting away constness and modifying std::set elements
You can still cast const away from an element of the std::set and modify its data members, as long as your std::set's comparison function doesn't take into account the data members you change. That is, if you only modify data members of an element belonging to an std::set whose comparison function doesn't consider, the order won't break.

When creating my own data structure, should I use iterators or indices to provide access from the outside?

Suppose I'm writing a project in a modern version of C++ (say 11 or 14) and use STL in that project. At a certain moment, I need to program a specific data structure that can be built using STL containers. The DS is encapsulated in a class (am I right that encapsulating the DS in a class is the only correct way to code it in C++?), thus I need to provide some sort of interface to provide read and/or write access to the data. Which leads us to the question:
Should I use (1a) iterators or (1b) simple "indices" (i.e. numbers of a certain type) for that? The DS that I'm working on right now is pretty much linear, but then when the elements are removed, of course simple integer indices are going to get invalidated. That's about the only argument against this approach that I can imagine.
Which approach is more idiomatic? What are the objective technical arguments for and against each one?
Also, when I choose to use iterators for my custom DS, should I (2a) public-ly typedef the iterators of the container that is used internally or (2b) create my own iterator from scratch? In the open libraries such as Boost, I've seen custom iterators being written from scratch. On the other hand, I feel I'm not able to write a proper iterator yet (i.e. one that is as detailed and complex as the ones in STL and/or Boost).
Edit as per #πάντα ῥεῖ request:
I've asked myself this question with a few DS in a few projects while studying at the Uni, but here's the last occurrence that made me come here and ask.
The DS is meant to represent a triangle array, or vertex array, or whatever one might call it. Point is, there are two arrays or lists, one storing the vertex coordinates, and another one storing triplets of indices from the first array, thus representing triangles. (This has been coded a gazillion times already, yet I want to write it on my own, once, for the purpose of learning.) Obviously, the two arrays should stay in sync, hence the encapsulation. The set of operations is meant to include adding (maybe also removing) a vertex, adding and removing a triangle (a vertex triplet) using the vertex data from the same array. How I see it is that the client adds vertices, writes down the indices/iterators, and then issues a call to add a triangle based on those indices/iterators, which in turn returns another index/iterator to the resulting triangle.

I don't see why you couldn't get both, if this makes sense for your container.
std::vector has iterators and the at/operator[] methods to provide access with indexes.
The API of your container depends on the operations you want to make available to your clients.
Is the container iterable, i.e. is it possible to iterate over each elements? Then, you should provide an iterator.
Does it make sense to randomly access elements in your container, knowing their address? Then you can also provide the at(size_t)/operator[size_t] methods.
Does it make sense to randomly access elements in your container,
knowing a special "key"? The you should probably provide the at(key_type)/operator[key_type] methods.
As for your question regarding custom iterators or reuse of existing iterators:
If your container is basically a wrapper that adds some insertion/removal logic to an existing container, I think it is fine to publicly typedef the existing iterator, as a custom iterator may miss some features of the the existing iterator, may contain bugs, and will not add any significant feature over the existing iterator.
On the other hand, if you iterate in a non-standard fashion (for instance, I implemented once a recursive_unordered_map that accepted a parent recursive_unordered_map at construction and would iterate both on its own unordered_map and on its parent's (and its parent's parent's...). I had to implement a custom iterator for this.

Which approach is more idiomatic?
Using iterators is definitely the way to go. Functions in <algorithm> don't work with indices. They work with iterators. If you want your container to be enabled for use by the functions in <algorithm>, using iterators is the only way to go.

In general, it is recommended that the class offers its own iterator. Under the hood, it could be an index or a STL iterator (preferred). But, as long as external clients and public APIs are concerned, they only deal with the iterator offered by the class.
Example 1
class Dictionary {
private:
typedef std::unordered_map<string, string> DictType;
public:
typedef DictType::iterator DictionaryIterator;
};
Example 2
class Sequence {
private:
typedef std::vector<string> SeqType;
public:
struct SeqIterator {
size_t index;
SeqIterator operator++();
string operator*();
};
};
If the clients are operating solely on SeqIterator, then the above can later be modified to
class Sequence {
private:
typedef std::deque<string> SeqType;
public:
typedef SeqType::iterator SeqIterator;
};
without the clients getting affected.

Access to custom objects in unordered_set

Please help to figure out the logic of using unordered_set with custom structures.
Consider I have following class
struct MyClass {
int id;
// other members
};
used with shared_ptr
using CPtr = std::shared_ptr<MyClass>;
Because of fast access by key I supposed to use an unordered_set with a custom hash and the MyClass::id member as a key):
template <class T> struct CHash;
template<> struct CHash<CPtr>
{
std::size_t operator() (const CPtr& c) const
{
return std::hash<decltype(c->id)> {} (c->id);
}
};
using std::unordered_set<CPtr, CHash>;
Right now, unordered_set still seems to be an appropriate container. However standard find() functions for sets are assumed to be const to ensure keys won't be changed. I intend to change objects guaranteeing keeping keys unchanged. So, the questions are:
1) How to realize easy accessing to element of set by int key reserving possibility to change element, something like
auto element = my_set.find(5);
element->b = 3.3;
It is possible to add converting constructor and use something like
auto element = my_set.find(MyClass (5));
But it doesn't solve the problem with constness and what if the class is huge.
2) Am I actually going wrong way? Should I use another container? For example unordered_map, that will store one more int key for each entry consuming more memory.

A pointer doesn't project its constness to the object it points to. Meaning, if you have a constant reference to a std::shared_ptr (as in a set) you can still modify the object via this pointer. Whether or not that is something you should do a is a different question and it doesn't solve your lookup problem.
OF course, if you want to lookup a value by a key, then this is what std::unordered_map was designed for so I'd have a closer look there. The main problem I see with this approach is not so much the memory overhead (unordered_set and unordered_map as well as shared_ptr have noticeable memory overhead anyway), but that you have to maintain redundant information (id in the object and id as a key).
If you have not many insertions and you don't absolutely need the (on average) constant lookup time and memory overhead is really important to you, you could consider a third solution (besides using a third-party or self written data structure of courses): namely to write a thin wrapper around a sorted std::vector<std::shared_ptr<MyClass>> or - if appropriate - even better std::vector<std::unique_ptr<MyClass>> that uses std::upper_bound for lookups.

I think you are going a wrong way using unordered_set,because unordered_set's definition is very clear that:
Keys are immutable, therefore, the elements in an unordered_set cannot be modified once in the container - they can be inserted and removed, though.
You can see its definition in site:
http://www.cplusplus.com/reference/unordered_set/unordered_set/.
And hope it is helpful for you.Thanks.

Design pattern to allow the efficient deletion of an element from multiple containers in C++

Say for example, an element is referenced from multiple maps, e.g. a map name to element, a map address to element and a map age to element. Now one looks up the element for example via name, and now wishes to delete it from all three maps?
Several solutions come to mind:
1) The most straight forward. Look up the element in the name to element map, then search both other maps to find the element in those, then remove the element entry in all three.
2) Store weak pointers in all three maps. Store a shared pointer somewhere, at best maybe even in the element itself. After finding the element in one map, delete the element. When trying to access the element from the other maps and realizing the weak pointers can't be converted to shared pointers, remove the entry.
3) Use intrusive maps. This has the advantage that one does not need to search the remaining maps to find the element in those. However, as the object is stored in several maps, the element itself can't be made intrusive - rather the element might need to have a member implementing the hooks.
4) Others?
Is there a very clean nice solution to this? I have been bumping into this problem a few times...
A few thoughts. Solution 1 is typically the one that ends up being implemented naturally as a project grows. If the element itself has the key information of the other maps, and other containers are maps, this is probably quite acceptable. However, if the keys are missing, or if the container is e.g. a list, it can become very slow. Solution 2 depends on the implementation of weak pointers, and might also end up being quite slow. Solution 3 seems best, but maybe somewhat complicated?

boost::multi_index is designed specially for such case.

Sounds like you haven't decided what is managing the lifetime of the object - that comes first. Once you know that then use the observer pattern. When the object is to be destroyed, the objected magaing its lifetime notifies all the objects that wrap the maps holding the pointers, then destroys the object.
The observers can either implement a common interface like this:
class ObjectLifetimeMgr
{
public:
CauseObjDeletion()
{
/.. notify all observers ../
}
private:
list<IObserver*> observers;
};
class IObserver
{
public:
virtual void ObjectDestroyed( Obj* );
};
class ConcreteObserver
{
public:
void ObjectDestroyed( Obj* )
{
/.. delete Obj from map ../
}
};
Or to do a really lovely job you could implement a c++ delegate, this frees the observers from a common base class and simply allows them to register a callback using a member method

Never found anything to replace solution 1. I ended up with shared_pointers and delete flags in a delete function (e.g. DeleteFromMaps(bool map1, bool map2, bool map3)) in the object. The call from eg map2 then becomes e.g.
it->DeleteFromMaps(true,false,true);
erase(it);

Hook on object in std::list. Pointer or iterator?

I looked for the same type of question but I didn't find the answer to my question (existential one):
What type of hook should I choose to keep control over an object in a list?
I waver between pointer and iterator.
The container is filled at the beginning and shouldn't be resized after that. The hook is the way I use to switch between my objects at the whim of user and manipulating only one variable in my algorithm.
In all cases, I must go through an iterator to find the right object to hook. But which one is the best practice/use?
// 10 object list
std::list <Object> List(10);
std::list <Object>::iterator it = List.begin();
Object *pt = NULL;
// Select the 3rd object
advance(it, 3);
pt = &(*it);
// Access to object member...
it->member;
pt->member;
Pointers allow not to access to neighbours, contrary to iterators, but may be unsafe.
What's the best pratice?

It depends on what you want to do with the "hook". If you use
an iterator, it can be used as the starting point for moving
forward or backward in the list. If you use a pointer, you can
also point to objects outside of the list. In the end, it
depends on how you expect your code to evolve.

Storing pointers or iterators into a container is quite risky because you might find they're invalid by the time you use them (i.e. if the container or the data changes).
A more generalised and robust approach might be to use a map instead of a list. Every value is identified by a key (of whatever type you like), and you can easily store the keys and check whether or not they're valid before you use them, e.g.:
std::map<int, std::string> data;
// Add stuff to the map
data[5] = "blah";
data[27] = "foo";
// Check if a key exists
if (data.find(31) == data.end()) {
// Key 31 does NOT exist
} else {
// Key 31 DOES exist
}
One thing to be aware of though is that maps are ordered by key value. That means if the sequence of elements is important then you'll need to choose your keys carefully.

In most cases use references:
Object& ref = *it;
ref.member
It behaves like a pointer (so feel free to pass it around functions) but you can't do pointer arithmetics on it (ref++ will actually call the operator++() on Object). Also you can't initialize it from null (will be reported as an error when you try to create the reference).
One thing to remember you still need the object to be alocated somewhere. If say some function deletes the Object from List you shouldn't use ref anymore.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js