I don't quite understand how it is possible that the same elements can appear in different intrusive containers while preserving the performance and memory usage guarantees that the boost::intrusive documentation states.
The documentation says that:
an intrusive container does not store copies of passed objects, but it
stores the objects themselves. The additional data needed to insert
the object in the container must be provided by the object itself. For
example, to insert MyClass in an intrusive container that implements a
linked list, MyClass must contain the needed next and previous
pointers:
class MyClass
{
MyClass *next;
MyClass *previous;
// ...
};
When underlining the differences between STL and boost::intrusive containers, the documentation also says:
A non-intrusive container has some limitations:
An object can only belong to one container: If you want to share an object between two containers, you either have to store multiple
copies of those objects or you need to use containers of pointers:
std::list<Object*>.
Makes sense, an element can't be in two std::lists. Okay. But how can one instance of type MyClass be inserted in two different boost::intrusive::list 's, for example, considering that such an element can only have one pointer to the next element and one to the previous element. If I am not wrong, this only works if you assume that modifying one container might also modify the other one and vice-versa.
The Boost.Intrusive library doesn't litterally ask you to define prev and next pointers -- in that part of the documentation, the presence of prev and next pointers is merely a conceptual illustration of how intrusive containers work.
The actual of defining intrusive containers is to include a hook through inheritance or as a member that contains the prev and next pointers. By including several hooks (tagged with different static types), you can include the same object in several differnet intrusive containers (each tagged with different static types).
See http://www.boost.org/doc/libs/1_38_0/doc/html/intrusive/usage.html to see how the hooks work. See this StackOverflow answer for an example of how to do this with multiple intrusive contianers.
This has some limitations -- you can't include your objects in an arbitrary set of multiple intrusive containers defined at runtime -- you have to know what containers you want to use when you're coding them initially, and build knowledge of each one into your objects.
Related
I have a conainter, lets say a std::list<int>, which I would like to share between objects. One of the objects is known to live longer than the others, so he will hold the container. In order to be able to access the list, the other objects may have a pointer to the list.
Since the holder object might get moved, I'll need to wrap the list with a unique_ptr:
class LongLiveHolder { std::unique_ptr<std::list<int>> list; };
class ShortLiveObject { std::list<int>& list; };
However, I don't really need the unique_ptr wrapper. Since the list probably just contains a [unique_ptr] pointer to the first node (and a pointer to the last node), I could, theoretically, have those pointers at the other objects:
class LongLiveHolder { std::unique_ptr<NonExistentListNode<int>> back; };
class ShortLiveObject { NonExistentListNode<int>& back; };
, which would save me a redundant dereference when accessing the list, except that I would no longer have the full std::list interface to use with the shorter-lived object- just the node pointers.
Can I somehow get rid of this extra layer of indirection, while still having the std::list interface in the shorter-lived object?
Preface
You may be overthinking the cost of the extra indirection from the std::unique_ptr (unless you have a lot of these lists and you know that usages of them will be frequent and intermixed with other procedures). In general, I'd first trust my compiler to do smart things. If you want to know the cost, do performance profiling.
The main purpose of the std::unique_ptr in your use-case is just to have shared data with a stable address when other data that reference it gets moved. If you use the list member of the long-lived object multiple times in a single procedure, you can possibly help your compiler to help you (and also get some nicer-to-read code) when you use the list through the long-lived object by making a variable in the scope of the procedure that stores a reference to the std::list pointed to by the std::unique_ptr like:
void fn(LongLiveHolder& holder) {
auto& list {holder.list.get()};
list.<some_operation_1>(...);
list.<some_operation_2>(...);
list.<some_operation_3>(...);
}
But again, you should inspect the generated machine code and do performance profiling if you really want to know what kind of difference it makes.
If Context Permits, Write your own List
You said:
However, I don't really need the unique_ptr wrapper. Since the list probably just contains a [unique_ptr] pointer to the first node (and a pointer to the last node), I could, theoretically, have those pointers at the other objects: [...]
Considering Changes in what is the First Node
What if the first node of the list is allowed to be deleted? What if a new node is allowed to be inserted at the beginning of the list? You'd need a very specific context for those to not be requirements. What you want in your short-lived object is a view abstractions which supports the same interface as the actual list but just doesn't manage the lifetime of the list contents. If you implement the view abstraction as a pointer to the list's first node, then how will the view object know about changes to what the "real"/lifetime-managing list considers to be the first node? It can't- unless the lifetime-managing list keeps an internal list of all views of itself which are alive and also updates those (which itself is a performance and space overhead), and even then, what about the reverse? If the view abstraction was used to change what's considered the first node, how would the lifetime-managing list know about that change? The simplest, sane solution is to have an extra level of indirection: make the view point to the list instead of to what was the list's first node when the view was created.
Considering Requirements on Time Complexity of getting the list size
I'm pretty sure a std::list can't just hold pointers to front and back nodes. For one thing, since c++11 requires that std::list::size() is O(1), std::list probably has to keep track of its size at all times in a counter member- either storing it in itself, or doing some kind of size-tracking in each node struct, or some other implementation-defined behaviour. I'm pretty sure the simplest and most performant way to have multiple moveable references (non-const pointers) to something that needs to do this kind of bookkeeping is to just add another level of indirection.
You could try to "skip" the indirection layer required by the bookkeeping for specific cases that don't require that information, which is the iterators/node-pointers approach, which I'll comment on later. I can't think of a better place or way to store that bookkeeping other than with the collection itself. Ie. If the list interface has requirements that require such bookkeeping, an extra layer of indirection for each user of the list implementation has a very strong design rationale.
If Context Permits
If you don't care about having O(1) to get the size of your list, and you know that what is considered the first node will not change for the lifetime of the short-lived object, then you can write your own List class list-view class and make your own context-specific optimizations. That's one of the big selling-points of languages like C++: You get a nice standard library that does commonly useful things, and when you have a specific scenario where some features of those tools aren't required and are resulting in unnecessary overhead, you can build your own tool/abstraction (or possibly use someone else's library).
Commentary on std::unique_ptr + reference
Your first snippet works, but you can probably get some better implicit constructors and such for SortLiveObject by using std::reference_wrapper, since the default implicity-declared copy-assignment and default-construct functions get deleted when there's a reference member.
class LongLiveHolder { std::unique_ptr<std::list<int>> list; };
class ShortLiveObject { std::reference_wrapper<std::list<int>> list; };
Commentary on std::shared_ptr + std::weak_ref
Like #Adrian Maire suggested, std::shared_ptr in the longer-lived, object which might move while the shorter-lived object exists, and std::weak_ptr in the shorter-lived object is a working approach, but it probably has more overhead (at least coming from the ref-count) than using std::unique_ptr + a reference, and I can't think of any generalized pros, so I wouldn't suggest it unless you already had some other reason to use a std::shared_ptr. In the scenario you gave, I'm pretty sure you do not.
Commentary on Storing iterators/node-pointers in the short-lived object
#Daniel Langr already commented about this, but I'll try to expand.
Specifically for std::list, there is a possible standard-compliant solution (with several caveats) that doesn't have the extra indirection of the smart pointer. Caveats:
You must be okay with only having an iterator interface for the shorter-lived object (which you indicated that you are not).
The front and back iterators must be stable for the lifetime of the shorter-lived object. (the iterators should not be deleted from the list, and the shorter-lived object won't see new list entries that are pushed to the front or back by someone using the longer-lived object).
From cppreference.com's page for std::list's constructors:
After container move construction (overload (8)), references, pointers, and iterators (other than the end iterator) to other remain valid, but refer to elements that are now in *this. The current standard makes this guarantee via the blanket statement in [container.requirements.general]/12, and a more direct guarantee is under consideration via LWG 2321.
From cppreference.com's page for std::list:
Adding, removing and moving the elements within the list or across several lists does not invalidate the iterators or references. An iterator is invalidated only when the corresponding element is deleted.
But I am not a language lawyer. I could be missing something important.
Also, you replied to Daniel saying:
Some iterators get invalid when moving the container (e.g. insert_iterator) #DanielLangr
Yes, so if you want to be able to make std::input_iterators, use the std::unique_ptr + reference approach and construct short-lived std::input_iterators when needed instead of trying to store long-lived ones.
If the list owner will be moved, then you need some memory address to share somehow.
You already indicated the unique_ptr. It's a decent solution if the non-owners don't need to save it internally.
The std::shared_ptr is an obvious alternative.
Finally, you can have a std::shared_ptr in the owner object, and pass std::weak_ptr to non-owners.
Suppose I'm writing a project in a modern version of C++ (say 11 or 14) and use STL in that project. At a certain moment, I need to program a specific data structure that can be built using STL containers. The DS is encapsulated in a class (am I right that encapsulating the DS in a class is the only correct way to code it in C++?), thus I need to provide some sort of interface to provide read and/or write access to the data. Which leads us to the question:
Should I use (1a) iterators or (1b) simple "indices" (i.e. numbers of a certain type) for that? The DS that I'm working on right now is pretty much linear, but then when the elements are removed, of course simple integer indices are going to get invalidated. That's about the only argument against this approach that I can imagine.
Which approach is more idiomatic? What are the objective technical arguments for and against each one?
Also, when I choose to use iterators for my custom DS, should I (2a) public-ly typedef the iterators of the container that is used internally or (2b) create my own iterator from scratch? In the open libraries such as Boost, I've seen custom iterators being written from scratch. On the other hand, I feel I'm not able to write a proper iterator yet (i.e. one that is as detailed and complex as the ones in STL and/or Boost).
Edit as per #πάντα ῥεῖ request:
I've asked myself this question with a few DS in a few projects while studying at the Uni, but here's the last occurrence that made me come here and ask.
The DS is meant to represent a triangle array, or vertex array, or whatever one might call it. Point is, there are two arrays or lists, one storing the vertex coordinates, and another one storing triplets of indices from the first array, thus representing triangles. (This has been coded a gazillion times already, yet I want to write it on my own, once, for the purpose of learning.) Obviously, the two arrays should stay in sync, hence the encapsulation. The set of operations is meant to include adding (maybe also removing) a vertex, adding and removing a triangle (a vertex triplet) using the vertex data from the same array. How I see it is that the client adds vertices, writes down the indices/iterators, and then issues a call to add a triangle based on those indices/iterators, which in turn returns another index/iterator to the resulting triangle.
I don't see why you couldn't get both, if this makes sense for your container.
std::vector has iterators and the at/operator[] methods to provide access with indexes.
The API of your container depends on the operations you want to make available to your clients.
Is the container iterable, i.e. is it possible to iterate over each elements? Then, you should provide an iterator.
Does it make sense to randomly access elements in your container, knowing their address? Then you can also provide the at(size_t)/operator[size_t] methods.
Does it make sense to randomly access elements in your container,
knowing a special "key"? The you should probably provide the at(key_type)/operator[key_type] methods.
As for your question regarding custom iterators or reuse of existing iterators:
If your container is basically a wrapper that adds some insertion/removal logic to an existing container, I think it is fine to publicly typedef the existing iterator, as a custom iterator may miss some features of the the existing iterator, may contain bugs, and will not add any significant feature over the existing iterator.
On the other hand, if you iterate in a non-standard fashion (for instance, I implemented once a recursive_unordered_map that accepted a parent recursive_unordered_map at construction and would iterate both on its own unordered_map and on its parent's (and its parent's parent's...). I had to implement a custom iterator for this.
Which approach is more idiomatic?
Using iterators is definitely the way to go. Functions in <algorithm> don't work with indices. They work with iterators. If you want your container to be enabled for use by the functions in <algorithm>, using iterators is the only way to go.
In general, it is recommended that the class offers its own iterator. Under the hood, it could be an index or a STL iterator (preferred). But, as long as external clients and public APIs are concerned, they only deal with the iterator offered by the class.
Example 1
class Dictionary {
private:
typedef std::unordered_map<string, string> DictType;
public:
typedef DictType::iterator DictionaryIterator;
};
Example 2
class Sequence {
private:
typedef std::vector<string> SeqType;
public:
struct SeqIterator {
size_t index;
SeqIterator operator++();
string operator*();
};
};
If the clients are operating solely on SeqIterator, then the above can later be modified to
class Sequence {
private:
typedef std::deque<string> SeqType;
public:
typedef SeqType::iterator SeqIterator;
};
without the clients getting affected.
What is the pros/cons of using the built-in std::list instead of an own linked list implementation based on pointers like in C?
Are there some special case where one is preferred over the other?
There are plenty of good reasons to use std::list instead of your own linked list implementation:
std::list is guaranteed (via the c++ standard library's
implementation of standard) to work as explained on the tin (no
bugs, exception safety and thread safety as by the standard).
std::list does not require you to spent time developing and
testing it.
std::list is well known so that anybody else every working with
the code (or yourself later in life) can understand what's going on
without the need to first get to grips with a custom linked list
implementation.
I cannot really think of any good reason to use your own custom linked list.
std::list is usually implemented as a doubly-linked list. If you only need a singly-linked list, you should consider std::forward_list.
Finally, if you're concerned with performance, you shouldn't use linked lists at all. Elements in a linked list are necessarily allocated individually (and often inserted at random places), so that processing a linked list generally results in many cache misses, each giving a performance hit.
Typically, you want to use std::list, as answered by #Walter.
However, a list implemented by "intrusively" integrating the next (and prev, if any) pointer directly into the contained objects, can avoid several disadvantages of std::list and the other STL containers, which may or may not be relevant to you (quoted from Boost.Intrusive documentation):
An object can only belong to one container: If you want to share an object
between two containers, you either have to store multiple copies of those
objects or you need to use containers of pointers: std::list<Object*>.
The use of dynamic allocation to create copies of passed values can be a
performance and size bottleneck in some applications. […]
Only copies of objects are stored in non-intrusive containers. Hence copy
or move constructors and copy or move assignment operators are required.
Non-copyable and non-movable objects can't be stored in non-intrusive
containers.
It's not possible to store a derived object in a STL-container while
retaining its original type.
The second point is probably not applicable for most typical usages of lists, where you would dynamically allocate the elements anyway.
If the last point is relevant to you, you may be interested in Boost.PointerContainer ‒ although a std::list<std::unique_ptr<Obj>> usually also does the job well enough.
Instead of completely implementing a list yourself, have a look at the aforementioned Boost.Intrusive library.
The answer provided by Walter covers the main reasons to prefer the stl implementation. The main reason to consider a clasic C style implementation is increased performance. The cost of this increased performance is primarily the potential for errors. This can be addressed with testing and the inclusion of some appropriate asserts (checks for null pointers..._
Contrary to the statements in Walter's answer there are cases where a high performance list is a good data structure choice.
If you need the performance of a custom list but want to avoid the work of constructing and testing your own check out the boost intrusive lists (singly and doubly linked) at:
http://www.boost.org/doc/libs/1_39_0/doc/html/intrusive.html
These will get you the same performance as a custom construction with (almost) the convenience of the stl versions.
Is there any simple/elegant way in C++11, via STL or boost, to make an element-type "smart" so that an instance of it always knows which container it belongs to and has member functions for a sort of "auto-removal" which also takes care of updating the container it is part of?
The real case is that I have a callback C-function (from a C library) being called after a given request has been completed. This function accepts a raw pointer to the element which has been processed. Now what I want is to remove this element from the list it belongs to and move it to another list.
I know I could store a pointer to the container in the element itself and when the callback is called I could iterate over that container until I find the element, then remove it and call newlist.push_back(object). Given that one element must live in one container (and only one), I wonder if there's something more elegant.
Boost's intrusive containers implement that functionality.
This requires specific containers as well as objects specifically designed to work with the containers, however.
From a C background I find myself falling back into C habits where there is generally a better way. In this case I can't think of a way to do this without pointers.
I would like
struct foo {
int i;
int j;
};
mymap['a'] = foo
mymap['b'] = bar
As long as only one key references a value mymap.find will return a reference so I can modify the value, but if I do this:
mymap['c'] = mymap.find('a') // problematic because foo is copied right?
The goal is to be able to find 'a' or 'c' modify foo and then the next find of 'a' or 'c' will show the updated result.
No, you will need to use pointers for this. Each entry in the map maintains a copy of the value assigned, which means that you cannot have two keys referring to the same element. Now if you store pointers to the element, then two keys will refer to two separate pointers that will refer to the exact same in memory element.
For some implementation details, std::map is implemented as a balanced tree where in each node contains a std::pair<const Key,Value> object (and extra information for the tree structure). When you do m[ key ] the node containing the key is looked up or a new node is created in the tree and the reference to the Value subobject of the pair is returned.
I would use std::shared_ptr here. You have an example of shared ownership, and shared_ptr is made for that. While pointers tend to be overused, it is nothing wrong with using them when necessary.
Boost.Intrusive
Boost.Intrusive is a library presenting some intrusive containers to
the world of C++. Intrusive containers are special containers that
offer better performance and exception safety guarantees than
non-intrusive containers (like STL containers).
The performance benefits of intrusive containers makes them ideal as a
building block to efficiently construct complex containers like
multi-index containers or to design high performance code like memory
allocation algorithms.
While intrusive containers were and are widely used in C, they became
more and more forgotten in C++ due to the presence of the standard
containers which don't support intrusive techniques.Boost.Intrusive
not only reintroduces this technique to C++, but also encapsulates the
implementation in STL-like interfaces. Hence anyone familiar with
standard containers can easily use Boost.Intrusive.