Immutable C++ container class

Immutable C++ container class - c++

Say that I have a C++ class, Container, that contains some elements of type Element. For various reasons, it is inefficient, undesirable, unnecessary, impractical, and/or impossible (1) to modify or replace the contents after construction. Something along the lines of const std::list<const Element> (2).
Container can meet many requirements of the STL's "container" and "sequence" concepts. It can provide the various types like value_type, reference, etc. It can provide a default constructor, a copy constructor, a const_iterator type, begin() const, end() const, size, empty, all the comparison operators, and maybe some of rbegin() const, rend() const, front(), back(), operator[](), and at().
However, Container can't provide insert, erase, clear, push_front, push_back, non-const front, non-const back, non-const operator[], or non-const at with the expected semantics. So it appears that Container can't qualify as a "sequence". Further, Container can't provide operator=, and swap, and it can't provide an iterator type that points to a non-const element. So, it can't even qualify as a "container".
Is there some less-capable STL concept that Container meets? Is there a "read-only container" or an "immutable container"?
If Container doesn't meet any defined level of conformance, is there value in partial conformance? Is is misleading to make it look like a "container", when it doesn't qualify? Is there a concise, unambiguous way that I can document the conformance so that I don't have to explicitly document the conforming semantics? And similarly, a way to document it so that future users know they can take advantage of read-only generic code, but don't expect mutating algorithms to work?
What do I get if I relax the problem so Container is Assignable (but its elements are not)? At that point, operator= and swap are possible, but dereferencing iterator still returns a const Element. Does Container now qualify as a "container"?
const std::list<T> has approximately the same interface as Container. Does that mean it is neither a "container" nor a "sequence"?
Footnote (1) I have use cases that cover this whole spectrum. I have a would-be-container class that adapts some read-only data, so it has to be immutable. I have a would-be-container that generates its own contents as needed, so it's mutable but you can't replace elements the way the STL requires. I yet have another would-be-container that stores its elements in a way that would make insert() so slow that it would never be useful. And finally, I have a string that stores text in UTF-8 while exposing a code-point oriented interface; a mutable implementation is possible but completely unnecessary.
Footnote (2) This is just for illustration. I'm pretty sure std::list requires an assignable element type.

The STL doesn't define any lesser concepts; mostly because the idea of const is usually expressed on a per-iterator or per-reference level, not on a per-class level.
You shouldn't provide iterator with unexpected semantics, only provide const_iterator. This allows client code to fail in the most logical place (with the most readable error message) if they make a mistake.
Possibly the easiest way to do it would be to encapsulate it and prevent all non-const aliases.
class example {
std::list<sometype> stuff;
public:
void Process(...) { ... }
const std::list<sometype>& Results() { return stuff; }
};
Now any client code knows exactly what they can do with the return value of Results- nada that requires mutation.

As long as your object can provider a conforming const_iterator it doesn't have to have anything else. It should be pretty easy to implement this on your container class.
(If applicable, look at the Boost.Iterators library; it has iterator_facade and iterator_adaptor classes to help you with the nitty-gritty details)

Related

Is there assign_or_inserter/insert_or_assigner in C++20?

std::inserter suffers from problem that it calls insert that for map is noop if key already exists.
There is std::map::insert_or_assign, but I was unable to find an inserter that uses that.
Is there something like this in C++20?
note: I know I can c/p it from somewhere on the SO/internet, I am interested in STL/boost solutions, not c/p implementation from somewhere.

No, there is no corresponding inserter that use insert_or_assign. In fact, the wording explicitly says there are only 3 such functions:
back_inserter, front_inserter, and inserter are three functions making the insert iterators out of a container.

There is no insert iterator with insert_or_assign functionality in the C++20 Standard Library
Insert iterators are governed by [insert.iterators], and it is particularly described in [insert.iterators]/1 that the only functions to create iterators
[...] back_inserter, front_inserter, and inserter are three functions making the insert iterators out of a container. [...]
are back_inserter, front_inserter, and inserter, which, as covered by [back.inserter], [front.inserter] and [inserter], respectively, returns back_insert_iterator, front_insert_iterator and insert_iterator, respectively.
As governed by [back.insert.iter.ops], [front.insert.iter.ops] and [insert.iter.ops], the effect of copy assignment of a value to these iterators is as if calling
// back_insert_iterator
container->push_back(value);
// front _insert_iterator
container->push_front(value);
// insert_iterator
container->insert(iter, value);
respectively. I.e., none of these insert iterators offer the effect of the insert_or_assign member functions of std::map and std::unordered_map (which was introduced by N4713, which makes no mention whatsoever regarding insert iterators in the sense of [insert.iterators]).

There are a few hurdles that make it unlikely to be standardised.
map::insert_or_assign doesn't have a value_type (reference) parameter, which is common across each of insert, push_front and push_back. Remember that operator= only has one parameter, so you'll have to construct a pair to assign to your proxy.
It would be desirable to forward either or both of key and value, how do you distinguish passing const std::pair<const Key, Value> &, std::pair<Key &&, const Value &> etc.
There are only 2 standard containers that have insert_or_assign, map and unordered_map. Every container has insert, and all the sequence containers have one or both of push_front or push_back.

What could be a "least bad implementation" for an iterator over a proxied container?

Context
I was trying to implement a nD array like container. Something that would wrap an underlying sequence container and allow to process it as a container of containers (of...): arr[i][j][k] should be a (eventually const) reference for _arr[(((i * dim2) + j) * dim3) + k].
Ok until there, arr[i] has just to be a wrapper class over the subarray...
And when I tried to implement interators, I suddenly realized that dragons were everywhere around:
my container is not a standard compliant container because operator [] returns a proxy or wrapper instead of a true reference (When Is a Container Not a Container?)
this causes the iterator to be either a stashing iterator (which is known to be bad (Reference invalidation after applying reverse_iterator on a custom made iterator and its accepted answer)
... or a proxy iterator which is not necessarily better (To Be or Not to Be (an Iterator))
The real problem is that as soon as you have a proxied container, no iterator can respect the following requirement for a forward iterator:
Forward iterators [forward.iterators]
...6 If a and b are both dereferenceable, then a == b if and only if *a and *b are bound to the same object.
Examples come from the standard library itself:
vector<bool> is known not to respect all the requirements for containers because it returns proxies instead of references:
Class vector [vector.bool]...3 There is no requirement that the data be stored as a contiguous allocation of bool values. A space-optimized
representation of bits is recommended instead.
4 reference is a class that simulates the behavior of references of a single bit in vector.
filesystem path iterator is known to be a stashing iterator:
path iterators [fs.path.itr]...
2 A path::iterator is a constant iterator satisfying all the requirements of a bidirectional iterator (27.2.6)
except that, for dereferenceable iterators a and b of type path::iterator with a == b, there is no requirement
that*a and *b are bound to the same object.
and from cppreference:
Notes: std::reverse_iterator does not work with iterators that return a reference to a member object (so-called "stashing iterators"). An example of stashing iterator is std::filesystem::path::iterator.
Question
I have currently found plenty of references about why proxied containers are not true containers and why it would be nice if proxied containers and iterators were allowed by the standard. But I have still not understood what was the best that could be done and what were the real limitations.
So my question is why proxy iterators are really better that stashing ones, and what algorithms are allowed for either of them. If possible, I would really love to find a reference implementation for such an iterator
For references, a current implementation of my code has been submitted on Code Review. It contains a stashing iterator (that broke immediately when I try to use std::reverse_iterator)

OK, we have two similar but distinct concepts. So lets lay them out.
But first, I need to make a distinction between the named requirements of C++-pre-20, and the actual in-language concepts created for the Ranges TS and included in C++20. They're both called "concepts", but they're defined differently. As such, when I talk about concept-with-a-lowercase-c, I mean the pre-C++20 requirements. When I talk about Concept-with-a-captial-C, I mean the C++20 stuff.
Proxy Iterators
Proxy iterators are iterators where their reference is not a value_type&, but is instead some other type that behaves like a reference to value_type. In this case, *it returns a prvalue to this reference.
The InputIterator concept imposes no requirement on reference, other than that it is convertible to value_type. However, the ForwardIterator concept makes the explicit statement that "reference is a reference to T".
Therefore, a proxy iterator cannot fit the ForwardIterator concept. But it can still be an InputIterator. So you can safely pass a proxy iterator to any function that only requires InputIterators.
So, the problem with vector<bool>s iterators is not that they're proxy iterators. It's that they promise they fulfill the RandomAccessIterator concept (though the use of the appropriate tag), when they're really only InputIterators and OutputIterators.
The Ranges proposal (mostly) adopted into C++20 makes changes to the iterator Concepts which allow all iterators to be proxy iterators. So under Ranges, vector<bool>::iterator really fulfills the RandomAccessIterator Concept. Therefore, if you have code written against the Ranges concepts, then you can use proxy iterators of all kinds.
This is very useful for dealing with things like counting ranges. You can have reference and value_type be the same type, so you're just dealing with integers either way.
And of course, if you have control over the code consuming the iterator, you can make it do whatever you want, so long as you don't violate the concept your iterator is written against.
Stashing Iterators
Stashing iterators are iterators where reference is (directly or indirectly) a reference to an object stored in the iterator. Therefore, if you make a copy of an iterator, the copy will return a reference to a different object than the original, even though they refer to the same element. And when you increment the iterator, previous references are no longer valid.
Stashing iterators are usually implemented because computing the value you want to return is expensive. Maybe it would involve a memory allocation (such as path::iterator) or maybe it would involve a possibly-complex operation that should only be done once (such as regex_iterator). So you only want to do it when necessary.
One of the foundations of ForwardIterator as a concept (or Concept) is that a range of these iterators represents a range over values which exist independently of their iterators. This permits multipass operation, but it also makes doing other things useful. You can store references to items in the range, and then iterate elsewhere.
If you need an iterator to be a ForwardIterator or higher, you should never make it a stashing iterator. Of course, the C++ standard library is not always consistent with itself. But it usually calls out its inconsistencies.
path::iterator is a stashing iterator. The standard says that it is a BidirectionalIterator; however, it also gives this type an exception to the reference/pointer preservation rule. This means that you cannot pass path::iterator to any code that might rely on that preservation rule.
Now, this doesn't mean you can't pass it to anything. Any algorithm which requires only InputIterator will be able to take such an iterator, since such code cannot rely on that rule. And of course, any code which you write or which specifically states in its documentation that it doesn't rely on that rule can be used. But there's no guarantee that you can use reverse_iterator on it, even though it says that it is a BidirectionalIterator.
regex_iterators are even worse in this regard. They are said to be a ForwardIterators based on their tag, but the standard never says that they actually are ForwardIterators (unlike path::iterator). And the specification of them as having reference be an actual reference to a member object makes it impossible for them to be true ForwardIterators.
Note that I made no distinction between the pre-C++20 concept and the Ranges Concept. That's because the std::forward_iterator Concept still forbids stashing iterators. This is by design.
Usage
Now obviously, you can do whatever you want in your code. But code you don't control will be under the domain of its owners. They will be writing against the old concepts, the new Concepts, or some other c/Concept or requirement that they specify. So your iterators need to be able to be compatible with their needs.
The algorithms that the Ranges introduces uses the new Concepts, so you can always rely on them to work with proxy iterators. However, as I understand it, the Range Concepts are not back-ported into older algorithms.
Personally, I would suggest avoiding stashing iterator implementations entirely. By providing complete support for proxy iterators, most stashing iterators can be rewritten to return values rather than references to objects.
For example, if there were a path_view type, path::iterator could have returned that instead of a full-fledged path. That way, if you want to do the expensive copy operation, you can. Similarly, the regex_iterators could have returned copies of the match object. The new Concepts make it possible to work that way by supporting proxy iterators.
Now, stashing iterators handle caching in a useful way; iterators can cache their results so that repeated *it usage only does the expensive operation once. But remember the problem with stashing iterators: returning a reference to their contents. You don't need to do that just to get caching. You can cache the results in an optional<T> (which you invalidate when the iterator is in/decremented). So you can still return a value. It may involve an additional copy, but reference shouldn't be a complex type.
Of course, all of this means that auto &val = *it; isn't legal code anymore. However, auto &&val = *it; will always work. This is actually a big part of the Range TS version of iterators.

For a vector, why prefer an iterator over a pointer?

In Herb Sutter's When Is a Container Not a Container?, he shows an example of taking a pointer into a container:
// Example 1: Is this code valid? safe? good?
//
vector<char> v;
// ...
char* p = &v[0];
// ... do something with *p ...
Then follows it up with an "improvement":
// Example 1(b): An improvement
// (when it's possible)
//
vector<char> v;
// ...
vector<char>::iterator i = v.begin();
// ... do something with *i ...
But doesn't really provide a convincing argument:
In general, it's not a bad guideline to prefer using iterators instead
of pointers when you want to point at an object that's inside a
container. After all, iterators are invalidated at mostly the same
times and the same ways as pointers, and one reason that iterators
exist is to provide a way to "point" at a contained object. So, if you
have a choice, prefer to use iterators into containers.
Unfortunately, you can't always get the same effect with iterators
that you can with pointers into a container. There are two main
potential drawbacks to the iterator method, and when either applies we
have to continue to use pointers:
You can't always conveniently use an iterator where you can use a pointer. (See example below.)
Using iterators might incur extra space and performance overhead, in cases where the iterator is an object and not just a bald
pointer.
In the case of a vector, the iterator is just a RandomAccessIterator. For all intents and purposes this is a thin wrapper over a pointer. One implementation even acknowledges this:
// This iterator adapter is 'normal' in the sense that it does not
// change the semantics of any of the operators of its iterator
// parameter. Its primary purpose is to convert an iterator that is
// not a class, e.g. a pointer, into an iterator that is a class.
// The _Container parameter exists solely so that different containers
// using this template can instantiate different types, even if the
// _Iterator parameter is the same.
Furthermore, the implementation stores a member value of type _Iterator, which is pointer or T*. In other words, just a pointer. Furthermore, the difference_type for such a type is std::ptrdiff_t and the operations defined are just thin wrappers (i.e., operator++ is ++_pointer, operator* is *_pointer) and so on.
Following Sutter's argument, this iterator class provides no benefits over pointers, only drawbacks. Am I correct?

For vectors, in non-generic code, you're mostly correct.
The benefit is that you can pass a RandomAccessIterator to a whole bunch of algorithms no matter what container the iterator iterates, whether that container has contiguous storage (and thus pointer iterators) or not. It's an abstraction.
(This abstraction, among other things, allows implementations to swap out the basic pointer implementation for something a little more sexy, like range-checked iterators for debug use.)
It's generally considered to be a good habit to use iterators unless you really can't. After all, habit breeds consistency, and consistency leads to maintainability.
Iterators are also self-documenting in a way that pointers are not. What does a int* point to? No idea. What does an std::vector<int>::iterator point to? Aha…
Finally, they provide a measure a type safety — though such iterators may only be thin wrappers around pointers, they needn't be pointers: if an iterator is a distinct type rather than a type alias, then you won't be accidentally passing your iterator into places you didn't want it to go, or setting it to "NULL" accidentally.
I agree that Sutter's argument is about as convincing as most of his other arguments, i.e. not very.

You can't always conveniently use an iterator where you can use a pointer
That is not one of the disadvantages. Sometimes it is just too "convenient" to get the pointer passed to places where you really didn't want them to go. Having a separate type helps in validating parameters.
Some early implementations used T* for vector::iterator, but it caused various problems, like people accidentally passing an unrelated pointer to vector member functions. Or assigning NULL to the iterator.
Using iterators might incur extra space and performance overhead, in cases where the iterator is an object and not just a bald pointer.
This was written in 1999, when we also believed that code in <algorithm> should be optimized for different container types. Not much later everyone was surprised to see that the compilers figured that out themselves. The generic algorithms using iterators worked just fine!
For a std::vector there is absolutely no space of time overhead for using an iterator instead of a pointer. You found out that the iterator class is just a thin wrapper over a pointer. Compilers will also see that, and generate equivalent code.

One real-life reason to prefer iterators over pointers is that they can be implemented as checked iterators in debug builds and help you catch some nasty problems early. I.e:
vector<int>::iterator it; // uninitialized iterator
it++;
or
for (it = vec1.begin(); it != vec2.end(); ++it) // different containers

Can I create an empty range (iterator pair) without an underlying container object?

I have a class akin to the following:
struct Config
{
using BindingContainer = std::map<ID, std::vector<Binding>>;
using BindingIterator = BindingContainer::mapped_type::const_iterator;
boost::iterator_range<BindingIterator> bindings(ID id) const;
private:
BindingContainer m_bindings;
};
Since the ID passed to bindings() might not exist, I need to be able to represent a 'no bindings' value in the return type domain.
I don't need to differentiate an unknown ID from an ID mapped to an empty vector, so I was hoping to be able to achieve this with the interface as above and return an empty range with default-constructed iterators. Unfortunately, although a ForwardIterator is DefaultConstructible [C++11 24.2.5/1] the result of comparing a singular iterator is undefined [24.2.1/5], so without a container it seems this is not possible.
I could change the interface to e.g wrap the iterator_range in a boost::optional, or return a vector value instead; the former is a little more clunky for the caller though, and the latter has undesirable copy overheads.
Another option is to keep a statically-allocated empty vector and return its iterators. The overhead wouldn't be problematic in this instance, but I'd like to avoid it if I can.
Adapting the map iterator to yield comparable default-constructed iterators is a possibility, though seems over-complex...
Are there any other options here that would support returning an empty range when there is no underlying container?
(Incidentally I'm sure a while back I read a working paper or article about producing empty ranges for standard container type when there is no container object, but can't find anything now.)
(Note I am limited to C++11 features, though I'd be interested if there is any different approach requiring later features.)

Nope, there aren't. Your options are as you suggest. Personally, I would probably go with the idea of hijacking an iterator pair from a static empty vector; I can't imagine what notional "overhead" would be involved here, beyond a couple of extra bytes in your process image.
Is this a singular iterator and, if so, can I compare it to another one?
Comparing default-constructed iterators with operator==
And this hasn't changed in either C++14 or C++17 (so far).

You may use a default constructed boost::iterator_range
from (https://www.boost.org/doc/libs/1_55_0/libs/range/doc/html/range/reference/utilities/iterator_range.html):
However, if one creates a default constructed iterator_range, then one
can still call all its member functions. This design decision avoids
the iterator_range imposing limitations upon ranges of iterators that
are not singular.
Example here:
https://wandbox.org/permlink/zslaPwmk3lBI4Q9N

Is there an operational difference between std::set::iterator and std::set::const_iterator?

For most containers, the iterator type provides read-write access to values in the container, and the const_iterator type provides read-only access. However, for std::set<T>, the iterator type cannot provide read-write access, because modifying a value in the set (potentially) breaks the container invariants. Therefore, in std::set<T>, both iterator and const_iterator provide read-only access.
This leads me to my question: Is there any difference between the things you can do with a std::set<T>::iterator and the things you can do with a std::set<T>::const_iterator?
Note that in C++11, the manipulation methods of containers (e.g., erase) can take const_iterator arguments.

No, there's not much much functional difference between them. Of course, there used to be back in C++03, when set<T>::iterator didn't return a const T&. But once they changed it, they were stuck with two different kinds of iterators that both do the same thing.
Indeed, the standard is quite clear that they have identical functionality (to the point where they can be the same type, but aren't required to be). From 23.2.4, p. 6:
iterator of an associative container is of the bidirectional iterator category. For associative containers where the value type is the same as the key type, both iterator and const_iterator are constant iterators. It is unspeciﬁed whether or not iterator and const_iterator are the same type. [ Note: iterator and const_iterator have identical semantics in this case, and iterator is convertible to const_iterator. Users can avoid violating the One Definition Rule by always using const_iterator in their function parameter lists. —end note ]

When we (Err I) ported our large app to VC 10.0, this rule took affect. It broke all sorts of old code where folks manipulated the iterators by calling non const methods on them.
So that leads to the biggest difference I found: You can only call const methods on a const iterator. Where-as in the old standard you could willy-nilly call non-const methods and mess up your set. In some cases I ended up replacing some set's with map's where I found the code absolutely required modification to the items getting stored in the container.
Hope that helps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js