Best practice on implementing object stream container in modern c++

Best practice on implementing object stream container in modern c++ - c++

I want to wrap some c library in modern C++ way.
The library provide a way to reverse serialize objects from binary string.
So that its API appears can only forward from begin of the string to the end, where the part have been processed will not be kept, just like stream.
However it works different from standard stream, it will not support "<<" operator to return char, and for loops should not return char also. It needs a iterator which can iterator on it and return objects it generated.
at first, I want to implement code just like below:
class Obj{
c_ptr ptr;
.....
}
class X{
public:
class const_iterator : std::iterator<std::forward_iterator_tag, Obj>{
......
};
class iterator : const_iterator{
.....
};
X::const_iterator cbegin();
X::iterator begin();
X::const_iterator cend();
X::iterator end();
........
}
or merge Obj class into iterator.
There are problems in this case.
1.as vector iterator example shows, begin and end() should return index value. But here X is stream, I can only get the begin once, after that access stream would not the the first byte. In iostream the end() seems return a specific char EOF.
I suppose that I can not inherit X from istream? because istream seem is designed for char streams operation with lots of mechanism as overflow, etc, which are not needed for the wrapper.
iterator inherit from const_iterator is suggest by some one to reduce the similar code. But it seems still have a lots of code should be different, mainly focus on it's declaration.
Is there any best practice on this kind of container or iterator implementation in modern C++?

The iterators actually don't return index values. They are pointers to typed objects I think.
In the case of vector the iterator fulfills the requirements (properties/traits/...) of a RandomAccessIterator and that's why you can access by subscript operator.
I suggest you read upon the iterator concepts first, you might need some of the concepts four your design: InputIterator/OutputIterator, ForwardIterator (I believe this iterator is what you might consider probably), etc.
begin() usually always points to the beginning of the container and end() usually always points to the end of the container. In STL I don't know about an exception of that. Also it can be that some operations of the container can invalidate an iterator (which I believe is likely in your use case). In your case, you first need to be clear of your use case/requirement for your stream design.
You're probably right. And it is not a good idea to extend a class and change the semantics of derived methods, etc.
It depends what you want to do with your stream. Simply iterate over elements in read-only mode? Or do you need to be able to write? Or in reverse? There are four kinds: iterator, const_iterator, reverse_iterator and const_reverse_iterator.
The best practice is the STL, though very difficult to digest. I recommend the book The C++ Programming Language where you can learn about the ideas, design, use cases, etc.

Related

Abstracting containers in C++

I would like to have a virtual base class (interface) that contains a method that returns an iterable object (a container?). The implementing classes would each handle their containers themselves, for example, class A would implement the container as a vector, class B as a list, etc.
Essentially, I would like to do something like this:
class Base{
virtual Container getContainer() = 0;
}
class A:Base{
Vector v;
Container getContainer() {return v;}
}
A a;
Iterator begin = a.getContainer().begin;
Iterator end = a.getContainer().end;
As in, the caller will be responsible for handling the iterator (calling the begin and end functions for iteration for example)
I assume something like this is possible, but I can't figure out how to do it. Specifically, I assume classes like vector and list inherit from a common interface that defines methods begin() and end(), but I can't figure out what that interface is, and how to handle it.

Check out cppreference.com. This common interface is only a convention and some rules, not actual C++ types. So no, you can't use containers polymorphically. The simple reason is efficiency, runtime polymorphism costs performance. For a more detailed insight, there are many documents concerning the design ideas behind the STL, which is reflected there.
If you really want to implement this, not only would you need wrapper classes around the STL-style sequence containers (deque, vector, list, array), but also around the iterators. Iterators are usually hard-tied to the containers, hence this connection. The wrapper classes will also have to provide the various methods you'd find in the underlying containers, like e.g. begin(), end(), size() etc.
Note that not every container supports all methods. For example, some don't support push_front() but only push_back(). All of them have begin() and end(), but a singly-linked list (not currently part of C++) can't have rbegin() and rend(). You could sort those categories into separate pure virtual baseclasses.
Overall, I'd doubt the usefulness of this. If you want to swap implementations, design your code so that the container becomes a template parameter to it. All calls are then resolved at compile time, leading to less code, less memory requirements and more performance.

Containers are not polymorphic. There is no common base class. The same is true of iterators.
This is intentional, because iterating element by element turns out to be highly inefficient when done through a virtual method based interface. It can be done, but it is slow.
Boost has any ranges and any iterators, or you can roll your own vis type erasure techniques. I advise against it.
The simplest and cheapest way to get iteration polymorphic is adding a foreach_element(std::function<void(Element const&)>)const method. You can batch up the iteration to reduce overhead with foreach_element(std::function<void(std::span<Element const>)>)const, allowing elements to be clumped by the container. That would be easy to write and faster than fully polymorphic iterators and containers.

Does Using a Pointer as a Container Iterator Violate the Standard

Angew made a comment that a vector using a raw pointer as it's iterator type was fine. That kinda threw me for a loop.
I started researching it and found that the requirement for vector iterators was only that they are "Random Access Iterators" for which it is explicitly stated that pointers qualify:
A pointer to an element of an array satisfies all requirements
Is the only reason that compilers even provide iterators to vector for debugging purposes, or is there actually a requirement I missed on vector?

§ 24.2.1
Since iterators are an abstraction of pointers, their semantics is a generalization of most of the semantics
of pointers in C++. This ensures that every function template that takes iterators works as well with
regular pointers.
So yes, using a pointer satisfies all of the requirements for a Random Access Iterator.
std::vector likely provides iterators for a few reasons
The standard says it should.
It would be odd if containers such as std::map or std::set provided iterators while std::vector provided only a value_type* pointer. Iterators provide consistency across the containers library.
It allows for specializations of the vector type eg, std::vector<bool> where a value_type* pointer would not be a valid iterator.

My 50 cents:
Iterators are generic ways to access any STL container. What I feel you're saying is: Since pointers are OK as a replacement of iterators for vectors, why are there iterators for vectors?
Well, who said you can't have duplicates in C++? Actually it's a good thing to have different interfaces to the same functionality. That should not be a problem.
On the other hand, think about libraries that have algorithms that use iterators. If vectors don't have iterators, it's just an invitation to exceptions (exceptions in the linguistic since, not programming sense). Every time one has to write an algorithm, he must do something different for vectors with pointers. But why? No reason for this hassle. Just interface everything the same way.

What those comments are saying is that
template <typename T, ...>
class vector
{
public:
typedef T* iterator;
typedef const T* const_iterator;
...
private:
T* elems; // pointer to dynamic array
size_t count;
...
}
is valid. Similarly a user defined container intended for use with std:: algorithms can do that. Then when a template asks for Container::iterator the type it gets back in that instantiation is T*, and that behaves properly.
So the standard requires that vector has a definition for vector::iterator, and you use that in your code. On one platform it is implemented as a pointer into an array, but on a different platform it is something else. Importantly these things behave the same way in all the aspects that the standard specifies.

For a vector, why prefer an iterator over a pointer?

In Herb Sutter's When Is a Container Not a Container?, he shows an example of taking a pointer into a container:
// Example 1: Is this code valid? safe? good?
//
vector<char> v;
// ...
char* p = &v[0];
// ... do something with *p ...
Then follows it up with an "improvement":
// Example 1(b): An improvement
// (when it's possible)
//
vector<char> v;
// ...
vector<char>::iterator i = v.begin();
// ... do something with *i ...
But doesn't really provide a convincing argument:
In general, it's not a bad guideline to prefer using iterators instead
of pointers when you want to point at an object that's inside a
container. After all, iterators are invalidated at mostly the same
times and the same ways as pointers, and one reason that iterators
exist is to provide a way to "point" at a contained object. So, if you
have a choice, prefer to use iterators into containers.
Unfortunately, you can't always get the same effect with iterators
that you can with pointers into a container. There are two main
potential drawbacks to the iterator method, and when either applies we
have to continue to use pointers:
You can't always conveniently use an iterator where you can use a pointer. (See example below.)
Using iterators might incur extra space and performance overhead, in cases where the iterator is an object and not just a bald
pointer.
In the case of a vector, the iterator is just a RandomAccessIterator. For all intents and purposes this is a thin wrapper over a pointer. One implementation even acknowledges this:
// This iterator adapter is 'normal' in the sense that it does not
// change the semantics of any of the operators of its iterator
// parameter. Its primary purpose is to convert an iterator that is
// not a class, e.g. a pointer, into an iterator that is a class.
// The _Container parameter exists solely so that different containers
// using this template can instantiate different types, even if the
// _Iterator parameter is the same.
Furthermore, the implementation stores a member value of type _Iterator, which is pointer or T*. In other words, just a pointer. Furthermore, the difference_type for such a type is std::ptrdiff_t and the operations defined are just thin wrappers (i.e., operator++ is ++_pointer, operator* is *_pointer) and so on.
Following Sutter's argument, this iterator class provides no benefits over pointers, only drawbacks. Am I correct?

For vectors, in non-generic code, you're mostly correct.
The benefit is that you can pass a RandomAccessIterator to a whole bunch of algorithms no matter what container the iterator iterates, whether that container has contiguous storage (and thus pointer iterators) or not. It's an abstraction.
(This abstraction, among other things, allows implementations to swap out the basic pointer implementation for something a little more sexy, like range-checked iterators for debug use.)
It's generally considered to be a good habit to use iterators unless you really can't. After all, habit breeds consistency, and consistency leads to maintainability.
Iterators are also self-documenting in a way that pointers are not. What does a int* point to? No idea. What does an std::vector<int>::iterator point to? Aha…
Finally, they provide a measure a type safety — though such iterators may only be thin wrappers around pointers, they needn't be pointers: if an iterator is a distinct type rather than a type alias, then you won't be accidentally passing your iterator into places you didn't want it to go, or setting it to "NULL" accidentally.
I agree that Sutter's argument is about as convincing as most of his other arguments, i.e. not very.

You can't always conveniently use an iterator where you can use a pointer
That is not one of the disadvantages. Sometimes it is just too "convenient" to get the pointer passed to places where you really didn't want them to go. Having a separate type helps in validating parameters.
Some early implementations used T* for vector::iterator, but it caused various problems, like people accidentally passing an unrelated pointer to vector member functions. Or assigning NULL to the iterator.
Using iterators might incur extra space and performance overhead, in cases where the iterator is an object and not just a bald pointer.
This was written in 1999, when we also believed that code in <algorithm> should be optimized for different container types. Not much later everyone was surprised to see that the compilers figured that out themselves. The generic algorithms using iterators worked just fine!
For a std::vector there is absolutely no space of time overhead for using an iterator instead of a pointer. You found out that the iterator class is just a thin wrapper over a pointer. Compilers will also see that, and generate equivalent code.

One real-life reason to prefer iterators over pointers is that they can be implemented as checked iterators in debug builds and help you catch some nasty problems early. I.e:
vector<int>::iterator it; // uninitialized iterator
it++;
or
for (it = vec1.begin(); it != vec2.end(); ++it) // different containers

Why do C++ STL container begin and end functions return iterators by value rather than by constant reference?

As I look at the standard for different STL objects and functions, one thing that doesn't make sense to me is why would the begin() and end() functions for container objects return an iterator by value rather than by constant reference? It seems to me that iterators could be held by the container object internally and adjusted whenever the container is mutated. This would mitigate the cost of creating unnecessary temporaries in for loops like this:
for (std::vector<int>::iterator it=my_vec.begin(); it!=my_vec.end(); ++it){
//do things
}
Is this a valid concern? Is there something about using references to iterators that makes this a bad idea? Do most compiler implementations optimize this concern away anyway?

Iterators are designed to be light-weight and copyable (and assignable). For example, for a vector an iterator might literally just be a pointer. Moreover, the whole point of iterators is to decouple algorithms from containers, and so the container shouldn't have to care at all what kind of iterators anyone else is currently holding

If the begin and end methods returned a reference, the container would be forced to have each of those iterators as members. The standards people try to leave as much flexibility to an implementation as possible.
For example you can create a simple wrapper for an array that behaves as a standard container and doesn't consume any extra memory. If this wrapper were required to contain the iterators it wouldn't be so simple or small anymore.

Well, if you choose right iterator, STL would :-)
for (
std::vector<int>::const_iterator it=my_vec.begin(),
end=my_vec.end();
it!=end;
++it)
{
//do things
}
Iterator in STL has pointer-semantic. Const iterator has const pointer semantic.

Immutable C++ container class

Say that I have a C++ class, Container, that contains some elements of type Element. For various reasons, it is inefficient, undesirable, unnecessary, impractical, and/or impossible (1) to modify or replace the contents after construction. Something along the lines of const std::list<const Element> (2).
Container can meet many requirements of the STL's "container" and "sequence" concepts. It can provide the various types like value_type, reference, etc. It can provide a default constructor, a copy constructor, a const_iterator type, begin() const, end() const, size, empty, all the comparison operators, and maybe some of rbegin() const, rend() const, front(), back(), operator[](), and at().
However, Container can't provide insert, erase, clear, push_front, push_back, non-const front, non-const back, non-const operator[], or non-const at with the expected semantics. So it appears that Container can't qualify as a "sequence". Further, Container can't provide operator=, and swap, and it can't provide an iterator type that points to a non-const element. So, it can't even qualify as a "container".
Is there some less-capable STL concept that Container meets? Is there a "read-only container" or an "immutable container"?
If Container doesn't meet any defined level of conformance, is there value in partial conformance? Is is misleading to make it look like a "container", when it doesn't qualify? Is there a concise, unambiguous way that I can document the conformance so that I don't have to explicitly document the conforming semantics? And similarly, a way to document it so that future users know they can take advantage of read-only generic code, but don't expect mutating algorithms to work?
What do I get if I relax the problem so Container is Assignable (but its elements are not)? At that point, operator= and swap are possible, but dereferencing iterator still returns a const Element. Does Container now qualify as a "container"?
const std::list<T> has approximately the same interface as Container. Does that mean it is neither a "container" nor a "sequence"?
Footnote (1) I have use cases that cover this whole spectrum. I have a would-be-container class that adapts some read-only data, so it has to be immutable. I have a would-be-container that generates its own contents as needed, so it's mutable but you can't replace elements the way the STL requires. I yet have another would-be-container that stores its elements in a way that would make insert() so slow that it would never be useful. And finally, I have a string that stores text in UTF-8 while exposing a code-point oriented interface; a mutable implementation is possible but completely unnecessary.
Footnote (2) This is just for illustration. I'm pretty sure std::list requires an assignable element type.

The STL doesn't define any lesser concepts; mostly because the idea of const is usually expressed on a per-iterator or per-reference level, not on a per-class level.
You shouldn't provide iterator with unexpected semantics, only provide const_iterator. This allows client code to fail in the most logical place (with the most readable error message) if they make a mistake.
Possibly the easiest way to do it would be to encapsulate it and prevent all non-const aliases.
class example {
std::list<sometype> stuff;
public:
void Process(...) { ... }
const std::list<sometype>& Results() { return stuff; }
};
Now any client code knows exactly what they can do with the return value of Results- nada that requires mutation.

As long as your object can provider a conforming const_iterator it doesn't have to have anything else. It should be pretty easy to implement this on your container class.
(If applicable, look at the Boost.Iterators library; it has iterator_facade and iterator_adaptor classes to help you with the nitty-gritty details)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js