C++ create a char* iterator - c++

I like to write container agnostic code that uses std methods like std::distance() or std::advance() in my code. This is useful for deserialization where I can pass in buffer objects of different types (network stream, byte stream, ...).
How can I convert char* or uint8_t* pointers to an iterator? Copying the data to a buffer is not an option.
One option I have in mind is to use a custom allocator with std::string but I'd prefer a more ready made solution if available.Any suggestions?

There are several types of iterators, specified by what properties they have (functionality they support) - there is a nice overview here http://www.cplusplus.com/reference/iterator/
Random access iterators require to implement all the iterator functionality seen in that table.
Raw pointers do in fact support all the operations and are therefore random access operators iterators and can be used for all STL algorithms and containers. Also discussed here Can raw pointers be used instead of iterators with STL algorithms for containers with linear storage?.
Although not necessary, it might still be useful to implement an iterator wrapper for your pointers - this is also discussed in the answers to the question above.

Nevermind. Those pointers work as iterators anyway because they implement the basic functionality.

Related

Wrapping C API taking raw pointers by C++ function that takes iterators

What is the best way to wrap a C API that takes plain pointers by a funtion that takes iterators?
I am trying to wrap a legacy API that takes pointers and size to contigious memory. However in my wrapper I would like to provide an iterator based interface.
Here is an example to illustrate the problem:
// C-Api
bool socket_send(void* buffer, size_t size);
// C++ Wrapper
class Socket {
template <...>
bool send(Iterator begin, Iterator end) {
...
}
};
One solution that comes to my mind is to create a vector and copy the values:
class Socket {
template <...>
bool send(Iterator begin, Iterator end) {
std::vector<...> buff(begin, end);
socket_send(buffer.data(), buffer.size()*sizeof(decltype(buffer)::value_type);
}
};
The problem, that I have with this solution, is the unneccessary copy of the data even if it is already contigious memory.
Is there a generic and efficient solution around this problem?
There is no general solution to this, because iterators do not know whether the range they represent is contiguous. Only pointers (okay these are a kind of iterator too) know that. Furthermore, your C API necessarily assumes this.
So, there is already a very small domain of uses for your proposed Iterator wrapper. In every case that it will be useful, your source data (e.g. a vector) will already have a way to trivially obtain a pointer and a size. In every other case, you will have to copy the data into a new vector, which you've correctly said is not optimal.
Therefore, you are better off simply not doing this. The existing interface is already exactly as it should be.
If you already have vector iterators, you can map to the C interface like socket_send((void*)&*begin, std::distance(begin, end)). You could make an forwarding overload for these (specifically for vector iterators), but is there really much point? socket_send(vec.data(), vec.size()) is nice.
And if you do really need to accept arbitrary ranges then, fine, do your copy. But note that the standard typically doesn't provide "automatic" or "implicit" or "easy-to-call" features for techniques that cost a lot (e.g. there is no random access for lists, because that would be expensive), and I suggest you take the same approach.
From C++20 we will have contiguous_iterator_tag, which you could use to disable your generic interface for incompatible iterator ranges. Again, though, I'd question the utility of this since in such cases you already have the information you need in the C API form.

why is begin() needed in std::vector erase?

Why do we have to write v.erase(v.begin(), v.begin()+3) ?
Why isn't it defined as erase(int, int) so you can write v.erase(0,2) and the implementation takes care of the begin()s?
The interface container.erase(iterator, iterator) is more general and works for containers that don't have indexing, like std::list. This is an advantage if you write templates and don't really know exactly which container the code is to work on.
The original design aimed at being as general as possible, and iterators are more general than indexes. The designers could have added extra index-based overloads for vector, but decided not to.
In STL, iterators are the only entity that provides general access to STL containers.
The array data structure can be accessed via pointers and indexes. iterators are generalization of these indexes/pointers.
Linked list can be accessed with moving pointers (a la ptr = ptr->next). iterators are generalization to these.
Trees and HashTables need special iterator class which encapsulates the logic of iterating these data structures.
As you can see, iterators are the general type which allows you to do common operations (such as iteration, deletion etc.) on data structures, regardless their underlying implementation.
This way, you can refactor your code to use std::list and container.erase(it0, it1) still works without modifying the code.

When creating my own data structure, should I use iterators or indices to provide access from the outside?

Suppose I'm writing a project in a modern version of C++ (say 11 or 14) and use STL in that project. At a certain moment, I need to program a specific data structure that can be built using STL containers. The DS is encapsulated in a class (am I right that encapsulating the DS in a class is the only correct way to code it in C++?), thus I need to provide some sort of interface to provide read and/or write access to the data. Which leads us to the question:
Should I use (1a) iterators or (1b) simple "indices" (i.e. numbers of a certain type) for that? The DS that I'm working on right now is pretty much linear, but then when the elements are removed, of course simple integer indices are going to get invalidated. That's about the only argument against this approach that I can imagine.
Which approach is more idiomatic? What are the objective technical arguments for and against each one?
Also, when I choose to use iterators for my custom DS, should I (2a) public-ly typedef the iterators of the container that is used internally or (2b) create my own iterator from scratch? In the open libraries such as Boost, I've seen custom iterators being written from scratch. On the other hand, I feel I'm not able to write a proper iterator yet (i.e. one that is as detailed and complex as the ones in STL and/or Boost).
Edit as per #πάντα ῥεῖ request:
I've asked myself this question with a few DS in a few projects while studying at the Uni, but here's the last occurrence that made me come here and ask.
The DS is meant to represent a triangle array, or vertex array, or whatever one might call it. Point is, there are two arrays or lists, one storing the vertex coordinates, and another one storing triplets of indices from the first array, thus representing triangles. (This has been coded a gazillion times already, yet I want to write it on my own, once, for the purpose of learning.) Obviously, the two arrays should stay in sync, hence the encapsulation. The set of operations is meant to include adding (maybe also removing) a vertex, adding and removing a triangle (a vertex triplet) using the vertex data from the same array. How I see it is that the client adds vertices, writes down the indices/iterators, and then issues a call to add a triangle based on those indices/iterators, which in turn returns another index/iterator to the resulting triangle.
I don't see why you couldn't get both, if this makes sense for your container.
std::vector has iterators and the at/operator[] methods to provide access with indexes.
The API of your container depends on the operations you want to make available to your clients.
Is the container iterable, i.e. is it possible to iterate over each elements? Then, you should provide an iterator.
Does it make sense to randomly access elements in your container, knowing their address? Then you can also provide the at(size_t)/operator[size_t] methods.
Does it make sense to randomly access elements in your container,
knowing a special "key"? The you should probably provide the at(key_type)/operator[key_type] methods.
As for your question regarding custom iterators or reuse of existing iterators:
If your container is basically a wrapper that adds some insertion/removal logic to an existing container, I think it is fine to publicly typedef the existing iterator, as a custom iterator may miss some features of the the existing iterator, may contain bugs, and will not add any significant feature over the existing iterator.
On the other hand, if you iterate in a non-standard fashion (for instance, I implemented once a recursive_unordered_map that accepted a parent recursive_unordered_map at construction and would iterate both on its own unordered_map and on its parent's (and its parent's parent's...). I had to implement a custom iterator for this.
Which approach is more idiomatic?
Using iterators is definitely the way to go. Functions in <algorithm> don't work with indices. They work with iterators. If you want your container to be enabled for use by the functions in <algorithm>, using iterators is the only way to go.
In general, it is recommended that the class offers its own iterator. Under the hood, it could be an index or a STL iterator (preferred). But, as long as external clients and public APIs are concerned, they only deal with the iterator offered by the class.
Example 1
class Dictionary {
private:
typedef std::unordered_map<string, string> DictType;
public:
typedef DictType::iterator DictionaryIterator;
};
Example 2
class Sequence {
private:
typedef std::vector<string> SeqType;
public:
struct SeqIterator {
size_t index;
SeqIterator operator++();
string operator*();
};
};
If the clients are operating solely on SeqIterator, then the above can later be modified to
class Sequence {
private:
typedef std::deque<string> SeqType;
public:
typedef SeqType::iterator SeqIterator;
};
without the clients getting affected.

Why is string not a (subclass of) vector?

C strings are char arrays.
Vectors are the new arrays in C++.
Why aren't strings vectors (of chars) then?
Most of the methods of vectors and strings seem duplicated too. Is there a reason for making strings a different thing in C++?
It's pretty much just historical. Strings and vectors were developed in parallel with little thought going to how they could be considered one and the same, for T==char.
That's also why standard containers are nice and generic, whereas std::basic_string is a total monolith of member function after member function.
Edge case optimisation opportunities since made it difficult or impossible to transform std::basic_string<T, Alloc> into std::vector<T, Alloc> in any sort of standard way. Take the small string optimisation, for example. Although, now that GCC's copy-on-write mechanism is officially dead, we're a little closer.
The ability to legally dereference std::string::end() (and obtain '\0' for your trouble) is still problematic, though. A bunch of fairly stringent iterator invalidation rules for .c_str() basically prevent us from using std::vector<char> for this right from the start.
tl;dr: this is what happens when you create a camel
The various answers in Vector vs string show several differences in interfaces between vector and string, and since the typical pattern in the standard is to use static polymorphism rather than dynamic, they were created as two different classes.
Since strings do have different characteristics from vectors it doesn't seem that you would want to use public inheritance but I don't think there's anything in the standard that would prohibit protected or private inheritance, or composition to provide the underlying space management.
Additionally I suspect that string may have been developed earlier and orthogonally to vector which likely explains why there are member methods that more likely would have been made free algorithms if developed in parallel to vector.
Yes because vector is a container which can contain T and std::string is a wrapper around char* and can't contain int or other datatypes. Wouldn't make any sense to have it any other way.

What are the advantages and disadvantages of using std::stack instead of just deque, vector or list

I am writing a very simple std::stack using vector as its underlying container. I realized that I could replace all the push(), pop() and top() functions with push_back(), pop_back() and back() of the vector container.
My questions are: why to use a container adaptor when the controlled use of the underlying container is enough? Why not to use just a deque, vector or list? There will be waste of memory or processing time?
When your code says std::stack it's clear to the reader what operations they need on the container... it communicates and documents while enforcing that no other operations are used. It may help them quickly form an impression of the algorithmic logic in your code. It's then easy to substitute other implementations that honour the same interface.
It's a bit like using std::ifstream instead of std::fstream - you can read and write with std::fstream, but whomever reads your code will need to consider more possible uses you put the stream to before realising that it's only being used for reading; you'd be wasting their mental effort.