C++ for each in on custom collections - c++

So since it was introduced I have been loving the for each in keywords to iterate STL collections.(I'm a very very big fan of syntactic sugar).
My question is how can I write a custom collection that can be iterated using these keywords?
Essentially, what APi do I need to expose for my collections to be iterable using these keywords?
I apologize if this sounds blunt, but please do not respond with "use boost", "don't write your own collections", or the like. Pursuit of knowledge, my friends. If it's not possible, hey, I can handle that.
I'd also very much so prefer not to have to inject an STL iterator into my collections.
Thanks in advance!

Here is a good explanation of iterable data structures (Range-Based loops):
In order to make a data structure iterable, it must work similarly to the existing STL iterators.
There must be begin and end methods that operate on that structure,
either as members or as stand-alone functions, and that return iterators to
the beginning and end of the structure.
The iterator itself must support an operator* method, an operator != method, and an operator++ method, either as members or as stand-alone functions.
Note, in C++11 there is an integrated support for range-based loops without the use of STL, though the above conditions hold for this as well. You can read about it at the same link above.

It's not really clear from your quesiton whether you're talking about std::for_each defined in the <algorithm> header, or the range-based for loop introduced in C++11.
However, the answer is similar for both.
Both operate on iterators, rather than the collection itself.
So you need to
define an iterator type which satisfies the requirements placed on it by the STL (the C++ standard, really). (The main things are that it must define operator++ and operator*, and a couple of other operations and typedefs)
for std::for_each, there is no 2. You're done. You simply pass two such iterators to std::for_each. For the range-based for loop, you need to expose a pair of these iterators via the begin() and end() functions.
And... that's it.
The only tricky part is really creating an iterator which complies with the requirements. Boost (even though you said you didn't want to use it) has a library which aids in implementing custom iterators (Boost.Iterator). There is also the std::iterator class which is intended as a base class for custom iterator implementations. But neither of these are necessary. Both are just convenience tools to make it easier to create your own iterator.

Related

STL container requierments

Does the standard require that some_container<T>::value_type be T?
I am asking because I am considering different approaches to implementing an STL-compliant 2d dynamic array. One of them is to have 2Darray<T>::value_type be 2Darray_row<T> or something like that, where the array would be iterated as a collection of rows (a little simplified. My actual implementation allows iteration in 3 directions)
The container requirements are a bit funky in the sense that they are actually not used by any generic algorithm. In that sense, it doesn't really matter much.
That said, the requirements are on the interface for containers not on how the container is actually instantiated. Even non-template classes can conform to the various requirements and, in fact, do. The requirement is that value_type is present; what it is defined to depends entirely on the container implementation.
Table 96 in ยง23.2.1 in the standard (c++11) requires a container class X containing objects of type T to return T for X::value_type.
So, if your some_container stores objects of type T, then value_type has to be T.
Either have a nested container (so colArray<rowArray<T> >) or have a single wrapping (2dArray<T>), but don't try to mix them. The nested approach allows you to use STL all the way down (vector<vector<T> >), but can be confusing and doesn't allow you column iterators etc, which you seem to want.
This SO answer addresses using ublas, and another suggests using Boost multi-arrays.
Generally, go for the STL or Boost option if you can. You are unlikely to write something as well by yourself.

Is there a container facade in Boost?

I'm learning how to use iterator_facade to hide some boilerplate of iterator implementation. In my current use case I'm wrapping another container (from .NET code, actually) so I need the begin(), end(), typedefs, etc. At a minimum I want the resulting type to work with BOOST_FOREACH. Is there a convenient thing in boost to simplify that?
I would wrap a pair of iterators from the given container in a boost::iterator_range from the Boost.Range library.

Why aren't there convenience functions for set_union, etc, which take container types instead of iterators?

std::set_union and its kin take two pairs of iterators for the sets to be operated on. That's great in that it's the most flexible thing to do. However they very easily could have made an additional convenience functions which would be more elegant for 80% of typical uses.
For instance:
template<typename ContainerType, typename OutputIterator>
OutputIterator set_union( const ContainerType & container1,
const ContainerType & container2,
OutputIterator & result )
{
return std::set_union( container1.begin(), container1.end(),
container2.begin(), container2.end(),
result );
}
would turn:
std::set_union( mathStudents.begin(), mathStudents.end(),
physicsStudents.begin(), physicsStudents.end(),
students.begin() );
into:
std::set_union( mathStudents, physicsStudents, students.begin() );
So:
Are there convenience functions like this hiding somewhere that I just haven't found?
If not, can anyone thing of a reason why it would be left out of STL?
Is there perhaps a more full featured set library in boost? (I can't find one)
I can of course always put my implementations in a utility library somewhere, but it's hard to keep such things organized so that they're used across all projects, but not conglomerated improperly.
Are there convenience functions like this hiding somewhere that I just haven't found?
Not in the standard library.
If not, can anyone thing of a reason why it would be left out of STL?
The general idea with algorithms is that they work with iterators, not containers. Containers can be modified, altered, and poked at; iterators cannot. Therefore, you know that, after executing an algorithm, it has not altered the container itself, only potentially the container's contents.
Is there perhaps a more full featured set library in boost?
Boost.Range does this. Granted, Boost.Range does more than this. It's algorithms don't take "containers"; they take iterator ranges, which STL containers happen to satisfy the conditions for. They also have lazy evaluation, which can be nice for performance.
One reason for working with iterators is of course that it is more general and works on ranges that are not containers, or just a part of a container.
Another reason is that the signatures would be mixed up. Many algorithms, like std::sort have more than one signature already:
sort(Begin, End);
sort(Begin, End, Compare);
Where the second one is for using a custom Compare when sorting on other than standard less-than.
If we now add a set of sort for containers, we get these new functions
sort(Container);
sort(Container, Compare);
Now we have the two signatures sort(Begin, End) and sort(Container, Compare) which both take two template parameters, and the compiler will have problems resolving the call.
If we change the name of one of the functions to resolve this (sort_range, sort_container?) it will not be as convenient anymore.
I agree, STL should take containers instead of iterators-pairs for the following reasons;
Simpler code
Algorithms could be overloaded for specified containers, ie, could use the map::find algorithm instead of std::find -> More general code
A subrange could easily be wrapped into a container, as is done in boost::range
#Bo Persson has pointed to a problem with ambiguity, and I think that's quite valid.
I think there's a historical reason that probably prevented that from ever really even being considered though.
The STL was introduced into C++ relatively late in the standardization process. Shortly after it was accepted, the committee voted against even considering any more new features for addition into C++98 (maybe even at the same meeting). By the time most people had wrapped their head around the existing STL to the point of recognizing how much convenience you could get from something like ranges instead of individual iterators, it was too late to even be considered.
Even if the committee was still considering new features, and somebody had written a proposals to allow passing containers instead of discrete iterators, and had dealt acceptably with the potential for ambiguity, I suspect the proposal would have been rejected. Many (especially the C-oriented people) saw the STL as a huge addition to the standard library anyway. I'm reasonably certain quite a few people would have considered it completely unacceptable to add (lots) more functions/overloads/specializations just to allowing passing one parameter in place of two.
Using the begin & end elements for iteration allows one to use non-container types as inputs. For example:
ContainerType students[10];
vector<ContainerType> physicsStudents;
std::set_union(physicsStudents.begin(), physicsStudents.end(),
&students[0], &students[10],
physicsStudents.begin());
Since they are such simple implementations, I think it makes sense not to add them to the std library and allow authors to add their own. Especially given that they are templates, thus potentially increasing the lib size of the code and adding convenience functions across std would lead to code bloat.

Operator overload for [] operator

Why would you need to overload the [] operator? I have never come across a practical scenario where this was necessary. Can somebody tell me a practical use case for this.
Err.. std::vector<t>, std::basic_string<t>, std::map<k, v>, and std::deque<t> ?
I used this for a class representing a registry key, where operator[] returned an object representing a registry value with the string between []s.
See also, the Spirit Parser Framework, which uses [] for semantic actions.
Any indexable container can usefully define operator[] to become usable in any template that uses []-syntax indexing.
You don't need that syntax sugar if you're not doing generic programming -- it may look nice, but, cosmetics apart, you could always define specific named methods such as getAt, setAt, and the like, with similar and simpler-to-code functionality.
However, generic programming is at the core of modern C++... and it bears an eerie resemblance to "compile-time, type-safe duck typing" (I'm biased towards such peculiar terminology, of course, having had a part in shaping it -- cfr wikipedia;-).
Just as you should try to use, e.g., prefix-* to mean "dereferencing" for all kinds of iterators and other pointer-like types (so they can be duck-typingly substituted for pointers in a template!), so similarly you should strive to define operator[] in container types where it makes sense, just so they can be duck-typingly substituted for arrays in appropriate templates.
It is useful if you implement almost any type of container that provides random access (or at least some form of keyed access) to its elements (e.g., consider std::vector).
If you write a class that inherits from another class that implements the [] operator, you might want to overwrite the [] operator, such as std::vector or std::string. If you don't do this, your class may not work as the user expects, as your class will implicitly inherit the parent's implementation of [].
Well, several STL containers give some examples - vector<> overloads it to make it act like an array. map<> for example provides the operator[] overload to provide an 'associative array'.
While it is not strictly necessary, it is incredibly useful in making user-defined containers or strings behave like builtin arrays or C strings. This cuts down on verbosity a lot (for example, in Java, you would have to use x.getElementAt(i) while in C++ you can use x[i]; similarly, in Java you need x.compareTo(y)<0, while in C++ you can achieve the same thing using x < y). It is syntactic sugar... but it is very, very tasty.

Make my C++ Class iterable via BOOST_FOREACH

I have a class which I want to expose a list of structs (which just contain some integers).
I don't want the outside to modify these data, just iterate over it and read them
Example:
struct TestData
{
int x;
int y;
// other data as well
}
class IterableTest
{
public:
// expose TestData here
};
now in my code I want to use my class like this:
IterableTest test;
BOOST_FOREACH(const TestData& data, test.data())
{
// do something with data
}
I've already read this article http://accu.org/index.php/journals/1527 about memberspaces.
However, I don't want to (or can't) save all TestData in an internal vector or something.
This is because the class itself doesn't own the storage, i.e. there is actually no underlying container which can be accessed directly by the class. The class itself can query an external component to get the next, previous or ith element, though.
So basically I want my class to behave as if it had a collection, but in fact it doesn't have one.
Any ideas?
It sounds like you have to write your own iterators.
The Boost.Iterator library has a number of helpful templates. I've used their Iterator Facade base class a couple of times, and it's nice and easy to define your own iterators using it.
But even without it, iterators aren't rocket science. They just have to expose the right operators and typedefs. In your case, they're just going to be wrappers around the query function they have to call when they're incremented.
Once you have defined an iterator class, you just have to add begin() and end() member functions to your class.
It sounds like the basic idea is going to have to be to call your query function when the iterator is incremented, to get the next value.
And dereference should then return the value retrieved from the last query call.
It may help to take a look at the standard library stream_iterators for some of the semantics, since they also have to work around some fishy "we don't really have a container, and we can't create iterators pointing anywhere other than at the current stream position" issues.
For example, assuming you need to call a query() function which returns NULL when you've reached the end of the sequence, creating an "end-iterator" is going to be tricky. But really, all you need is to define equality so that "iterators are equal if they both store NULL as their cached value". So initialize the "end" iterator with NULL.
It may help to look up the required semantics for input iterators, or if you're reading the documentation for Boost.Iterator, for single-pass iterators specifically. You probably won't be able to create multipass iterators. So look up exactly what behavior is required for a single-pass iterator, and stick to that.
If your collection type presents a standard container interface, you don't need to do anything to make BOOST_FOREACH work with your type. In other words, if your type has iterator and const_iterator nested typedefs, and begin() and end() member functions, BOOST_FOREACH already knows how to iterate over your type. No further action is required.
http://boost-sandbox.sourceforge.net/libs/foreach/doc/html/boost_foreach/extending_boost_foreach.html
From the Boost FOR_EACH documentation page:
BOOST_FOREACH iterates over sequences. But what qualifies as a sequence, exactly? Since BOOST_FOREACH is built on top of Boost.Range, it automatically supports those types which Boost.Range recognizes as sequences. Specifically, BOOST_FOREACH works with types that satisfy the Single Pass Range Concept. For example, we can use BOOST_FOREACH with: