Overloading std::begin() and std::end() for non-arrays

Overloading std::begin() and std::end() for non-arrays - c++

Let's assume I have the following Data class:
struct Data {
char foo[8];
char bar;
};
and the following function, my_algorithm, which takes a pair of char * (similar to an STL algorithm):
void my_algorithm(char *first, char *last);
For Data's foo data member, instead of calling my_algorithm() like this:
Data data;
my_algorithm(data.foo, data.foo + 8);
I can use the std::begin() and std::end() convenience function templates:
my_algorithm(std::begin(data.foo), std::end(data.foo));
I would like to achieve something similar to Data's bar data member. That is, instead of writing:
my_algorithm(&data.bar, &data.bar + 1);
I would like to write something like:
my_algorithm(begin(data.bar), end(data.bar));
Therefore, I've defined the two following ordinary (non-template) functions for this case:
char* begin(char& c) { return &c; }
char* end(char& c) { return &c + 1; }
So that I would be able to write code like the following:
Data data;
using std::begin;
using std::end;
my_algorithm(begin(data.foo), end(data.foo)); // ok - std::begin()/std::end()
my_algorithm(begin(data.bar), end(data.bar)); // Error!!!
With the using declarations above I would have expected std::begin()/std::end() and ::begin()/::end() to be in the same overload set, respectively. Since the functions ::begin() and ::end() are a perfect match for the latter call and they are not templates, I would have expected the last call to my_algorithm() to match them. However, the ordinary functions are not considered at all. As a result the compilation fails, because std::begin() and std::end() are not matches for the call.
Basically, the latter call acts as if I had written instead:
my_algorithm(begin<>(data.bar), end<>(data.bar));
That is, only the function templates (i.e., std::begin()/std::end()) are considered by the overload resolution process, not the ordinary functions (i.e., not ::begin()/::end()).
It only works as expected, if I fully qualify the calls to ::begin()/::end():
my_algorithm(::begin(data.bar), ::end(data.bar));
What am I missing here?

Let's get a complete, reproducible example:
#include <iterator>
char* begin(char& c) { return &c; }
char* end(char& c) { return &c + 1; }
namespace ns {
void my_algorithm(char *first, char *last);
void my_function() {
using std::begin;
using std::end;
char c = '0';
my_algorithm(begin(c), end(c));
}
}
When you make the unqualified call to begin(c) and end(c), the compiler goes through the process of unqualified name lookup (described on the Argument-dependent lookup page of cppreference).
For regular unqualified name lookup, the process is roughly to start at the namespace you are currently in—::ns in this case—and only move out a namespace if you don't find the specific name.
If a function call is unqualified, as it is here with begin(c) and end(c), argument dependent lookup can occur, which finds free functions declared in the same namespace as the types of the functions' arguments, through the process of extending the overload set by finding "associated namespaces."
In this case, however, char is a fundamental type, so argument dependent lookup doesn't allow us to find the global ::begin and ::end functions.
For arguments of fundamental type, the associated set of namespaces and classes is empty
cppreference: argument dependent lookup
Instead, as we already have using std::begin; using std::end;, the compiler already sees possible functions for begin(...) and end(...)—namely those defined in namespace ::std—without having to move out a namespace from ::ns to ::. Thus, the compiler uses those functions, and compilation fails.
It's worth noting that the using std::begin; using std::end; also block the compiler from finding the custom ::begin and ::end even if you were to place them inside ::ns.
What you can do instead is write your own begin and end:
#include <iterator>
namespace ns {
char* begin(char& c) { return &c; }
char* end(char& c) { return &c + 1; }
template <typename T>
auto begin(T&& t) {
using std::begin;
// Not unbounded recursion if there's no `std::begin(t)`
// or ADL `begin(t)`, for the same reason that our
// char* begin(char& c); overload isn't found with
// using std::begin; begin(c);
return begin(t);
}
template <typename T>
auto end(T&& t) {
using std::end;
return end(t);
}
void my_algorithm(char *first, char *last);
void my_function() {
char c = '0';
my_algorithm(ns::begin(c), ns::end(c));
}
}

The title of question is "Overloading std::begin()". Overloading is possible only within the same scope. That is you can't overload names from different scopes. In another scope we can only make efforts to help lookup name. Essentially, here "using std::begin" declaration hides ::begin in question's code. See S.Lippman for reference:
functions that are members of two distinct namespaces do not overload one another.
Scope of a using Declaration. Names introduced in a using declaration obey normal scope rules.
Entities with the same name defined in an outer scope are hidden.
As soon as parameter is char and char is fundamental type - argument dependent lookup should not be taken into consideration - as mentioned in comments - there is no associated namespace with fundamental types.
Again, the question was: "What am I missing?" - therefore the answer is focused only on reasons - recommendations may be too broad.

Related

ADL and typedefs

In short, I am trying to understand the behavior of Argument-Dependent Lookup in C++. Some statements in ISO/IEC 14882:2017 (E) regarding ADL are not clear to me. I hope somebody would clarify them to me.
According to standard,
Typedef names and using-declarations used to specify the types do not contribute to this set.
and
When considering an associated namespace, the lookup is the same as the lookup performed when the
associated namespace is used as a qualifier (6.4.3.2) except that:
Any using-directive s in the associated namespace are ignored...
For me, these statements imply that ADL should completely ignore any typedef, or using occurrence. Probably, this is not the case. Consider the following example:
#include <iostream>
using namespace std;
namespace N2
{
struct B {};
template <typename T>
void func (const T&) {cout << __PRETTY_FUNCTION__ << endl;}
};
namespace N
{
typedef N2::B C;
}
void tfunc (N::C) {}
int main ()
{
func(tfunc);
}
It works, i.e., the compiler is able to find func. So, what those quotes from the standard actually mean?

This answer is provided by #IgorTandetnik.
What the standard is saying is that N2 is among the associated namespaces of tfunc, but N is not. In other words, void tfunc (N::C) works exactly the same as void tfunc(N2::B). If, in your example, you move func to N, it won't be found, despite the fact that, on the surface, the declaration of tfunc mentions N.

Actually, if you just read a bit further down the standard, you can see why your code works. Specifically, N3337 [basic.lookup.argdep]/2 as shown below [emphasis mine]:
The sets of namespaces and classes are determined in the following way:
...
...
— If T is a function type, its associated namespaces and classes are those associated with the function
parameter types and those associated with the return type.
As you can see, if the argument passed is a function type (tfunc in your case), types associated with all the parameters (N2::B is the only type), as well as the return type (void - fundamental type, so not counted) are considered to create a set of associated namespaces.

Is it legit to define std::begin for const char*?

I have a function for case insensitive comparison of strings which uses std::lexicographical_compare with custom comparator.
However i would like to be able to compare strings, string_views and const char* between each other, for maximum convenience and efficiency.
So i was thinking: What if i make a template, std::string has begin/end, std::string_view has begin/end, ... but const char* doesn't, not even in a form of non-member function.
So it is ok to define own begin/end overloads like this
namespace std {
const char * begin(const char* str) { return str; }
const char * end(const char* str) { return str + strlen(str); }
}
so that then i can compare everything with everything by
std::lexicographical_compare(std::begin(a), std::end(a), std::begin(b), std::end(b), icomp );
?
If not, how else could i solve my problem?

No, this is not legal, because const char * is not a user-defined type.
The behavior of a C++ program is undefined if it adds declarations or
definitions to namespace std or to a namespace within namespace std
unless otherwise specified. A program may add a template
specialization for any standard library template to namespace std only
if the declaration depends on a user-defined type and the
specialization meets the standard library requirements for the
original template and is not explicitly prohibited
[namespace.std/1]
You can instead declare those in some other namespace, such as ::
const char * begin(const char* str) { return str; }
const char * end(const char* str) { return str + strlen(str); }
And use them with unqualified calls
std::lexicographical_compare(begin(a), end(a), begin(b), end(b), icomp );
Additionally, in C++20, it will be even more restrictive, permitting only class templates specialisations for program-defined types
Unless otherwise specified, the behavior of a C++ program is undefined
if it adds declarations or definitions to namespace std or to a
namespace within namespace std.
Unless explicitly prohibited, a program may add a template
specialization for any standard library class template to namespace
std provided that (a) the added declaration depends on at least one
program-defined type and (b) the specialization meets the standard
library requirements for the original template.
[namespace.std]

extending c++ math functions to non fundamental types

I want to extend? or overload?(not too sure what to call it) the sqrt() function used by double and other fundamental types so the it can be used by my own class. Here called "myType". I will code the function sqrt() for when its argument is a myType. I want sqrt() to stay intact when it is used for the fundamental types. So that I can write one template that covers both cases.
For example. The key thing for below is to be able to use bar() for both fundamental and myType. Not foo() for myType and bar() for the fundamental types. Can this be done cleanly? Thanks in advance for any help.
#include <math.h>
using namespace std;
class myType
{
public:
myType() {
}
double sqrt()
{
return 4;//just to return something
}
};
template<typename T> bool bar(T in)
{
if (sqrt(in) == 4) {// handels int and all sorts of other types but not my type
return true;
}
return false;
}
template<typename T> bool foo(T in)
{
if (in.sqrt() == 4) { //handles myType
return true;
}
return false;
}
int main() {
double y = 3;
bar(y);//This is great
myType x;
//bar(x);//this is the line I want to write
foo(x);//stuck doing this
}

Firstly, there is argument-dependent lookup (ADL, a.k.a. König Lookup) which searches for the function foo() in the namespace of its argument. So, if you define your class in namespace my, you can define a function sqrt() in that namespace and it will automatically be found even if called in an unqualified context.
Secondly, in order to make your template functions generic, you need to enable ADL. However, it still won't look inside namespace std if you use this with fundamental types. Therefore, give the compiler a hint that std::sqrt may be used in the function.
Example (sketch):
namespace my {
struct number;
number sqrt(number const&);
}
template<typename scalar>
scalar
sym_sqrt(scalar const& s) {
using std::sqrt;
if (s < 0) {
return -sqrt(-s);
else
return sqrt(s);
}
int main() {
float m = ...;
std::cout << sym_sqrt(m);
my::number n = ...;
std::cout << sym_sqrt(n);
}
Notes
I did not using namespace std;. This should neither be necessary and in particular not be required by the users of your class. Get rid of this habit now!
Concerning the wording, you can't extend a function, but you add overloads to the existing overloads.
If you use an ancient compiler, you might have issues with sqrt being a macro. Just upgrade the compiler then.
Note that sqrt() in your attempt is defined as a member function, which is a whole different beast. Keep it as free function similar to the existing versions. In general, don't blindly put things into classes. There are good reasons to use plain functions. In particular the math functions that represent functions in the mathematical sense (no side-effects, no external dependencies) are actually well placed into functions.
std::sqrt() is already overloaded for non-fundamental types, namely for the different std::complex<T> types.
if (X) {return true;} else {return false}; can be written much smaller as return X;.

I don't think you can do anything to make std::sqrt work with your user-defined types (unless they can be converted to a basic data type using a conversion operator). But you could write your own function that redirects to std::sqrt by default, and specialize it for your user-defined types. Then use that function in your code instead of std::sqrt.

Why is ADL not working with Boost.Range?

Considering:
#include <cassert>
#include <boost/range/irange.hpp>
#include <boost/range/algorithm.hpp>
int main() {
auto range = boost::irange(1, 4);
assert(boost::find(range, 4) == end(range));
}
Live Clang demo
Live GCC demo
this gives:
main.cpp:8:37: error: use of undeclared identifier 'end'
Considering that if you write using boost::end; it works just fine, which implies that boost::end is visible:
Why is ADL not working and finding boost::end in the expression end(range)? And if it's intentional, what's the rationale behind it?
To be clear, the expected result would be similar to what happens in this example using std::find_if and unqualified end(vec).

Historical background
The underlying reason is discussed in this closed Boost ticket
With the following code, compiler will complain that no begin/end is
found for "range_2" which is integer range. I guess that integer range
is missing ADL compatibility ?
#include <vector>
#include <boost/range/iterator_range.hpp>
#include <boost/range/irange.hpp>
int main() {
std::vector<int> v;
auto range_1 = boost::make_iterator_range(v);
auto range_2 = boost::irange(0, 1);
begin(range_1); // found by ADL
end(range_1); // found by ADL
begin(range_2); // not found by ADL
end(range_2); // not found by ADL
return 0;
}
boost::begin() and boost::end() are not meant to be found by ADL. In
fact, Boost.Range specifically takes precautions to prevent
boost::begin() and boost::end() from being found by ADL, by declaring
them in the namespace boost::range_adl_barrier and then exporting them
into the namespace boost from there. (This technique is called an "ADL
barrier").
In the case of your range_1, the reason unqualified begin() and end()
calls work is because ADL looks not only at the namespace a template
was declared in, but the namespaces the template arguments were
declared in as well. In this case, the type of range_1 is
boost::iterator_range<std::vector<int>::iterator>. The template
argument is in namespace std (on most implementations), so ADL finds
std::begin() and std::end() (which, unlike boost::begin() and
boost::end(), do not use an ADL barrier to prevent being found by
ADL).
To get your code to compile, simply add "using boost::begin;" and
"using boost::end;", or explicitly qualify your begin()/end() calls
with "boost::".
Extended code example illustrating the dangers of ADL
The danger of ADL from unqualified calls to begin and end is two-fold:
the set of associated namespaces can be much larger than one expects. E.g. in begin(x), if x has (possibly defaulted!) template parameters, or hidden base classes in its implementation, the associated namespaces of the template parameters and of its base classes are also considered by ADL. Each of those associated namespace can lead to many overloads of begin and end being pulled in during argument dependent lookup.
unconstrained templates cannot be distinguished during overload resolution. E.g. in namespace std, the begin and end function templates are not separately overloaded for each container, or otherwise constrained on the signature of the container being supplied. When another namespace (such as boost) also supplies similarly unconstrained function templates, overload resolution will consider both an equal match, and an error occurs.
The following code samples illustrate the above points.
A small container library
The first ingredient is to have a container class template, nicely wrapped in its own namespace, with an iterator that derives from std::iterator, and with generic and unconstrained function templates begin and end.
#include <iostream>
#include <iterator>
namespace C {
template<class T, int N>
struct Container
{
T data[N];
using value_type = T;
struct Iterator : public std::iterator<std::forward_iterator_tag, T>
{
T* value;
Iterator(T* v) : value{v} {}
operator T*() { return value; }
auto& operator++() { ++value; return *this; }
};
auto begin() { return Iterator{data}; }
auto end() { return Iterator{data+N}; }
};
template<class Cont>
auto begin(Cont& c) -> decltype(c.begin()) { return c.begin(); }
template<class Cont>
auto end(Cont& c) -> decltype(c.end()) { return c.end(); }
} // C
A small range library
The second ingredient is to have a range library, also wrapped in its own namespace, with another set of unconstrained function templates begin and end.
namespace R {
template<class It>
struct IteratorRange
{
It first, second;
auto begin() { return first; }
auto end() { return second; }
};
template<class It>
auto make_range(It first, It last)
-> IteratorRange<It>
{
return { first, last };
}
template<class Rng>
auto begin(Rng& rng) -> decltype(rng.begin()) { return rng.begin(); }
template<class Rng>
auto end(Rng& rng) -> decltype(rng.end()) { return rng.end(); }
} // R
Overload resolution ambiguity through ADL
Trouble begins when one tries to make an iterator range into a container, while iterating with unqualified begin and end:
int main()
{
C::Container<int, 4> arr = {{ 1, 2, 3, 4 }};
auto rng = R::make_range(arr.begin(), arr.end());
for (auto it = begin(rng), e = end(rng); it != e; ++it)
std::cout << *it;
}
Live Example
Argument-dependent name lookup on rng will find 3 overloads for both begin and end: from namespace R (because rng lives there), from namespace C (because the rng template parameter Container<int, 4>::Iterator lives there), and from namespace std (because the iterator is derived from std::iterator). Overload resolution will then consider all 3 overloads an equal match and this results in a hard error.
Boost solves this by putting boost::begin and boost::end in an inner namespace and pulling them into the enclosing boost namespace by using directives. An alternative, and IMO more direct way, would be to ADL-protect the types (not the functions), so in this case, the Container and IteratorRange class templates.
Live Example With ADL barriers
Protecting your own code may not be enough
Funny enough, ADL-protecting Container and IteratorRange would -in this particular case- be enough to let the above code run without error because std::begin and std::end would be called because std::iterator is not ADL-protected. This is very surprising and fragile. E.g. if the implementation of C::Container::Iterator no longer derives from std::iterator, the code would stop compiling. It is therefore preferable to use qualified calls R::begin and R::end on any range from namespace R in order to be protected from such underhanded name-hijacking.
Note also that the range-for used to have the above semantics (doing ADL with at least std as an associated namespace). This was discussed in N3257 which led to semantic changes in range-for. The current range-for first looks for member functions begin and end, so that std::begin and std::end will not be considered, regardless of ADL-barriers and inheritance from std::iterator.
int main()
{
C::Container<int, 4> arr = {{ 1, 2, 3, 4 }};
auto rng = R::make_range(arr.begin(), arr.end());
for (auto e : rng)
std::cout << e;
}
Live Example

In boost/range/end.hpp they explicitly block ADL by putting end in a range_adl_barrier namespace, then using namespace range_adl_barrier; to bring it into the boost namespace.
As end is not actually from ::boost, but rather from ::boost::range_adl_barrier, it is not found by ADL.
Their reasoning is described in boost/range/begin.hpp:
// Use a ADL namespace barrier to avoid ambiguity with other unqualified
// calls. This is particularly important with C++0x encouraging
// unqualified calls to begin/end.
no examples are given of where this causes a problem, so I can only theorize what they are talking about.
Here is an example I have invented of how ADL can cause ambiguity:
namespace foo {
template<class T>
void begin(T const&) {}
}
namespace bar {
template<class T>
void begin(T const&) {}
struct bar_type {};
}
int main() {
using foo::begin;
begin( bar::bar_type{} );
}
live example. Both foo::begin and bar::begin are equally valid functions to call for the begin( bar::bar_type{} ) in that context.
This could be what they are talking about. Their boost::begin and std::begin might be equally valid in a context where you have using std::begin on a type from boost. By putting it in a sub-namespace of boost, std::begin gets called (and works on ranges, naturally).
If the begin in the namespace boost had been less generic, it would be preferred, but that isn't how they wrote it.

That's because boost::end is inside an ADL barrier, which is then pulled in boost at the end of the file.
However, from cppreference's page on ADL (sorry, I don't have a C++ draft handy):
1) using-directives in the associated namespaces are ignored
That prevents it from being included in ADL.

Ambiguous call to templated function due to ADL

I've been bitten by this problem a couple of times and so have my colleagues. When compiling
#include <deque>
#include <boost/algorithm/string/find.hpp>
#include <boost/operators.hpp>
template< class Rng, class T >
typename boost::range_iterator<Rng>::type find( Rng& rng, T const& t ) {
return std::find( boost::begin(rng), boost::end(rng), t );
}
struct STest {
bool operator==(STest const& test) const { return true; }
};
struct STest2 : boost::equality_comparable<STest2> {
bool operator==(STest2 const& test) const { return true; }
};
void main() {
std::deque<STest> deq;
find( deq, STest() ); // works
find( deq, STest2() ); // C2668: 'find' : ambiguous call to overloaded function
}
...the VS9 compiler fails when compiling the second find. This is due to the fact that STest2 inherits from a type that is defined in boost namespace which triggers the compiler to try ADL which finds boost::algorithm::find(RangeT& Input, const FinderT& Finder).
An obvious solution is to prefix the call to find(…) with "::" but why is this necessary? There is a perfectly valid match in the global namespace, so why invoke Argument-Dependent Lookup? Can anybody explain the rationale here?

ADL isn't a fallback mechanism to use when "normal" overload resolution fails, functions found by ADL are just as viable as functions found by normal lookup.
If ADL was a fallback solution then you might easily fall into the trap were a function was used even when there was another function that was a better match but only visible via ADL. This would seem especially strange in the case of (for example) operator overloads. You wouldn't want two objects to be compared via an operator== for types that they could be implicitly converted to when there exists a perfectly good operator== in the appropriate namespace.

I'll add the obvious answer myself because I just did some research on this problem:
C++03 3.4.2
§2 For each argument type T in the function call, there is a set of zero or more associated namespaces [...] The sets of namespaces and classes are determined in the following way:
[...]
— If T is a class type (including unions), its associated classes are: the class itself; the class of which it is a
member, if any; and its direct and indirect base classes. Its associated namespaces are the namespaces
in which its associated classes are defined.
§ 2a If the ordinary unqualified lookup of the name finds the declaration of a class member function, the associated
namespaces and classes are not considered. Otherwise the set of declarations found by the lookup of
the function name is the union of the set of declarations found using ordinary unqualified lookup and the set
of declarations found in the namespaces and classes associated with the argument types.
At least it's standard conformant, but I still don't understand the rationale here.

Consider a mystream which inherits from std::ostream. You would like that your type would support all the << operators that are defined for std::ostream normally in the std namespace. So base classes are associated classes for ADL.
I think this also follows from the substitution principle - and functions in a class' namespace are considered part of its interface (see Herb Sutter's "What's in a class?"). So an interface that works on the base class should remain working on a derived class.
You can also work around this by disabling ADL:
(find)( deq, STest2() );

I think you stated the problem yourself:
in the global namespace
Functions in the global namespace are considered last. It's the most outer scope by definition. Any function with the same name (not necessarily applicable) that is found in a closer scope (from the call point of view) will be picked up first.
template <typename Rng, typename T>
typename Rng::iterator find( Rng& rng, T const& t );
namespace foo
{
bool find(std::vector<int> const& v, int);
void method()
{
std::deque<std::string> deque;
auto it = find(deque, "bar");
}
}
Here (unless vector or deque include algorithm, which is allowed), the only method that will be picked up during name look-up will be:
bool foo::find(std::vector<int> const&, int);
If algorithm is somehow included, there will also be:
template <typename FwdIt>
FwdIt std::find(FwdIt begin, FwdIt end,
typename std::iterator_traits<FwdIt>::value_type const& value);
And of course, overload resolution will fail stating that there is no match.
Note that name-lookup is extremely dumb: neither arity nor argument type are considered!
Therefore, there are only two kinds of free-functions that you should use in C++:
Those which are part of the interface of a class, declared in the same namespace, picked up by ADL
Those which are not, and that you should explicitly qualified to avoid issues of this type
If you fall out of these rules, it might work, or not, depending on what's included, and that's very awkward.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Overloading std::begin() and std::end() for non-arrays - c++

Related

ADL and typedefs

Is it legit to define std::begin for const char*?

extending c++ math functions to non fundamental types

Why is ADL not working with Boost.Range?

Ambiguous call to templated function due to ADL

Categories

Resources