What are customization point objects and how to use them? - c++

The last draft of the c++ standard introduces the so-called "customization point objects" ([customization.point.object]),
which are widely used by the ranges library.
I seem to understand that they provide a way to write custom version of begin, swap, data, and the like, which are
found by the standard library by ADL. Is that correct?
How is this different from previous practice where a user defines an overload for e.g. begin for her type in her own
namespace? In particular, why are they objects?

What are customization point objects?
They are function object instances in namespace std that fulfill two objectives: first unconditionally trigger (conceptified) type requirements on the argument(s), then dispatch to the correct function in namespace std or via ADL.
In particular, why are they objects?
That's necessary to circumvent a second lookup phase that would directly bring in the user provided function via ADL (this should be postponed by design). See below for more details.
... and how to use them?
When developing an application: you mainly don't. This is a standard library feature, it will add concept checking to future customization points, hopefully resulting e.g. in clear error messages when you mess up template instantiations. However, with a qualified call to such a customization point, you can directly use it. Here's an example with an imaginary std::customization_point object that adheres to the design:
namespace a {
struct A {};
// Knows what to do with the argument, but doesn't check type requirements:
void customization_point(const A&);
}
// Does concept checking, then calls a::customization_point via ADL:
std::customization_point(a::A{});
This is currently not possible with e.g. std::swap, std::begin and the like.
Explanation (a summary of N4381)
Let me try to digest the proposal behind this section in the standard. There are two issues with "classical" customization points used by the standard library.
They are easy to get wrong. As an example, swapping objects in generic code is supposed to look like this
template<class T> void f(T& t1, T& t2)
{
using std::swap;
swap(t1, t2);
}
but making a qualified call to std::swap(t1, t2) instead is too simple - the user-provided
swap would never be called (see
N4381, Motivation and Scope)
More severely, there is no way to centralize (conceptified) constraints on types passed to such user provided functions (this is also why this topic gained importance with C++20). Again
from N4381:
Suppose that a future version of std::begin requires that its argument model a Range concept.
Adding such a constraint would have no effect on code that uses std::begin idiomatically:
using std::begin;
begin(a);
If the call to begin dispatches to a user-defined overload, then the constraint on std::begin
has been bypassed.
The solution that is described in the proposal mitigates both issues
by an approach like the following, imaginary implementation of std::begin.
namespace std {
namespace __detail {
/* Classical definitions of function templates "begin" for
raw arrays and ranges... */
struct __begin_fn {
/* Call operator template that performs concept checking and
* invokes begin(arg). This is the heart of the technique.
* Everyting from above is already in the __detail scope, but
* ADL is triggered, too. */
};
}
/* Thanks to #cpplearner for pointing out that the global
function object will be an inline variable: */
inline constexpr __detail::__begin_fn begin{};
}
First, a qualified call to e.g. std::begin(someObject) always detours via std::__detail::__begin_fn,
which is desired. For what happens with an unqualified call, I again refer to the original paper:
In the case that begin is called unqualified after bringing std::begin into scope, the situation
is different. In the first phase of lookup, the name begin will resolve to the global object
std::begin. Since lookup has found an object and not a function, the second phase of lookup is not
performed. In other words, if std::begin is an object, then using std::begin; begin(a); is
equivalent to std::begin(a); which, as we’ve already seen, does argument-dependent lookup on the
users’ behalf.
This way, concept checking can be performed within the function object in the std namespace,
before the ADL call to a user provided function is performed. There is no way to circumvent this.

"Customization point object" is a bit of a misnomer. Many - probably a majority - aren't actually customization points.
Things like ranges::begin, ranges::end, and ranges::swap are "true" CPOs. Calling one of those causes some complex metaprogramming to take place to figure out if there is a valid customized begin or end or swap to call, or if the default implementation should be used, or if the call should instead be ill-formed (in a SFINAE-friendly manner). Because a number of library concepts are defined in terms of CPO calls being valid (like Range and Swappable), correctly constrained generic code must use such CPOs. Of course, if you know the concrete type and another way to get an iterator out of it, feel free.
Things like ranges::cbegin are CPOs without the "CP" part. They always do the default thing, so it's not much of a customization point. Similarly, range adaptor objects are CPOs but there's nothing customizable about them. Classifying them as CPOs is more of a matter of consistency (for cbegin) or specification convenience (adaptors).
Finally, things like ranges::all_of are quasi-CPOs or niebloids. They are specified as function templates with special magical ADL-blocking properties and weasel wording to allow them to be implemented as function objects instead. This is primarily to prevent ADL picking up the unconstrained overload in namespace std when a constrained algorithm in std::ranges is called unqualified. Because the std::ranges algorithm accepts iterator-sentinel pairs, it's usually less specialized than its std counterpart and loses overload resolution as a result.

Related

Can Niebloids be passed where Callables is required?

Generally speaking, unless explicitly allowed, the behavior of a C++ program that tries to take the pointer of a standard library function is unspecified. Which means extra caution should be taken before passing them as Callable. Instead it is typically better to wrap them in a lambda.
More on the topic: Can I take the address of a function defined in standard library?
However, C++20 introduced Constrained algorithms, or ranged algorithms, based on the Range-v3 library; where function-like entities, such as std::ranges::sort and std::ranges::transform, are introduced as Niebloids.
While the original library has created a functor class for each functions in the algorithm library, and each niebloids, such as ranges::sort, is simply a named object of the corresponding functor class; the standard does not specify how they should be implemented.
So the question is if the behavior of passing a Niebloid as a Callable, such as std::invoke(std::ranges::sort, my_vec), specified/explicitly allowed?
All the spec says, in [algorithms.requirements] is:
The entities defined in the std​::​ranges namespace in this Clause are not found by argument-dependent name lookup ([basic.lookup.argdep]). When found by unqualified ([basic.lookup.unqual]) name lookup for the postfix-expression in a function call ([expr.call]), they inhibit argument-dependent name lookup.
The only way to implement that, today, is by making them objects. However, we don't specify any further behavior of those objects.
So this:
std::invoke(std::ranges::sort, my_vec)
will work, simply because that will simply evaluate as std::ranges::sort(my_vec) after taking a reference to it, and there's no way to really prevent that from working.
But other uses might not. For instance, std::views::transform(r, std::ranges::distance) is not specified to work, because we don't say whether std::ranges::distance is copyable or not - std::ranges::size is a customization point object, and thus copyable, but std::ranges::distance is just an algorithm.
The MSVC implementation tries to adhere aggressively to the limited specification, and its implementation of std::ranges::distance is not copyable. libstdc++, on the other hand, just makes them empty objects, so views::transform(ranges::distance) just works by way of being not actively rejected.
All of which to say is: once you get away from directly writing std::ranges::meow(r) (or otherwise writing meow(r) after a using or using namespace), you're kind of on your own.

Why does accumulate in C++ have two templates defined

Why does accumulate in C++ have two templates defined when the job can be done with just one template (the one with the binaryOperation and default value to sum)?
I am referring to the accumulate declaration from http://www.cplusplus.com/reference/numeric/accumulate/
Because that's how the standard has been specified.
It is often a matter of taste whether to use an overload or a default argument. In this case, overload was chosen (by committee, by Alexander Stepanov, or by whoever happened to be responsible for the choice).
Default values are more limited than overloads. For example, you can have a function pointer T (*)(InputIterator, InputIterator, T) pointing to the first overload, which would not be possible if there was only one function (template) with 4 arguments. This flexibility can be used as an argument for using overloads rather than default arguments when possible.
It's true you would get mostly the same behavior from a single template like
template <class InputIt, class T, class BinaryOperation = std::plus<>>
accumulate(InputIt first, InputIt last, T init, BinaryOperation op = {});
But note that in earlier versions of C++, this would be difficult or impossible:
Prior to C++11, a function template could not have default template arguments.
Prior to C++14, std::plus<> (which is the same as std::plus<void>) was not valid: the class template could only be instantiated with one specific argument type.
The accumulate template is even older than the first C++ Standard of 1998: it goes back to the SGI STL library. At that time, compiler support for templates was rather inconsistent, so it was advisable to keep templates as simple as possible.
So the original two declarations were kept. As noted in bobah's answer, combining them into one declaration could break existing code, since for example code might be using a function pointer to an instantiation of the three-argument version (and function pointers cannot represent a default function argument, whether the function is from a template or not).
Sometimes the Standard library will add additional overloads to an existing function, but usually only for a specific purpose that would improve the interface, and when possible without breaking old code. There hasn't been any such reason for std::accumulate.
(But note member functions in the standard library can change more often than non-member functions like std::accumulate. The Standard gives implementations permission to declare member functions with different overloads, default arguments, etc. than specified as long as the effects are as described. This means it's generally a bad idea to take pointers to member functions to standard library class members, or otherwise assume very specific declarations, in the first place.)
The motivtion for the 2 functions is the same reason that we have both a copy and a transform function, to give the coder the flexability to apply a function on a per element basis. But perhaps some real world code would be helpful in understanding where this would be used. I've used both these snipits professionally in coding:
The 1st instance of accumulate can be used to sum the elements of a range. For example, given const int input[] = { 13, 42 } I can do this to get the sum of all elements in input:
accumulate(cbegin(input), cend(input), 0) /* Returns 55 */
I personally most commonly use the 2nd instance to generate strings (because it's the closest thing c++ has to a join) but it can also be used when special preprocessing is needed before the element is added. For example:
accumulate(next(cbegin(input)), cend(input), to_string(front(input)), [](const auto& current_sum, const auto i){ return current_sum + ", " + to_string(i); }) /* Returns "13, 42"s */
It's worth noting P0616R0 when considering my use of the 2nd function. This proposal has been accepted into c++20 and will move rather than copy the first parameter to accumulate's functor, which, "Can lead to massive improvements (particularly, it
means accumulating strings is linear rather than quadratic)."

ADL and container functions (begin, end, etc)

C++11 and later define free functions begin, end, empty, etc in namespace std. For most containers these functions invoke the corresponding member function. But for some containers (like valarray) these free functions are overloaded (initializer_list does not have a member begin()). So to iterate over any container free functions should be used and to find functions for container from namespaces other than std ADL should be used:
template<typename C>
void foo(C c)
{
using std::begin;
using std::end;
using std::empty;
if (empty(c)) throw empty_container();
for (auto i = begin(c); i != end(c); ++i) { /* do something */ }
}
Question 1: Am I correct? Are begin and end expected to be found via ADL?
But ADL rules specify that if type of an argument is a class template specialization ADL includes namespaces of all template arguments. And then Boost.Range library comes into play, it defines boost::begin, boost::end, etc. These functions are defined like this:
template< class T >
inline BOOST_DEDUCED_TYPENAME range_iterator<T>::type begin( T& r )
{
return range_begin( r );
}
If I use std::vector<boost::any> and a Boost.Range I run into trouble. std::begin and boost::begin overloads are ambiguous. That it, I can not write template code that will find a free begin via ADL. If I explicitly use std::begin I expect that any non-std:: container has a member begin.
Question 2: What shall I do in this case?
Rely on the presence of member function? Simplest way.
Ban Boost.Range? Well, algorithms that take container instead of a pair of iterators are convinient. Boost.Range adaptors (containers that lazily apply an algorithm to a container) are also convinient. But if I do not use Boost.Range in my code it still can be used in a boost library (other than Range). This make template code really fragile.
Ban Boost?
A few years ago, I had a similar problem where my code suddenly started getting ambiguities between std::begin and boost::begin. I found it was due to using Boost.Operator to aid in defining a class, even though it was not even a public base class or apparent to the user of the types involved. A random change somewhere caused #include <boost/range/begin.hpp> to be present in the nested include files somewhere, thus making boost::begin visible to the compiler.
I complained to the mailing list about the putting of classes directly in the Boost namespace, rather than in a nested class and exposing them via using declarations; all stuff defined directly in the Boost namespace could potentially step on each other via accidental ADL.
I just tried to reproduce this today, and it seems quite resilient against such ambiguities now! Looking at the definition, boost::begin is itself in an inner namespace so it can never be found via unqualified lookup if you had not supplied your own using boost::begin; in your own scope.
I don’t know how long ago this fix took place. (If you can still reproduce it, please post a complete program with version and platform details.)
So your answer is:
For Boost, don’t worry about it anymore (upgrade Boost if necessary).
For new code, never define a free function named begin in the same namespace as any of its defined types.

STL Extension/Modification Best Practice

I have been writing in c++ for a few months, and i am comfortable enough with it now to begin implementing my own library, consisting of things that i have found myself reusing again and again. One thing that nagged me was the fact that you always had to provide a beginning and end iterator for functions like std::accumulate,std::fill etc...
The option to provide a qualified container was completely absent and it was simply an annoyance to write begin and end over and over. So, I decided to add this functionality to my library, but i came across problem, i couldn't figure out the best approach of doing so. Here were my general solutions:
1. Macros
- A macro that encapsulates an entire function call
ex. QUICK_STL(FCall)
- A macro that takes the container, function name, and optional args
ex. QUICK_STL(C,F,Args...)
2. Wrapper Function/Functor
- A class that takes the container, function name, and optional args
ex. quick_stl(F, C, Args...)
3. Overload Functions
- Overload every function in namespace std OR my library namespace
ex
namespace std { // or my library root namespace 'cherry'
template <typename C, typename T>
decltype(auto) count(const C& container, const T& value);
}
I usually steer clear of macros, but in this case it could certainty save alot
of lines of code from being written. With regards to function overloading, every single function that i want to use i must overload, which wouldn't really scale. The upside to that approach though is that you retain the names of the functions. With perfect forwarding and decltype(auto) overloading becomes alot easier, but still will take time to implement, and would have to be modified if ever another function was added. As to whether or not i should overload the std namespace i am rather skeptical on whether or not it would be appropriate in this case.
What would be the most appropriate way of going about overloading functions in the STD namespace (note these functions will only serve as proxy's to the original functions)?
You need to read this: Why do all functions take only ranges, not containers?
And This: STL algorithms: Why no additional interface for containers (additional to iterator pairs)?
I have been writing in c++ for a few months, and i am comfortable
enough with it now to begin implementing my own library...
Let me look on the brighter side and just say... Some of us have been there before.... :-)
One thing that nagged me was the fact that you always had to provide a
beginning and end iterator for functions like
std::accumulate,std::fill etc...
That's why you have Boost.Ranges and the Eric's proposed ranges that seems like it isn't gonna make it to C++17.
Macros
See Macros
Wrapper Function/Functor
Not too bad...Provided you do it correctly, You can do that, that's what essentially Ranges do for Containers... See the aforementioned implementations
Overload Functions
Overload every function in namespace std ...
Don't do that... The C++ standard doesn't like it.
See what the standard has to say
$17.6.4.2.1 The behavior of a C++ program is undefined if it adds declarations or definitions to namespace std or to a namespace within
namespace std unless otherwise specified. A program may add a template
specialization for any standard library template to namespace std only
if the declaration depends on a user-defined type and the
specialization meets the standard library requirements for the
original template and is not explicitly prohibited.

How does "using std::swap" enable ADL?

In What is the copy-and-swap idiom this example is shown:
friend void swap(dumb_array& first, dumb_array& second) // nothrow
{
// enable ADL (not necessary in our case, but good practice)
using std::swap;
// by swapping the members of two classes,
// the two classes are effectively swapped
swap(first.mSize, second.mSize);
swap(first.mArray, second.mArray);
}
How exactly does using std::swap enable ADL? ADL only requires an unqualified name. The only benefits I see for using std::swap is that since std::swap is a function template you can use a template argument list in the call (swap<int, int>(..)).
If that is not the case then what is using std::swap for?
Just wanted to add why this idiom is used at all, which seemed like the spirit of the original question.
This idiom is used within many std library classes where swap is implemented. From http://www.cplusplus.com/reference/algorithm/swap/:
Many components of the standard library (within std) call swap in an
unqualified manner to allow custom overloads for non-fundamental types
to be called instead of this generic version: Custom overloads of swap
declared in the same namespace as the type for which they are provided
get selected through argument-dependent lookup over this generic
version.
So the purpose of using an unqualified "swap" to swap member variables in the function you described is so that ADL can find customized swap functions for those classes (if they exist elsewhere).
Since these customized classes don't exist within the class you're referencing (mSize and mArray is a std::size_t and an int*, respectively, in the original example), and the std::swap works just fine, the author added a comment that this was not necessary in this case, but good practice. He would have gotten the same results had he explicitly called std::swap, as is pointed out in the previous answer.
Why is it good practice? Because if you have as members instances of classes for which custom swap is defined, you want the behavior to be this: check for a customized swap function...if it exists, use it, if it does not exist, use the std library functions. In the cases where there are no customized swap functions available, you want it to default to the simple std::swap implementation described in the link above. Hence the "using", to bring swap for built-in types into the namespace. But those will be tried last.
See also: https://stackoverflow.com/a/2684544/2012659
If for some reason you hate the "using std::swap", I suppose you could in theory resolve this manually by explicitly calling std::swap for everything you'd want to swap using std::swap and using the unqualified swap for every custom swap you know is defined (still found using ADL). But this is error prone ... if you didn't author those classes you may not know if a customized swap exists for it. And switching between std::swap and swap makes for confusing code. Better to let the compiler handle all of this.
The "enable ADL" comment applies to the transformation of
std::swap(first.mSize, second.mSize);
std::swap(first.mArray, second.mArray);
to
using std::swap;
swap(first.mSize, second.mSize);
swap(first.mArray, second.mArray);
You're right, ADL only requires an unqualified name, but this is how the code is re-worked to use an unqualified name.
Just plain
swap(first.mSize, second.mSize);
swap(first.mArray, second.mArray);
wouldn't work, because for many types, ADL won't find std::swap, and no other usable swap implementation is in scope.