Will specialization of function templates in std for program-defined types no longer be allowed in C++20? - c++

Quote from cppreference.com:
Adding template specializations
It is allowed to add template specializations for any standard library |class (since C++20)| template to the namespace std only if the declaration depends on at least one program-defined type and the specialization satisfies all requirements for the original template, except where such specializations are prohibited.
Does it mean, that starting from C++20, adding specializations of function templates to the std namespace for user-defined types will be no longer allowed? If so, it implies that many pieces of existing code can break, doesn't it? (It seems to me to be kind-of a "radical" change.) Moreover, it will inject into such codes undefined behavior, which will not trigger compilations errors (warnings hopefully will).

As it stands now it definitly looks that way. Previously [namespace.std] contained
A program may add a template specialization for any standard library template to namespace std only if the declaration depends on a user-defined type and the specialization meets the standard library requirements for the original template and is not explicitly prohibited.
While the current draft states
Unless explicitly prohibited, a program may add a template specialization for any standard library class template to namespace std provided that (a) the added declaration depends on at least one program-defined type and (b) the specialization meets the standard library requirements for the original template.
emphasis mine
And it looks like the paper Thou Shalt Not Specialize std Function Templates! by Walter E. Brown is responsible for it. In it he details an number of reason why this should be changed such as:
Herb Sutter: “specializations don’t participate in overloading. [...] If you want to customize a function base template and want that
customization to participate in overload resolution (or, to always be
used in the case of exact match), make it a plain old function, not a
specialization. And, if you do provide overloads, avoid also providing
specializations.”
David Abrahams: “it’s wrong to use function template specialization [because] it interacts in bad ways with overloads. [...] For example,
if you specialize the regular std::swap for std::vector<mytype>&,
your specialization won’t get chosen over the standard’s vector
specific swap, because specializations aren’t considered during
overload resolution.”
Howard Hinnant: “this issue has been settled for a long time. . . . Disregard Dave’s expert opinion/answer in this area at your own
peril.”
Eric Niebler: “[because of] the decidedly wonky way C++ resolves function calls in templates. . . , [w]e make an unqualified call to
swap in order to find an overload that might be defined in [...]
associated namespaces[...] , and we do using std::swap so that, on
the off-chance that there is no such overload, we find the default
version defined in the std namespace.”
High Integrity C++ Coding Standard: “Overload resolution does not take into account explicit specializations of function templates. Only
after overload resolution has chosen a function template will any
explicit specializations be considered.”

Not really that radical. This change is based on this paper from Walter E. Brown. The paper goes into rationale rather deeply, but ultimately it boils down to this:
Specialization of function templates is rather poor as a customization point. Overloading and ADL are much better in that regard. There are other customization points discussed in the paper as well.
The standard library doesn't rely on this poor customization point too much already.
The wording change that's put into place actually permits adding entire declarations to namespace std (not just specializations) where it's explicitly permitted. So now there are better customization points.
Given #1 and #2, it's rather unlikely existing code will break. Or at least, not enough for this to be a major problem. Code that used auto and register also "broke" in the past, but that minuscule amount of C++ code didn't stop progress.

Related

Can Niebloids be passed where Callables is required?

Generally speaking, unless explicitly allowed, the behavior of a C++ program that tries to take the pointer of a standard library function is unspecified. Which means extra caution should be taken before passing them as Callable. Instead it is typically better to wrap them in a lambda.
More on the topic: Can I take the address of a function defined in standard library?
However, C++20 introduced Constrained algorithms, or ranged algorithms, based on the Range-v3 library; where function-like entities, such as std::ranges::sort and std::ranges::transform, are introduced as Niebloids.
While the original library has created a functor class for each functions in the algorithm library, and each niebloids, such as ranges::sort, is simply a named object of the corresponding functor class; the standard does not specify how they should be implemented.
So the question is if the behavior of passing a Niebloid as a Callable, such as std::invoke(std::ranges::sort, my_vec), specified/explicitly allowed?
All the spec says, in [algorithms.requirements] is:
The entities defined in the std​::​ranges namespace in this Clause are not found by argument-dependent name lookup ([basic.lookup.argdep]). When found by unqualified ([basic.lookup.unqual]) name lookup for the postfix-expression in a function call ([expr.call]), they inhibit argument-dependent name lookup.
The only way to implement that, today, is by making them objects. However, we don't specify any further behavior of those objects.
So this:
std::invoke(std::ranges::sort, my_vec)
will work, simply because that will simply evaluate as std::ranges::sort(my_vec) after taking a reference to it, and there's no way to really prevent that from working.
But other uses might not. For instance, std::views::transform(r, std::ranges::distance) is not specified to work, because we don't say whether std::ranges::distance is copyable or not - std::ranges::size is a customization point object, and thus copyable, but std::ranges::distance is just an algorithm.
The MSVC implementation tries to adhere aggressively to the limited specification, and its implementation of std::ranges::distance is not copyable. libstdc++, on the other hand, just makes them empty objects, so views::transform(ranges::distance) just works by way of being not actively rejected.
All of which to say is: once you get away from directly writing std::ranges::meow(r) (or otherwise writing meow(r) after a using or using namespace), you're kind of on your own.

Does C++20 offer any new solutions to the problem of public member invisibility and source code bloat with inherited class templates?

Does C++20 offer any new solutions to the problem of public member invisibility and source code bloat/repetition with inherited class templates described in this question over 2 years ago ?
The "Problem"
The alleged "problem" is that, in a template, an unqualified name used in a way which isn't dependent on the specialization is truly independent of the specialization and refers to the entity with that name found at that point. The alleged source code "bloat" is using this-> to explicitly make the name dependent or qualifying the name. This is still the situation in C++20.
Just to be clear, the set of entities is not known at the point we refer to them. In the linked question, the base class depends on the template parameter, and we have only seen the primary base class template. The base class template may be specialized later and may have completely different member functions than the ones we've seen. So, any "solution" requires that a name without any obvious contextual dependence on the specialization find entities not yet declared which may be surprising.
Why It's Impossible
Any naive changes in this direction are either pretty big or have severe downsides, or both.
You could postpone all name lookup until instantiation time, discarding two phase lookup. That invites ODR violations which is silent UB, which is a huge downside.
You could restrict specialization so that you cannot specialize the base class later such that a different entity is found. That is difficult to diagnose, so it would likely be a new rule introducing silent UB.
You could opt in with a using declaration: using X::* as you propose or using class X as someone else suggested in a different context. This has the benefit of explicitness. It moves the problem a level up: if X is not dependent, presumably it should be found now, but if it is dependent, what happens? We can't instantiate it prior to instantiating the template we're in now. Thus, we can't interpret any names we see until instantiation. It has similar downsides to discarding two phase lookup.
Any of these changes would add complexity to an already complex area and would also pose significant backwards compatibility hurdles. None of them is a clear win.
Note: In C++20 they did make the rules more uniform by allowing ADL to find function templates with specified parameters to be found: f<int>(1).
Why It's Not a Problem
I doubt there would be consensus that this really is a problem. The linked question makes a poor argument. The derived class adds member function behavior to a certain base class, but a free function works better. These behaviors did not need member access, they aren't required by the language to be members, they can be found as non-members with ADL, and by using a free function they apply even when the static type you have is the base type. So, using inheritance for this is unnecessary coupling, and is a worse option.
Searching for 100s of places to add this-> and adding these 6 characters 100s of times strikes me as Code Bloat and Repetition when I have to templatize a base class
"Searching": The compiler will tell you when a name can't be found, which is better than silent bad behavior, such as making ODR violations easier to hit.
"templatize a base": Templatizing the base class doesn't trigger this. Templatizing the derived class and making the base dependent does. Yes, when templatizing the derived class, specifying that a bare name used in a way that is independent of the template parameter is in fact dependent may seem like boilerplate to some, but others might argue being explicit is clearer.
"100s of times": Seems hyperbolic.
These code patterns are used all the time in real world. Just look at CRTP. (comment on linked question)
Again, this only applies if the derived class is templated. I would dispute the commonality, but these idioms do exist and have a place.
Most importantly, though, is that CRTP is not a goal. CRTP is a hack. It's a C++ idiom because C++ lacks better facilities. CRTP allows a class to opt into certain behaviors that would otherwise be bothersome to write. Relevant C++ proposals do exist, but by and large, they have focused on making extension easier or removing boilerplate, and not on making CRTP, the hack, easier.
These are some that come to mind:
C++20: Comparisons
A very common use of CRTP is for things that require a lot of extra boilerplate. C++ required you to define operator== and operator!=, for instance. By opting into a CRTP base class, one could define only the primitive operation and have the other one generated.
C++20 fixed the underlying problem with comparisons. The typical member-wise comparison can be defaulted, and comparisons can be re-written so that != can invoke ==.
The problem is solved at the root, removing CRTP, not enhancing it.
C++20: Iterators to Ranges
Another common use of CRTP in the same vein as above is iterators. Writing custom iterators requires only a few fundamental operations: advance and dereference for forward iterators. But then there's a lot of extra seemingly unnecessary ceremony: pre-increment, post-increment, const iterators, typedefs, etc.
C++20 took a large step forward by introducing range concepts and the range library. The result is that it should be much less necessary to write a custom iterator. Ranges become a capable concept on their own, and there's a good suite of range combinators.
C++20: Concepts
C++ essentially has two systems for specifying an interface: virtual functions and concepts. They have trade-offs. Virtual functions are intrusive. But prior to C++20, concepts were implicit and emulated. One reason to use CRTP or inheritance generally would be to inject virtual functions. But in C++20, concepts are a language feature removing one big negative.
Future C++: Metaclasses
One value of CRTP in addition to the boilerplate reduction is satisfying an entire collection of multiple type requirements. A comparable class defines all the comparison operators. An clonable base defines a virtual destructor and clone.
This is the topic of metaclasses, which is not yet in C++.
See Also
See also the work on customization points which seems very interesting. And see the debates on unified function call syntax, which seems like we'll never get.
Summary
There's a very good question hiding in here about how C++20 makes it easier to reduce boilerplate, remove hacks like CRTP, and write better and clearer code. C++20 takes several steps in this regard, but they made the expression of intent easier, not a particular idiom.

What are customization point objects and how to use them?

The last draft of the c++ standard introduces the so-called "customization point objects" ([customization.point.object]),
which are widely used by the ranges library.
I seem to understand that they provide a way to write custom version of begin, swap, data, and the like, which are
found by the standard library by ADL. Is that correct?
How is this different from previous practice where a user defines an overload for e.g. begin for her type in her own
namespace? In particular, why are they objects?
What are customization point objects?
They are function object instances in namespace std that fulfill two objectives: first unconditionally trigger (conceptified) type requirements on the argument(s), then dispatch to the correct function in namespace std or via ADL.
In particular, why are they objects?
That's necessary to circumvent a second lookup phase that would directly bring in the user provided function via ADL (this should be postponed by design). See below for more details.
... and how to use them?
When developing an application: you mainly don't. This is a standard library feature, it will add concept checking to future customization points, hopefully resulting e.g. in clear error messages when you mess up template instantiations. However, with a qualified call to such a customization point, you can directly use it. Here's an example with an imaginary std::customization_point object that adheres to the design:
namespace a {
struct A {};
// Knows what to do with the argument, but doesn't check type requirements:
void customization_point(const A&);
}
// Does concept checking, then calls a::customization_point via ADL:
std::customization_point(a::A{});
This is currently not possible with e.g. std::swap, std::begin and the like.
Explanation (a summary of N4381)
Let me try to digest the proposal behind this section in the standard. There are two issues with "classical" customization points used by the standard library.
They are easy to get wrong. As an example, swapping objects in generic code is supposed to look like this
template<class T> void f(T& t1, T& t2)
{
using std::swap;
swap(t1, t2);
}
but making a qualified call to std::swap(t1, t2) instead is too simple - the user-provided
swap would never be called (see
N4381, Motivation and Scope)
More severely, there is no way to centralize (conceptified) constraints on types passed to such user provided functions (this is also why this topic gained importance with C++20). Again
from N4381:
Suppose that a future version of std::begin requires that its argument model a Range concept.
Adding such a constraint would have no effect on code that uses std::begin idiomatically:
using std::begin;
begin(a);
If the call to begin dispatches to a user-defined overload, then the constraint on std::begin
has been bypassed.
The solution that is described in the proposal mitigates both issues
by an approach like the following, imaginary implementation of std::begin.
namespace std {
namespace __detail {
/* Classical definitions of function templates "begin" for
raw arrays and ranges... */
struct __begin_fn {
/* Call operator template that performs concept checking and
* invokes begin(arg). This is the heart of the technique.
* Everyting from above is already in the __detail scope, but
* ADL is triggered, too. */
};
}
/* Thanks to #cpplearner for pointing out that the global
function object will be an inline variable: */
inline constexpr __detail::__begin_fn begin{};
}
First, a qualified call to e.g. std::begin(someObject) always detours via std::__detail::__begin_fn,
which is desired. For what happens with an unqualified call, I again refer to the original paper:
In the case that begin is called unqualified after bringing std::begin into scope, the situation
is different. In the first phase of lookup, the name begin will resolve to the global object
std::begin. Since lookup has found an object and not a function, the second phase of lookup is not
performed. In other words, if std::begin is an object, then using std::begin; begin(a); is
equivalent to std::begin(a); which, as we’ve already seen, does argument-dependent lookup on the
users’ behalf.
This way, concept checking can be performed within the function object in the std namespace,
before the ADL call to a user provided function is performed. There is no way to circumvent this.
"Customization point object" is a bit of a misnomer. Many - probably a majority - aren't actually customization points.
Things like ranges::begin, ranges::end, and ranges::swap are "true" CPOs. Calling one of those causes some complex metaprogramming to take place to figure out if there is a valid customized begin or end or swap to call, or if the default implementation should be used, or if the call should instead be ill-formed (in a SFINAE-friendly manner). Because a number of library concepts are defined in terms of CPO calls being valid (like Range and Swappable), correctly constrained generic code must use such CPOs. Of course, if you know the concrete type and another way to get an iterator out of it, feel free.
Things like ranges::cbegin are CPOs without the "CP" part. They always do the default thing, so it's not much of a customization point. Similarly, range adaptor objects are CPOs but there's nothing customizable about them. Classifying them as CPOs is more of a matter of consistency (for cbegin) or specification convenience (adaptors).
Finally, things like ranges::all_of are quasi-CPOs or niebloids. They are specified as function templates with special magical ADL-blocking properties and weasel wording to allow them to be implemented as function objects instead. This is primarily to prevent ADL picking up the unconstrained overload in namespace std when a constrained algorithm in std::ranges is called unqualified. Because the std::ranges algorithm accepts iterator-sentinel pairs, it's usually less specialized than its std counterpart and loses overload resolution as a result.

Has the C++ standard committee considered templated namespaces?

Namespaces are in many was like classes with no constructors, no destructors, no inheritance, final, and only static methods and members. After all, this kind of classes can essentially be used only the way namespaces are used: a named scope for declarations and definitions.
... except that the above is not true, since classes can be templated - and namespaces cannot. There have been a couple of questions here on the site similar to "can I template a namespace", but what I'd like to know is - has the C++ standard committee ever considered a proposal to make namespaces templatable? If it has, was the proposal rejected? If it was, what were the reasons?
The inability to have a template namespace is actually just one way in which they differ from classes. Others would be things like new namespace, and sizeof (namespace) - how could a compiler implement that, given that a namespace may extend over many compilation units?
Looking just at template namespaces... While it can at times be hard to keep up with all the proposals for new C++ features, I don't recall ever seeing one that attempted to add a feature such as you describe.
Would it ever be considered, assuming someone were to write a proposal? As Stroustrup indicates in this interview (http://www.stroustrup.com/devXinterview.html):
For C++ to remain viable for decades to come, it is essential that
Standard C++ isn't extended to support every academic and commercial
fad. Most language facilities that people ask for can be adequately
addressed through libraries using only current C++ facilities.
As you indicate yourself, what you are asking for is basically already there: just use a templated class with static members. This seems to disqualify it as a potential new feature, at least in the eyes of Stroustrup.
How would ADL work if namespaces can be templated? Are we supposed to create special template deduction rules for ADL then?
More importantly, can you justify the added complexity to the language by demonstrating a use-case that can't be filled by, just make a template struct with only static members? If a template namespace is just like a gimped template struct, that doesn't seem to be very compelling.
Also. I understand you weren't satisfied with the other questions about namespace / template hybrids, but one point in this answer seems to be relevant to your question:
Why can't namespaces be template parameters?
Possibly difficult: A namespace isn't a complete, self-contained entity. Different members of a namespace can be declared in different headers and even different compilation units.
If a namespace is a template, how will this even work? Can you still "reopen" the namespace like you can with a regular namespace? If that's allowed, then what is the point of instantiation of the namespace?
It sounds like it could potentially be extremely complicated.
Also: Will the language still be easily parsable after your proposed feature?
One of the most vexing things in C++ is the need to write template often when defining templates that refer to other templates, in order to resolve ambiguity in the grammar regarding whether < is a less than operator or a template parameter list.
3.4.5 [basic.lookup.classref]
In a class member access expression (5.2.5), if the . or -> token is immediately followed by an identifier followed by a <, the identifier must be looked up to determine whether the < is the beginning of a template argument list (14.2) or a less-than operator. The identifier is first looked up in the class of the object expression. If the identifier is not found, it is then looked up in the context of the entire postfix-expression and shall name a class template. If the lookup in the class of the object expression finds a template, the name is also looked up in the context of the entire postfix-expression and
— if the name is not found, the name found in the class of the object expression is used, otherwise
— if the name is found in the context of the entire postfix-expression and does not name a class template, the name found in the class of the object expression is used, otherwise
— if the name found is a class template, it shall refer to the same entity as the one found in the class of the object expression, otherwise the program is ill-formed.
If namespaces can be templates, don't we have to write template for them also, whenever you will refer to a template after a :: operator? For the same reason that foo::bar < 1 ... could be a namespace template bar inside of template foo with a non-type template parameter, or it could be a comparison of 1 with int foo::bar.
How do we disambiguate between that and the third possibility, foo is a namespace and bar is a regular class template inside of it`?

Is std::vector<T> a `user-defined type`?

In 17.6.4.2.1/1 and 17.6.4.2.1/2 of the current draft standard restrictions are placed on specializations injected by users into namespace std.
The behavior of a C
++
program is undefined if it adds declarations or definitions to namespace
std
or to a
namespace within namespace
std
unless otherwise specified. A program may add a template specialization
for any standard library template to namespace
std
only if the declaration depends on a user-defined type
and the specialization meets the standard library requirements for the original template and is not explicitly
prohibited.
I cannot find where in the standard the phrase user-defined type is defined.
One option I have heard claimed is that a type that is not std::is_fundamental is a user-defined type, in which case std::vector<int> would be a user-defined type.
An alternative answer would be that a user-defined type is a type that a user defines. As users do not define std::vector<int>, and std::vector<int> is not dependent on any type a user defines, std::vector<int> is not a user-defined type.
A practical problem this impacts is "can you inject a specialization for std::hash for std::tuple<Ts...> into namespace std? Being able to do so is somewhat convenient -- the alternative is to create another namespace where we recursively build our hash for std::tuple (and possibly other types in std that do not have hash support), and if and only if we fail to find a hash in that namespace do we fall back on std.
However, if this is legal, then if and when the standard adds a hash specialization for std::tuple to namespace std, code that specialized it already would be broken, creating a reason not to add such specializations in the future.
While I am talking about std::vector<int> as a concrete example, I am trying to ask if types defined in std are ever user-defined type s. A secondary question is, even if not, maybe std::tuple<int> becomes a user-defined type when used by a user (this gets slippery: what then happens if something inside std defines std::tuple<int>, and you partial-specialize hash for std::tuple<Ts...>).
There is currently an open defect on this problem.
Prof. Stroustrup is very clear that any type that is not built-in is user-defined. See the second paragraph of section 9.1 in Programming Principles and Practice Using C++.
He even specifically calls out “standard library types” as an example of user-defined types. In other words, a user-defined type is any compound type.
Source
The article explicitly mentions that not everyone seems to agree, but this is IMHO mostly wishful thinking and not what the standard (and Prof. Stroustrup) are actually saying, only what some people want to read into it.
When Clause 17 says "user-defined" it means "a type not defined in the standard" so std::vector<int> is not user-defined, neither is std::string, so you cannot specialize std::vector<int> or std::vector<std::string>. On the other hand, struct MyClass is user-defined, because it's not a type defined in the standard, so you can specialize std::vector<MyClass>.
This is not the same meaning of "user-defined" used in clauses 1-16, and that difference is confusing and silly. There is a defect report for this, with some discussion recorded that basically says "yes, the library uses the wrong term, but we don't have a better one".
So the answer to your question is "it depends". If you're talking to a C++ compiler implementor or a core language expert, std::vector<int> is definitely a user-defined type, but if you're talking to a standard library implementor, it is not. More precisely, it's not user-defined for the purposes of 17,6.4.2.1.
One way to look at it is that the standard library is "user code" as far as the core language is concerned. But the standard library has a different idea of "users" and considers itself to be part of the implementation, and only things that aren't part of the library are "user-defined".
Edit: I have proposed changing the library Clauses to use a new term, "program-defined", which means something defined in your program (as opposed to UDTs defined in the standard, such as std::string).
As users do not define std::vector<int>, and std::vector<int> is not dependent on any type a user defines, std::vector<int> is not a user-defined type.
The logical counter argument is that users do define std::vector<int>. You see std::vector is a class template and as such has no direct representation in binary code.
In a sense it gets it binary representation through the instantiation of a type, so the very action of declaring a std::vector<int> object is what gives "soul" to the template (pardon the phrasing). In a program where noone uses a std::vector<int> this data type does not exist.
On the other hand, following the same argument, std::vector<T> is not a user defined type, it is not even a type, it does not exist; only if we want to (instantiate a type), it will mandate how a structure will be layed out but until then we can only argue about it in terms of structure, design, properties and so on.
Note
The above argument (about templates being not code but ... well templates for code) may seem a bit superficial but draws it's logic, from Mayer's introduction in A. Alexandrescu's book Modern C++ Design. The relative quote there, goes like this :
Eventually, Andrei turned his attention to the development of template-based implementations of popular language idioms and design patterns, especially the GoF[*] patterns. This led to a brief skirmish with the Patterns community, because one of their fundamental tenets is that patterns cannot be represented in code. Once it became clear that Andrei was automating the generation of pattern implementations rather than trying to encode patterns themselves, that objection was removed, and I was pleased to see Andrei and one of the GoF (John Vlissides) collaborate on two columns in the C++ Report focusing on Andrei's work.
The draft standard contrasts fundamental types with user-defined types in a couple of (non-normative) places.
The draft standard also uses the term "user-defined" in other contexts, referring to entities created by the programmer or defined in the standard library. Examples include user-defined constructor, user-defined operator and user-defined conversion.
These facts allow us, absent other evidence, to tentatively assume that the intent of the standard is that user-defined type should mean compound type, according to historical usage. Only an explicit clarification in a future standard document can definitely resolve the issue.
Note that the historical usage is not clear on types like int* or struct foo* or void(*)(struct foo****). They are compound, but should they (or some of them) be considered user-defined?