Template partial ordering - why does partial deduction succeed here - c++

Consider the following simple (to the extent that template questions ever are) example:
#include <iostream>
template <typename T>
struct identity;
template <>
struct identity<int> {
using type = int;
};
template<typename T> void bar(T, T ) { std::cout << "a\n"; }
template<typename T> void bar(T, typename identity<T>::type) { std::cout << "b\n"; }
int main ()
{
bar(0, 0);
}
Both clang and gcc print "a" there. According to the rules in [temp.deduct.partial] and [temp.func.order], to determine partial ordering, we need to synthesize some unique types. So we have two attempts at deduction:
+---+-------------------------------+-------------------------------------------+
| | Parameters | Arguments |
+---+-------------------------------+-------------------------------------------+
| a | T, typename identity<T>::type | UniqueA, UniqueA |
| b | T, T | UniqueB, typename identity<UniqueB>::type |
+---+-------------------------------+-------------------------------------------+
For deduction on "b", according to Richard Corden's answer, the expression typename identity<UniqueB>::type is treated as a type and is not evaluated. That is, this will be synthesized as if it were:
+---+-------------------------------+--------------------+
| | Parameters | Arguments |
+---+-------------------------------+--------------------+
| a | T, typename identity<T>::type | UniqueA, UniqueA |
| b | T, T | UniqueB, UniqueB_2 |
+---+-------------------------------+--------------------+
It's clear that deduction on "b" fails. Those are two different types so you cannot deduce T to both of them.
However, it seems to me that the deduction on A should fail. For the first argument, you'd match T == UniqueA. The second argument is a non-deduced context - so wouldn't that deduction succeed iff UniqueA were convertible to identity<UniqueA>::type? The latter is a substitution failure, so I don't see how this deduction could succeed either.
How and why do gcc and clang prefer the "a" overload in this scenario?

As discussed in the comments, I believe there are several aspects of the function template partial ordering algorithm that are unclear or not specified at all in the standard, and this shows in your example.
To make things even more interesting, MSVC (I tested 12 and 14) rejects the call as ambiguous. I don't think there's anything in the standard to conclusively prove which compiler is right, but I think I might have a clue about where the difference comes from; there's a note about that below.
Your question (and this one) challenged me to do some more investigation into how things work. I decided to write this answer not because I consider it authoritative, but rather to organize the information I have found in one place (it wouldn't fit in comments). I hope it will be useful.
First, the proposed resolution for issue 1391. We discussed it extensively in comments and chat. I think that, while it does provide some clarification, it also introduces some issues. It changes [14.8.2.4p4] to (new text in bold):
Each type nominated above from the parameter template and the
corresponding type from the argument template are used as the types of
P and A. If a particular P contains no template-parameters that
participate in template argument deduction, that P is not used to
determine the ordering.
Not a good idea in my opinion, for several reasons:
If P is non-dependent, it doesn't contain any template parameters at all, so it doesn't contain any that participate in argument deduction either, which would make the bold statement apply to it. However, that would make template<class T> f(T, int) and template<class T, class U> f(T, U) unordered, which doesn't make sense. This is arguably a matter of interpretation of the wording, but it could cause confusion.
It messes with the notion of used to determine the ordering, which affects [14.8.2.4p11]. This makes template<class T> void f(T) and template<class T> void f(typename A<T>::a) unordered (deduction succeeds from first to second, because T is not used in a type used for partial ordering according to the new rule, so it can remain without a value). Currently, all compilers I've tested report the second as more specialized.
It would make #2 more specialized than #1 in the following example:
#include <iostream>
template<class T> struct A { using a = T; };
struct D { };
template<class T> struct B { B() = default; B(D) { } };
template<class T> struct C { C() = default; C(D) { } };
template<class T> void f(T, B<T>) { std::cout << "#1\n"; } // #1
template<class T> void f(T, C<typename A<T>::a>) { std::cout << "#2\n"; } // #2
int main()
{
f<int>(1, D());
}
(#2's second parameter is not used for partial ordering, so deduction succeeds from #1 to #2 but not the other way around). Currently, the call is ambiguous, and should arguably remain so.
After looking at Clang's implementation of the partial ordering algorithm, here's how I think the standard text could be changed to reflect what actually happens.
Leave [p4] as it is and add the following between [p8] and [p9]:
For a P / A pair:
If P is non-dependent, deduction is considered successful if and only if P and A are the same type.
Substitution of deduced template parameters into the non-deduced contexts appearing in P is not performed and does not affect the outcome of the deduction process.
If template argument values are successfully deduced for all template parameters of P except the ones that appear only in non-deduced contexts, then deduction is considered successful (even if some parameters used in P remain without a value at the end of the deduction process for that particular P / A pair).
Notes:
About the second bullet point: [14.8.2.5p1] talks about finding template argument values that will make P, after substitution of the deduced values (call it the deduced A), compatible with A. This could cause confusion about what actually happens during partial ordering; there's no substitution going on.
MSVC doesn't seem to implement the third bullet point in some cases. See the next section for details.
The second and third bullet points are intented to also cover cases where P has forms like A<T, typename U::b>, which aren't covered by the wording in issue 1391.
Change the current [p10] to:
Function template F is at least as specialized as function template
G if and only if:
for each pair of types used to determine the ordering, the type from F is at least as specialized as the type from G, and,
when performing deduction using the transformed F as the argument template and G as the parameter template, after deduction is done
for all pairs of types, all template parameters used in the types from
G that are used to determine the ordering have values, and those
values are consistent across all pairs of types.
F is more specialized than G if F is at least as specialized
as G and G is not at least as specialized as F.
Make the entire current [p11] a note.
(The note added by the resolution of 1391 to [14.8.2.5p4] needs to be adjusted as well - it's fine for [14.8.2.1], but not for [14.8.2.4].)
For MSVC, in some cases, it looks like all template parameters in P need to receive values during deduction for that specific P / A pair in order for deduction to succeed from A to P. I think this could be what causes implementation divergence in your example and others, but I've seen at least one case where the above doesn't seem to apply, so I'm not sure what to believe.
Another example where the statement above does seem to apply: changing template<typename T> void bar(T, T) to template<typename T, typename U> void bar(T, U) in your example swaps results around: the call is ambiguous in Clang and GCC, but resolves to b in MSVC.
One example where it doesn't:
#include <iostream>
template<class T> struct A { using a = T; };
template<class, class> struct B { };
template<class T, class U> void f(B<U, T>) { std::cout << "#1\n"; }
template<class T, class U> void f(B<U, typename A<T>::a>) { std::cout << "#2\n"; }
int main()
{
f<int>(B<int, int>());
}
This selects #2 in Clang and GCC, as expected, but MSVC rejects the call as ambiguous; no idea why.
The partial ordering algorithm as described in the standard speaks of synthesizing a unique type, value, or class template in order to generate the arguments. Clang manages that by... not synthesizing anything. It just uses the original forms of the dependent types (as declared) and matches them both ways. This makes sense, as substituting the synthesized types doesn't add any new information. It can't change the forms of the A types, since there's generally no way to tell what concrete types the substituted forms could resolve to. The synthesized types are unknown, which makes them pretty similar to template parameters.
When encountering a P that is a non-deduced context, Clang's template argument deduction algorithm simply skips it, by returning "success" for that particular step. This happens not only during partial ordering, but for all types of deductions, and not just at the top level in a function parameter list, but recursively whenever a non-deduced context is encountered in the form of a compound type. For some reason, I found that surprising the first time I saw it. Thinking about it, it does, of course, make sense, and is according to the standard ([...] does not participate in type deduction [...] in [14.8.2.5p4]).
This is consistent with Richard Corden's comments to his answer, but I had to actually see the compiler code to understand all the implications (not a fault of his answer, but rather of my own - programmer thinking in code and all that).
I've included some more information about Clang's implementation in this answer.

I believe the key is with the following statement:
The second argument is a non-deduced context - so wouldn't that deduction succeed iff UniqueA were convertible to identity::type?
Type deduction does not perform checking of "conversions". Those checks take place using the real explicit and deduced arguments as part of overload resolution.
This is my summary of the steps that are taken to select the function template to call (all references taken from N3937, ~ C++ '14):
Explicit arguments are replaced and the resulting function type checked that it is valid. (14.8.2/2)
Type deduction is performed and the resulting deduced arguments are replaced. Again the resulting type must be valid. (14.8.2/5)
The function templates that succeeded in Steps 1 and 2 are specialized and included in the overload set for overload resolution. (14.8.3/1)
Conversion sequences are compared by overload resolution. (13.3.3)
If the conversion sequences of two function specializations are not 'better' the partial ordering algorithm is used to find the more specialized function template. (13.3.3)
The partial ordering algorithm checks only that type deduction succeeds. (14.5.6.2/2)
The compiler already knows by step 4 that both specializations can be called when the real arguments are used. Steps 5 and 6 are being used to determine which of the functions is more specialized.

Related

Ambiguous operator overload in Clang

Consider the following:
template<typename T>
struct C {};
template<typename T, typename U>
void operator +(C<T>&, U);
struct D: C<D> {};
struct E {};
template<typename T>
void operator +(C<T>&, E);
void F() { D d; E e; d + e; }
This code compiles fine on both GCC-7 and Clang-5. The selected overload for operator + is that of struct E.
Now, if the following change takes place:
/* Put `operator +` inside the class. */
template<typename T>
struct C {
template<typename U>
void operator +(U);
};
that is, if operator + is defined inside the class template, instead of outside, then Clang yields ambiguity between both operator +s present in the code. GCC still compiles fine.
Why does this happen? Is this a bug in either GCC or Clang?
This is a bug in gcc; specifically, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53499 .
The problem is that gcc is regarding the implicit object parameter of a class template member function as having a dependent type; that is, during function template partial ordering gcc transforms
C<D>::template<class U> void operator+(U); // #1
into
template<class T, class U> void operator+(C<T>&, U); // #1a (gcc, wrong)
when it should be transformed into
template<class U> void operator+(C<D>&, U); // #1b (clang, correct)
We can see that when compared to your
template<class T> void operator+(C<T>&, E); // #2
#2 is better than the erroneous #1a, but is ambiguous with #1b.
Observe that gcc incorrectly accepts even when C<D> is not a template at all - i.e., when C<D> is a class template full specialization:
template<class> struct C;
struct D;
template<> struct C<D> {
// ...
This is covered by [temp.func.order]/3, with clarification in the example. Note that again, gcc miscompiles that example, incorrectly rejecting it but for the same reason.
Edit: The original version of this answer said that GCC was correct. I now believe that Clang is correct according to the wording of the standard, but I can see how GCC's interpretation could also be correct.
Let's look at your first example, where the two declarations are:
template<typename T, typename U>
void operator +(C<T>&, U);
template<typename T>
void operator +(C<T>&, E);
Both are viable, but it is obvious that the second template is more specialized than the first. So GCC and Clang both resolve the call to the second template. But let's walk through [temp.func.order] to see why, in the wording of the standard, the second template is more specialized.
The partial ordering rules tell us to replace each type template parameter with a unique synthesized type and then perform deduction against the other template. Under this scheme, the first overload type becomes
void(C<X1>&, X2)
and deduction against the second template fails since the latter only accepts E. The second overload type becomes
void(C<X3>&, E)
and deduction against the first template succeeds (with T = X3 and U = E). Since the deduction succeeded in only one direction, the template that accepted the other's transformed type (the first one) is considered less specialized, and thus, the second overload is chosen as the more specialized one.
When the second overload is moved into class C, both overloads are still found and the overload resolution process should apply in exactly the same way. First, the argument list is constructed for both overloads, and since the first overload is a non-static class member, an implied object parameter is inserted. According to [over.match.funcs], the type of that implied object parameter should be "lvalue reference to C<T>" since the function does not have a ref-qualifier. So the two argument lists are both (C<D>&, E). Since this fails to effect a choice between the two overloads, the partial ordering test kicks in again.
The partial ordering test, described in [temp.func.order], also inserts an implied object parameter:
If only one of the function templates M is a non-static member of
some class A, M is considered to have a new first parameter inserted in its function parameter list. Given cv
as the cv-qualifiers of M (if any), the new parameter is of type “rvalue reference to cv A” if the optional
ref-qualifier of M is && or if M has no ref-qualifier and the first parameter of the other template has rvalue
reference type. Otherwise, the new parameter is of type “lvalue reference to cv A”. [ Note: This allows a
non-static member to be ordered with respect to a non-member function and for the results to be equivalent
to the ordering of two equivalent non-members. — end note ]
This is the step where, presumably, GCC and Clang take different interpretations of the standard.
My take: The member operator+ has already been found in the class C<D>. The template parameter T for the class C is not being deduced; it is known because the name lookup process entered the concrete base class C<D> of D. The actual operator+ that is submitted to partial ordering therefore does not have a free T parameter; it is not void operator+(C<T>&, U), but rather, void operator+(C<D>&, U).
Thus, for the member overload, the transformed function type should not be void(C<X1>&, X2), but rather void(C<D>&, X2). For the non-member overload, the transformed function type is still void(C<X3>&, E) as before. But now we see that void(C<D>&, X2) is not a match for the non-member template void(C<T>&, E) nor is void(C<X3>&, E) a match for the member template void(C<D>&, U). So partial ordering fails, and overload resolution returns an ambiguous result.
GCC's decision to continue to select the non-member overload makes sense if you assume that it is constructing the transformed function type for the member lexically, making it still void(C<X1>&, X2), while Clang substitutes D into the template, leaving only U as a free parameter, before beginning the partial ordering test.

Should the following code compile according to C++ standard?

#include <type_traits>
template <typename T>
struct C;
template<typename T1, typename T2>
using first = T1;
template <typename T>
struct C<first<T, std::enable_if_t<std::is_same<T, int>::value>>>
{
};
int main ()
{
}
Results of compilation by different compilers:
MSVC:
error C2753: 'C': partial specialization cannot match argument list for primary template
gcc-4.9:
error: partial specialization 'C' does not specialize any template arguments
clang all versions:
error: class template partial specialization does not specialize any template argument; to define the primary template, remove the template argument list
gcc-5+:
successfully compiles
And additionaly I want to point out that trivial specialization like:
template<typename T>
struct C<T>
{
};
successfully fails to be compiled by gcc. So it seems like it figures out that specialization in my original example is non-trivial. So my question is - is pattern like this explicitly forbidden by C++ standard or not?
The crucial paragraph is [temp.class.spec]/(8.2), which requires the partial specialization to be more specialized than the primary template. What Clang actually complains about is the argument list being identical to the primary template's: this has been removed from [temp.class.spec]/(8.3) by issue 2033 (which stated that the requirement was redundant) fairly recently, so hasn't been implemented in Clang yet. However, it apparently has been implemented in GCC, given that it accepts your snippet; it even compiles the following, perhaps for the same reason it compiles your code (it also only works from version 5 onwards):
template <typename T>
void f( C<T> ) {}
template <typename T>
void f( C<first<T, std::enable_if_t<std::is_same<T, int>::value>>> ) {}
I.e. it acknowledges that the declarations are distinct, so must have implemented some resolution of issue 1980. It does not find that the second overload is more specialized (see the Wandbox link), however, which is inconsistent, because it should've diagnosed your code according to the aforementioned constraint in (8.2).
Arguably, the current wording makes your example's partial ordering work as desired†: [temp.deduct.type]/1 mentions that in deduction from types,
Template arguments can be deduced in several different contexts, but in each case a type that is specified in terms of template parameters (call it P) is compared with an actual type (call it A), and an attempt is made to find template argument values […] that will make P, after substitution of the deduced values (call it the deduced A), compatible with A.
Now via [temp.alias]/3, this would mean that during the partial ordering step in which the partial specialization's function template is the parameter template, the substitution into is_same yields false (since common library implementations just use a partial specialization that must fail), and enable_if fails.‡ But this semantics is not satisfying in the general case, because we could construct a condition that generally succeeds, so a unique synthesized type meets it, and deduction succeeds both ways.
Presumably, the simplest and most robust solution is to ignore discarded arguments during partial ordering (making your example ill-formed). One can also orientate oneself towards implementations' behaviors in this case (analogous to issue 1157):
template <typename...> struct C {};
template <typename T>
void f( C<T, int> ) = delete;
template <typename T>
void f( C<T, std::enable_if_t<sizeof(T) == sizeof(int), int>> ) {}
int main() {f<int>({});}
Both Clang and GCC diagnose this as calling the deleted function, i.e. agree that the first overload is more specialized than the other. The critical property of #2 seems to be that the second template argument is dependent yet T appears solely in non-deduced contexts (if we change int to T in #1, nothing changes). So we could use the existence of discarded (and dependent?) template arguments as tie-breakers: this way we don't have to reason about the nature of synthesized values, which is the status quo, and also get reasonable behavior in your case, which would be well-formed.
† #T.C. mentioned that the templates generated through [temp.class.order] would currently be interpreted as one multiply declared entity—again, see issue 1980. That's not directly relevant to the standardese in this case, because the wording never mentions that these function templates are declared, let alone in the same program; it just specifies them and then falls back to the procedure for function templates.
‡ It isn't entirely clear with what depth implementations are required to perform this analysis. Issue 1157 demonstrates what level of detail is required to "correctly" determine whether a template's domain is a proper subset of the other's. It's neither practical nor reasonable to implement partial ordering to be this sophisticated. However, the footnoted section just goes to show that this topic isn't necessarily underspecified, but defective.
I think you could simplify your code - this has nothing to do with type_traits. You'll get the same results with following one:
template <typename T>
struct C;
template<typename T>
using first = T;
template <typename T>
struct C<first<T>> // OK only in 5.1
{
};
int main ()
{
}
Check in online compiler (compiles under 5.1 but not with 5.2 or 4.9 so it's probably a bug) - https://godbolt.org/g/iVCbdm
I think that int GCC 5 they moved around template functionality and it's even possible to create two specializations of the same type. It will compile until you try to use it.
template <typename T>
struct C;
template<typename T1, typename T2>
using first = T1;
template<typename T1, typename T2>
using second = T2;
template <typename T>
struct C<first<T, T>> // OK on 5.1+
{
};
template <typename T>
struct C<second<T, T>> // OK on 5.1+
{
};
int main ()
{
C<first<int, int>> dummy; // error: ambiguous template instantiation for 'struct C<int>'
}
https://godbolt.org/g/6oNGDP
It might be somehow related to added support for C++14 variable templates. https://isocpp.org/files/papers/N3651.pdf

Is this "if e is a pack, then get a template name, otherwise get a variable name" valid or not?

I have tried to construct a case that requires no typename or template, but still yield a variable or template depending on whether a given name t is a function parameter pack or not
template<typename T> struct A { template<int> static void f(int) { } };
template<typename...T> struct A<void(T...,...)> { static const int f = 0; };
template<typename> using type = int;
template<typename T> void f(T t) { A<void(type<decltype(t)>...)>::f<0>(1); }
int main() {
f(1);
}
The above will refer to the static const int, and do a comparison. The following just has T t changed to be a pack and make f refer to a template, but GCC does not like either
template<typename ...T> void f(T ...t) { A<void(type<decltype(t)>...)>::f<0>(1); }
int main() {
f(1, 2, 3);
}
GCC complains for the first
main.cpp:5:68: error: incomplete type 'A<void(type<decltype (t)>, ...)>' used in nested name specifier
template<typename T> void f(T t) { A<void(type<decltype(t)>...)>::f<0>(1); }
And for the second
main.cpp:5:74: error: invalid operands of types '<unresolved overloaded function type>' and 'int' to binary 'operator<'
template<typename ...T> void f(T ...t) { A<void(type<decltype(t)>...)>::f<0>(1); }
I have multiple questions
Does the above code work according to the language, or is there an error?
Since Clang accepts both variants but GCC rejects, I wanted to ask what compiler is correct?
If I remove the body of the primary template, then for the f(1, 2, 3) case, Clang complains
main.cpp:5:42: error: implicit instantiation of undefined template 'A<void (int)>'
Please note that it says A<void (int) >, while I would expected A<void (int, int, int)>. How does this behavior occur? Is this a bug in my code - i.e is it illformed, or is it a bug in Clang? I seem to remember a defect report about the order of expansion vs the substitution of alias template, is that relevant and does it render my code ill-formed?
Expanding a parameter pack either should, or does, make an expression type dependent. Regardless of whether the things expanded are type dependent.
If it did not, there would be a gaping hole in the type dependency rules of C++ and it would be a defect in the standard.
So A<void(type<decltype(t)>...)>::f when t is a pack, no matter what tricks you pull in the void( here ) parts to unpack the t, should be a dependent type, and template is required before the f if it is a template.
In the case where t is not a pack, it is intended that type<decltype(t)> not be dependent (See http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1390), but the standard may or may not agree at this point (I think not?)
If compilers did "what the committee intended", then when t is not a pack:
A<void(type<decltype(t)>...)>::f<0>(1)
could mean
A<void(int...)>::f<0>(1)
which is
A<void(int, ...)>::f<0>(1)
and if f is a template (your code makes it an int, but I think swapping the two should work) this would be fine. But the standard apparently currently disagrees?
So if http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1390 was implemented, then you could swap your two A specializations. The void(T...,...) specialization should have a template<int> void f(int), and the T specialization should have a static const int.
Now in the case where A<> is dependent (on the size of a pack), ::f is an int and does not need template. In the case where A<> is not dependent, ::f is a template but does not need disambiguation.
We can replace the type<decltype(t)>... with:
decltype(sizeof(decltype(t)*))...
and sizeof(decltype(t)*) is of non-dependent type (it is std::size_t), decltype gives us a std::size_t, and the ... is treated as a old-school ... arg. This means void(std::size_t...) becomes a non-dependent type, so A<void(std::size_t...)> is not dependent, so ::f being a template is not a template in a dependent context.
In the case where t is a parameter pack with one element
decltype(sizeof(decltype(t)*))...
becomes
std::size_t
but in a dependent context (one copy per element in t pack). So we get
A<void(std::size_t)>::f
which is presumed to be a scalar value, so
A<void(std::size_t)>::f<0>(1)
becomes an expression evaluating to false.
(Chain of logic generated in a discussion with Johannes in comments in original question).
Your second case is ill-formed; A<void(type<decltype(t)>...)>::f<0>(1) should be
A<void(type<decltype(t)>...)>::template f<0>(1)
// ~~~~~~~~~
For the first case, both compilers are behaving incorrectly; this was considered sufficiently confusing that CWG 1520 was raised to query the correct behavior; the conclusion was that pack expansion should be applied before alias substitution:
The latter interpretation (a list of specializations) is the correct interpretation; a parameter pack can't be substituted into anything, including an alias template specialization. CWG felt that this is clear enough in the current wording.
This is reminiscent of CWG 1558 (alias templates and SFINAE), which was fixed for C++14, but per the above even C++11 compilers are expected to get this correct, so it is disappointing that gcc and clang get it wrong (though in fairness they do behave correctly in simpler cases, including the motivating example in CWG 1520). Note that MSVC had a similar bug till recently; it is fixed in VS2015.
Your code (only in the first case) is correct; but as a workaround, you could alter your alias template to use and discard its template parameter, fixing your program for both compilers - of course that means that your CWG 1390 exploit will cease to be valid:
template<typename T> using type = decltype(((int(*)(T*))(0))(0)); // int
However, I don't think your CWG 1390 trick can work as presented, since even though the expansion-substitution of type<decltype(t)>... is not dependent on the types of t..., it is dependent on their number:
template<typename T> struct A { template<int> static void f(int) {} };
template<> struct A<void(int, int, int)> { static const int f = 0; };
As Yakk points out, it can be made to work if you swap the member function template and data member, since a data member is OK in dependent context.

Are the template partial ordering rules underspecified? [duplicate]

Consider the following simple (to the extent that template questions ever are) example:
#include <iostream>
template <typename T>
struct identity;
template <>
struct identity<int> {
using type = int;
};
template<typename T> void bar(T, T ) { std::cout << "a\n"; }
template<typename T> void bar(T, typename identity<T>::type) { std::cout << "b\n"; }
int main ()
{
bar(0, 0);
}
Both clang and gcc print "a" there. According to the rules in [temp.deduct.partial] and [temp.func.order], to determine partial ordering, we need to synthesize some unique types. So we have two attempts at deduction:
+---+-------------------------------+-------------------------------------------+
| | Parameters | Arguments |
+---+-------------------------------+-------------------------------------------+
| a | T, typename identity<T>::type | UniqueA, UniqueA |
| b | T, T | UniqueB, typename identity<UniqueB>::type |
+---+-------------------------------+-------------------------------------------+
For deduction on "b", according to Richard Corden's answer, the expression typename identity<UniqueB>::type is treated as a type and is not evaluated. That is, this will be synthesized as if it were:
+---+-------------------------------+--------------------+
| | Parameters | Arguments |
+---+-------------------------------+--------------------+
| a | T, typename identity<T>::type | UniqueA, UniqueA |
| b | T, T | UniqueB, UniqueB_2 |
+---+-------------------------------+--------------------+
It's clear that deduction on "b" fails. Those are two different types so you cannot deduce T to both of them.
However, it seems to me that the deduction on A should fail. For the first argument, you'd match T == UniqueA. The second argument is a non-deduced context - so wouldn't that deduction succeed iff UniqueA were convertible to identity<UniqueA>::type? The latter is a substitution failure, so I don't see how this deduction could succeed either.
How and why do gcc and clang prefer the "a" overload in this scenario?
As discussed in the comments, I believe there are several aspects of the function template partial ordering algorithm that are unclear or not specified at all in the standard, and this shows in your example.
To make things even more interesting, MSVC (I tested 12 and 14) rejects the call as ambiguous. I don't think there's anything in the standard to conclusively prove which compiler is right, but I think I might have a clue about where the difference comes from; there's a note about that below.
Your question (and this one) challenged me to do some more investigation into how things work. I decided to write this answer not because I consider it authoritative, but rather to organize the information I have found in one place (it wouldn't fit in comments). I hope it will be useful.
First, the proposed resolution for issue 1391. We discussed it extensively in comments and chat. I think that, while it does provide some clarification, it also introduces some issues. It changes [14.8.2.4p4] to (new text in bold):
Each type nominated above from the parameter template and the
corresponding type from the argument template are used as the types of
P and A. If a particular P contains no template-parameters that
participate in template argument deduction, that P is not used to
determine the ordering.
Not a good idea in my opinion, for several reasons:
If P is non-dependent, it doesn't contain any template parameters at all, so it doesn't contain any that participate in argument deduction either, which would make the bold statement apply to it. However, that would make template<class T> f(T, int) and template<class T, class U> f(T, U) unordered, which doesn't make sense. This is arguably a matter of interpretation of the wording, but it could cause confusion.
It messes with the notion of used to determine the ordering, which affects [14.8.2.4p11]. This makes template<class T> void f(T) and template<class T> void f(typename A<T>::a) unordered (deduction succeeds from first to second, because T is not used in a type used for partial ordering according to the new rule, so it can remain without a value). Currently, all compilers I've tested report the second as more specialized.
It would make #2 more specialized than #1 in the following example:
#include <iostream>
template<class T> struct A { using a = T; };
struct D { };
template<class T> struct B { B() = default; B(D) { } };
template<class T> struct C { C() = default; C(D) { } };
template<class T> void f(T, B<T>) { std::cout << "#1\n"; } // #1
template<class T> void f(T, C<typename A<T>::a>) { std::cout << "#2\n"; } // #2
int main()
{
f<int>(1, D());
}
(#2's second parameter is not used for partial ordering, so deduction succeeds from #1 to #2 but not the other way around). Currently, the call is ambiguous, and should arguably remain so.
After looking at Clang's implementation of the partial ordering algorithm, here's how I think the standard text could be changed to reflect what actually happens.
Leave [p4] as it is and add the following between [p8] and [p9]:
For a P / A pair:
If P is non-dependent, deduction is considered successful if and only if P and A are the same type.
Substitution of deduced template parameters into the non-deduced contexts appearing in P is not performed and does not affect the outcome of the deduction process.
If template argument values are successfully deduced for all template parameters of P except the ones that appear only in non-deduced contexts, then deduction is considered successful (even if some parameters used in P remain without a value at the end of the deduction process for that particular P / A pair).
Notes:
About the second bullet point: [14.8.2.5p1] talks about finding template argument values that will make P, after substitution of the deduced values (call it the deduced A), compatible with A. This could cause confusion about what actually happens during partial ordering; there's no substitution going on.
MSVC doesn't seem to implement the third bullet point in some cases. See the next section for details.
The second and third bullet points are intented to also cover cases where P has forms like A<T, typename U::b>, which aren't covered by the wording in issue 1391.
Change the current [p10] to:
Function template F is at least as specialized as function template
G if and only if:
for each pair of types used to determine the ordering, the type from F is at least as specialized as the type from G, and,
when performing deduction using the transformed F as the argument template and G as the parameter template, after deduction is done
for all pairs of types, all template parameters used in the types from
G that are used to determine the ordering have values, and those
values are consistent across all pairs of types.
F is more specialized than G if F is at least as specialized
as G and G is not at least as specialized as F.
Make the entire current [p11] a note.
(The note added by the resolution of 1391 to [14.8.2.5p4] needs to be adjusted as well - it's fine for [14.8.2.1], but not for [14.8.2.4].)
For MSVC, in some cases, it looks like all template parameters in P need to receive values during deduction for that specific P / A pair in order for deduction to succeed from A to P. I think this could be what causes implementation divergence in your example and others, but I've seen at least one case where the above doesn't seem to apply, so I'm not sure what to believe.
Another example where the statement above does seem to apply: changing template<typename T> void bar(T, T) to template<typename T, typename U> void bar(T, U) in your example swaps results around: the call is ambiguous in Clang and GCC, but resolves to b in MSVC.
One example where it doesn't:
#include <iostream>
template<class T> struct A { using a = T; };
template<class, class> struct B { };
template<class T, class U> void f(B<U, T>) { std::cout << "#1\n"; }
template<class T, class U> void f(B<U, typename A<T>::a>) { std::cout << "#2\n"; }
int main()
{
f<int>(B<int, int>());
}
This selects #2 in Clang and GCC, as expected, but MSVC rejects the call as ambiguous; no idea why.
The partial ordering algorithm as described in the standard speaks of synthesizing a unique type, value, or class template in order to generate the arguments. Clang manages that by... not synthesizing anything. It just uses the original forms of the dependent types (as declared) and matches them both ways. This makes sense, as substituting the synthesized types doesn't add any new information. It can't change the forms of the A types, since there's generally no way to tell what concrete types the substituted forms could resolve to. The synthesized types are unknown, which makes them pretty similar to template parameters.
When encountering a P that is a non-deduced context, Clang's template argument deduction algorithm simply skips it, by returning "success" for that particular step. This happens not only during partial ordering, but for all types of deductions, and not just at the top level in a function parameter list, but recursively whenever a non-deduced context is encountered in the form of a compound type. For some reason, I found that surprising the first time I saw it. Thinking about it, it does, of course, make sense, and is according to the standard ([...] does not participate in type deduction [...] in [14.8.2.5p4]).
This is consistent with Richard Corden's comments to his answer, but I had to actually see the compiler code to understand all the implications (not a fault of his answer, but rather of my own - programmer thinking in code and all that).
I've included some more information about Clang's implementation in this answer.
I believe the key is with the following statement:
The second argument is a non-deduced context - so wouldn't that deduction succeed iff UniqueA were convertible to identity::type?
Type deduction does not perform checking of "conversions". Those checks take place using the real explicit and deduced arguments as part of overload resolution.
This is my summary of the steps that are taken to select the function template to call (all references taken from N3937, ~ C++ '14):
Explicit arguments are replaced and the resulting function type checked that it is valid. (14.8.2/2)
Type deduction is performed and the resulting deduced arguments are replaced. Again the resulting type must be valid. (14.8.2/5)
The function templates that succeeded in Steps 1 and 2 are specialized and included in the overload set for overload resolution. (14.8.3/1)
Conversion sequences are compared by overload resolution. (13.3.3)
If the conversion sequences of two function specializations are not 'better' the partial ordering algorithm is used to find the more specialized function template. (13.3.3)
The partial ordering algorithm checks only that type deduction succeeds. (14.5.6.2/2)
The compiler already knows by step 4 that both specializations can be called when the real arguments are used. Steps 5 and 6 are being used to determine which of the functions is more specialized.

Type deduction for non-viable function templates

In his answer to this question and the comment section, Johannes Schaub says there's a "match error" when trying to do template type deduction for a function template that requires more arguments than have been passed:
template<class T>
void foo(T, int);
foo(42); // the template specialization foo<int>(int, int) is not viable
In the context of the other question, what's relevant is whether or not type deduction for the function template succeeds (and substitution takes place):
template<class T>
struct has_no_nested_type {};
// I think you need some specialization for which the following class template
// `non_immediate_context` can be instantiated, otherwise the program is
// ill-formed, NDR
template<>
struct has_no_nested_type<double>
{ using type = double; };
// make the error appear NOT in the immediate context
template<class T>
struct non_immediate_context
{
using type = typename has_no_nested_type<T>::type;
};
template<class T>
typename non_immediate_context<T>::type
foo(T, int) { return {}; }
template<class T>
bool foo(T) { return {}; }
int main()
{
foo(42); // well-formed? clang++3.5 and g++4.8.2 accept it
foo<int>(42); // well-formed? clang++3.5 accepts it, but not g++4.8.2
}
When instantiating the first function template foo for T == int, the substitution produces an invalid type not in the immediate context of foo. This leads to a hard error (this is what the related question is about.)
However, when letting foo deduce its template-argument, g++ and clang++ agree that no instantiation takes place. As Johannes Schaub explains, this is because there is a "match error".
Question: What is a "match error", and where and how is it specified in the Standard?
Altenative question: Why is there a difference between foo(42) and foo<int>(42) for g++?
What I've found / tried so far:
[over.match.funcs]/7 and [temp.over] seem to describe the overload resolution specifics for function templates. The latter seem to mandate the substitution of template parameters for foo.
Interestingly, [over.match.funcs]/7 triggers the process described in [temp.over] before checking for viability of the function template (specialization).
Similarly, type deduction does not to take into account, say, default function arguments (other than making them a non-deduced context). It seems not to be concerned with viability, as far as I can tell.
Another possibly important aspect is how type deduction is specified. It acts on single function parameters, but I don't see where the distinction is made between parameter types that contain / are dependent on template parameters (like T const&) and those which aren't (like int).
Yet, g++ makes a difference between explicitly specifying the template parameter (hard error) and letting them be deduced (deduction failure / SFINAE). Why?
What I've summarized is the process described at 14.8.2.1p1
Template argument deduction is done by comparing each function template parameter type (call it P) with the type of the corresponding argument of the call (call it A) as described below.
In our case, we have for P (T, int) and for A, we have (int). For the first pair of P/A, which is T against int, we can match T to int (by the process described in 14.8.2.5). But for the second "pair", we have int but have no counterpart. Thus deduction cannot be made for this "pair".
Thereby, by 14.8.2.5p2, "If type deduction cannot be done for any P/A pair, ..., template
argument deduction fails.".
You then won't ever come to the point where you substitute template arguments into the function template.
This can all probably described more precisely in the Standard (IMO), but I believe this is how one could implement things to match the actual behavior of Clang and GCC and it seems a reasonable interpretation of the Standardese.