Does constraint subsumption only apply to concepts?

Does constraint subsumption only apply to concepts? - c++

Consider this example:
template <typename T> inline constexpr bool C1 = true;
template <typename T> inline constexpr bool C2 = true;
template <typename T> requires C1<T> && C2<T>
constexpr int foo() { return 0; }
template <typename T> requires C1<T>
constexpr int foo() { return 1; }
constexpr int bar() {
return foo<int>();
}
Is the call foo<int>() ambiguous, or does the constraint C1<T> && C2<T> subsume C1<T>?

Yes. Only concepts can be subsumed. The call to foo<int> is ambiguous because neither of the declarations is "at least as constrained as" the other.
If, however, C1 and C2 were both concepts instead of inline constexpr bools, then the declaration of the foo() that returns 0 would be at least as constrained as the declaration of the foo() that returns 1, and the call to foo<int> would be valid and return 0. This is one reason to prefer to use concepts as constraints over arbitrary boolean constant expressions.
Background
The reason for this difference (concepts subsume, arbitrary expressions do not) is best expressed in Semantic constraint matching for concepts, which is worth reading in full (I will not reproduce all the arguments here). But taking an example from the paper:
namespace X {
template<C1 T> void foo(T);
template<typename T> concept Fooable = requires (T t) { foo(t); };
}
namespace Y {
template<C2 T> void foo(T);
template<typename T> concept Fooable = requires (T t) { foo(t); };
}
X::Fooable is equivalent to Y::Fooable despite them meaning completely different things (by virtue of being defined in different namespace). This kind of incidental equivalence is problematic: an overload set with functions constrained by these two concepts would be ambiguous.
That problem is exacerbated when one concept incidentally refines the others.
namespace Z {
template<C3 T> void foo(T);
template<C3 T> void bar(T);
template<typename T> concept Fooable = requires (T t) {
foo(t);
bar(t);
};
}
An overload set containing distinct viable candidates constrained by X::Fooable, Y::Fooable, and Z::Fooable respectively will always select the candidate constrained by Z::Fooable. This is almost certainly not what a programmer wants.
Standard References
The subsumption rule is in [temp.constr.order]/1.2:
an atomic constraint A subsumes another atomic constraint B if and only if the A and B are identical using the rules described in [temp.constr.atomic].
Atomic constraints are defined in [temp.constr.atomic]:
An atomic constraint is formed from an expression E and a mapping from the template parameters that appear within E to template arguments involving the template parameters of the constrained entity, called the parameter mapping ([temp.constr.decl]). [ Note: Atomic constraints are formed by constraint normalization. E is never a logical AND expression nor a logical OR expression. — end note ]
Two atomic constraints are identical if they are formed from the same expression and the targets of the parameter mappings are equivalent according to the rules for expressions described in [temp.over.link].
The key here is that atomic constraints are formed. This is the key point right here. In [temp.constr.normal]:
The normal form of an expression E is a constraint that is defined as follows:
The normal form of an expression ( E ) is the normal form of E.
The normal form of an expression E1 || E2 is the disjunction of the normal forms of E1 and E2.
The normal form of an expression E1 && E2 is the conjunction of the normal forms of E1 and E2.
The normal form of an id-expression of the form C<A1, A2, ..., An>, where C names a concept, is the normal form of the constraint-expression of C, after substituting A1, A2, ..., An for C's respective template parameters in the parameter mappings in each atomic constraint. If any such substitution results in an invalid type or expression, the program is ill-formed; no diagnostic is required. [ ... ]
The normal form of any other expression E is the atomic constraint whose expression is E and whose parameter mapping is the identity mapping.
For the first overload of foo, the constraint is C1<T> && C2<T>, so to normalize it, we get the conjunction of the normal forms of C1<T>1 and C2<T>1 and then we're done. Likewise, for the second overload of foo, the constraint is C1<T>2 which is its own normal form.
The rule for what makes atomic constraints identical is that they must be formed from the same expression (the source-level construct). While both functions hvae an atomic constraint which uses the token sequence C1<T>, those are not the same literal expression in the source code.
Hence the subscripts indicating that these are, in fact, not the same atomic constraint. C1<T>1 is not identical to C1<T>2. The rule is not token equivalence! So the first foo's C1<T> does not subsume the second foo's C1<T>, and vice versa.
Hence, ambiguous.
On the other hand, if we had:
template <typename T> concept D1 = true;
template <typename T> concept D2 = true;
template <typename T> requires D1<T> && D2<T>
constexpr int quux() { return 0; }
template <typename T> requires D1<T>
constexpr int quux() { return 1; }
The constraint for the first function is D1<T> && D2<T>. The 3rd bullet gives us the conjunction of D1<T> and D2<T>. The 4th bullet then leads us to substitute into the concepts themselves, so the first one normalizes into true1 and the second into true2. Again, the subscripts indicate which true is being referred to.
The constraint for the second function is D1<T>, which normalizes (4th bullet) into true1.
And now, true1 is indeed the same expression as true1, so these constraints are considered identical. As a result, D1<T> && D2<T> subsumes D1<T>, and quux<int>() is an unambiguous call that returns 0.

Related

C++20 Concepts: Explicit instantiation of partially ordered constraints for member functions

This works and outputs "1", because the function's constraints are partially ordered and the most constrained overload wins:
template<class T>
struct B {
int f() requires std::same_as<T, int> {
return 0;
}
int f() requires (std::same_as<T, int> && !std::same_as<T, char>) {
return 1;
}
};
int main() {
std::cout << B<int>{}.f();
}
This works as well, because explicit instantiation doesn't instantiate member functions with unsatisfied constraints:
template<class T>
struct B {
int f() requires std::same_as<T, int> {
return 0;
}
int f() requires (!std::same_as<T, int>) {
return 1;
}
};
template struct B<int>;
So what should this do?
template<class T>
struct B {
int f() requires std::same_as<T, int> {
return 0;
}
int f() requires (std::same_as<T, int> && !std::same_as<T, char>) {
return 1;
}
};
template struct B<int>;
Currently this crashes trunk gcc because it compiles two functions with the same mangling. I think it would make sense to compile only the second function, so that the behavior is consistent with the regular overload resolution. Does anything in the standard handle this edge case?

C++20 recognizes that there can be different spellings of the same effective requirements. So the standard defines two concepts: "equivalent" and "functionally equivalent".
True "equivalence" is based on satisfying the ODR (one-definition rule):
Two expressions involving template parameters are considered equivalent if two function definitions containing the expressions would satisfy the one-definition rule, except that the tokens used to name the template parameters may differ as long as a token used to name a template parameter in one expression is replaced by another token that names the same template parameter in the other expression.
There's more to it, but that's not an issue here.
Equivalence for template heads includes that all constraint expressions are equivalent (template headers include constraints).
Functional equivalence is (usually) about the results of expressions being equal. For template heads, two template heads that are not ODR equivalent can be functionally equivalent:
Two template-heads are functionally equivalent if they accept and are satisfied by ([temp.constr.constr]) the same set of template argument lists.
That's based in part on the validity of the constraint expressions.
Your two template heads in versions 1 and 3 are not ODR equivalent, but they are functionally equivalent, as they both accept the same template parameters. And the behavior of that code will be different from its behavior if they were ODR equivalent. Therefore, this passage kicks in:
If the validity or meaning of the program depends on whether two constructs are equivalent, and they are functionally equivalent but not equivalent, the program is ill-formed, no diagnostic required.
As such, all of the compilers are equally right because your code is wrong. Obviously a compiler shouldn't straight-up crash (and that should be submitted as a bug), but "ill-formed, no diagnostic required" often carries with it unforeseen consequences.

Why does "&& true" added to a constraint make a function template a better overload?

Consider the following two overloads of a function template foo:
template <typename T>
void foo(T) requires std::integral<T> {
std::cout << "foo requires integral\n";
}
template <typename T>
int foo(T) requires std::integral<T> && true {
std::cout << "foo requires integral and true\n";
return 0;
}
Note the difference between the two constraints: the second has an extra && true.
Intuitively speaking, true is redundant in a conjunction (since X && true is just X). However, it looks like this makes a semantic difference, as foo(42) would call the second overload.
Why is this the case? Specifically, why is the second function template a better overload?

As per [temp.constr.order], particularly [temp.constr.order]/1 and [temp.constr.order]/3
/1 A constraint P subsumes a constraint Q if and only if, [...] [ Example: Let A and B be atomic constraints. The constraint A ∧ B subsumes A, but A does not subsume A ∧ B. The constraint A subsumes A ∨ B, but A ∨ B does not subsume A. Also note that every constraint subsumes itself. — end example ]
/3 A declaration D1 is at least as constrained as a declaration D2 if
(3.1) D1 and D2 are both constrained declarations and D1's associated constraints subsume those of D2; or
(3.2) D2 has no associated constraints.
if we consider A as std::integral<T> and B as true; then:
A ∧ B which is std::integral<T> && true subsumes A, which is std::integral<T>,
meaning that for the following declarations:
// Denote D1
template <typename T>
void foo(T) requires std::integral<T> && true;
// Denote D2
template <typename T>
void foo(T) requires std::integral<T>;
the associated constraints of D1 subsume those of D2, and thus D1 is at least as constrained as D2. Meanwhile the reverse does not hold, and D2 is not at least as constrained as D1. This means, as per [temp.constr.order]/4
A declaration D1 is more constrained than another declaration D2 when D1 is at least as constrained as D2, and D2 is not at least as constrained as D1.
that the declaration D1 is more constrained than declaration D2, and D1 is subsequently chosen as the best match by overload resolution, as per [temp.func.order]/2:
Partial ordering selects which of two function templates is more specialized than the other by transforming each template in turn (see next paragraph) and performing template argument deduction using the function type. The deduction process determines whether one of the templates is more specialized than the other. If so, the more specialized template is the one chosen by the partial ordering process. If both deductions succeed, the partial ordering selects the more constrained template (if one exists) as determined below.

The constraint std::integral<T> && true subsumes std::integral<T> and therefore "wins" according to the partial ordering of constraints rules.
To check if constraint A subsumes B ([temp.constr.order]):
1. Both constraints are brought to a disjunctive normal form. This means all || are "expanded" to their independent form.
2. Then each disjunctive clause is split into atomic clauses (smallest &&-parts).
3. The meaning of atomic clauses themselves isn't compared, they are only compared for equality.
If constraint A contains all the atomic clauses of B and some more, then A subsumes B.
See Example 1:
[Example 1: Let A and B be atomic constraints. The constraint A ∧ B subsumes A, but A does not subsume A ∧ B. The constraint A subsumes A ∨ B, but A ∨ B does not subsume A. Also note that every constraint subsumes itself. — end example]
So it doesn't matter that the additional clause is a no-op, it's there in A but not in B, and that's all there is to it really.

Intuitively speaking, true is redundant in a conjunction
Indeed! In the interest of adopting standard-speak, let's say your specializations are functionally equivalent.
Why is the second function template a better overload?
Surprise! It isn't.
The standard explicitly references your && true constraint in an example of a program that is ill-formed, no diagnostic required.
[13.5.2.3]
a program is ill-formed, no diagnostic required, when the meaning of the program depends on whether two constructs are equivalent, and they are functionally equivalent but not equivalent.
[Example 2:
template <unsigned N> void f2()
requires Add1<2 * N>;
template <unsigned N> int f2()
requires Add1<N * 2> && true;
void h2() {
f2<0>(); // ill-formed, no diagnostic required:
// requires determination of subsumption between atomic constraints that are
// functionally equivalent but not equivalent
}
— end example]

Second specialization is picked because C++ uses partial ordering algorithm for picking the generic function specialization.
Partial ordering selects which of two function templates is more specialized than the other by transforming
each template in turn (see next paragraph) and performing template argument deduction using the function
type. The deduction process determines whether one of the templates is more specialized than the other.
If so, the more specialized template is the one chosen by the partial ordering process. If both deductions
succeed, the partial ordering selects the more constrained template (if one exists) as determined below.
13.5.4 Partial ordering by constraints
13.7.6.2 Partial ordering of function templates
of C++ 20 final working draft which can be found here:
http://open-std.org/jtc1/sc22/wg21/docs/papers/2020/n4861.pdf

When is a requires clause expression required to be parenthesised? (pun by accident)

This gives an error:
template <class T, T A, T B>
requires A > B // <-- error
class X{};
error: parentheses are required around this expression in a requires
clause
requires A < B
~~^~~
( )
Almost all operators give this error (requires A > B, requires A == B, requires A & B, requires !A)
However && and || seem to work:
template <class T, T A, T B>
requires A && B // <-- ok
class X{};
Testes with gcc trunk and clang trunk (on May 2020) on godbolt. Both compilers give the same results.

Requires clauses use a special expression grammar to avoid certain pathological parsing ambiguities. As noted on cppreference, the allowed forms are
a primary expression
a sequence of primary expressions joined with the operator &&
a sequence of aforementioned expressions joined with the operator ||

Yes, && and || are treated special here because constraints are aware of conjunctions and disjunctions.
§ 13.5.1 Constraints [temp.constr.constr]
A constraint is a sequence of logical operations and operands that specifies requirements on template arguments. The operands of a
logical operation are constraints. There are three different kinds of
constraints:
(1.1) — conjunctions (13.5.1.1), (1.2) — disjunctions
(13.5.1.1), and (1.3) — atomic constraints (13.5.1.2).
They need to be in order to define a partial ordering by constraints.
13.5.4 Partial ordering by constraints [temp.constr.order]
[Note: [...] This partial ordering is used to determine
(2.1) the best viable candidate of non-template functions (12.4.3),
(2.2) the address of a non-template function (12.5),
(2.3) the matching of template template arguments (13.4.3),
(2.4) the partial ordering of class template specializations (13.7.5.2), and
(2.5) the partial ordering of function
templates (13.7.6.2).
— end note]
Which makes this code work:
template <class It>
concept bidirectional_iterator = requires /*...*/;
template <class It>
concept randomaccess_iterator = bidirectional_iterator<It> && requires /*...*/;
template <bidirectional_iterator It>
void sort(It begin, It end); // #1
template <randomaccess_iterator It>
void sort(It begin, It end); // #2
std::list<int> l{};
sort(l.begin(), l.end()); // #A -> calls #1
std::vector<int> v{};
sort(v.begin(), v.end()); // #B -> calls #2
But for call #B even if both sorts are viable as both constraints (randomaccess_iterator and bidirectional_iterator are satisfied) the sort #2 (the one with the randomaccess_iterator) is correctly chosen because it is more constrained than sort #1 (the one with bidirectional_iterator) because randomaccess_iterator subsumes bidirectional_iterator:
See How is the best constrained function template selected with concepts?

Are the template partial ordering rules underspecified? [duplicate]

Consider the following simple (to the extent that template questions ever are) example:
#include <iostream>
template <typename T>
struct identity;
template <>
struct identity<int> {
using type = int;
};
template<typename T> void bar(T, T ) { std::cout << "a\n"; }
template<typename T> void bar(T, typename identity<T>::type) { std::cout << "b\n"; }
int main ()
{
bar(0, 0);
}
Both clang and gcc print "a" there. According to the rules in [temp.deduct.partial] and [temp.func.order], to determine partial ordering, we need to synthesize some unique types. So we have two attempts at deduction:
+---+-------------------------------+-------------------------------------------+
| | Parameters | Arguments |
+---+-------------------------------+-------------------------------------------+
| a | T, typename identity<T>::type | UniqueA, UniqueA |
| b | T, T | UniqueB, typename identity<UniqueB>::type |
+---+-------------------------------+-------------------------------------------+
For deduction on "b", according to Richard Corden's answer, the expression typename identity<UniqueB>::type is treated as a type and is not evaluated. That is, this will be synthesized as if it were:
+---+-------------------------------+--------------------+
| | Parameters | Arguments |
+---+-------------------------------+--------------------+
| a | T, typename identity<T>::type | UniqueA, UniqueA |
| b | T, T | UniqueB, UniqueB_2 |
+---+-------------------------------+--------------------+
It's clear that deduction on "b" fails. Those are two different types so you cannot deduce T to both of them.
However, it seems to me that the deduction on A should fail. For the first argument, you'd match T == UniqueA. The second argument is a non-deduced context - so wouldn't that deduction succeed iff UniqueA were convertible to identity<UniqueA>::type? The latter is a substitution failure, so I don't see how this deduction could succeed either.
How and why do gcc and clang prefer the "a" overload in this scenario?

As discussed in the comments, I believe there are several aspects of the function template partial ordering algorithm that are unclear or not specified at all in the standard, and this shows in your example.
To make things even more interesting, MSVC (I tested 12 and 14) rejects the call as ambiguous. I don't think there's anything in the standard to conclusively prove which compiler is right, but I think I might have a clue about where the difference comes from; there's a note about that below.
Your question (and this one) challenged me to do some more investigation into how things work. I decided to write this answer not because I consider it authoritative, but rather to organize the information I have found in one place (it wouldn't fit in comments). I hope it will be useful.
First, the proposed resolution for issue 1391. We discussed it extensively in comments and chat. I think that, while it does provide some clarification, it also introduces some issues. It changes [14.8.2.4p4] to (new text in bold):
Each type nominated above from the parameter template and the
corresponding type from the argument template are used as the types of
P and A. If a particular P contains no template-parameters that
participate in template argument deduction, that P is not used to
determine the ordering.
Not a good idea in my opinion, for several reasons:
If P is non-dependent, it doesn't contain any template parameters at all, so it doesn't contain any that participate in argument deduction either, which would make the bold statement apply to it. However, that would make template<class T> f(T, int) and template<class T, class U> f(T, U) unordered, which doesn't make sense. This is arguably a matter of interpretation of the wording, but it could cause confusion.
It messes with the notion of used to determine the ordering, which affects [14.8.2.4p11]. This makes template<class T> void f(T) and template<class T> void f(typename A<T>::a) unordered (deduction succeeds from first to second, because T is not used in a type used for partial ordering according to the new rule, so it can remain without a value). Currently, all compilers I've tested report the second as more specialized.
It would make #2 more specialized than #1 in the following example:
#include <iostream>
template<class T> struct A { using a = T; };
struct D { };
template<class T> struct B { B() = default; B(D) { } };
template<class T> struct C { C() = default; C(D) { } };
template<class T> void f(T, B<T>) { std::cout << "#1\n"; } // #1
template<class T> void f(T, C<typename A<T>::a>) { std::cout << "#2\n"; } // #2
int main()
{
f<int>(1, D());
}
(#2's second parameter is not used for partial ordering, so deduction succeeds from #1 to #2 but not the other way around). Currently, the call is ambiguous, and should arguably remain so.
After looking at Clang's implementation of the partial ordering algorithm, here's how I think the standard text could be changed to reflect what actually happens.
Leave [p4] as it is and add the following between [p8] and [p9]:
For a P / A pair:
If P is non-dependent, deduction is considered successful if and only if P and A are the same type.
Substitution of deduced template parameters into the non-deduced contexts appearing in P is not performed and does not affect the outcome of the deduction process.
If template argument values are successfully deduced for all template parameters of P except the ones that appear only in non-deduced contexts, then deduction is considered successful (even if some parameters used in P remain without a value at the end of the deduction process for that particular P / A pair).
Notes:
About the second bullet point: [14.8.2.5p1] talks about finding template argument values that will make P, after substitution of the deduced values (call it the deduced A), compatible with A. This could cause confusion about what actually happens during partial ordering; there's no substitution going on.
MSVC doesn't seem to implement the third bullet point in some cases. See the next section for details.
The second and third bullet points are intented to also cover cases where P has forms like A<T, typename U::b>, which aren't covered by the wording in issue 1391.
Change the current [p10] to:
Function template F is at least as specialized as function template
G if and only if:
for each pair of types used to determine the ordering, the type from F is at least as specialized as the type from G, and,
when performing deduction using the transformed F as the argument template and G as the parameter template, after deduction is done
for all pairs of types, all template parameters used in the types from
G that are used to determine the ordering have values, and those
values are consistent across all pairs of types.
F is more specialized than G if F is at least as specialized
as G and G is not at least as specialized as F.
Make the entire current [p11] a note.
(The note added by the resolution of 1391 to [14.8.2.5p4] needs to be adjusted as well - it's fine for [14.8.2.1], but not for [14.8.2.4].)
For MSVC, in some cases, it looks like all template parameters in P need to receive values during deduction for that specific P / A pair in order for deduction to succeed from A to P. I think this could be what causes implementation divergence in your example and others, but I've seen at least one case where the above doesn't seem to apply, so I'm not sure what to believe.
Another example where the statement above does seem to apply: changing template<typename T> void bar(T, T) to template<typename T, typename U> void bar(T, U) in your example swaps results around: the call is ambiguous in Clang and GCC, but resolves to b in MSVC.
One example where it doesn't:
#include <iostream>
template<class T> struct A { using a = T; };
template<class, class> struct B { };
template<class T, class U> void f(B<U, T>) { std::cout << "#1\n"; }
template<class T, class U> void f(B<U, typename A<T>::a>) { std::cout << "#2\n"; }
int main()
{
f<int>(B<int, int>());
}
This selects #2 in Clang and GCC, as expected, but MSVC rejects the call as ambiguous; no idea why.
The partial ordering algorithm as described in the standard speaks of synthesizing a unique type, value, or class template in order to generate the arguments. Clang manages that by... not synthesizing anything. It just uses the original forms of the dependent types (as declared) and matches them both ways. This makes sense, as substituting the synthesized types doesn't add any new information. It can't change the forms of the A types, since there's generally no way to tell what concrete types the substituted forms could resolve to. The synthesized types are unknown, which makes them pretty similar to template parameters.
When encountering a P that is a non-deduced context, Clang's template argument deduction algorithm simply skips it, by returning "success" for that particular step. This happens not only during partial ordering, but for all types of deductions, and not just at the top level in a function parameter list, but recursively whenever a non-deduced context is encountered in the form of a compound type. For some reason, I found that surprising the first time I saw it. Thinking about it, it does, of course, make sense, and is according to the standard ([...] does not participate in type deduction [...] in [14.8.2.5p4]).
This is consistent with Richard Corden's comments to his answer, but I had to actually see the compiler code to understand all the implications (not a fault of his answer, but rather of my own - programmer thinking in code and all that).
I've included some more information about Clang's implementation in this answer.

I believe the key is with the following statement:
The second argument is a non-deduced context - so wouldn't that deduction succeed iff UniqueA were convertible to identity::type?
Type deduction does not perform checking of "conversions". Those checks take place using the real explicit and deduced arguments as part of overload resolution.
This is my summary of the steps that are taken to select the function template to call (all references taken from N3937, ~ C++ '14):
Explicit arguments are replaced and the resulting function type checked that it is valid. (14.8.2/2)
Type deduction is performed and the resulting deduced arguments are replaced. Again the resulting type must be valid. (14.8.2/5)
The function templates that succeeded in Steps 1 and 2 are specialized and included in the overload set for overload resolution. (14.8.3/1)
Conversion sequences are compared by overload resolution. (13.3.3)
If the conversion sequences of two function specializations are not 'better' the partial ordering algorithm is used to find the more specialized function template. (13.3.3)
The partial ordering algorithm checks only that type deduction succeeds. (14.5.6.2/2)
The compiler already knows by step 4 that both specializations can be called when the real arguments are used. Steps 5 and 6 are being used to determine which of the functions is more specialized.

Template partial ordering - why does partial deduction succeed here

Consider the following simple (to the extent that template questions ever are) example:
#include <iostream>
template <typename T>
struct identity;
template <>
struct identity<int> {
using type = int;
};
template<typename T> void bar(T, T ) { std::cout << "a\n"; }
template<typename T> void bar(T, typename identity<T>::type) { std::cout << "b\n"; }
int main ()
{
bar(0, 0);
}
Both clang and gcc print "a" there. According to the rules in [temp.deduct.partial] and [temp.func.order], to determine partial ordering, we need to synthesize some unique types. So we have two attempts at deduction:
+---+-------------------------------+-------------------------------------------+
| | Parameters | Arguments |
+---+-------------------------------+-------------------------------------------+
| a | T, typename identity<T>::type | UniqueA, UniqueA |
| b | T, T | UniqueB, typename identity<UniqueB>::type |
+---+-------------------------------+-------------------------------------------+
For deduction on "b", according to Richard Corden's answer, the expression typename identity<UniqueB>::type is treated as a type and is not evaluated. That is, this will be synthesized as if it were:
+---+-------------------------------+--------------------+
| | Parameters | Arguments |
+---+-------------------------------+--------------------+
| a | T, typename identity<T>::type | UniqueA, UniqueA |
| b | T, T | UniqueB, UniqueB_2 |
+---+-------------------------------+--------------------+
It's clear that deduction on "b" fails. Those are two different types so you cannot deduce T to both of them.
However, it seems to me that the deduction on A should fail. For the first argument, you'd match T == UniqueA. The second argument is a non-deduced context - so wouldn't that deduction succeed iff UniqueA were convertible to identity<UniqueA>::type? The latter is a substitution failure, so I don't see how this deduction could succeed either.
How and why do gcc and clang prefer the "a" overload in this scenario?

As discussed in the comments, I believe there are several aspects of the function template partial ordering algorithm that are unclear or not specified at all in the standard, and this shows in your example.
To make things even more interesting, MSVC (I tested 12 and 14) rejects the call as ambiguous. I don't think there's anything in the standard to conclusively prove which compiler is right, but I think I might have a clue about where the difference comes from; there's a note about that below.
Your question (and this one) challenged me to do some more investigation into how things work. I decided to write this answer not because I consider it authoritative, but rather to organize the information I have found in one place (it wouldn't fit in comments). I hope it will be useful.
First, the proposed resolution for issue 1391. We discussed it extensively in comments and chat. I think that, while it does provide some clarification, it also introduces some issues. It changes [14.8.2.4p4] to (new text in bold):
Each type nominated above from the parameter template and the
corresponding type from the argument template are used as the types of
P and A. If a particular P contains no template-parameters that
participate in template argument deduction, that P is not used to
determine the ordering.
Not a good idea in my opinion, for several reasons:
If P is non-dependent, it doesn't contain any template parameters at all, so it doesn't contain any that participate in argument deduction either, which would make the bold statement apply to it. However, that would make template<class T> f(T, int) and template<class T, class U> f(T, U) unordered, which doesn't make sense. This is arguably a matter of interpretation of the wording, but it could cause confusion.
It messes with the notion of used to determine the ordering, which affects [14.8.2.4p11]. This makes template<class T> void f(T) and template<class T> void f(typename A<T>::a) unordered (deduction succeeds from first to second, because T is not used in a type used for partial ordering according to the new rule, so it can remain without a value). Currently, all compilers I've tested report the second as more specialized.
It would make #2 more specialized than #1 in the following example:
#include <iostream>
template<class T> struct A { using a = T; };
struct D { };
template<class T> struct B { B() = default; B(D) { } };
template<class T> struct C { C() = default; C(D) { } };
template<class T> void f(T, B<T>) { std::cout << "#1\n"; } // #1
template<class T> void f(T, C<typename A<T>::a>) { std::cout << "#2\n"; } // #2
int main()
{
f<int>(1, D());
}
(#2's second parameter is not used for partial ordering, so deduction succeeds from #1 to #2 but not the other way around). Currently, the call is ambiguous, and should arguably remain so.
After looking at Clang's implementation of the partial ordering algorithm, here's how I think the standard text could be changed to reflect what actually happens.
Leave [p4] as it is and add the following between [p8] and [p9]:
For a P / A pair:
If P is non-dependent, deduction is considered successful if and only if P and A are the same type.
Substitution of deduced template parameters into the non-deduced contexts appearing in P is not performed and does not affect the outcome of the deduction process.
If template argument values are successfully deduced for all template parameters of P except the ones that appear only in non-deduced contexts, then deduction is considered successful (even if some parameters used in P remain without a value at the end of the deduction process for that particular P / A pair).
Notes:
About the second bullet point: [14.8.2.5p1] talks about finding template argument values that will make P, after substitution of the deduced values (call it the deduced A), compatible with A. This could cause confusion about what actually happens during partial ordering; there's no substitution going on.
MSVC doesn't seem to implement the third bullet point in some cases. See the next section for details.
The second and third bullet points are intented to also cover cases where P has forms like A<T, typename U::b>, which aren't covered by the wording in issue 1391.
Change the current [p10] to:
Function template F is at least as specialized as function template
G if and only if:
for each pair of types used to determine the ordering, the type from F is at least as specialized as the type from G, and,
when performing deduction using the transformed F as the argument template and G as the parameter template, after deduction is done
for all pairs of types, all template parameters used in the types from
G that are used to determine the ordering have values, and those
values are consistent across all pairs of types.
F is more specialized than G if F is at least as specialized
as G and G is not at least as specialized as F.
Make the entire current [p11] a note.
(The note added by the resolution of 1391 to [14.8.2.5p4] needs to be adjusted as well - it's fine for [14.8.2.1], but not for [14.8.2.4].)
For MSVC, in some cases, it looks like all template parameters in P need to receive values during deduction for that specific P / A pair in order for deduction to succeed from A to P. I think this could be what causes implementation divergence in your example and others, but I've seen at least one case where the above doesn't seem to apply, so I'm not sure what to believe.
Another example where the statement above does seem to apply: changing template<typename T> void bar(T, T) to template<typename T, typename U> void bar(T, U) in your example swaps results around: the call is ambiguous in Clang and GCC, but resolves to b in MSVC.
One example where it doesn't:
#include <iostream>
template<class T> struct A { using a = T; };
template<class, class> struct B { };
template<class T, class U> void f(B<U, T>) { std::cout << "#1\n"; }
template<class T, class U> void f(B<U, typename A<T>::a>) { std::cout << "#2\n"; }
int main()
{
f<int>(B<int, int>());
}
This selects #2 in Clang and GCC, as expected, but MSVC rejects the call as ambiguous; no idea why.
The partial ordering algorithm as described in the standard speaks of synthesizing a unique type, value, or class template in order to generate the arguments. Clang manages that by... not synthesizing anything. It just uses the original forms of the dependent types (as declared) and matches them both ways. This makes sense, as substituting the synthesized types doesn't add any new information. It can't change the forms of the A types, since there's generally no way to tell what concrete types the substituted forms could resolve to. The synthesized types are unknown, which makes them pretty similar to template parameters.
When encountering a P that is a non-deduced context, Clang's template argument deduction algorithm simply skips it, by returning "success" for that particular step. This happens not only during partial ordering, but for all types of deductions, and not just at the top level in a function parameter list, but recursively whenever a non-deduced context is encountered in the form of a compound type. For some reason, I found that surprising the first time I saw it. Thinking about it, it does, of course, make sense, and is according to the standard ([...] does not participate in type deduction [...] in [14.8.2.5p4]).
This is consistent with Richard Corden's comments to his answer, but I had to actually see the compiler code to understand all the implications (not a fault of his answer, but rather of my own - programmer thinking in code and all that).
I've included some more information about Clang's implementation in this answer.

I believe the key is with the following statement:
The second argument is a non-deduced context - so wouldn't that deduction succeed iff UniqueA were convertible to identity::type?
Type deduction does not perform checking of "conversions". Those checks take place using the real explicit and deduced arguments as part of overload resolution.
This is my summary of the steps that are taken to select the function template to call (all references taken from N3937, ~ C++ '14):
Explicit arguments are replaced and the resulting function type checked that it is valid. (14.8.2/2)
Type deduction is performed and the resulting deduced arguments are replaced. Again the resulting type must be valid. (14.8.2/5)
The function templates that succeeded in Steps 1 and 2 are specialized and included in the overload set for overload resolution. (14.8.3/1)
Conversion sequences are compared by overload resolution. (13.3.3)
If the conversion sequences of two function specializations are not 'better' the partial ordering algorithm is used to find the more specialized function template. (13.3.3)
The partial ordering algorithm checks only that type deduction succeeds. (14.5.6.2/2)
The compiler already knows by step 4 that both specializations can be called when the real arguments are used. Steps 5 and 6 are being used to determine which of the functions is more specialized.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js