gcc4 template bug or more likely id10t error - c++

The following code compiles just fine under Visual Studio but neither gcc 4.6.2 or 4.7 can handle it. It seems to be valid but gcc can't seem to resolve the difference between const and non const parameters. Could this be a compiler bug?
struct CReadType{};
struct CWriteType{};
template<typename ReadWriteType, typename T>
struct AddPkgrConstByType {};
template<typename T>
struct AddPkgrConstByType<CReadType, T> {
typedef T type;
};
template<typename T>
struct AddPkgrConstByType<CReadType, const T> {
typedef T type;
};
template<typename T>
struct AddPkgrConstByType<CWriteType, T> {
typedef T const type;
};
template<typename Packager, typename T>
struct AddPkgrConst : public AddPkgrConstByType<typename Packager::CReadWriteType, T> {
};
template<typename Packager, typename T>
inline bool Package( Packager* ppkgr, T* pt )
{
return true;
}
template<typename Packager>
inline bool Package( Packager* ppkgr, typename AddPkgrConst<Packager,bool>::type* pb)
{
return false;
}
struct ReadPackager {
typedef CReadType CReadWriteType;
};
struct WritePackager {
typedef CWriteType CReadWriteType;
};
int main(int argc, char* argv[])
{
ReadPackager rp;
WritePackager wp;
bool b = true;
const bool cb = false;
Package( &rp, &b );
}
Compiler call:
g++ -fPIC -O -std=c++0x -Wno-deprecated -D_REENTRANT
g++-D__STDC_LIMIT_MACROS -c test.cpp
test.cpp: In function ‘int main(int, char**)’:
test.cpp:58:22: error: call of overloaded ‘Package(ReadPackager*, bool*)’ is ambiguous
test.cpp:58:22: note: candidates are:
test.cpp:31:6: note: bool Package(Packager*, T*) [with Packager = ReadPackager, T = bool]
test.cpp:38:6: note: bool Package(Packager*, typename AddPkgrConst<Packager, bool>::type*) [with Packager = ReadPackager, typename AddPkgrConst<Packager, bool>::type = bool]

This looks like a compiler error to me. The issues involved here are overload resolution and partial ordering of template functions. Since both template functions can match the argument list (ReadPackager*, bool), partial ordering of template functions should be used to choose the more specialized template function.
Put simply, a template function is at least as specialized as another if the arguments to that function can always be used as arguments to the other.
It's clear that any two pointer arguments match the first Package() function, but for instance Package(ReadPackager*, const int*) can not match the second. This seems to imply that the second Package function is more specialized and ought to resolve any ambiguity.
However, since there is disagreement among the compilers, there may be some subtleties involved that are overlooked by the simplified explanation. I will therefore follow the procedure for determining function template partial ordering from the standard to discern the correct behavior.
First, labeling the functions as P1 and P2 for easy reference.
P1:
template<typename Packager, typename T>
bool Package( Packager* ppkgr, T* pt );
P2:
template<typename Packager>
bool Package( Packager* ppkgr, typename AddPkgrConst<Packager,bool>::type* pb);
The standard says that for each template function (T1), we must generic unique type for each of its template parameters, use those types to determine the function call parameter types, and then use those types to deduce the types in the other template (T2). If this succeeds, the first template (T1) is at least as specialized as the second (T2).
First P2->P1
Synthesize unique type U for template parameter Packager of P2.
Perform type deduction against P1's parameter list. Packager is deduced to be U and T is deduced to be AddPkgrConst<Packager,U>::type.
This succeeds and P1 is judged to be no more specialized than P2.
Now P1->P2:
Synthesize unique types U1 and U2 for template parameters Packager and T of P1 to get the parameter list (U1*, U2*).
Perform type deduction against P2's parameter list. Packager is deduced to be U1.
No deduction is performed for the second parameter because, being a dependent type, it is considered a non-deduced context.
The second argument is therefore AddPkgrConst<U1,bool>::type which evaluates to bool. This does not match the second parameter U2.
This procedure fails if we proceed to step 4. However, my suspicion is that the compilers that reject this code don't perform step 4 and therefore consider P2 no more specialized than P1 merely because type deduction succeeded. This seems counter intuitive since P1 clearly accepts any input that P2 does and not vice versa. This part of the standard is somewhat convoluted, so it's not clear whether this final comparison is required to be made.
Let's try to address this question by applying §14.8.2.5, paragraph 1, Deducing template arguments from a type
Template arguments can be deduced in several different contexts, but in each case a type that is specified in terms of template parameters (call it P) is compared with an actual type (call it A), and an attempt is made to find template argument values (a type for a type parameter, a value for a non-type parameter, or a template for a template parameter) that will make P, after substitution of the deduced values (call it the deduced A), compatible with A.
In our type deduction, the deduced A is AddPkgrConst<U1,bool>::type=bool. This is not compatible with the original A, which is the unique type U2. This seems to support the position that the partial ordering resolves the ambiguity.

I don't know what's wrong with Visual Studio, but what gcc says seems right:
You instantiate AddPkgrConstByType<CReadType, T> because Packager::CReadWriteType resolves to CReadType. Therefore, AddPkgrConst<Packager,bool>::type will resolve according to the first implementation (which is not a specialisation) to bool. This means you have two separate function specialisations with the same parameter list, which C++ doesn't allow you.

Since function templates can't be specialized, what you have here is two function template overloads. Both of these overloads are capable of accepting a bool* as their second argument so they appear to properly be detected as ambiguous by g++.
However classes can be partially specialized so that only one version will be picked, and you can use wrapper magic to attain your desired goal. I'm only pasting the code I added.
template <typename Packager, typename T>
struct Wrapper
{
static bool Package()
{
return true;
}
};
template <typename Packager>
struct Wrapper<Packager, typename AddPkgrConst<Packager,bool>::type>
{
static bool Package()
{
return false;
}
};
template <typename Packager, typename T>
Wrapper<Packager, T> make_wrapper(Packager* /*p*/, T* /*t*/)
{
return Wrapper<Packager, T>();
}
int main()
{
ReadPackager rp;
bool b = true;
std::cout << make_wrapper(&rp, &b).Package() << std::endl; // Prints out 0.
}

Related

Why sometimes calling template function needs angle bracket and full type specification and sometimes not? [duplicate]

// Function declaration.
template <typename T1,
typename T2,
typename RT> RT max (T1 a, T2 b);
// Function call.
max <int,double,double> (4,4.2)
// Function call.
max <int> (4,4.2)
One case may be when you need to specify the return type.
Is there any other situation which requires the argument types to be specified manually?
(1) When there is no argument to the function and still it's a template type, then you may have to specify the arguments explicitly
template<typename T>
void foo ()
{}
Usage:
foo<int>();
foo<A>();
(2) You want to distinguish between value and reference.
template<typename T>
void foo(T obj)
{}
Usage:
int i = 2;
foo(i); // pass by value
foo<int&>(i); // pass by reference
(3) Need another type to be deduced instead of the natural type.
template<typename T>
void foo(T& obj)
{}
Usage:
foo<double>(d); // otherwise it would have been foo<int>
foo<Base&>(o); // otherwise it would have been foo<Derived&>
(4) Two different argument types are provided for a single template parameter
template<typename T>
void foo(T obj1, T obj2)
{}
Usage:
foo<double>(d,i); // Deduction finds both double and int for T
If the function template parameter appears in the function parameter list, then you don't need to specify the template parameters. For example,
template<typename T>
void f(const T &t) {}
Here T is a template parameter, and it appears in the function parameter list, i.e const T &t. So you don't need to specify the template parameter when calling this function:
f(10); //ok
Since the type of 10 is int, so the compiler can deduce the template parameter T from it, and T becomes int.
Note that since the type deduction is done from using the information of the function arguments, its called template argument deduction. Now read on.
If the template parameter doesn't appear in the function parameter list, then you have to provide the template parameter. Example:
template<typename T>
void g(const int &i) {}
Notice g() is different from f(). Now T doesn't appear in the function parameter list. So:
g(10); //error
g<double>(10); //ok
Note that if a function template templatizes on the return type as well, and the return type is different from the types appearing the function parameter list, then you've to provide the return type:
template<typename T>
T h(const T &t) {}
Since return type T is same as the function parameter, type deduction is possible from function argument:
h(10); //ok - too obvious now
But if you've this:
template<typename R, typename T>
R m(const T &t) {}
Then,
m(10); //error - only T can be deduced, not R
m<int>(10); //ok
Note that even though the function template m has templatized on two types : R and T, we've provided only ONE type when calling it. That is, we've written m<int>(10) as opposed to m<int,int>(10). There is no harm in writing the later, but its okay, if you don't. But sometimes you've to specif both, even if one type T can be deduced. It is when the order of type parameters is different as shown below:
template<typename T, typename R> //note the order : its swapped now!
R n(const T &t) {}
Now, you've to provide both types:
n(10); //error - R cannot be deduced!
n<int>(10); //error - R still cannot be deduced, since its the second argument!
n<int,int>(10); //ok
The new thing here is : the order of type parameters is also important.
Anyway, this covers only the elementary concept. Now I would suggest you to read some good book on templates, to learn all the advanced things regarding type deduction.
In general, you need to explicitly specify the types when the compiler can't figure it out on its own. As you mentioned, this often happens when the return type is templatized, since the return type cannot be inferred from the function call.
Template classes have the same problem -- instantiating a std::vector offers no way for the compiler to determine what type your vector is storing, so you need to specify std::vector<int> and so forth.
The type resolution is only performed in the case of function arguments, so it may be easier to view that as the special case; ordinarily, the compiler is unable to guess what type(s) to use.
The simple answer is that you need to provide the types when the compiler cannot deduce the types by itself, or when you want the template to be instantiated with a particular type that is different from what the compiler will deduce.
There are different circumstances when the compiler cannot deduce a type. Because type deduction is only applied to the arguments (as is the case with overload resolution) if the return type does not appear as an argument that is deducible, then you will have to specify it. But there are other circumstances when type deduction will not work:
template <typename R> R f(); // Return type is never deduced by itself
template <typename T>
T min( T const & lhs, T const & rhs );
min( 1, 2 ); // Return type is deducible from arguments
min( 1.0, 2 ); // T is not deducible (no perfect match)
min<double>( 1.0, 2 ); // Now it is ok: forced to be double
min<double>( 1, 2 ); // Compiler will deduce int, but we want double
template <typename T>
void print_ptr( T* p );
print_ptr<void>( 0 ); // 0 is a valid T* for any T, select manually one
template <typename T>
T min( T lhs, T rhs );
int a = 5, b = 7;
min<int&>(a,b)++; // Type deduction will drop & by default and call
// min<int>(a,b), force the type to be a reference
template <typename C>
typename C::value_type
min_value( typename C::const_iterator begin, typename C::const_iterator end );
std::vector<int> v;
min_value<std::vector<int> >( v.begin(), v.end() );
// Argument type is not deducible, there are
// potentially infinite C that match the constraints
// and the compiler would be forced to instantiate
// all
There are probably more reasons for an argument type cannot be deduced, you can take a look at §14.8.2.1 in the standard for the specifics of deduction of arguments from a function call.

Why are variadic templates different than non-variadic, for only one argument?

This code compiles just fine:
template <typename T1>
struct Struct {
};
struct ConvertsToStruct {
operator Struct<int>() const;
};
template <typename T>
void NonVariadicFunc(Struct<T>);
int main() {
NonVariadicFunc<int>(ConvertsToStruct{});
return 0;
}
But an attempt to make it a little more generic, by using variadic templates, fails to compile:
template <typename T1>
struct Struct {
};
struct ConvertsToStruct {
operator Struct<int>() const;
};
template <typename... T>
void VariadicFunc(Struct<T...>);
int main() {
VariadicFunc<int>(ConvertsToStruct{});
return 0;
}
What's going wrong? Why isn't my attempt to explicitly specify VariadicFunc's template type succeeding?
Godbolt link => https://godbolt.org/g/kq9d7L
There are 2 reasons to explain why this code can't compile.
The first is, the template parameter of a template function can be partially specified:
template<class U, class V> void foo(V v) {}
int main() {
foo<double>(12);
}
This code works, because you specify the first template parameter U and let the compiler determine the second parameter. For the same reason, your VariadicFunc<int>(ConvertsToStruct{}); also requires template argument deduction. Here is a similar example, it compiles:
template<class... U> void bar(U... u) {}
int main() {
bar<int>(12.0, 13.4f);
}
Now we know compiler needs to do deduction for your code, then comes the second part: compiler processes different stages in a fixed order:
cppreference
Template argument deduction takes place after the function template name lookup (which may involve argument-dependent lookup) and before template argument substitution (which may involve SFINAE) and overload resolution.
Implicit conversion takes place at overload resolution, after template argument deduction. Thus in your case, the existence of a user-defined conversion operator has no effect when compiler is doing template argument deduction. Obviously ConvertsToStruct itself cannot match anything, thus deduction failed and the code can't compile.
The problem is that with
VariadicFunc<int>(ConvertsToStruct{});
you fix only the first template parameter in the list T....
And the compiler doesn't know how to deduce the remaining.
Even weirder, I can take the address of the function, and then it works
It's because with (&VariadicFunc<int>) you ask for the pointer of the function (without asking the compiler to deduce the types from the argument) so the <int> part fix all template parameters.
When you pass the ConvertToStruct{} part
(&VariadicFunc<int>)(ConvertToStruct{});
the compiler know that T... is int and look if can obtain a Struct<int> from a ConvertToStruct and find the apposite conversion operator.

SFINAE and the address of an overloaded function

I'm experimenting with resolving the address of an overloaded function (bar) in the context of another function's parameter (foo1/foo2).
struct Baz {};
int bar() { return 0; }
float bar(int) { return 0.0f; }
void bar(Baz *) {}
void foo1(void (&)(Baz *)) {}
template <class T, class D>
auto foo2(D *d) -> void_t<decltype(d(std::declval<T*>()))> {}
int main() {
foo1(bar); // Works
foo2<Baz>(bar); // Fails
}
There's no trouble with foo1, which specifies bar's type explicitly.
However, foo2, which disable itself via SFINAE for all but one version of bar, fails to compile with the following message :
main.cpp:19:5: fatal error: no matching function for call to 'foo2'
foo2<Baz>(bar); // Fails
^~~~~~~~~
main.cpp:15:6: note: candidate template ignored: couldn't infer template argument 'D'
auto foo2(D *d) -> void_t<decltype(d(std::declval<T*>()))> {}
^
1 error generated.
It is my understanding that C++ cannot resolve the overloaded function's address and perform template argument deduction at the same time.
Is that the cause ? Is there a way to make foo2<Baz>(bar); (or something similar) compile ?
As mentioned in the comments, [14.8.2.1/6] (working draft, deducing template arguments from a function call) rules in this case (emphasis mine):
When P is a function type, function pointer type, or pointer to member function type:
If the argument is an overload set containing one or more function templates, the parameter is treated as a non-deduced context.
If the argument is an overload set (not containing function templates), trial argument deduction is attempted using each of the members of the set. If deduction succeeds for only one of the overload set members, that member is used as the argument value for the deduction. If deduction succeeds for more than one member of the overload set the parameter is treated as a non-deduced context.
SFINAE takes its part to the game once the deduction is over, so it doesn't help to work around the standard's rules.
For further details, you can see the examples at the end of the bullet linked above.
About your last question:
Is there a way to make foo2<Baz>(bar); (or something similar) compile ?
Two possible alternatives:
If you don't want to modify the definition of foo2, you can invoke it as:
foo2<Baz>(static_cast<void(*)(Baz *)>(bar));
This way you explicitly pick a function out of the overload set.
If modifying foo2 is allowed, you can rewrite it as:
template <class T, class R>
auto foo2(R(*d)(T*)) {}
It's more or less what you had before, no decltype in this case and a return type you can freely ignore.
Actually you don't need to use any SFINAE'd function to do that, deduction is enough.
In this case foo2<Baz>(bar); is correctly resolved.
Some kind of the general answer is here: Expression SFINAE to overload on type of passed function pointer
For the practical case, there's no need to use type traits or decltype() - the good old overload resolution will select the most appropriate function for you and break it into 'arguments' and 'return type'. Just enumerate all possible calling conventions
// Common functions
template <class T, typename R> void foo2(R(*)(T*)) {}
// Different calling conventions
#ifdef _W64
template <class T, typename R> void foo2(R(__vectorcall *)(T*)) {}
#else
template <class T, typename R> void foo2(R(__stdcall *)(T*)) {}
#endif
// Lambdas
template <class T, class D>
auto foo2(const D &d) -> void_t<decltype(d(std::declval<T*>()))> {}
It could be useful to wrap them in a templated structure
template<typename... T>
struct Foo2 {
// Common functions
template <typename R> static void foo2(R(*)(T*...)) {}
...
};
Zoo2<Baz>::foo2(bar);
Although, it will require more code for member functions as they have modifiers (const, volatile, &&)

implicit instantiation of undefined template 'class'

When trying to offer functions for const and non-const template arguments in my library I came across a strange problem. The following source code is a minimal example phenomenon:
#include <iostream>
template<typename some_type>
struct some_meta_class;
template<>
struct some_meta_class<int>
{
typedef void type;
};
template<typename some_type>
struct return_type
{
typedef typename some_meta_class< some_type >::type test;
typedef void type;
};
template<typename type>
typename return_type<type>::type foo( type & in )
{
std::cout << "non-const" << std::endl;
}
template<typename type>
void foo( type const & in )
{
std::cout << "const" << std::endl;
}
int main()
{
int i;
int const & ciref = i;
foo(ciref);
}
I tried to implement a non-const version and a const version for foo but unfortunately this code won't compile on CLANG 3.0 and gcc 4.6.3.
main.cpp:18:22: error: implicit instantiation of undefined template
'some_meta_class'
So for some reason the compiler wants to use the non-const version of foo for a const int-reference. This obviously leads to the error above because there is no implementation for some_meta_class. The strange thing is, that if you do one of the following changes, the code compile well and works:
uncomment/remove the non-const version
uncomemnt/remove the typedef of return_type::test
This example is of course minimalistic and pure academic. In my library I came across this problem because the const and non-const version return different types. I managed this problem by using a helper class which is partially specialized.
But why does the example above result in such strange behaviour? Why doesn't the compiler want to use the non-const version where the const version is valid and and matches better?
The reason is the way function call resolution is performed, together with template argument deduction and substitution.
Firstly, name lookup is performed. This gives you two functions with a matching name foo().
Secondly, type deduction is performed: for each of the template functions with a matching name, the compiler tries to deduce the function template arguments which would yield a viable match. The error you get happens in this phase.
Thirdly, overload resolution enters the game. This is only after type deduction has been performed and the signatures of the viable functions for resolving the call have been determined, which makes sense: the compiler can meaningfully resolve your function call only after it has found out the exact signature of all the candidates.
The fact that you get an error related to the non-const overload is not because the compiler chooses it as a most viable candidate for resolving the call (that would be step 3), but because the compiler produces an error while instantiating its return type to determine its signature, during step 2.
It is not entirely obvious why this results in an error though, because one might expect that SFINAE applies (Substitution Failure Is Not An Error). To clarify this, we might consider a simpler example:
template<typename T> struct X { };
template<typename T> typename X<T>::type f(T&) { } // 1
template<typename T> void f(T const&) { } // 2
int main()
{
int const i = 0;
f(i); // Selects overload 2
}
In this example, SFINAE applies: during step 2, the compiler will deduce T for each of the two overloads above, and try to determine their signatures. In case of overload 1, this results in a substitution failure: X<const int> does not define any type (no typedef in X). However, due to SFINAE, the compiler simply discards it and finds that overload 2 is a viable match. Thus, it picks it.
Let's now change the example slightly in a way that mirrors your example:
template<typename T> struct X { };
template<typename Y>
struct R { typedef typename X<Y>::type type; };
// Notice the small change from X<T> into R<T>!
template<typename T> typename R<T>::type f(T&) { } // 1
template<typename T> void f(T const&) { } // 2
int main()
{
int const i = 0;
f(i); // ERROR! Cannot instantiate R<int const>
}
What has changed is that overload 1 no longer returns X<T>::type, but rather R<T>::type. This is in turn the same as X<T>::type because of the typedef declaration in R, so one might expect it to yield the same result. However, in this case you get a compilation error. Why?
The Standard has the answer (Paragraph 14.8.3/8):
If a substitution results in an invalid type or expression, type deduction fails. An invalid type or expression is one that would be ill-formed if written using the substituted arguments. [...] Only invalid types and expressions in the immediate context of the function type and its template parameter types can result in a deduction failure.
Clearly, the second example (as well as yours) generates an error in a nested context, so SFINAE does not apply. I believe this answers your question.
By the way, it is interesting to notice, that this has changed since C++03, which more generally stated (Paragraph 14.8.2/2):
[...] If a substitution in a template parameter or in the function type of the function template results in an invalid type, type deduction fails. [...]
If you are curious about the reasons why things have changed, this paper might give you an idea.

function resolution failed when return type is deduced from enclosed template class

I have been trying to implement a complex number class for fixed point types where the result type of the multiply operation will be a function of the input types. I need to have functions where I can do multiply complex by complex and also complex by real number.
This essentially is a simplified version of the code. Where A is my complex type.
template<typename T1, typename T2> struct rt {};
template<> struct rt<double, double> {
typedef double type;
};
//forward declaration
template<typename T> struct A;
template<typename T1, typename T2>
struct a_rt {
typedef A<typename rt<T1,T2>::type> type;
};
template <typename T>
struct A {
template<typename T2>
typename a_rt<T,T2>::type operator*(const T2& val) const {
typename a_rt<T,T2>::type ret;
cout << "T2& called" << endl;
return ret;
}
template<typename T2>
typename a_rt<T,T2>::type operator*(const A<T2>& val) const {
typename a_rt<T,T2>::type ret;
cout << "A<T2>& called" << endl;
return ret;
}
};
TEST(TmplClassFnOverload, Test) {
A<double> a;
A<double> b;
double c;
a * b;
a * c;
}
The code fails to compile because the compiler is trying to instantiate the a_rt template with double and A<double>. I don't know what is going on under the hood since I imagine the compiler should pick the more specialized operator*(A<double>&) so a_rt will only be instantiated with <double, double> as arguments.
Would you please explain to me why this would not work?
And if this is a limitation, how should I work around this.
Thanks a tonne!
unittest.cpp: In instantiation of 'a_rt<double, A<double> >':
unittest.cpp:198: instantiated from here
unittest.cpp:174: error: no type named 'type' in 'struct rt<double, A<double> >'
Update
The compiler appears to be happy with the following change. There is some subtlety I'm missing here. Appreciate someone who can walk me through what the compiler is doing in both cases.
template<typename T2>
A<typename rt<T,T2>::type> operator*(const T2& val) const {
A<typename rt<T,T2>::type> ret;
cout << "T2& called" << endl;
return ret;
}
template<typename T2>
A<typename rt<T,T2>::type> operator*(const A<T2>& val) const {
A<typename rt<T,T2>::type> ret;
cout << "A<T2>& called" << endl;
return ret;
}
Resolving function calls in C++ proceeds in five phases:
name lookup: this finds two versions of operator*
template argument deduction: this will be applied to all functions found in step 1)
overload resolution: the best match will be selected
access control: can the best match in fact be invoked (i.e. is it not a private member)
virtuality: if virtual function are involved, a lookup in the vtable might be required
First note that the return type is never ever being deduced. You simply cannot overload on return type. The template arguments to operator* are being deduced and then substituted into the return type template.
So what happens at the call a * b;? First, both versions of operator* have their arguments deduced. For the first overload, T2 is deduced to being A<double>, and for the second overload T2 resolves to double. If there multiple overloads, the Standard says:
14.7.1 Implicit instantiation [temp.inst] clause 9
If a function template or a member function template specialization is
used in a way that involves overload resolution, a declaration of the
specialization is implicitly instantiated (14.8.3).
So at the end of argument deduction when the set of candidate functions are being generated, (so before overload resolution) the template gets instantiated and you get an error because rt does not have a nested type. This is why the more specialized second template will not be selected: overload resolution does not take place. You might have expected that this substitution failure would not be an error. HOwever, the Standard says:
14.8.2 Template argument deduction [temp.deduct] clause 8
If a substitution results in an invalid type or expression, type
deduction fails. An invalid type or expression is one that would be
ill-formed if written using the substituted arguments.
Only invalid types and expressions in the immediate context of the
function type and its template parameter types can result in a
deduction failure. [ Note: The evaluation of the substituted types and
expressions can result in side effects such as the instantiation of
class template specializations and/or function template
specializations, the generation of implicitly-defined functions, etc.
Such side effects are not in the “immediate context” and can result in
the program being ill-formed. — end note ]
In your original code, the typename a_rt<T,T2>::type return type is not an immediate context. Only during template instantiation does it get evaluated, and then the lack of the nested type in rt is an erorr.
In your updated code A<typename rt<T,T2>::type> return type is an immediate context and the Substitution Failure is Not An Erorr (SFINAE) applies: the non-deduced function template is simply removed from the overload resolution set and the remaining one is being called.
With your updated code, output will be:
> A<T2>& called
> T2& called
Your forward declaration uses class:
template<typename T> class A;
But your definition uses struct:
template <typename T>
struct A {
Other than that, I can't see any problems...