Why does the C++ linker allow undefined functions? - c++

This C++ code, perhaps surprisingly, prints out 1.
#include <iostream>
std::string x();
int main() {
std::cout << "x: " << x << std::endl;
return 0;
}
x is a function prototype, which seems to be viewed as a function pointer, and C++ Standard section 4.12 Boolean conversions says:
4.12 Boolean conversions [conv.bool] 1 A prvalue of arithmetic, unscoped enumeration, pointer, or pointer to member type can be
converted to a prvalue of type bool. A zero value, null pointer value,
or null member pointer value is converted to false; any other value is
converted to true. For direct-initialization (8.5), a prvalue of type
std::nullptr_t can be converted to a prvalue of type bool; the
resulting value is false.
However, x is never bound to a function. As I would expect, the C linker doesn't allow this. However in C++ this isn't a problem at all. Can anyone explain this behavior?

What's happening here is that the function pointer is implicitly converted to bool. This is specified by [conv.bool]:
A zero value, null pointer value, or null member pointer value is converted to false;
any other value is converted to true
where "null pointer value" includes null function pointers. Since the function pointer obtained from decay of a function name cannot be null, this gives true. You can see this by including << std::boolalpha in the output command.
The following does cause a link error in g++: (int)x;
Regarding whether this behaviour is permitted or not, C++14 [basic.odr.ref]/3 says:
A function whose name appears as a potentially-evaluated expression is odr-used if it is
the unique lookup result or the selected member of a set of overloaded functions [...]
which does cover this case, since x in the output expression is looked up to the declaration of x above and that is the unique result. Then in /4 we have:
Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program; no diagnostic required.
so the program is ill-formed but no diagnostic is required, meaning that the program's behaviour is completely undefined.
Incidentally this clause implies that no link error is required for x(); either, however from a quality-of-implementation angle; that would be silly. The course that g++ has chosen here seems reasonable to me.

X doesn't need to be "bound" to a function, because you stated in your code that such function exists. So compiler can safely assume, that the address of this function must not be NULL. For that to be possible, you'd have to declare the function to be a weak symbol, and you didn't. Linker did not protest, because you never call your function (you never use its actual address), so it sees no problem.

[basic.def.odr]/2:
A function whose name appears
as a potentially-evaluated expression is odr-used if it is the unique lookup result or the selected member of a
set of overloaded functions (3.4, 13.3, 13.4), unless it is a pure virtual function and its name is not explicitly
qualified.
Hence, strictly speaking, the code odr-uses the function and therefore requires a definition.
But modern compilers will realize that the functions exact address is not actually relevant for the behavior of the program, and will thus elide the use and not require a definition.
Also note what [basic.def.odr]/3 specifies:
Every program shall contain exactly one definition of every non-inline
function or variable that is odr-used in that program; no diagnostic
required.
An implementation is not obliged to halt compilation and issue an error message (=diagnostic). It can do what it considers best. In other words, any action is allowed and we have UB.

Related

Use specific std::get overload in a function [duplicate]

Consider the following code:
#include <cctype>
#include <functional>
#include <iostream>
int main()
{
std::invoke(std::boolalpha, std::cout); // #1
using ctype_func = int(*)(int);
char c = std::invoke(static_cast<ctype_func>(std::tolower), 'A'); // #2
std::cout << c << "\n";
}
Here, the two calls to std::invoke are labeled for future reference.
The expected output is:
a
Is the expected output guaranteed in C++20?
(Note: there are two functions called tolower — one in <cctype> and the other in <locale>. The explicit cast is introduced to select the desired overload.)
Short answer
No.
Explanation
[namespace.std] says:
Let F denote a standard library function ([global.functions]), a standard library static member function, or an instantiation of a standard library function template.
Unless F is designated an addressable function, the behavior of a C++ program is unspecified (possibly ill-formed) if it explicitly or implicitly attempts to form a pointer to F.
[Note: Possible means of forming such pointers include application of the unary & operator ([expr.unary.op]), addressof ([specialized.addressof]), or a function-to-pointer standard conversion ([conv.func]).
— end note ]
Moreover, the behavior of a C++ program is unspecified (possibly ill-formed) if it attempts to form a reference to F or if it attempts to form a pointer-to-member designating either a standard library non-static member function ([member.functions]) or an instantiation of a standard library member function template.
With this in mind, let's check the two calls to std::invoke.
The first call
std::invoke(std::boolalpha, std::cout);
Here, we are attempting to form a pointer to std::boolalpha. Fortunately, [fmtflags.manip] saves the day:
Each function specified in this subclause is a designated addressable function ([namespace.std]).
And boolalpha is a function specified in this subclause.
Thus, this line is well-formed, and is equivalent to:
std::cout.setf(std::ios_base::boolalpha);
But why is that? Well, it is necessary for the following code:
std::cout << std::boolalpha;
The second call
std::cout << std::invoke(static_cast<ctype_func>(std::tolower), 'A') << "\n";
Unfortunately, [cctype.syn] says:
The contents and meaning of the header <cctype> are the same as the C standard library header <ctype.h>.
Nowhere is tolower explicitly designated an addressable function.
Therefore, the behavior of this C++ program is unspecified (possibly ill-formed), because it attempts to form a pointer to tolower, which is not designated an addressable function.
Conclusion
The expected output is not guaranteed.
In fact, the code is not even guaranteed to compile.
This also applies to member functions.
[namespace.std] doesn’t explicitly mention this, but it can be seen from [member.functions] that the behavior of a C++ program is unspecified (possibly ill-formed) if it attempts to take the address of a member function declared in the C++ standard library. Per [member.functions]/2:
For a non-virtual member function described in the C++ standard library, an implementation may declare a different set of member function signatures, provided that any call to the member function that would select an overload from the set of declarations described in this document behaves as if that overload were selected. [ Note: For instance, an implementation may add parameters with default values, or replace a member function with default arguments with two or more member functions with equivalent behavior, or add additional signatures for a member function name. — end note ]
And [expr.unary.op]/6:
The address of an overloaded function can be taken only in a context that uniquely determines which version of the overloaded function is referred to (see [over.over]). [ Note: Since the context might determine whether the operand is a static or non-static member function, the context can also affect whether the expression has type “pointer to function” or “pointer to member function”. — end note ]
Therefore, the behavior of a program is unspecified (possibly ill-formed) if it explicitly or implicitly attempts to form a pointer to a member function in the C++ library.
(Thanks for the comment for pointing this out!)

Is calling a "noexcept function" through a "function" lvalue undefined?

[expr.call]/6:
Calling a function through an expression whose function type is different from the function type of the called function's definition results in undefined behavior.
void f() noexcept {}; // function type is "noexcept function"
void (*pf)() = f; // variable type is "pointer to function"; initialized by result of [conv.fctptr]([conv.func](f))
int main()
{
(*pf)(); // `*pf`: lvalue expression's function type is "function" (without noexcept!)
}
Does the above call result in undefined behavior per the cited standardese?
C++14 had a weaker requirement, from [expr.call]/6:
[...] Calling a function through an expression whose function type has a language linkage that is different from the language linkage of the function type of the called function's definition is undefined ([dcl.link]). [...]
However [expr.reinterpret.cast]/6 contained a similar, but stronger requirement:
A function pointer can be explicitly converted to a function pointer of a different type. The effect of calling a function through a pointer to a function type ([dcl.fct]) that is not the same as the type used in the definition of the function is undefined.
P0012R1 made exception specifications to be part of the type system, and was implemented for C++17
The exception specification of a function is now part of the function’s type: void f() noexcept(true); and void f() noexcept(false); are functions of two distinct types. Function pointers are convertible in the sensible direction. (But the two functions f may not form an overload set.) This change strengthens the type system, e.g. by allowing APIs to require non-throwing callbacks.
and moreover added [conv.fctptr]:
Add a new section after section 4.11 [conv.mem]:
4.12 [conv.fctptr] Function pointer conversions
A prvalue of type "pointer to noexcept function" can be converted to a prvalue of type
"pointer to function". [...]
but included no changes to [expr.reinterpret.cast]/6; arguably an unintentional omission.
CWG 2215 highlighted the duplicated information in [expr.call] compared to [expr.reinterpret.cast]/6, flagging the weaker requirement in the former as redundant. The following cplusplus / draft commit implemented CWG 2215, and removed the weaker (redundant) requirement, made [expr.reinterpret.cast]/6 into a non-normative note and moved its (stronger) normative requirement to [expr.call]; eventually this stronger requirement was broken out into its own paragraph.
This confusion arguably lead to the unintentional (seemingly conflicting) rules that:
a prvalue of type “pointer to noexcept function” can be converted to a prvalue of type “pointer to function” ([conv.fctptr]/1), and
calling a function through an expression whose function type is different only by its exception specification is undefined behaviour.
Afaict, there are no defect reports covering this issue, and a new one should arguably be submitted.

Why the sentence "The expression can be used only as the left-hand operand of a member function call" in [expr.ref]p(6.3.2)?

[expr.ref]p(6.3.2):
Otherwise, if E1.E2 refers to a non-static member function and the
type of E2 is “function of parameter-type-list cv
ref-qualifieropt returning T”, then E1.E2 is a prvalue. The expression designates a non-static member function. The
expression can be used only as the left-hand operand of a member
function call ([class.mfct]). [ Note: Any redundant set of
parentheses surrounding the expression is ignored ([expr.prim.paren]).
— end note ] The type of E1.E2 is “function of parameter-type-list
cv returning T”.
For example the second statement in main below doesn't compile, probably because of the highlighted sentence above. But why is the language set up to work this way?
#include<iostream>
void g();
struct S { void f(); };
S s;
int main(){
std::cout << "decltype(g) == void() ? " << std::is_same<decltype(g), void()>::value << '\n'; // Ok
std::cout << "decltype(s.f) == void() ? " << std::is_same<decltype(s.f), void()>::value << '\n'; // Doesn't compile probably because of the sentence hihlighted above in [expr.ref]p(6.3.2).
}
When you do E1.E2, you are not talking about a general property of the type of thing that E1 is. You're asking to access a thing within the object designated by E1, where the name of the thing to be accessed is E2. If E2 is static, it accesses the class static thing; if E2 is non-static, then it accesses the member thing specific to that object. That's important.
Member variables become the subobject. If your class S had a non-static data member int i;, s.i is a reference to an int. That reference, from the stand point of an int&, behaves no differently from any other int&.
Let me say that more clearly: any int* or int& can point to/reference an int which is a complete object or an int which is a subobject of some other object. The single construct int& can serve double-duty in this way.*
Given that understanding of s.i, what would be the presumed meaning of s.f? Well, it should be similar, right? s.f would be some kind of thing that, when called with params, will be the equivalent of doing s.f(params).
But that is not a thing which exists in C++.
There is no language construct in C++ which can represent that meaning of s.f. Such a construct would need to store a reference to s as well as the member S::f.
A function pointer can't do that. Function pointers need to be able to be pointer-interconvertible with void***. But such an s.f would need to store the member S::f as well as a reference to s itself. So by definition, it'll have to be bigger than a void*.
A member pointer can't do that either. Member pointers explicitly don't carry their this object along with them (that's kind of the point); you must provide them at call-time using the specific member pointer call syntax .* or .->.
Oh, there are ways to encode this within the language: lambdas, std::bind, etc. But there is no language-level construct which has this precise meaning.
Because C++ is asymmetric in this way, where s.i has an encodable meaning but not s.f, C++ makes the unencodable one illegal.
You may ask why such a construct doesn't simply get built. It's not really that important. The language works perfectly fine as is, and due to the complexity of what an s.f would need to be, it's probably best to make you use a lambda (for which admittedly there should be ways to make it shorter to write such things) if that's what you want.
And if you want a naked s.f to be equivalent to S::f (ie: designates the member function), that doesn't really work either. First, S::f doesn't have a type either; the only thing you can do with such a prvalue is convert it to a pointer to a member. Second, a member function pointer doesn't know what object it came from, so in order to use one to call the member, you need to give it s. Therefore, in a call expression, s would have to appear twice. Which is really silly.
*: there are things you can do to complete objects that you cannot do to subobjects. But those provoke UB because they're not detectable by the compiler, because an int* doesn't say if it comes from a subobject or not. Which is the main point; nobody can tell the difference.
**: the standard does not require this, but the standard cannot do something which out-right makes such an implementation impossible. Most implementations provide this functionality, and basically any DLL/SO loading code relies on it. Oh, and it would also be completely incompatible with C, which makes it a non-starter.
The stat_result.st_mtime syntax was inherited from C, where it always has a value (more particularly, an lvalue). It is therefore syntactically an expression even in the case of a method call, but it has no value because it is evaluated in concert with its associated call expression.
It is (as you quoted) given the type of the member function so as to satisfy the requirement that a function be called via an expression of the correct type. It would, however, be misleading to define decltype for it since it cannot be a full-expression (as is every unevaluated operand) and common expression SFINAE with decltype would not prevent a hard error from the guarded instantiation.
Note that a non-static member function can be named in an unevaluated operand, but that allows S::f (or just f within the class, although that is arguably rewritten to be (*this).f in a member function), not s.f. That expression has the same type but is itself restricted (to appearing with & to form a pointer to member [function]) for the same reason: if it were to be used otherwise, it would be usable as a normal [function] pointer, which is impossible.

Can I take the address of a function defined in standard library?

Consider the following code:
#include <cctype>
#include <functional>
#include <iostream>
int main()
{
std::invoke(std::boolalpha, std::cout); // #1
using ctype_func = int(*)(int);
char c = std::invoke(static_cast<ctype_func>(std::tolower), 'A'); // #2
std::cout << c << "\n";
}
Here, the two calls to std::invoke are labeled for future reference.
The expected output is:
a
Is the expected output guaranteed in C++20?
(Note: there are two functions called tolower — one in <cctype> and the other in <locale>. The explicit cast is introduced to select the desired overload.)
Short answer
No.
Explanation
[namespace.std] says:
Let F denote a standard library function ([global.functions]), a standard library static member function, or an instantiation of a standard library function template.
Unless F is designated an addressable function, the behavior of a C++ program is unspecified (possibly ill-formed) if it explicitly or implicitly attempts to form a pointer to F.
[Note: Possible means of forming such pointers include application of the unary & operator ([expr.unary.op]), addressof ([specialized.addressof]), or a function-to-pointer standard conversion ([conv.func]).
— end note ]
Moreover, the behavior of a C++ program is unspecified (possibly ill-formed) if it attempts to form a reference to F or if it attempts to form a pointer-to-member designating either a standard library non-static member function ([member.functions]) or an instantiation of a standard library member function template.
With this in mind, let's check the two calls to std::invoke.
The first call
std::invoke(std::boolalpha, std::cout);
Here, we are attempting to form a pointer to std::boolalpha. Fortunately, [fmtflags.manip] saves the day:
Each function specified in this subclause is a designated addressable function ([namespace.std]).
And boolalpha is a function specified in this subclause.
Thus, this line is well-formed, and is equivalent to:
std::cout.setf(std::ios_base::boolalpha);
But why is that? Well, it is necessary for the following code:
std::cout << std::boolalpha;
The second call
std::cout << std::invoke(static_cast<ctype_func>(std::tolower), 'A') << "\n";
Unfortunately, [cctype.syn] says:
The contents and meaning of the header <cctype> are the same as the C standard library header <ctype.h>.
Nowhere is tolower explicitly designated an addressable function.
Therefore, the behavior of this C++ program is unspecified (possibly ill-formed), because it attempts to form a pointer to tolower, which is not designated an addressable function.
Conclusion
The expected output is not guaranteed.
In fact, the code is not even guaranteed to compile.
This also applies to member functions.
[namespace.std] doesn’t explicitly mention this, but it can be seen from [member.functions] that the behavior of a C++ program is unspecified (possibly ill-formed) if it attempts to take the address of a member function declared in the C++ standard library. Per [member.functions]/2:
For a non-virtual member function described in the C++ standard library, an implementation may declare a different set of member function signatures, provided that any call to the member function that would select an overload from the set of declarations described in this document behaves as if that overload were selected. [ Note: For instance, an implementation may add parameters with default values, or replace a member function with default arguments with two or more member functions with equivalent behavior, or add additional signatures for a member function name. — end note ]
And [expr.unary.op]/6:
The address of an overloaded function can be taken only in a context that uniquely determines which version of the overloaded function is referred to (see [over.over]). [ Note: Since the context might determine whether the operand is a static or non-static member function, the context can also affect whether the expression has type “pointer to function” or “pointer to member function”. — end note ]
Therefore, the behavior of a program is unspecified (possibly ill-formed) if it explicitly or implicitly attempts to form a pointer to a member function in the C++ library.
(Thanks for the comment for pointing this out!)

Are variables appearing in function expression taking arguments by reference but returning by value odr-used?

Given the code snippet:
struct S {
static const int var = 0;
};
int function(const int& rVar){
return rVar;
}
int main()
{
return function(S::var);
}
Compiled with gcc 5.4.0:
g++ -std=c++17 main.cpp -o test
results in the following linkage error:
/tmp/ccSeEuha.o: In function `main':
main.cpp:(.text+0x15): undefined reference to `S::var'
collect2: error: ld returned 1 exit status
§3.3 from the ISO Standard C++17 draft n4296 states:
A variable x whose name appears as a potentially-evaluated expression ex is odr-used by ex unless applying
the lvalue-to-rvalue conversion (4.1) to x yields a constant expression (5.20) that does not invoke any non-
trivial functions and, if x is an object, ex is an element of the set of potential results of an expression e,
where either the lvalue-to-rvalue conversion (4.1) is applied to e [bold-type formatting added], or e is a discarded-value expression (Clause 5).
Q: Why is a definition of the variable var required here? Isn't var denoting an integer object that appears in the potentially-evaluted expression S::var of an outter function call expression, which indeed takes a parameter by reference, but to which finally a lvalue-to-rvalue conversion is applied, and, thus isn't odr-used as stated in the paragraph?
but to which finally a lvalue-to-rvalue conversion is applied, and, thus isn't odr-used as stated in the paragraph?
The lvalue-to-rvalue conversion in the other expression is irrelevant I believe. There is no lvalue-to-rvalue conversion applied to subexpression S::var in the expression function(S::var), thus the exception does not apply.
Considering from the common sense point of view, rather than analysing the rule: functioncould be defined in another translation unit, so the compiler cannot necessarily know how the reference would be used. It cannot just send a copy of the value to the function and hope that the function definition won't use the object in a way that would require the definition of the referred object. Likewise, when compiling the function, the compiler cannot assume that all function calls would send anything other than a reference to an object that exists.
Technically, I suppose that there could be yet more complicated exception for reference arguments of inline functions, but there isn't such exception in the standard. And there shouldn't be since it would make inline expansion mandatory in those cases. In practice, a compiler might behave exactly as such exception would require if it happens to expand the function inline, since odr violations have undefined behaviour.
It is explicit in the example show just above the paragraph you quoted: in function(S::var);, S::var is odr-used.
The reason is that as function takes its parameter by ref (and not by value), no lvalue to rvalue conversion occurs.
But if you change function to take its argument by value:
int function(const int rVar){
return rVar;
}
then the lvalue to rvalue conversion occurs, and S::var is non longer odr-used. And the program no longer exhibit the undefined reference...