Does C++ allow unused variables to have multiple definitions? [duplicate] - c++

The standard seems to imply that there is no restriction on the number of definitions of a variable if it is not odr-used (§3.2/3):
Every program shall contain exactly one definition of every non-inline function or variable that is odr-used in that program; no diagnostic required.
It does say that any variable can't be defined multiple times within a translation unit (§3.2/1):
No translation unit shall contain more than one definition of any variable, function, class type, enumeration type, or template.
But I can't find a restriction for non-odr-used variables across the entire program. So why can't I compile something like the following:
// other.cpp
int x;
// main.cpp
int x;
int main() {}
Compiling and linking these files with g++ 4.6.3, I get a linker error for multiple definition of 'x'. To be honest, I expect this, but since x is not odr-used anywhere (as far as I can tell), I can't see how the standard restricts this. Or is it undefined behaviour?

Your program violates the linkage rules. C++11 §3.5[basic.link]/9 states:
Two names that are the same and that are declared in different scopes shall denote the same
variable, function, type, enumerator, template or namespace if
both names have external linkage or else both names have internal linkage and are declared in the same translation unit; and
both names refer to members of the same namespace or to members, not by inheritance, of the same class; and
when both names denote functions, the parameter-type-lists of the functions are identical; and
when both names denote function templates, the signatures are the same.
(I've cited the complete paragraph, for reference. The second two bullets do not apply here.)
In your program, there are two names x, which are the same. They are declared in different scopes (in this case, they are declared in different translation units). Both names have external linkage and both names refer to members of the same namespace (the global namespace).
These two names do not denote the same variable. The declaration int x; defines a variable. Because there are two such definitions in the program, there are two variables in the program. The name "x" in one translation unit denotes one of these variables; the name "x" in the other translation unit denotes the other. Therefore, the program is ill-formed.

You're correct that the standard is at fault in this regard. I have a feeling that this case falls into the gap between 3.2p1 (at most one definition per translation unit, as in your question) and 3.2p6 (which describes how classes, enumerations, inline functions, and various templates can have duplicate definitions across translation units).
For comparison, in C, 6.9p5 requires that (my emphasis):
An external definition is an external declaration that is also a definition of a function
(other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof or _Alignof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.

If standard does not say anything about definitions of unused variables then you can not imply that there may be multiple:
Undefined behavior may also be expected when this International
Standard omits the description of any explicit definition of
behavior.
So it may compile and run nicely or may stop during translation with error message or may crash runtime etc.
EDIT: See James McNellis answer the standard indeed actually has rules about it.

There is no error in compiling that, the error is in its linkage. By default your global variable or functions are public to other files(have extern storage) so at the end when linker want to link your code it see two definition for x and it can't select one of them, so if you do not use x of main.cpp in other.cpp and vice-verse make them static(that means only visible to the file that contain it)
// other.cpp
static int x;
// main.cpp
static int x;

Related

Why do c and c++ treat redefinitions of uninitialised variables differently?

int a;
int a=3; //error as cpp compiled with clang++-7 compiler but not as C compiled with clang-7;
int main() {
}
For C, the compiler seems to merge these symbols into one global symbol but for C++ it is an error.
Demo
file1:
int a = 2;
file2:
#include<stdio.h>
int a;
int main() {
printf("%d", a); //2
}
As C files compiled with clang-7, the linker does not produce an error and I assume it converts the uninitialised global symbol 'a' to an extern symbol (treating it as if it were compiled as an extern declaration). As C++ files compiled with clang++-7, the linker produces a multiple definition error.
Update: the linked question does answer the first example in my question, specifically 'In C, If an actual external definition is found earlier or later in the same translation unit, then the tentative definition just acts as a declaration.' and 'C++ does not have “tentative definitions”'.
As for the second scenario, if I printf a, then it does print 2, so obviously the linker has linked it correctly (but I previously would have assumed that a tentative definition would be initialised to 0 by the compiler as a global definition and would cause a link error).
It turns out that int i[]; tentative defintion in both files also gets linked to one definition. int i[5]; is also a tentative definition in .common, just with a different size expressed to the assembler. The former is known as a tentative definition with an incomplete type, whereas the latter is a tentative definition with a complete type.
What happens with the C compiler is that int a is made strong-bound weak global in .common and left uninitialised (where .common implies a weak global) in the symbol table (whereas extern int a would be an extern symbol), and the linker makes the necessary decision, i.e. it ignores all weak-bound globals defined using #pragma weak if there is a strong-bound global with the same identifier in a translation unit, where 2 strong-bounds would be a multiple definition error (but if it finds no strong-bounds and 1 weak-bound, the output is a single weak-bound, and if it finds no strong-bounds but two weak-bounds, it chooses the definition in the first file on the command line and outputs the single weak-bound. Though two weak-bounds are two definitions to the linker (because they are initialised to 0 by the compiler), it is not a multiple definition error, because they are both weak-bound) and then resolves all .common symbols to point to the strong/weak-bound strong global. https://godbolt.org/z/Xu_8tY https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/chapter2-93321/index.html
As baz is declared with #pragma weak, it is weak-bound and gets zeroed by the compiler and put in .bss (even though it is a weak global, it doesn't go in .common, because it is weak-bound; all weak-bound variables go in .bss if uninitialised and get initialised by the compiler, or .data if they are initialised). If it were not declared with #pragma weak, baz would go in common and the linker will zero it if no weak/strong-bound strong global symbol is found.
C++ compiler makes int a a strong-bound strong global in .bss and initialises it to 0: https://godbolt.org/z/aGT2-o, therefore the linker treats it as a multiple definition.
Update 2:
GCC 10.1 defaults to -fno-common. As a result, global variable targets are more efficient on various targets. In C, global variables with multiple tentative definitions now result in linker errors (like C++). With -fcommon such definitions are silently merged during linking.
I'll address the C end of the question, since I'm more familiar with that language and you seem to already be pretty clear on why the C++ side works as it does. Someone else is welcome to add a detailed C++ answer.
As you noted, in your first example, C treats the line int a; as a tentative definition (see 6.9.2 in N2176). The later int a = 3; is a declaration with an initializer, so it is an external definition. As such, the earlier tentative definition int a; is treated as merely a declaration. So, retroactively, you have first declared a variable at file scope and later defined it (with an initializer). No problem.
In your second example, file2 also has a tentative definition of a. There is no external definition in this translation unit, so
the behavior is exactly as if the translation
unit contains a file scope declaration of that identifier, with the composite type as of the end of the
translation unit, with an initializer equal to 0. [6.9.2 (1)]
That is, it is as if you had written int a = 0; in file2. Now you have two external definitions of a in your program, one in file1 and another in file2. This violates 6.9 (5):
If an identifier declared with external linkage is used in an expression
(other than as part of the operand of a sizeof or _Alignof operator whose result is an integer
constant), somewhere in the entire program there shall be exactly one external definition for the
identifier; otherwise, there shall be no more than one.
So under the C standard, the behavior of your program is undefined and the compiler is free to do as it likes. (But note that no diagnostic is required.) With your particular implementation, instead of summoning nasal demons, what your compiler chooses to do is what you described: use the common feature of your object file format, and have the linker merge the definitions into one. Although not required by the standard, this behavior is traditional at least on Unix, and is mentioned by the standard as a "common extension" (no pun intended) in J.5.11.
This feature is quite convenient, in my opinion, but since it's only possible if your object file format supports it, we couldn't really expect the C standard authors to mandate it.
clang doesn't document this behavior very clearly, as far as I can see, but gcc, which has the same behavior, describes it under the -fcommon option. On either compiler, you can disable it with -fno-common, and then your program should fail to link with a multiple definition error.

Different compilation results not using extern in C vs in C++

When I declare a global variable in two different source files and only define it in one of the source files, I get different results compiling for C++ than for C. See the following example:
main.c
#include <stdio.h>
#include "func.h" // only contains declaration of void print();
int def_var = 10;
int main() {
printf("%d\n", def_var);
return 0;
}
func.c
#include <stdio.h>
#include "func.h"
/* extern */int def_var; // extern needed for C++ but not for C?
void print() {
printf("%d\n", def_var);
}
I compile with the following commands:
gcc/g++ -c main.c -o main.o
gcc/g++ -c func.c -o func.o
gcc/g++ main.o func.o -o main
g++/clang++ complain about multiple definition of def_var (this is the behaviour I expected, when not using extern).
gcc/clang compile just fine. (using gcc 7.3.1 and clang 5.0)
According to this link:
A tentative definition is a declaration that may or may not act as a definition. If an actual external definition is found earlier or later in the same translation unit, then the tentative definition just acts as a declaration.
So my variable def_var should be defined at the end of each translation unit and then result in multiple definitions (as it is done for C++). Why is that not the case when compiling with gcc/clang?
This isn't valid C either, strictly speaking. Says as much in
6.9 External definitions - p5
An external definition is an external declaration that is also a
definition of a function (other than an inline definition) or an
object. If an identifier declared with external linkage is used in an
expression (other than as part of the operand of a sizeof or _Alignof
operator whose result is an integer constant), somewhere in the entire
program there shall be exactly one external definition for the
identifier; otherwise, there shall be no more than one.
You have two definitions for an identifier with external linkage. You violate that requirement, the behavior is undefined. The program linking and working is not in opposition to that. It's not required to be diagnosed.
And it's worth noting that C++ is no different in that regard.
[basic.def.odr]/4
Every program shall contain exactly one definition of every non-inline
function or variable that is odr-used in that program outside of a
discarded statement; no diagnostic required. The definition can appear
explicitly in the program, it can be found in the standard or a
user-defined library, or (when appropriate) it is implicitly defined
(see [class.ctor], [class.dtor] and [class.copy]). An inline function
or variable shall be defined in every translation unit in which it is
odr-used outside of a discarded statement.
Again, a "shall" requirement, and it says explicitly that no diagnostic is required. As you may have noticed, there's quite a bit more machinery that this paragraph can apply to. So the front ends for GCC and Clang probably need to work harder, and as such are able to diagnose it, despite not being required to.
The program is ill-formed either way.
As M.M pointed out in a comment, the C standard has an informative section that mentions the very extension in zwol's answer.
J.5.11 Multiple external definitions
There may be more than one external definition for the identifier of
an object, with or without the explicit use of the keyword extern; if
the definitions disagree, or more than one is initialized, the
behavior is undefined (6.9.2).
I believe you are observing an extension to C known as "common symbols", implemented by most, but not all, Unix-lineage C compilers, originally (IIUC) for compatibility with FORTRAN. The extension generalizes the "tentative definitions" rule described in StoryTeller's answer to multiple translation units. All external object definitions with the same name and no initializer,
int foo; // at file scope
are collapsed into one, even if they appear in more than one TU, and if there exists an external definition with an initializer for that name,
int foo = 1; // different TU, also file scope
then all of the external definitions with no initializers are treated as external declarations. C++ compilers do not implement this extension, because (oversimplifying) nobody wanted to figure out what it should do in the presence of templates. For GCC and Clang, you can disable the extension with -fno-common, but other Unix C compilers may not have any way to turn it off.

Why does defining inline global function in 2 different cpp files cause a magic result?

Suppose I have two .cpp files file1.cpp and file2.cpp:
// file1.cpp
#include <iostream>
inline void foo()
{
std::cout << "f1\n";
}
void f1()
{
foo();
}
and
// file2.cpp
#include <iostream>
inline void foo()
{
std::cout << "f2\n";
}
void f2()
{
foo();
}
And in main.cpp I have forward declared the f1() and f2():
void f1();
void f2();
int main()
{
f1();
f2();
}
Result (doesn't depend on build, same result for debug/release builds):
f1
f1
Whoa: Compiler somehow picks only the definition from file1.cpp and uses it also in f2(). What is the exact explanation of this behavior?.
Note, that changing inline to static is a solution for this problem. Putting the inline definition inside an unnamed namespace also solves the problem and the program prints:
f1
f2
This is undefined behavior, because the two definitions of the same inline function with external linkage break C++ requirement for objects that can be defined in several places, known as One Definition Rule:
3.2 One definition rule
...
There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14),[...] in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
6.1 each definition of D shall consist of the same sequence of tokens; [...]
This is not an issue with static functions, because one definition rule does not apply to them: C++ considers static functions defined in different translation units to be independent of each other.
The compiler may assume that all definitions of the same inline function are identical across all translation units because the standard says so. So it can choose any definition it wants. In your case, that happened to be the one with f1.
Note that you cannot rely on the compiler always picking the same definition, violating the aforementioned rule makes the program ill-formed. The compiler could also diagnose that and error out.
If the function is static or in an anonymous namespace, you have two distinct functions called foo and the compiler must pick the one from the right file.
Relevant standardese for reference:
An inline function shall be defined in every translation unit in which it is odr-used and shall have exactly
the same definition in every case (3.2). [...]
7.1.2/4 in N4141, emphasize mine.
As others have noted, the compilers are in compliance with the C++ standard because the One definition rule states that you shall have only one definition of a function, except if the function is inline then the definitions must be the same.
In practice, what happens is that the function is flagged as inline, and at linking stage if it runs into multiple definitions of an inline flagged token, the linker silently discards all but one. If it runs into multiple definitions of a token not flagged inline, it instead generates an error.
This property is called inline because, prior to LTO (link time optimization), taking the body of a function and "inlining" it at the call site required that the compiler have the body of the function. inline functions could be put in header files, and each cpp file could see the body and "inline" the code into the call site.
It doesn't mean that the code is actually going to be inlined; rather, it makes it easier for compilers to inline it.
However, I am unaware of a compiler that checks that the definitions are identical before discarding duplicates. This includes compilers that otherwise check definitions of function bodies for being identical, such as MSVC's COMDAT folding. This makes me sad, because it is a reall subtle set of bugs.
The proper way around your problem is to place the function in an anonymous namespace. In general, you should consider putting everything in a source file in an anonymous namespace.
Another really nasty example of this:
// A.cpp
struct Helper {
std::vector<int> foo;
Helper() {
foo.reserve(100);
}
};
// B.cpp
struct Helper {
double x, y;
Helper():x(0),y(0) {}
};
methods defined in the body of a class are implicitly inline. The ODR rule applies. Here we have two different Helper::Helper(), both inline, and they differ.
The sizes of the two classes differ. In one case, we initialize two sizeof(double) with 0 (as the zero float is zero bytes in most situations).
In another, we first initialize three sizeof(void*) with zero, then call .reserve(100) on those bytes interpreting them as a vector.
At link time, one of these two implementations is discarded and used by the other. What more, which one is discarded is likely to be pretty determistic in a full build. In a partial build, it could change order.
So now you have code that might build and work "fine" in a full build, but a partial build causes memory corruption. And changing the order of files in makefiles could cause memory corruption, or even changing the order lib files are linked, or upgrading your compiler, etc.
If both cpp files had a namespace {} block containing everything except the stuff you are exporting (which can use fully qualified namespace names), this could not happen.
I've caught exactly this bug in production multiple times. Given how subtle it is, I do not know how many times it slipped through, waiting for its moment to pounce.
POINT OF CLARIFICATION:
Although the answer rooted in C++ inline rule is correct, it only applies if both sources are compiled together. If they are compiled separately, then, as one commentator noted, each resulting object file would contain its own 'foo()'. HOWEVER: If these two object files are then linked together, then because both 'foo()'-s are non-static, the name 'foo()' appears in the exported symbol table of both object files; then the linker has to coalesce the two table entries, hence all internal calls are re-bound to one of the two routines (presumably the one in the first object file processed, since it is already bound [i.e the linker would treat the second record as 'extern' regardless of binding]).

What is the difference between the global variables in C and C++?

I have tested the following code:
in file a.c/a.cpp
int a;
in file b.c/b.cpp
int a;
int main() { return 0; }
When I compile the source files with gcc *.c -o test, it succeeds.
But when I compile the source files with g++ *.c -o test, it fails:
ccIJdJPe.o:b.cpp:(.bss+0x0): multiple definition of 'a'
ccOSsV4n.o:a.cpp:(.bss+0x0): first defined here
collect2.exe: error: ld returned 1 exit status
I'm really confused about this. Is there any difference between the global variables in C and C++?
Here are the relevant parts of the standard. See my explanation below the standard text:
§6.9.2/2 External object definitions
A declaration of an identifier for an object that has file scope without an initializer, and without a storage-class specifier or with the storage-class specifier static, constitutes a tentative definition. If a translation unit contains one or more tentative definitions for an identifier, and the translation unit contains no external definition for that identifier, then the behavior is exactly as if the translation unit contains a file scope declaration of that identifier, with the composite type as of the end of the translation unit, with an initializer equal to 0.
ISO C99 §6.9/5 External definitions
An external definition is an external declaration that is also a definition of a function (other than an inline definition) or an object. If an identifier declared with external linkage is used in an expression (other than as part of the operand of a sizeof operator whose result is an integer constant), somewhere in the entire program there shall be exactly one external definition for the identifier; otherwise, there shall be no more than one.
With the C version, the 'g' global variables are 'merged' into one, so you will only have one in the end of the day which is declared twice. This is OK due to the time when extern was not needed, or perhaps did not exits. Hence, this is for historical and compatibility reason to build old code. This is a gcc extension for this legacy feature.
It basically makes gcc allocate memory for a variable with the name 'a', so there can be more than one declarations, but only one definition. That is why the code below will not work even with gcc.
This is also called tentative definition. There is no such a thing with C++, and that is while it compiles. C++ has no concept of tentative declaration.
A tentative definition is any external data declaration that has no storage class specifier and no initializer. A tentative definition becomes a full definition if the end of the translation unit is reached and no definition has appeared with an initializer for the identifier. In this situation, the compiler reserves uninitialized space for the object defined.
Note however that the following code will not compile even with gcc because this is tentative definition/declaration anymore with values assigned:
in file "a.c/a.cpp"
int a = 1;
in file "b.c/b.cpp"
int a = 2;
int main() { return 0; }
Let us go even beyond this with further examples. The following statements show normal definitions and tentative definitions. Note, static would make it a bit difference since that is file scope, and would not be external anymore.
int i1 = 10; /* definition, external linkage */
static int i2 = 20; /* definition, internal linkage */
extern int i3 = 30; /* definition, external linkage */
int i4; /* tentative definition, external linkage */
static int i5; /* tentative definition, internal linkage */
int i1; /* valid tentative definition */
int i2; /* not legal, linkage disagreement with previous */
int i3; /* valid tentative definition */
int i4; /* valid tentative definition */
int i5; /* not legal, linkage disagreement with previous */
Further details can be on the following page:
http://c0x.coding-guidelines.com/6.9.2.html
See also this blog post for further details:
http://ninjalj.blogspot.co.uk/2011/10/tentative-definitions-in-c.html
gcc implements a legacy feature where uninitialized global variables are placed in a common block.
Although in each translation unit the definitions are tentative, in ISO C, at the end of the translation unit, tentative definitions are "upgraded" to full definitions if they haven't already been merged into a non-tentative definition.
In standard C, it is always incorrect to have the same variables with external linkage defined in more that one translation unit even if these definitions came from tentative definitions.
To get the same behaviour as C++, you can use the -fno-common switch with gcc and this will result in the same error. (If you are using the GNU linker and don't use -fno-common you might also want to consider using the --warn-common / -Wl,--warn-common option to highlight the link time behaviour on encountering multiple common and non-common symbols with the same name.)
From the gcc man page:
-fno-common
In C code, controls the placement of uninitialized global
variables. Unix C compilers have traditionally permitted multiple
definitions of such variables in different compilation units by
placing the variables in a common block. This is the behavior
specified by -fcommon, and is the default for GCC on most
targets. On the other hand, this behavior is not required by ISO
C, and on some targets may carry a speed or code size penalty on
variable references. The -fno-common option specifies that the
compiler should place uninitialized global variables in the data
section of the object file, rather than generating them as common
blocks. This has the effect that if the same variable is declared
(without extern) in two different compilations, you will get a
multiple-definition error when you link them. In this case, you
must compile with -fcommon instead. Compiling with
-fno-common is useful on targets for which it provides better
performance, or if you wish to verify that the program will work
on other systems which always treat uninitialized variable
declarations this way.
gcc's behaviour is a common one and it is described in Annex J of the standard (which is not normative) which describes commonly implemented extensions to the standard:
J.5.11 Multiple external definitions
There may be more than one external definition for the identifier of an object, with or
without the explicit use of the keyword extern; if the definitions disagree, or more than
one is initialized, the behavior is undefined (6.9.2).

Inline member functions in C++

ISO C++ says that the inline definition of member function in C++ is the same as declaring it with inline. This means that the function will be defined in every compilation unit the member function is used. However, if the function call cannot be inlined for whatever reason, the function is to be instantiated "as usual". (http://msdn.microsoft.com/en-us/library/z8y1yy88%28VS.71%29.aspx) The problem I have with this definition is that it does not tell in which translation unit it would be instantiated.
The problem I encountered is that when facing two object files in a single static library, both of which have the reference to some inline member function which cannot be inlined, the linker might "pick" an arbitrary object file as a source for the definition. This particular choice might introduce unneeded dependencies. (among other things)
For instance:
In a static library
A.h:
class A{
public:
virtual bool foo() { return true; }
};
U1.cpp:
A a1;
U2.cpp:
A a2;
and lots of dependencies
In another project
main.cpp:
#include "A.h"
int main(){
A a;
a.foo();
return 0;
}
The second project refers the first. How do I know which definition the compiler will use, and, consequently which object files with their dependencies will be linked in? Is there anything the standard says on that matter? (Tried, but failed to find that)
Thanks
Edit: since I've seen some people misunderstand what the question is, I'd like to emphasize: If the compiler decided to create a symbol for that function (and in this case, it will, because of 'virtualness', there will be several (externally-seen) instantiations in different object file, which definition (from which object file?) will the linker choose?)
Just my two cents. This is not about virtual function in particular, but about inline and member-functions generally. Maybe it is useful.
C++
As far as Standard C++ is concerned, a inline function must be defined in every translation unit in which it is used. And an non-static inline function will have the same static variables in every translation unit and the same address. The compiler/linker will have to merge the multiple definitions into one function to achieve this. So, always place the definition of an inline function into the header - or place no declaration of it into the header if you define it only in the implementation file (".cpp") (for a non-member function), because if you would, and someone used it, you would get a linker error about an undefined function or something similar.
This is different from non-inline functions which must be defined only once in an entire program (one-definition-rule). For inline functions, multiple definitions as outlined above are rather the normal case. And this is independent on whether the call is atually inlined or not. The rules about inline functions still matter. Whether the Microsoft compiler adheres to those rules or not - i can't tell you. If it adheres to the Standard in that regard, then it will. However, i could imagine some combination using virtual, dlls and different TUs could be problematic. I've never tested it but i believe there are no problems.
For member-functions, if you define your function in the class, it is implicitly inline. And because it appears in the header, the rule that it has to be defined in every translation unit in which it is used is automatically satisfied. However, if you define the function out-of-class and in a header file (for example because there is a circular dependency with code in between), then that definition has to be inline if you include the corresponding file more than once, to avoid multiple-definition errors thrown by the linker. Example of a file f.h:
struct f {
// inline required here or before the definition below
inline void g();
};
void f::g() { ... }
This would have the same effect as placing the definition straight into the class definition.
C99
Note that the rules about inline functions are more complicated for C99 than for C++. Here, an inline function can be defined as an inline definition, of which can exist more than one in the entire program. But if such a (inline-) definition is used (e.g if it is called), then there must be also exactly one external definition in the entire program contained in another translation unit. Rationale for this (quoting from a PDF explaining the rationale behind several C99 features):
Inlining in C99 does extend the C++ specification in two ways. First, if a function is declared inline in one translation unit, it need not be declared inline in every other translation unit. This allows, for example, a library function that is to be inlined within the library but available only through an external definition elsewhere. The alternative of using a wrapper function for the external function requires an additional name; and it may also adversely impact performance if a translator does not actually do inline substitution.
Second, the requirement that all definitions of an inline function be "exactly the same" is replaced by the requirement that the behavior of the program should not depend on whether a call is implemented with a visible inline definition, or the external definition, of a function. This allows an inline definition to be specialized for its use within a particular translation unit. For example, the external definition of a library function might include some argument validation that is not needed for calls made from other functions in the same library. These extensions do offer some advantages; and programmers who are concerned about compatibility can simply abide by the stricter C++ rules.
Why do i include C99 into here? Because i know that the Microsoft compiler supports some stuff of C99. So in those MSDN pages, some stuff may come from C99 too - haven't figured anything in particular though. One should be careful when reading it and when applying those techniques to ones own C++ code intended to be portable C++. Probably informing which parts are C99 specific, and which not.
A good place to test small C++ snippets for Standard conformance is the comeau online compiler. If it gets rejected, one can be pretty sure it is not strictly Standard conforming.
When you have an inline method that is forced to be non-inlined by the compiler, it will really instantiate the method in every compiled unit that uses it. Today most compilers are smart enough to instantiate a method only if needed (if used) so merely including the header file will not force instantiation. The linker, as you said, will pick one of the instantiations to include in the executable file - but keep in mind that the record inside the object module is of a special kind (for instance, a COMDEF) in order to give the linker enough information to know how to discard duplicated instances. These records will not, therefore, result in unwanted dependencies between modules, because the linker will use them with less priority than "regular" records to resolve dependencies.
In the example you gave, you really don't know, but it doesn't matter. The linker won't resolve dependencies based on non-inlined instances alone ever. The result (in terms of modules included by the linker) will be as good as if the inline method didn't exist.
AFAIK, there is no standard definition of how and when a C++ compiler will inline a function call. These are usually "recommendations" that the compiler is in no way required to follow. In fact, different users may want different behaviors. One user may care about speed, while another may care about small generated object file size. In addition, compilers and platforms are different. Some compilers may apply smarter analysis, some may not. Some compilers may generate longer code from the inline, or work on a platform where calls are too expensive, etc.
When you have an inline function, the compiler should still generate a symbol for it and eventually resolve a single version of it. So that if it is in a static library, people can still call the function not in inline. In other words, it still acts as a normal function,.
The only effect of the inline is that some cases, the compiler will see the call, see the inline, and skip the call completely, but the function should still be there, it's just not getting called in this case.
If the compiler decided to create a symbol for that function (and in this case, it will, because of 'virtualness', there will be several (externally-seen) instantiations in different object file, which definition (from which object file?) will the linker choose?)
The definition that is present in the corresponding translation unit. And a translation unit cannot, and I repeat, cannot have but exactly one such definition. The standard is clear about that.
[...]the linker might "pick" an arbitrary object file as a source for the definition.
EDIT: To avoid any further misunderstanding, let me make my point clear: As per my reading of the standard, the ability to have multiple definition across different TUs does not give us any practical leverage. By practical, I mean having even slightly varying implementations. Now, if all your TUs have the exact same definition, why bother which TU the definition is being picked up from?
If you browse through the standard you will find the One Definition Rule is applied everywhere. Even though it is allowed to have multiple definitions of an inline function:
3.2 One Definition Rule:
5 There can be more than one definition of a class type (Clause 9), concept (14.9), concept map (14.9.2), enumeration type (7.2), inline function with external linkage (7.1.2), [...]
Read it in conjunction with
3 [...] An inline function shall be defined in every translation unit in which it is used.
This means that the function will be defined in every compilation unit [...]
and
7.1.2 Function Specifiers
2 A function declaration (8.3.5, 9.3, 11.4) with an inline specifier declares an inline function. The inline specifier indicates to the implementation that inline substitution of the function body at the point of call is to be preferred to the usual function call mechanism. An implementation is not required to perform this inline substitution at the point of call; however, even if this inline substitution is omitted, the other rules
for inline functions defined by 7.1.2 shall still be respected.
3 A function defined within a class definition is an inline function. The inline specifier shall not appear on a block scope function declaration.[footnote: 82] If the inline specifier is used in a friend declaration, that declaration shall be a definition or the function shall have previously been declared inline.
and the footnote:
82) The inline keyword has no effect on the linkage of a function.
§ 7.1.2 138
as well as:
4 An inline function shall be defined in every translation unit in which it is used and shall have exactly the same definition in every case (3.2). [ Note: a call to the inline function may be encountered before its definition appears in the translation unit. —end note ] If the definition of a function appears in a translation unit before its first declaration as inline, the program is ill-formed. If a function with external linkage is
declared inline in one translation unit, it shall be declared inline in all translation units in which it appears; no diagnostic is required. An inline function with external linkage shall have the same address in all translation units. A static local variable in an extern inline function always refers to the same object. A string literal in the body of an extern inline function is the same object in different translation units. [ Note: A string literal appearing in a default argument expression is not in the body of an inline function merely because the expression is used in a function call from that inline function. —end note ]
Distilled: Its ok to have multiple definitions, but they must have the same look and feel in every translation unit and address -- but that doesn't really give you much to cheer about. Having multiple deinition across translation units is therefore not defined (note: I am not saying you are invoking UB, yet).
As for the virtual thingy -- there won't be any inlining. Period.
The standard says:
The same declaration must be available
There must be one definition
From MSDN:
A given inline member function must be declared the same way in every compilation unit. This constraint causes inline functions to behave as if they were instantiated functions. Additionally, there must be exactly one definition of an inline function.
Your A.h contains the class definition and the member foo()'s definition.
U1.cpp and U2.cpp both define two different objects of class A.
You create another A object in main(). This is just fine.
So far, I have seen only one definition of A::foo() which is inline. (Remember that a function defined within the class declaration is always inlined whether or not it is preceded by the inline keyword.)
Don't inline your functions if you want to ensure they get compiled into a specific library.