how linker decides which implementation use - c++

Let's say we have the following files:
//foo.h
class Foo
{
public:
void foo()
{
//Great code here
}
};
//foo1.cpp
#include "foo.h"
void Foo1()
{
Foo f1;
f1.foo();
}
//foo2.cpp
#include "foo.h"
void Foo2()
{
Foo f2;
f2.foo();
}
When I compile them separated, they generate two objects: Foo1.o Foo2.o. When I link them together they link perfectly.
Now if I dump symbols table for both, the seems to implement Foo::foo function in the two compilation units.
_ZN3Foo3fooEv
Now, how does the linker distinguish which implementation to use?

Mats Petersson's answers entirely correct, but I'll spin this in my own words with different coverage....
When you compile C++ code, you compile it a translation unit at a time... each translation unit typically consists of one implementation file (e.g. .cpp/.cc or whatever you've chosen to name it) and the header files it includes, and the compiler produces one .o file. When the compiler sees your foo.h and the definition for Foo::foo(), it will consider it a nominally inline function because the function body appears inside the class. As such, the compiler may or may not actually inline the function at points of call - that decision will depend upon the size/complexity of the function and the compiler's heuristics and options. So, Foo::foo may still end up as separate out-of-line functions in the .os for both translation units.
Because the function's nominally inline, the compiler needs to make sure that the symbol is marked as a "weak symbol" (exact terminology may differ by OS/toolchain - this is implementation detail below the level of the C++ Standard) - see http://en.wikipedia.org/wiki/Weak_symbol
When objects are linked that have the same weak symbols in them, the code from one copy is kept and the other copies discarded. Consequently, both .o files may have the function (despite it being nominally inline due to definition in class), but the executable linked from the .os only has one copy.

Since the code in Foo::foo() is identical [if it's not, you are breaking the "one definition rule" - that is, one function should have ONE definition, no matter how many times it is actually defined].
So, the compiler/linker should be perfectly allowed to merge the two identical functions into one when completing your executable file.
Note however, that as it stands, Foo::foo() is declared as an inline function, which means that it's not "exported" to the outside world, and there should be no conflict.
If you were to "manually include" the class definition for Foo in both of your foo1.cpp and foo2.cpp, and make some subtle difference in the functions, you would find that the linker "picks" one of the functions, and discards the other one. Which it picks is not defined, and since the "one definition rule" has been broken, you are "outside the bounds of what you should do", so no point in complaining that the compiler "doesn't do the right thing". [Although you would have to make the function "non-inline" to make this a problem, and then you'd probably get a linker error for multiple definitions].

Both functions Foo1 and Foo2 are in their own respective translational units. When you include foo.h, both gets a copy of the entire Foo class. Thus, the linker uses each units respective copy.
Now if you implmented Foo::foo() inside it's own source file, then both Foo1 and Foo2 would use the same foo function during linking.

Related

Using functions inside class [duplicate]

I've done a simple experiment, a ".h" file with a class definition and a funciton definition, as below:
$cat testInline.h
#pragma once
class C{
public:
void f(){}
};
void g(){}
Then 2 users of this .h file:
$cat use01.cpp
#include"testInline.h"
void g01(){
g();
C obj1;
obj1.f();
}
$cat use02.cpp
#include"testInline.h"
int main(){
g();
C obj2;
obj2.f();
return 0;
}
I compile them together and gets an error:
$g++ use01.cpp use02.cpp
duplicate symbol __Z1gv in:
/var/folders/zv/b953j0_55vldj97t0wz4qmkh0000gn/T/use01-34f300.o
/var/folders/zv/b953j0_55vldj97t0wz4qmkh0000gn/T/use02-838e05.o
ld: 1 duplicate symbol for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Looks really weird: I've used "#pragma once",still I cannot stop compiler from reporting duplicated definition of g()(__Z1gv as name mangling)
Then I modified testInline.h->g() definition to be like:
inline void g(){}
Well, it compiles. Isn't it in C++ that "inline" keyword is basically useless, because compilers will decide whether it'll inline a function or not?
And,why C::f() with code in .h file doesn't report duplication, while a C-style function g() does? And why class C doesn't "have to" add "inline" for its "f()" function, while g() has to use "inline"?
Hope I've stated my question clearly. Thanks for your help.
I've used "#pragma once",
Yes, you did. And each one of the two translation units effectively processed the header file exactly once. Each one would've done so even without the pragma, since each translation unit includes the header file just once.
#pragma once does not mean "include this header file in just one of the translation units being compiled". It means "include this header file once per translation unit, even if the translation unit directly, or indirectly, includes the header file two or more times". As such, each translation unit included the header file, and defined the functions/methods from the header file itself. Since the same function or method ended up being defined by both translation units you ended up with a duplicate at link time.
Isn't it in C++ that "inline" keyword is basically useless, because
compilers will decide whether it'll inline a function or not?
It is true that the compiler decides whether the function actually gets inlined, or not. However, the inline keyword specifies whether the function definition is processed as if it were logically inlined for every use of it, and not actually defined. As such, using the inline keyword does not result in duplicate definitions since, logically, the function is inserted inline at its every reference.
It is true that Whether this actually happens, or whether the compiler produces non-inlined code, is up to the compiler. However, C++ requires that the function gets compiled "as if" it was inlined; so even if the compiler decides not to inline the function, it must take whatever steps are necessary to ensure that the duplicate non-inlined copies of the function does not result in an ill-formed program.
And,why C::f() with code in .h file doesn't report duplication,
Because a class method defines inside the definition of the class is effectively an inline definition, even if the inline keyword is not explicitly specified.
The inline keyword is not useless. It's just that it doesn't necessarily control whether the function is actually inlined.
The inline keyword marks a function as possibly defined in multiple translation units. (The definition and meaning must be the same in all of them.) You should mark a function defined in a header file as inline.
A function defined within a class definition, like your C::f, is automatically considered "inline".

Why does defining inline global function in 2 different cpp files cause a magic result?

Suppose I have two .cpp files file1.cpp and file2.cpp:
// file1.cpp
#include <iostream>
inline void foo()
{
std::cout << "f1\n";
}
void f1()
{
foo();
}
and
// file2.cpp
#include <iostream>
inline void foo()
{
std::cout << "f2\n";
}
void f2()
{
foo();
}
And in main.cpp I have forward declared the f1() and f2():
void f1();
void f2();
int main()
{
f1();
f2();
}
Result (doesn't depend on build, same result for debug/release builds):
f1
f1
Whoa: Compiler somehow picks only the definition from file1.cpp and uses it also in f2(). What is the exact explanation of this behavior?.
Note, that changing inline to static is a solution for this problem. Putting the inline definition inside an unnamed namespace also solves the problem and the program prints:
f1
f2
This is undefined behavior, because the two definitions of the same inline function with external linkage break C++ requirement for objects that can be defined in several places, known as One Definition Rule:
3.2 One definition rule
...
There can be more than one definition of a class type (Clause 9), enumeration type (7.2), inline function with external linkage (7.1.2), class template (Clause 14),[...] in a program provided that each definition appears in a different translation unit, and provided the definitions satisfy the following requirements. Given such an entity named D defined in more than one translation unit, then
6.1 each definition of D shall consist of the same sequence of tokens; [...]
This is not an issue with static functions, because one definition rule does not apply to them: C++ considers static functions defined in different translation units to be independent of each other.
The compiler may assume that all definitions of the same inline function are identical across all translation units because the standard says so. So it can choose any definition it wants. In your case, that happened to be the one with f1.
Note that you cannot rely on the compiler always picking the same definition, violating the aforementioned rule makes the program ill-formed. The compiler could also diagnose that and error out.
If the function is static or in an anonymous namespace, you have two distinct functions called foo and the compiler must pick the one from the right file.
Relevant standardese for reference:
An inline function shall be defined in every translation unit in which it is odr-used and shall have exactly
the same definition in every case (3.2). [...]
7.1.2/4 in N4141, emphasize mine.
As others have noted, the compilers are in compliance with the C++ standard because the One definition rule states that you shall have only one definition of a function, except if the function is inline then the definitions must be the same.
In practice, what happens is that the function is flagged as inline, and at linking stage if it runs into multiple definitions of an inline flagged token, the linker silently discards all but one. If it runs into multiple definitions of a token not flagged inline, it instead generates an error.
This property is called inline because, prior to LTO (link time optimization), taking the body of a function and "inlining" it at the call site required that the compiler have the body of the function. inline functions could be put in header files, and each cpp file could see the body and "inline" the code into the call site.
It doesn't mean that the code is actually going to be inlined; rather, it makes it easier for compilers to inline it.
However, I am unaware of a compiler that checks that the definitions are identical before discarding duplicates. This includes compilers that otherwise check definitions of function bodies for being identical, such as MSVC's COMDAT folding. This makes me sad, because it is a reall subtle set of bugs.
The proper way around your problem is to place the function in an anonymous namespace. In general, you should consider putting everything in a source file in an anonymous namespace.
Another really nasty example of this:
// A.cpp
struct Helper {
std::vector<int> foo;
Helper() {
foo.reserve(100);
}
};
// B.cpp
struct Helper {
double x, y;
Helper():x(0),y(0) {}
};
methods defined in the body of a class are implicitly inline. The ODR rule applies. Here we have two different Helper::Helper(), both inline, and they differ.
The sizes of the two classes differ. In one case, we initialize two sizeof(double) with 0 (as the zero float is zero bytes in most situations).
In another, we first initialize three sizeof(void*) with zero, then call .reserve(100) on those bytes interpreting them as a vector.
At link time, one of these two implementations is discarded and used by the other. What more, which one is discarded is likely to be pretty determistic in a full build. In a partial build, it could change order.
So now you have code that might build and work "fine" in a full build, but a partial build causes memory corruption. And changing the order of files in makefiles could cause memory corruption, or even changing the order lib files are linked, or upgrading your compiler, etc.
If both cpp files had a namespace {} block containing everything except the stuff you are exporting (which can use fully qualified namespace names), this could not happen.
I've caught exactly this bug in production multiple times. Given how subtle it is, I do not know how many times it slipped through, waiting for its moment to pounce.
POINT OF CLARIFICATION:
Although the answer rooted in C++ inline rule is correct, it only applies if both sources are compiled together. If they are compiled separately, then, as one commentator noted, each resulting object file would contain its own 'foo()'. HOWEVER: If these two object files are then linked together, then because both 'foo()'-s are non-static, the name 'foo()' appears in the exported symbol table of both object files; then the linker has to coalesce the two table entries, hence all internal calls are re-bound to one of the two routines (presumably the one in the first object file processed, since it is already bound [i.e the linker would treat the second record as 'extern' regardless of binding]).

Inline functions in C++

Hii ,
I am a novice in C++. I did read about inline functions and understood them right. But this site says that "We get an 'unresolved external' error if we write the definition of an inline function in one .cpp file and call it from another file....why is that so ... ?
This can be done for normal functions right...Please correct me if i am wrong ...
Thanks
It's a language requirement. inline means that you may have the function defined in more than one translation unit but the definitions must be identical and that you must have a definition in every translation unit that uses the function.
Those are the rules. The rules allow (but don't require) the compiler to expand the code for the inline function at each call site and omit emitting a callable function version.
This is different from non-inline functions which must only be defined once across all translation units. This is the usual "one definition rule" which applies to most entities in C++.
inline doesn't change the linkage of a function. inline functions have, by default, external linkage so if you use a static variable inside an inline function the implementation must ensure that there is only one copy of that variable in the program.
Keep in mind that the compiler operates on a file-by-file basis, i.e. it treats each .cpp file as its own discrete unit. There is no connection between each one of them (except of course references to other functions and variables that are glued together by the linker).
If you inline something, and if the compiler decides to take you by the word (remember that inline is a hint, which means the compiler can choose to ignore you), it will embed the function into the code of whichever other block is calling it, so there will be no function that the linker can point other .cpp files two.
As an example:
File a.cpp:
void func1() {
// code...
}
This will create an object file (like a.obj) which contains the code for func1 in a way that others can call it. The linker will be able to tell other .cpp files to go there.
File b.cpp:
void func2() {
func1();
}
This will create b.obj which contains func2 with a function call to func1. The code has no idea what func1 does, it just has a branch here and asks the linker to put the right address in once everything has been compiled.
This is all nice an good, but if a.cpp only had an inlined version of func1, the linker will have nothing to give func2() to.

Does defining a function inside a header always make the compiler treat it as inline?

I just learned that defining a c++ function inside a class's header file make the function inline. But I know that putting the inline keyword next to a function is only a suggestion and the compiler wont necessarily follow it. Is this the same for header defined c++ functions and is there a difference in behavior between a standalone c++ function and a c++ function that is part of a class?
"defining a c++ function inside a class's header file make the function inline"
That's not true. Defining a function (that is to say, providing the body of the function instead of just a declaration) inside a class definition makes it inline. By "makes it inline", I mean it's the same as giving it the inline keyword. But class definitions don't have to be in headers, and headers can contain other things than class definitions.
So in this example, the function foo is implicitly inline. The function bar is not implicitly inline:
struct Foo {
void foo() {}
void bar();
};
void Foo::bar() {}
"putting the inline keyword next to a function is only a suggestion and the compiler wont necessarily follow it"
inline has two effects. One of them is a hint to the compiler which it can ignore. The other is not optional, and always has its effect. The "hint" is that the compiler is advised to replace calls to that function with a copy of the code for the function itself.
The guaranteed effect is that an inline function can be defined in multiple translation units, and those be linked together, without a multiple definition error, and all but one of the copies is removed by the linker. So, if the example above appears in a header file which is shared between multiple translation units, bar needs to be explicitly marked inline. Otherwise, the linker will discover multiple definitions of bar, which is not allowed.
Despite the name, inline in C++ is mostly about the second, compulsory effect, not the first, optional one. Modern optimising compilers have their own ideas about which calls should be inlined, and don't pay a whole lot of attention to inline when making that decision. For instance I've seen it have an effect in gcc at moderate optimisation levels, but at low levels approximately nothing is inlined, and at high levels approximately everything is (if the definition is available when the call is compiled) unless it makes the function too big.
Whether a function is defined in a header or in a cpp file has absolutely no effect on anything by itself. You can safely imagine that what #include does is copy and paste the header file into the cpp file in the preprocessor, before the compiler ever sees it. If a function is defined in the same translation unit as a call to it, then the function code is available to be inlined by the compiler. If they're in different translation units, then the code is not available and the call can only be inlined by the linker, with whole-program optimisation or similar. A "translation unit" more or less means, "a cpp file, after all the headers have been copy and pasted into it".
C++ compilers are free to choose what will be inline and what won't, no matter what hints you give them. It shouldn't matter if the function is part of a class or not, or whether it is in a header file or source file; the compiler doesn't pay attention to those things while making its decision.
No, not always. The compiler treats it as a hint, just like the inline keyword, but it mostly decides on its own, because it knows better than you what the costs and benefits may be. Inlining the code avoids the function call overhead, but makes the code bigger, which has negative performance impacts on the instruction cache.
These performance hints from the programmer are generally more and more often ignored by the compiler. What it does not ignore (or rather, what the linker does not ignore) is that a function declared inline may appear in several compilation units, and should be treated as multiple copies of the same function without resulting in linker errors.
If you place the definition of a free non-template function in a header file, you will end up with a function definition in each .cpp file that includes the header (directly or indirectly). This can lead to problems when linking.
However, if you declare the function to be inline, the linker will make sure you only use a single definition even if it is included at multiple places.

Does code in header file increases binary size?

Consider this:
class Foo{
void func1(){
/*func1 code*/
}
void func2(){
/*func2 code*/
}
};
Case 1: class Foo in Foo.h
Case 2: class Foo nicely seperated among Foo.h and Foo.cpp
Various other cpp files include Foo.h
My question is...Will Case 1 lead to a bigger binary?
Maybe it will, maybe it won't. It really has nothing to do with header files. What matters here is that your member functions are defined in the class definition. When member functions are defined like that, they are treated as inline functions. If the compiler decides not to actually inline any calls to these functions, there won't be any impact on code size. If the compiler decides to inline any (or all) of the calls, the answer would be "it depends". Inlining calls to small functions might result in increased code size as well as in decreased code size. This all depends on the function itself and on the compiler's capabilities (optimization capabilities specifically).
If compiler decides not to inline those functions, and generate separate body for them, these bodies will appear in each object file who uses them, but with special flag for linker - 'weak symbol'. When linker finds this flag, it will combine all symbols with that name into only one resulting symbol (maybe it will produce error message if bodies or sizes of such symbols are different)
Also RTTI info and vtables also use same scenario.
With dynamic libraries, weak symbol joining may happen at run-time, if they uses the same class.
If the functions in the header are declared as static, than yes, each module (source file) that includes that header file will store a copy of that function in the object file and the final executable will be bigger in size...
If you have the code definition in the header, the compiler might create redundant copies of each function whenever you include the .h. Those redundant copies might also trigger errors from the linker, so the practice is generally frowned upon except for inline functions.
If the code for functions is included inline in the headers, then the compiler can use that to define the functions in the object code for each separate source file, or embed the function code directly where the functions are called. Depending on your compiler and linker and the support for C++ generally, that may leave you with larger code than you would have with the functions all defined separately. If the inline functions are small enough, you may save space by avoiding function call overhead. However, such functions have to be very small.