Why linking to a LIB significantly increases binary's size

Why linking to a LIB significantly increases binary's size - c++

Lets say i have a module(DLL / EXE) which defines a certain flow with N objects, after compilation / linking, module's size is X.
If i ever decide to break down that module, into a main executable and a helper LIB file, counting exactly the N objects i described earlier, will the overall size of the executable remain the same?
I know that during linkage, the compiler decides which parts of the LIB copy into the executable, so i'd expect the overall size of the executable to be smaller or equal to the executable.
I've defined the LIB project with favor size over speed and minimum size(O1).
Just to clear things out, I've decided to implement a small HelloWorld function in the LIB(global function), and removed any references to the LIB's objects from the main executable, and executing the following command
#include "../LibObject/Function.h"
void main()
{
HelloWorld();
}
executable's overall size has remained the size as if i'd call to original objects, howcome?

Static libraries are in almost all regards just a collection of object modules (think of them as a .zip of .obj); there's no real difference for the linker whether you pass all your object files separately or all together in a static library (the dead functions elimination, if possible, is performed in the same way), so the fact that you see the same effect on the executable size with or without the intermediate library step is completely expected.

You are forward declaring the class but not defining it which doesn't really make sense. If it is defined in the header files then you don't need to forward declare it. If it is a class the you are creating then just forward declaring it is not enough. You need to define the class. You seem to have straddled the fence.
namespace Ramy{
namespace TEST {
namespace standard{
class StandardAnalyzer;
}
}
}
is the forward declaration. It just tells the compiler that the class exists, it doesn't tell the compiler anything about it. The compiler needs a class definition.
So, is it a class that is defined in the Ramy libraries or is it a class you are creating yourself? Depends on your answer.
This is the reason because when you link a program with a library increase size.
The library contain function's , dependecy needed by main program.

The lib file will always increase the size of the executable file, because you are executing a preprocessor with your application when you call a .h file.

Related

Is there a standard way to ensure that a piece of code is executed at global scope?

I have some code I want to execute at global scope. So, I can use a global variable in a compilation unit like this:
int execute_global_code();
namespace {
int dummy = execute_global_code();
}
The thing is that if this compilation unit ends up in a static library (or a shared one with -fvisibility=hidden), the linker may decide to eliminate dummy, as it isn't used, and with it my global code execution.
So, I know that I can use concrete solutions based on the specific context: specific compiler (pragma include), compilation unit location (attribute visibility default), surrounding code (say, make an dummy use of dummy in my code).
The question is, is there a standard way to ensure execute_global_code will be executed that can fit in a single macro which will work regardless of the compilation unit placement (executable or lib)? ie: only standard c++ and no user code outside of that macro (like a dummy use of dummy in main())

The issue is that the linker will use all object files for linking a binary given to it directly, but for static libraries it will only pull those object files which define a symbol that is currently undefined.
That means that if all the object files in a static library contain only such self-registering code (or other code that is not referenced from the binary being linked) - nothing from the entire static library shall be used!
This is true for all modern compilers. There is no platform-independent solution.
A non-intrusive to the source code way to circumvent this using CMake can be found here - read more about it here - it will work if no precompiled headers are used. Example usage:
doctest_force_link_static_lib_in_target(exe_name lib_name)
There are some compiler-specific ways to do this as grek40 has already pointed out in a comment.

Dynamic link vs. static link efficiency

I have an argument with another developer, I'd like to settle here over Dynamic Link vs. Static Link.
In Theory:
Say you have a library with 100 functions, each has significant amounts of code inside it:
int A()
int B()
int C()
..
..and so on...
And your application only calls or depends on one of them.
You have two methods at your disposal.
Build the library as a dynamic linked library
Build the library as a statically linked library
My colleague claims that linking the static library to our application, the compiler/linker will not add the code of the 99 unused functions into our executable. I claim it will. I claim in this scenario the only advantage is having a single executable and not having to distribute the library with our application, but it will not have significant size differences if we used a dynamically linked library approach.
Who is correct?

It can depend on a combination of how the code is organized, and what compiler flags you use.
Following the classic, simple model of things, the linker would link in whatever object files in the library were needed to satisfy the symbol references, so if your A(), B() and C() were each defined in different object files, only the object file that contained the symbol you actually used would be linked into the program (unless it, in turn, depended upon one or more of the others, in which case, the linker would find object files to satisfy those references as well, recursively, until it either satisfied them all, or found one it couldn't satisfy (at which time you'd get the standard "Unresolved external XXX" error message).
More recently, most compilers can "package" functions into separate "modules" without your having to put them into separate source files to create separate object files. Details vary, but can reduce (or eliminate) the necessity for having each source file as tiny as possible just to keep what ends up in the final executable to a minimum.
So, bottom line: at least for the most part, he's right and you're wrong.

It depends :-)
If you put each function in its own source file, or use the /Gy compile option, each function will be packaged in a separate section of the static library.
The linker will then be able to pick them up as needed, and only include the functions that are actually called.

Will g++ link my programs with classes it doesn't use from a library?

I've created a simple static library, contained in a .a file. I might use it in a variety of projects, some of which simply will not need 90% of it. For example, if I want to use neural networks, which are a part of my library, on an AVR microcomputer, I probably wont need a tonne of other stuff, but will that be linked in my code potentially generating a rather large file?
I intend to compile programs like this:
g++ myProg.cpp myLib.a -o prog

G++ will pull in only the object files it needs from your library, but this means that if one symbol from a single object file is used, everything in that object file gets added to your executable.
One source file becomes one object file, so it makes sense to logically group things together only when they are sure to be needed together.
This practice varies by compiler (actually by linker). For example, the Microsoft linker will pick object files apart and only include those parts that actually are needed.

You could also try to break your library into independent smaller parts and only link the parts you are really going to need.

When you link to a static library the linker pulls in things that resolve names used in other parts of the code. In general, if the name isn't used it doesn't get linked in.

The GNU linker will pull in the stuff it needs from the libraries you have specified on an object file by object file basis. Object files are atomic units as far as the GNU linker is concerned. It doesn't split them apart. The linker will bring in an object file if that object file defines one or more unresolved external references. That object file may have external references. The linker will try to resolve these, but if it can't, the linker adds those to the set of references that need to be resolved.
There are a couple of gotchas that can make for a much larger than needed executable. By larger than needed, I mean an executable that contains functions that will never be called, global objects that will never be examined or modified, during the execution of the program. You will have binary code that is unreachable.
One of these gotchas results when an object file contains a large number of functions or global objects. Your program might only need one of these, but your executable gets all of them because object files are atomic units to the linker. Those extra functions will be unreachable because there's no call path from your main to these functions, but they're still in your executable. The only way to ensure that this doesn't happen is to use the "one function per source file" rule. I don't follow that rule myself, but I do understand the logic of it.
Another set of gotchas occur when you use polymorphic classes. A constructor contains auto-generated code as well as the body of the constructor itself. That auto-generated code calls the constructors for parent classes, inserts a pointer to the vtable for the class in the object, and initializes data members per the initializer list. These parent class constructors, the vtable, and the mechanisms to process the initializer list might be external references that the linker needs to resolve. If the parent class constructor is in a larger header file, you've just dragged all that stuff into your executable.
What about the vtable? The GNU compiler picks a key member function as the place to store the vtable. That key function is the first member function in the class that does not have a an inline definition. Even if you don't call that member function, you get the object file that contains it in your executable -- and you get everything that that object file drags in.
Keeping your source files down to a small size once again helps with this "look what the cat dragged in!" problem. It's a good idea to pay special attention to the file that contains that key member function. Keep that source file small, at least in terms of stuff the cat will drag in. I tend to put small, self-contained member functions in that source file. Functions that will inevitably drag in a bunch of other stuff shouldn't go there.
Another issue with the vtable is that it contains pointers to all of the virtual functions for a class. Those pointers need to point to something real. Your executable will contain the object files that define each and every virtual function defined for a class, including the ones you never call. And you're going to get everything that those virtual functions drag in as well.
One solution to this problem is to avoid making big huge classes. They tend to drag in everything. God classes in particular are problematic in this regard. Another solution is to think hard about whether a function really does need to be virtual. Don't just make a function virtual because you think someday someone will need to overload it. That's speculative generality, and with virtual functions, speculative generality comes with a high cost.

Static variable initialization over a library

I am working on a factory that will have types added to them, however, if the class is not explicitly instiated in the .exe that is exectured (compile-time), then the type is not added to the factory. This is due to the fact that the static call is some how not being made. Does anyone have any suggestions on how to fix this? Below is five very small files that I am putting into a lib, then an .exe will call this lib. If there is any suggestions on how I can get this to work, or maybe a better design pattern, please let me know. Here is basically what I am looking for
1) A factory that can take in types
2) Auto registration to go in the classes .cpp file, any and all registration code should go in the class .cpp (for the example below, RandomClass.cpp) and no other files.
BaseClass.h : http://codepad.org/zGRZvIZf
RandomClass.h : http://codepad.org/rqIZ1atp
RandomClass.cpp : http://codepad.org/WqnQDWQd
TemplateFactory.h : http://codepad.org/94YfusgC
TemplateFactory.cpp : http://codepad.org/Hc2tSfzZ

When you are linking with a static library, you are in fact extracting from it the object files which provide symbols which are currently used but not defined. In the pattern that you are using, there is probably no undefined symbols provided by the object file which contains the static variable which triggers registration.
Solutions:
use explicit registration
have somehow an undefined symbol provided by the compilation unit
use the linker arguments to add your static variables as a undefined symbols
something useful, but this is often not natural
a dummy one, well it is not natural if it is provided by the main program, as a linker argument it main be easier than using the mangled name of the static variable
use a linker argument stating that all the objects of a library have to be included
dynamic libraries are fully imported, thus don't have that problem

As a general rule of thumb, an application do not include static or global variables from a library unless they are implicitly or explicitly used by the application.
There are hundred different ways this can be refactored. One method could be to place the static variable inside function, and make sure the function is called.

To expand on one of #AProgrammer's excellent suggestions, here is a portable way to guarantee the calling program will reference at least one symbol from the library.
In the library code declare a global function that returns an int.
int make_sure_compilation_unit_referenced() { return 0; }
Then in the header for the library declare a static variable that is initialized by calling the global function:
extern int make_sure_compilation_unit_referenced();
static int never_actually_used = make_sure_compilation_unit_referenced();
Every compilation unit that includes the header will have a static variable that needs to be initialized by calling a (useless) function in the library.
This is made a little cleaner if your library has its own namespace encapsulating both of the definitions, then there's less chance of name collisions between the bogus function in your library with other libraries, or of the static variable with other variables in the compilation unit(s) that include the header.

static variable initialisation code never gets called

I've got an application that's using a static library I made. One .cpp file in the library has a static variable declaration, whose ctor calls a function on a singleton that does something- e.g. adds a string.
Now when I use that library from the application, my singleton doesn't seem to contain any traces of the string that was supposed to be added.
I'm definitely missing something but I don't know what..

If you have an object in a static library that is not EXPLICITLY used in the application. Then the linker will not pull that object from the lib into the application.
There is a big difference between static and dynamic libraries.
Dynamic Library:
At compile time nothing is pulled from the dynamic library. Extra code is added to explicitly load and resolve the symbols at run-time. At run time the whole library is loaded and thus object initializers are called (though when is implementation detail).
Static libraries are handled very differently:
When you link against a static library it pulls all the items that are not defined in application that are defined in the library into the application. This is repeated until there are no more dependencies that the library can resolve. The side effect of this is that objects/functions not explicitly used are not pulled form the library (thus global variables that are not directly accessed will not be pulled).

My memory of this is a bit hazy, but you might be getting hit with an initialization order problem. There are no guarantees in which order static variable initializers in different files get called, so if your singleton isn't initialized yet when your static variable in the library is being initialized, that might produce the effect you're seeing.
The way I've gotten around these problems is to have some sort of an explicit init function that does this stuff and that I call at the start of main or something. You might be able to fiddle with the order in which you give the object file and library arguments to the compiler (or linker, actually) because that's also worked for me, but that solution is a bit fragile because it depends not only on using the specific linker but probably also the specific version.

Refactor the classes doing static initialization so they do not depend on any other such classes. That is, make each class's initialization independent and self-sufficient.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why linking to a LIB significantly increases binary's size - c++

The lib file will always increase the size of the executable file, because you are executing a preprocessor with your application when you call a .h file.

Related

Is there a standard way to ensure that a piece of code is executed at global scope?

Dynamic link vs. static link efficiency

Will g++ link my programs with classes it doesn't use from a library?

Static variable initialization over a library

static variable initialisation code never gets called

Categories

Resources