Are all the object file symbols added to the output? - build

I have an embedded application. The platform on which the program is ran is very limited in terms of resources(code memory included).
So I have a huge open-source library(.c and .h files), which is built along with my application(user) files.
Let assume I have a simple program in main.c that never invokes any of the library functions. Example:
#include "main.h"
volatile int a;
int main()
{
while(1)
{
if(a)
{
a=0;
}
}
return 0;
}
I the example above, let say main.h includes all the library header files.
The size of the the .text or ROM section (in my opinion) should be very small, because the program does not need(at the current state of development) any of the library's functions (Let assume I include them for premature design).
Will the size of the code memory be as big as to contain all the compiled symbols? Or (somehow) the linker is smart enough to know that not-referenced symbols have no place in the output?
If yes, what is the mechanism that determines what symbol (from object files pool) is to be placed in the output?

I would like to add answer to my question since no one did so far just comments of the type because this is how linker works(which give no information). So after a lot of search I came across to this SO question.
It explains the mechanism behind linker's decision making algorithm.
Namely what stays and what goes away :).

Related

c++ class without header

Ok, so I don't have a problem, but a question:
When using c++, you can transfer class to another file and include it without creating header, like this:
foo.cpp :
#include <iostream>
using namespace std;
class foo
{
public:
string str;
foo(string inStr)
{
str = inStr;
}
void print()
{
cout<<str<<endl;
}
};
main.cpp :
#include "foo.cpp"
using namespace std;
int main()
{
foo Foo("That's a string");
Foo.print();
return 0;
}
So the question is: is this method any worse than using header files? It's much easier and much more clean, but is it any slower, any more bug-inducing etc?
I've searched for this topic for a long time now but I haven't seen a single topic on the internet considering this even an option...
So the question is: is this method any worse than using header files?
You might consider reviewing the central idea of what the "C++ translation unit" is.
In your example, what the preprocessor does is as if it inserts a copy of foo.cpp into an internal copy of main.cpp. The preprocessor does this, not the compiler.
So ... the compiler never sees your code when they were separate files. It is this single, concatenated, 'translation unit' that is submitted to the compiler. There is no magic in .hh nor .cc, except that they fulfill your peer's (or boss's) expectations.
Now think about your question ... the translation unit is neither of your source files, nor any of your system include files, but it is one stream of text, one thing, put together by the preprocessor. So how would it be better or worse?
It's much easier and much more clean,
It can be. I often take this 'different' approach in my 'private' coding efforts.
When I did a quick eval of using gmpxx.h (mpz_class) in factorial, I did indeed take just these kinds of shortcuts, and did not need a .hpp file to properly create my compilation unit. FYI - The factorial of 12345, is more than 45,000 bytes. It is pointless to read the chars, too.
A 'more formal' effort (job, cooperation, etc), I always use header's, and separate compilation, and the building a library of functions useful to the app as part of how things should be done. Especially if I might share this code or contribute to a companies archives. There are too many good reasons for me to describe why I recommend you learn these issues.
but is it any slower, any more bug-inducing etc?
I think not. I think not. There is one compilation unit, and concatenating the parts has to be right, but I think is no more difficult.
I've searched for this topic for a long time now but I haven't seen a single
topic on the internet considering this even an option...
I'm not sure I've ever seen it discussed either. I have acquired the information. The separate compilations and library development are generally perceived to save development time. (Time is money, right?)
Also, a library, and header files, are how you package your success for others to use, how you can improve your value to a team.
There's no semantic difference between naming your files .cpp or .hpp (or .c / .h).
People will be surprised by the #include "foo.cpp", the compiler doesn't care
You've still created a "header file", but you've given it the ".cpp" extension. File extensions are for the programmer, the compiler doesn't care.
From the compiler's point of view, there is no difference between your example and
foo.h :
#include <iostream>
using namespace std;
class foo
{
//...
};
main.cpp :
#include "foo.h"
using namespace std;
int main()
{
// ...
}
A "header file" is just a file that you include at the beginning i.e. the head of another file (technically, headers don't need to be at the beginning and sometimes are not but typically they are, hence the name).
You've simply created a header file named foo.cpp.
Naming header files with extension that is conventionally used for source files is not a good idea. Some IDE's and other tools may erroneously assume that your header is a source file, and therefore attempt to compile as if it were such, wasting resources if nothing else.
Not to mention the confusion it may cause in your colleagues. Source files may have definitions that the C++ standard allows to be defined exactly once (see one definition rule, odr) because source files are not included in other files. If you name your header as if it were a source file, someone might assume that they can have odr definitions there when they can't.
If you ever build some larger project, the two main differences will become clear to you:
If you deliver your code as a library to others, you have to give them all your code - all your IP - instead of only the headers of the exposed classes plus a compiled library.
If you change one letter in any file, you will need to recompile everything. Once compile times for a larger project hits minutes, you will lose a lot of productivity.
Otherwise, of course it works, and the result is the same.

Is it practical to use Header files without a partner Class/Cpp file in C++

I've recently picked up C++ as part of my course, and I'm trying to understand in more depth the partnership between headers and classes. From every example or tutorial I've looked up on header files, they all use a class file with a constructor and then follow up with methods if they were included. However I'm wondering if it's fine just using header files to hold a group of related functions without the need to make an object of the class every time you want to use them.
//main file
#include <iostream>
#include "Example.h"
#include "Example2.h"
int main()
{
//Example 1
Example a; //I have to create an object of the class first
a.square(4); //Then I can call the function
//Example 2
square(4); //I can call the function without the need of a constructor
std::cin.get();
}
In the first example I create an object and then call the function, i use the two files 'Example.h' and 'Example.cpp'
//Example1 cpp
#include <iostream>
#include "Example.h"
void Example::square(int i)
{
i *= i;
std::cout << i << std::endl;
}
//Example1 header
class Example
{
public:
void square(int i);
};
In example2 I call the function directly from file 'Example2.h' below
//Example2 header
void square(int i)
{
i *= i;
std::cout << i;
}
Ultimately I guess what I'm asking is, if it's practical to use just the header file to hold a group of related functions without creating a related class file. And if the answer is no, what's the reason behind that. Either way I'm sure I've over looked something, but as ever I appreciate any kind of insight from you guys on this!
Of course, it's just fine to have only headers (as long as you consider the One Definition Rule as already mentioned).
You can as well write C++ sources without any header files.
Strictly speaking, headers are nothing else than filed pieces of source code which might be #included (i.e. pasted) into multiple C++ source files (i.e. translation units). Remembering this basic fact was sometimes quite helpful for me.
I made the following contrived counter-example:
main.cc:
#include <iostream>
// define float
float aFloat = 123.0;
// make it extern
extern float aFloat;
/* This should be include from a header
* but instead I prevent the pre-processor usage
* and simply do it by myself.
*/
extern void printADouble();
int main()
{
std::cout << "printADouble(): ";
printADouble();
std::cout << "\n"
"Surprised? :-)\n";
return 0;
}
printADouble.cc:
/* This should be include from a header
* but instead I prevent the pre-processor usage
* and simply do it by myself.
*
* This is intentionally of wrong type
* (to show how it can be done wrong).
*/
// use extern aFloat
extern double aFloat;
// make it extern
extern void printADouble();
void printADouble()
{
std::cout << aFloat;
}
Hopefully, you have noticed that I declared
extern float aFloat in main.cc
extern double aFloat in printADouble.cc
which is a disaster.
Problem when compiling main.cc? No. The translation unit is consistent syntactically and semantically (for the compiler).
Problem when compiling printADouble.cc? No. The translation unit is consistent syntactically and semantically (for the compiler).
Problem when linking this mess together? No. Linker can resolve every needed symbol.
Output:
printADouble(): 5.55042e-315
Surprised? :-)
as expected (assuming you expected as well as me nothing with sense).
Live Demo on wandbox
printADouble() accessed the defined float variable (4 bytes) as double variable (8 bytes). This is undefined behavior and goes wrong on multiple levels.
So, using headers doesn't support but enables (some kind of) modular programming in C++. (I didn't recognize the difference until I once had to use a C compiler which did not (yet) have a pre-processor. So, this above sketched issue hit me very hard but was really enlightening for me, also.)
IMHO, header files are a pragmatic replacement for an essential feature of modular programming (i.e. the explicit definion of interfaces and separation of interfaces and implementations as language feature). This seems to have annoyed other people as well. Have a look at A Few Words on C++ Modules to see what I mean.
C++ has a One Definition Rule (ODR). This rule states that functions and objects should be defined only once. Here's the problem: headers are often included more than once. Your square(int) function might therefore be defined twice.
The ODR is not an absolute rule. If you declare square as
//Example2 header
inline void square(int i)
// ^^^
{
i *= i;
std::cout << i;
}
then the compiler will inform the linker that there are multiple definitions possible. It's your job then to make sure all inline definitions are identical, so don't redefine square(int) elsewhere.
Templates and class definitions are exempt; they can appear in headers.
C++ is a multi paradigm programming language, it can be (at least):
procedural (driven by condition and loops)
functional (driven by recursion and specialization)
object oriented
declarative (providing compile-time arithmetic)
See a few more details in this quora answer.
Object oriented paradigm (classes) is only one of the many that you can leverage programming in C++.
You can mix them all, or just stick to one or a few, depending on what's the best approach for the problem you have to solve with your software.
So, to answer your question:
yes, you can group a bunch of (better if) inter-related functions in the same header file. This is more common in "old" C programming language, or more strictly procedural languages.
That said, as in MSalters' answer, just be conscious of the C++ One Definition Rule (ODR). Use inline keyword if you put the declaration of the function (body) and not only its definition (templates exempted).
See this SO answer for description of what "declaration" and "definition" are.
Additional note
To enforce the answer, and extend it to also other programming paradigms in C++,
in the latest few years there is a trend of putting a whole library (functions and/or classes) in a single header file.
This can be commonly and openly seen in open source projects, just go to github or gitlab and search for "header-only":
The common way is and always has been to put code in .cpp files (or whatever extension you like) and declarations in headers.
There is occasionally some merit to putting code in the header, this can allow more clever inlining by the compiler. But at the same time, it can destroy your compile times since all code has to be processed every time it is included by the compiler.
Finally, it is often annoying to have circular object relationships (sometimes desired) when all the code is the headers.
Some exception case is Templates. Many newer "modern" libraries such as boost make heavy use of templates and often are "header only." However, this should only be done when dealing with templates as it is the only way to do it when dealing with them.
Some downsides of writing header only code
If you search around, you will see quite a lot of people trying to find a way to reduce compile times when dealing with boost. For example: How to reduce compilation times with Boost Asio, which is seeing a 14s compile of a single 1K file with boost included. 14s may not seem to be "exploding", but it is certainly a lot longer than typical and can add up quite quickly. When dealing with a large project. Header only libraries do affect compile times in a quite measurable way. We just tolerate it because boost is so useful.
Additionally, there are many things which cannot be done in headers only (even boost has libraries you need to link to for certain parts such as threads, filesystem, etc). A Primary example is that you cannot have simple global objects in header only libs (unless you resort to the abomination that is a singleton) as you will run into multiple definition errors. NOTE: C++17's inline variables will make this particular example doable in the future.
To be more specific boost, Boost is library, not user level code. so it doesn't change that often. In user code, if you put everything in headers, every little change will cause you to have to recompile the entire project. That's a monumental waste of time (and is not the case for libraries that don't change from compile to compile). When you split things between header/source and better yet, use forward declarations to reduce includes, you can save hours of recompiling when added up across a day.

Source code build confusion (preprocessing and linking)

Why is it that during the preprocessing step, the #includes in a main file are only replaced with the contents of the relevant header files (and not the function definitions as well (.cpp files))?
I would think that during this step it should first go into the header files and replace the #includes there with the contents of their associated .cpp files and then and only then go back to replace the #includes in the main file with everything, thus negating the need for any linking (one giant file with everything). Why does it not happen this way?
Why is it that during the preprocessing step, the #includes in a main file are only replaced with the contents of the relevant header files (and not the function definitions as well (.cpp files))?
Simply put, the header files are the only files you've told the preprocessor about. It can't assume the names of the source files, because there could be many source files for any given header. You may be thinking "Hey, why don't I just include the source files?" and I'm here to tell you No! Bad! Besides, who's to say that you have access to the source files in the first place?
The only way for the compiler to know about and compile all of your source files is for you to pass the compiler each source file, have it compile them into objects, and link together those objects into a library or executable.
There's a great difference between compiling and linking:
Pre-processor, Pre-compile-time and Compile-time:
The pre-processor check for # symbol and replaces it with the relevant content eg:
#include <iostream> // the content will be replaced here and this line will be removed
So the content of iostream will be added above.
eg2:
#define PI 3.14 // wherever PI is used in your source file the macro will be expanded replacing each PI with the constant value 3.14
The compiler only checks for syntax errors,functions prototypes... and doesn't care about the body of functions, resulting in an.obj file`.
Link-time:
The linker links these obj files with the relevant libraries, and in this very time functions called must have a definition; without a definition will issue in a link-time error.
Why does it not happen this way?
History and the expectations of experienced programmers, and their experience with bigger programs (see last statements, below)
I would think that during this step it should first go into the header
files and replace the #includes there with the contents of their
associated .cpp files ...
If you accept a job coding in C++, your company will provide to you a coding standard detailing guide lines or rules which you will want to follow, or face defending your choices when you deviate from them.
You might take some time now to look at available coding standards. For example, try studying the Google C++ Style Guide (I don't particularly like or dislike this one, its just easy to remember). A simple google search can also find several coding standards. Adding a 'why conform to coding standard?' to your search might provide some info.
negating the need for any linking (one giant file with everything).
Note: this approach can not eliminate linking with compiler tools or 3rd party provided libraries. I often use -lrt, and -pthread, and some times -lncurses, -lgmp, -lgmpxx, etc.
For now, as an experiment, you can manually achieve the giant file approach (which I often do for my smaller trial and development of private tools).
Consider:
if main.cc has:
#include "./Foo.hh" // << note .hh file
int main(int argc, char* argv[])
{
Foo foo(argc, argv);
foo.show();
...
and Foo.cc has
#include "./Foo.hh" // << note .hh file
// Foo implementation
This is the common pattern (no, not the pattern book pattern), and will require you link together Foo.o and main, which is trivial enough for small builds, but still something more to do.
The 'small'-ness allows you to use #include to create your 'one giant file with everything' easily:
change main to
#include "./Foo.cc" // << note .cc also pulls in .hh
// (prepare for blow back on this idea)
int main(int argc, char* argv[])
{
Foo foo(argc, argv);
foo.show();
...
The compiler sees all the code in one compilation unit. No linking of local .o's needed (but still library linking).
Note, I do not recommend this. Why?
Probably the primary reason is that many of my tools have 100's of objects (ie. 100's of .cc files). That single 'giant' file can be quite giant.
For most development churn (i.e. early bug fixes), ONLY one or TWO of the .cc files gets changes. Recompiling all of the source code can be a big waste of your time, and your compiler's time.
The alternative is what the experience developers have already learned:
A) Compiling the much smaller number of .cc's that have changed (perhaps one or two?),
B) then linking them with the 100's of other .o's that have not changed is much quicker build.
A major key to your productivity is to minimize your edit-compile-debug duration. A and B and a good editor are important to this development iteration.

Why linking to a LIB significantly increases binary's size

Lets say i have a module(DLL / EXE) which defines a certain flow with N objects, after compilation / linking, module's size is X.
If i ever decide to break down that module, into a main executable and a helper LIB file, counting exactly the N objects i described earlier, will the overall size of the executable remain the same?
I know that during linkage, the compiler decides which parts of the LIB copy into the executable, so i'd expect the overall size of the executable to be smaller or equal to the executable.
I've defined the LIB project with favor size over speed and minimum size(O1).
Just to clear things out, I've decided to implement a small HelloWorld function in the LIB(global function), and removed any references to the LIB's objects from the main executable, and executing the following command
#include "../LibObject/Function.h"
void main()
{
HelloWorld();
}
executable's overall size has remained the size as if i'd call to original objects, howcome?
Static libraries are in almost all regards just a collection of object modules (think of them as a .zip of .obj); there's no real difference for the linker whether you pass all your object files separately or all together in a static library (the dead functions elimination, if possible, is performed in the same way), so the fact that you see the same effect on the executable size with or without the intermediate library step is completely expected.
You are forward declaring the class but not defining it which doesn't really make sense. If it is defined in the header files then you don't need to forward declare it. If it is a class the you are creating then just forward declaring it is not enough. You need to define the class. You seem to have straddled the fence.
namespace Ramy{
namespace TEST {
namespace standard{
class StandardAnalyzer;
}
}
}
is the forward declaration. It just tells the compiler that the class exists, it doesn't tell the compiler anything about it. The compiler needs a class definition.
So, is it a class that is defined in the Ramy libraries or is it a class you are creating yourself? Depends on your answer.
This is the reason because when you link a program with a library increase size.
The library contain function's , dependecy needed by main program.
The lib file will always increase the size of the executable file, because you are executing a preprocessor with your application when you call a .h file.

Is C++ linkage smart enough to avoid linkage of unused libraries?

I'm far from fully understanding how the C++ linker works and I have a specific question about it.
Say I have the following:
Utils.h
namespace Utils
{
void func1();
void func2();
}
Utils.cpp
#include "some_huge_lib" // Needed only by func2()
namespace Utils
{
void func1() { /* Do something */ }
void func2() { /* Make use of some functions defined in some_huge_lib */ }
}
main.cpp
int main()
{
Utils::func1();
}
My goal is to generate as small binary files as possible.
Will some_huge_lib be included in the output object file?
Including or linking against large libraries usually won't make a difference unless you use that stuff. Linkers should perform dead code elimination and thus ensure that at build time you won't be getting large binaries with a lot of unused code (read your compiler/linker manual to find out more, this isn't enforced by the C++ standard).
Including lots of headers won't increase your binary size either (but it might substantially increase your compilation time, cfr. precompiled headers). Some exceptions stand for global objects and dynamic libraries (those can't be stripped). I also recommend to read this passage (gcc only) regarding separating code into multiple sections.
One last notice about performances: if you use a lot of position dependent code (i.e. code that can't just map to any address with relative offsets but needs some 'hotpatching' via a relocation or similar table) then there will be a startup cost.
This depends a lot on what tools and switches you use in order to link and compile.
Firstly, if link some_huge_lib as a shared library, all the code and dependencies will need to be resolved on linking the shared library. So yes, it'll get pulled in somewhere.
If you link some_huge_lib as an archive, then - it depends. It is good practice for the sanity of the reader to put func1 and func2 in separate source code files, in which case in general the linker will be able to disregard the unused object files and their dependencies.
If however you have both functions in the same file, you will, on some compilers, need to tell them to produce individual sections for each function. Some compilers do this automatically, some don't do it at all. If you don't have this option, pulling in func1 will pull in all the code for func2, and all the dependencies will need to be resolved.
Think of each function as a node in a graph.
Each node is associated with a piece of binary code - the compiled binary of the node's function.
There is a link (directed edge) between 2 nodes if one node (function) depends on (calls) another.
A static library is primarily a list of such nodes (+ an index).
The program starting-node is the main() function.
The linker traverses the graph from main() and links into the executable all the nodes that are reachable from main(). That's why it is called a linker (the linking maps the function call addresses within the executable).
Unused functions, do not have links from nodes in the graph emanating from main().
Thus, such disconnected nodes are not reachable and are not included in the final executable.
The executable (as opposed to the static library) is primarily a list of all nodes reachable from main() (+ an index and startup code among other things).
In addition to other replies, it must be said that normally linkers work in terms of sections, not functions.
Compilers typically have it configurable whether they put all of your object code into one monolithic section or split it into a number of smaller ones. For example, GCC options to switch on splitting are -ffunction-sections (for code) and -fdata-sections (for data); MSVC option is /Gy (for both). -fnofunction-sections, -fnodata-sections, /Gy- respectively to put all code or data into one section.
You might 'play' with compiling your modules in both modes and then dumping them (objdump for GCC, dumpbin for MSVC) to see the generated object file structure.
Once a section is formed by the compiler, for the linker it is a unit. Sections define symbols and refer to symbols defined in other sections. The linker will build dependency graph between the sections (starting at a number of roots) and then either disband or keep each of them entirely. So, if you have a used and an unused function in a section, the unused function will be kept.
There are both benefits and drawbacks in either mode. Turning splitting on means smaller executable files, but larger object files and longer linking times.
It has to also be noted that in C++, unlike C, there are certain situations where the One Definition Rule is relaxed, and multiple definitions of a function or data object are allowed (for example, in case of inline functions). The rules are formulated in such way that the linker is allowed to pick any definition.
From the point of view of sections, putting inline functions together with non-inline ones would mean that in a typical use scenario the linker would typically be forced to keep virtually every definition of every inline function; that would mean excessive code bloat. Therefore, such functions and data are normally put into their own sections regardless of compiler command line options.
UPDATE: As #janm correctly reminded in his comment, the linker must also be instructed to get rid of unreferenced sections by specifying --gc-sections (GNU) or /opt:ref (MS).