I am writing a hello world c++ application, in the instruction #include help the compiler or linker to import the c++ library. My " cout << "hello world"; " use a cout in the library. The question is after compile and generated exe is about 96k in size, so what instructions are actually contained in this exe file, does this file also contains the iostream library?
Thanks
In the general case, the linker will only bring in what it needs. Once the compiler phase has turned your source code into an object file, it's treated much the same as all other object files. You have:
the C start-up code which prepares the execution environment (sets up argv, argv and so on) then calls your main or equivalent.
your code itself.
whatever object files need to be dragged in from libraries (dynamic linking is a special case of linking that happens at runtime and I won't cover that here since you asked specifically about static linking).
The linker will include all the object files you explicitly specify (unless it's a particularly smart linker and can tell you're not using the object file).
With libraries, it's a little different. Basically, you start with a list of unresolved symbols (like cout). The linker will search all the object files in all the libraries you specify and, when it finds an object file that satisfies that symbol, it will drag it in and fix up the symbol references.
This may, of course, add even more unresolved symbols if, for example, there was something in the object file that relies on the C printf function (unlikely but possible).
The linker continues like this until all symbols are satisfied (when it gives you an executable) or one cannot be satisfied (when it complains to you bitterly about your coding practices).
So as to what is in your executable, it may be the entire iostream library or it may just be the minimum required to do what you asked. It will usually depend on how many object files the iostream library was built into.
I've seen code where an entire subsystem went into one object file so, that if you wanted to just use one tiny bit, you still got the lot. Alternatively, you can put every single function into its own object file and the linker will probably create an executable as small as possible.
There are options to the linker which can produce a link map which will show you how things are organised. You probably won't generally see it if you're using the IDE but it'll be buried deep within the compile-time options dialogs under MSVC.
And, in terms of your added comment, the code:
cout << "hello";
will quite possibly bring in sizeable chunks of both the iostream and string processing code.
Use cl /EHsc hello.cpp -link /MAP. The .map file generated will give you a rough idea which pieces of the static library are present in the .exe.
Some of the space is used by C++ startup code, and the portions of the static library that you use.
In windows, the library or part of the libraries (which are used) are also usually included in the .exe, the case is different in case of Linux. However, there are optimization options.
I guess this Wiki link will be useful : Static Libraries
Related
i want to know that in a c++ code during execution how iostream file is founded. we write #include in c++ program and i know about #include which is a preprocessor directive
to load files and is a file name but i don't know that how that file is located.
i have some questions in my mind...
Is Standard library present in compiler which we are using?
Is that file is present in standard library or in our computer?
Can we give directory path to locate the file through c++ code if yes then how?
You seem to be confused about the compilation and execution model of C++. C++ is generally not interpreted (though it could) but instead a native binary is produced during the compilation phase, which is then executed... So let's take a detour.
In order to go from a handful of text files to a program being executed, there are several steps:
compilation
link
load
execution proper
I will only describe what traditional compilers do (such as gcc or clang), potential variations will be indicated later on.
Compiling
During compilation, each source file (generally .cpp or .cxx though the compiler could care less) is processed to produced an object file (generally .o on Linux):
The source file is preprocessed: this means resolving the #include (copy/pasting the included file in the current file), navigating the #if and #else to remove unneeded sources and expanding macros.
The preprocessed file is fed to the compiler which will produce native code for each function, static or global variable, etc... the format depends on the target system in general
The compiler outputs the object file (it's a binary format, in general)
Linking
In this phase, multiple object files are assembled together into a library or an executable. In the case of a static library or a statically-linked executable, the libraries it depend on are also assembled in the produced file.
Traditionally, the linker job is relatively simple: it just concatenates all object files, which already contain the binary format the target machine can execute. However it often actually does more: in C and C++ inline functions are duplicated across object files, so the linker need to keep only one of the definitions, for example.
At this point, the program is "compiled", and we live the realm of the compiler.
Loading
When you ask to execute a program, the OS will load its code into memory (thanks to a loader) and execute it.
In the case of a statically linked executable, this is easy: it's just a single big blob of code that need be loaded. In the case of a dynamically linked executable, it implies finding the dependencies and "resolving the symbols", I'll describe this below:
First of all, your dynamically linked executable and libraries have a section describing which other libraries they depend on. They only have the name of the library, not its exact location, so the loader will search among a list of paths (LD_LIBRARY_PATH for example on Linux) for the libraries and actually load them.
When loading a library, the loader will perform replacements. Your executable had placeholders saying "Here should be the address of the printf function", and the loader will replace that placeholder with the actual address.
Once everything is loaded properly, all symbols should be resolved. If some symbols are missing, ie the code for them is not found in either the present library or any of its dependencies, then you generally get an error (either immediately, or only when the symbol is actually needed if you use lazy loading).
Executing
The code (assembler instruction in binary format) is now executed. In C++, this starts with building the global and static objects (at file scope, not function-static), and then goes on to calling your main function.
Caveats: this is a simplified view, nowadays Link-Time Optimizations mean that the linker will do more and more, the loader could actually perform optimizations too and as mentioned, using lazy loading, the loader might be invoked after the execution started... still, you've got to start somewhere, don't you ?
So, what does it mean about your question ?
The #include <iostream> in your source code is a pre-processor directive. It is thus fully resolved early in the compilation phase, and only depends on finding the appropriate header (no library code is actually needed yet). Note: a compiler would be allowed not to have a header file sitting around, and just magically inject the necessary code as if the header file existed, because this is a Standard Library header (thus special). For regular headers (yours) the pre-processor is invoked.
Then, at link-time:
if you use static linking, the linker will search for the Standard Library and include it in the executable it produces
if you use dynamic linking, the linker will note that it depends on the Standard Library file (libc++.so for example) and that the produced code is missing an implementation of printf (for example)
Then, at load time:
if you used dynamic linking, the loader will search for the Standard Library and load its code
Then, at execution time, the code (yours and its dependencies) is finally executed.
Several Standard Library implementations exist, off the top of my head:
MSVC ships with a modified version of Dirkumware
gcc ships with libstdc++ (which depends on libc)
clang ships with libc++ (which depends on libc), but may use libstdc++ instead (with compiler flags)
And of course, you could provide others... probably... though setting it up might not be easy.
Which is ultimately used depends on the compiler options you use. By default the most common compiler ship with their own implementation and use it without any intervention on your part.
And finally yes, you can indeed specify paths in a #include directive. For example, using boost:
#include <boost/optional.hpp>
#include <boost/algorithm/string/trim.hpp>
you can even use relative paths:
#include <../myotherproject/x.hpp>
though this is considered poor form by some (since it breaks as soon as your reorganize your files).
What matters is that the pre-processor will look through a list of directories, and for each directory append / and the path you specified. If this creates the path of an existing file, it picks it, otherwise it continues to the next directory... until it runs out (and complain).
The <iostream> file is just not needed during execution. But that's just the header. You do need the standard library, but that's generally named differently if not included outright into your executable.
The C++ Standard Library doesn't ship with your OS, although on many Linux systems the line between OS and common libraries such as the C++ Standard Library is a bit thin.
How you locate that library is very much OS dependent.
There can be 2 ways to load a header file (like iostream.h) in C++
if you write the code as:
# include <iostream>
It will look up the header file in include directory of your C++ compiler and will load it
Other way you can give the full path of the header file as:
# include "path_of_file.h"
And loading up the file is OS dependent as answered by MSalters
You definitely required the standard library header files so that pre-processor directive can locate them.
Yes those files are present in the library and on include copied to our code.
if we had defined the our own header file then we have to give path of that file. In that way we can include also *.c or *.cpp along with the header files in which we had defined various methods and those had to include at pre-processing time.
I found one question about compiling and linking in C++ and I don't know which answer is correct. It was discussed with my friends and opinions are divided. Here is a question:
In order to run program written in C++ language its source code is:
(A) compiled to machine code,
(B) compiled and linked to machine code
In my opinion the correct answer is A but I don't have any source to prove it.
Google, first hit.
Linkage is needed as well to create a standalone executable.
You need to link the code you have produced to make it into an executable file. For simple programs, the compiler does this for you, by calling the linker at the end of the compilation process.
The compiler proper simply translates C code to either assembler (classic C compiler) which is then assembled with an assembler or directly to machine code (many modern compilers). The machine code is usually produced as "object files", which are not "executable", because they refer to external units - such as when you call printf(). It is possible to write C code that is completely standalone, but you still typically need to combine more than one object file, and it certainly needs to be "formatted" to the right way to make an executable file - which is a different file-format than an object file [although typically fairly SIMILAR].
Compilation does nothing except creation of object files which means converting C/C++ source code to machine codes.
Linking process is the creation of executable file from multiple obj files. So for running an application/executable you have to also link it.
During compilation, compiler doesn't complain about non existing functions or broken functions, because it will assume it might be defined in another object (source code file). Linker verifies all functions and their existance, so if you have a broken function, you'll get error in linking process
Compiling: Takes input C/C++-code and produces machinecode (object file)
gcc –c MyProgram.c
Note that the object file does not contain all external references!
Linking: Combines object file with external references into an executable file
gcc MyProgram.o –o MyProgram
Note that no unresolved references!
Illustration:
Where libc.a is the standard C library and it's automatically linked into your programs by the gcc.
I've just noticed that your question was about c++, the same concept is in c++ too, if you understand this, you'll understand how it works in c++ too
strictly speaking. Answer A.
But for you to see the whole picture, lets say you have defined some function. Then the compiler writes the machine code code of that function at some address, and puts that address and the name of the function in the object ".o" file where the linker can find it. The linker then take this "machine code" and resolve the symbols as you might heard in some previous error.
I have a common scenario:
Some source files for an executable which depend on some standard libraries and some user libraries. All the user libraries are statically linked into the executable whereas the standard libraries are linked dynamically to it.
Problem:
I believe that I have multiply defined symbols in my complete package (executable which already includes the user library code + shared standard libraries). The linker obviously has insight into it, but as I understand the linker won't complain unless it encounters multiple strong named symbols. My fear is that, while I am moving my code from solaris 8/sparc platform to solaris10/sparc platform, some standard unix functions have been implemented in the user libraries which are causing the app to crash at runtime. Note that the app runs fine in solaris 8/sparc platform
I have been facing weird issues which have led me to believe this might be the source
Modifying one variable from one library is changing the value of another variable in another library
Solaris 8-10: host2ip conversion problems
What I need:
Is there a way to easily list all multiply defined symbols?
Is there a way to easily list all multiply defined symbols stemming from user libraries?
Do you guys think the issue #1 might be caused by linking issues, or you feel it might be a sign of some other issue?
Edit1:
Since then I know that on generating map file using ld, it has a section of multiply defined symbol which I am going through to find names that look like standard library call. For people who do not know, the linker will only fail to link if it finds multiple symbols with the same name AND the names are strong names.
You could turn on MAP file generation in the compiler (actually linker) settings and look through the map file for symbols that match the UNIX system functions you are concerned about. You'd probably have to write a script to automate it, but this would be a good starting point. The command line switch is probably -map or something similar, it will depend on which compiler/linker you are using.
The actual problem that was happening is:
The library (let's call it lib1) had an array like below
#define ARRAY_SIZE 1024
SomeStruct* global_array[ARRAY_SIZE];
This array is used by my another library (let's call it lib2) which in turn is used by my application using an extern declaration for it.
While compiling lib2 (or is it the app not sure), we did not define ARRAY_SIZE at all. This somehow caused the compiler of lib2 (or the app) to miscalculate the size of global_array in-turn causing it to allocate the memory for some other variable at a location which was already allocated to the global_array.
By defining ARRAY_SIZE again while compiling my libs and apps, everything starts behaving normal. I do not fully understand what caused the issue and why it gets resolved since extern declaration of arrays do not contain the size. Also, if the library really used the MACRO ARRAY_SIZE, then why wouldn't the compilation fail? Also, there is a possibility, that the name used for the define is a standard name (the actual string was FD_SETSIZE)
My initial gut feeling about the linker was wrong.
I'm going to ask how it's done in c++, but this idea can apply to multiple languages. If you know how to do it in objective-c as well, please provide any similarities between the two
Lets say I want to create an instance of an ofstream like
ofstream myfile;
I'm assuming all I have on my computer is the *.o file (in a library archive) and the *.h file for iostream class. If this part isn't true let me know. I am assuming this when all I have installed is the runtime and the devel packages, not the source files.
How does it connect the header file to the object file, is there a naming scheme. And where does it look and in what order.?
Why this is confusing me is normally when I want to create a class I link my implementation of the class with the program, so where does it now and how does it now to link the files?
One more, does it matter if it loaded statically or dynamically?
Thank you in advance, and sry if this is a silly question.
Computer Science 101:
Broadly speaking (VERY broadly!), there are two kinds of "programs":
a) Interpreted: you read the program source line-by-line every time you execute it
<= *nix shell scripts and DOS .bat files are "interpeted"
b) Compiled: you read the source once (to convert it into a "binary machine code"). You link the machine code "object files" to build an "executable program".
You're talking about "compiled programs"
The "ofstream" part is irrelevant once the program is "compiled"
The binary implementation for "ofstream" can be compiled directly into the executable, or it can be dynamically loaded from a shared library (.dll) at runtime.
A "compiler" users ".h" headers to process the source file.
A "linker" uses ".lib" libraries to match symbols and link static code at link type.
The "Operating System" recognizes dynamic links and loads the needed shared libraries (.dll's) at runtime.
Three different things, all independent of each other: Compiler/source code, Linker/machine object code, OS/executable programs
'Hope that helps .. a bit...
This is not standardized and it's up to the implementation. I don't know about *unix, but I assume it's fairly similar to Windows.
You can assume that .o files are similar to library files .lib.
The header does define the class definition, so that the linker knows what to look for in the library.
Say you have a header:
class A
{
public:
A();
void foo();
};
and a lib file A.lib.
You include that header and call:
A a;
a.foo();
The compiler finds the declarations for bot A() and A::foo(). Now it knows it has to search the library for these functions. Names in the library are decorated, and contain modifiers, but its specific to the compiler so the linker finds the functions if they are exported in the library. It then binds the functions to the specific entry point from the dll.
If by dynamic loading you mean using LoadModule() and GetProcAddress() instead of linking, than the concept is pretty similar.
If you do static linking all symbols with linkage are available in the .obj file. The linker binds the calls of the functions to the entry points of the functions. There is a name mangeling involved in this process so that the symbols can be resolved correctly.
Dynamic linking is a platform dependent issue and not part of the C or C++ standard as far as I know.
I have some doubt about how do programs use shared library.
When I build a shared library ( with -shared -fPIC switches) I make some functions available from an external program.
Usually I do a dlopen() to load the library and then dlsym() to link the said functions to some function pointers.
This approach does not involve including any .h file.
Is there a way to avoid doing dlopen() & dlsym() and just including the .h of the shared library?
I guess this may be how c++ programs uses code stored in system shared library. ie just including stdlib.h etc.
Nick, I think all the other answers are actually answering your question, which is how you link libraries, but the way you phrase your question suggests you have a misunderstanding of the difference between headers files and libraries. They are not the same. You need both, and they are not doing the same thing.
Building an executable has two main phases, compilation (which turns your source into an intermediate form, containing executable binary instructions, but is not a runnable program), and linking (which combines these intermediate files into a single running executable or library).
When you do gcc -c program.c, you are compiling, and you generate program.o. This step is where headers matter. You need to #include <stdlib.h> in program.c to (for example) use malloc and free. (Similarly you need #include <dlfcn.h> for dlopen and dlsym.) If you don't do that the compiler will complain that it doesn't know what those names are, and halt with an error. But if you do #include the header the compiler does not insert the code for the function you call into program.o. It merely inserts a reference to them. The reason is to avoid duplication of code: The code is only going to need to be accessed once by every part of your program, so if you needed further files (module1.c, module2.c and so on), even if they all used malloc you would merely end up with many references to a single copy of malloc. That single copy is present in the standard library in either it's shared or static form (libc.so or libc.a) but these are not referenced in your source, and the compiler is not aware of them.
The linker is. In the linking phase you do gcc -o program program.o. The linker will then search all libraries you pass it on the command line and find the single definition of all functions you've called which are not defined in your own code. That is what the -l does (as the others have explained): tell the linker the list of libraries you need to use. Their names often have little to do with the headers you used in the previous step. For example to get use of dlsym you need libdl.so or libdl.a, so your command-line would be gcc -o program program.o -ldl. To use malloc or most of the functions in the std*.h headers you need libc, but because that library is used by every C program it is automatically linked (as if you had done -lc).
Sorry if I'm going into a lot of detail but if you don't know the difference you will want to. It's very hard to make sense of how C compilation works if you don't.
One last thing: dlopen and dlsym are not the normal method of linking. They are used for special cases where you want to dynamically determine what behavior you want based on information that is, for whatever reason, only available at runtime. If you know what functions you want to call at compile time (true in 99% of the cases) you do not need to use the dl* functions.
You can link shared libraries like static one. They are then searched for when launching the program. As a matter of fact, by default -lXXX will prefer libXXX.so to libXXX.a.
You need to give the linker the proper instructions to link your shared library.
The shared library names are like libNAME.so, so for linking you should use -lNAME
Call it libmysharedlib.so and then link your main program as:
gcc -o myprogram myprogram.c -lmysharedlib
If you use CMake to build your project, you can use
TARGET_LINK_LIBRARIES(targetname libraryname)
As in:
TARGET_LINK_LIBRARIES(myprogram mylibrary)
To create the library "mylibrary", you can use
ADD_LIBRARY(targetname sourceslist)
As in:
ADD_LIBRARY(mylibrary ${mylibrary_SRCS})
Additionally, this method is cross-platform (whereas simply passing flags to gcc is not).
Shared libraries (.so) are object files where the actual source code of function/class/... are stored (in binary)
Header files (.h) are files indicating (the reference) where the compiler can find function/class/... (in .so) that are required by the main code
Therefore, you need both of them.