I am trying to build a Fortran program, but I get errors about an undefined reference or an unresolved external symbol. I've seen another question about these errors, but the answers there are mostly specific to C++.
What are common causes of these errors when writing in Fortran, and how do I fix/prevent them?
This is a canonical question for a whole class of errors when building Fortran programs. If you've been referred here or had your question closed as a duplicate of this one, you may need to read one or more of several answers. Start with this answer which acts as a table of contents for solutions provided.
A link-time error like these messages can be for many of the same reasons as for more general uses of the linker, rather than just having compiled a Fortran program. Some of these are covered in the linked question about C++ linking and in another answer here: failing to specify the library, or providing them in the wrong order.
However, there are common mistakes in writing a Fortran program that can lead to link errors.
Unsupported intrinsics
If a subroutine reference is intended to refer to an intrinsic subroutine then this can lead to a link-time error if that subroutine intrinsic isn't offered by the compiler: it is taken to be an external subroutine.
implicit none
call unsupported_intrinsic
end
With unsupported_intrinsic not provided by the compiler we may see a linking error message like
undefined reference to `unsupported_intrinsic_'
If we are using a non-standard, or not commonly implemented, intrinsic we can help our compiler report this in a couple of ways:
implicit none
intrinsic :: my_intrinsic
call my_intrinsic
end program
If my_intrinsic isn't a supported intrinsic, then the compiler will complain with a helpful message:
Error: ‘my_intrinsic’ declared INTRINSIC at (1) does not exist
We don't have this problem with intrinsic functions because we are using implicit none:
implicit none
print *, my_intrinsic()
end
Error: Function ‘my_intrinsic’ at (1) has no IMPLICIT type
With some compilers we can use the Fortran 2018 implicit statement to do the same for subroutines
implicit none (external)
call my_intrinsic
end
Error: Procedure ‘my_intrinsic’ called at (1) is not explicitly declared
Note that it may be necessary to specify a compiler option when compiling to request the compiler support non-standard intrinsics (such as gfortran's -fdec-math). Equally, if you are requesting conformance to a particular language revision but using an intrinsic introduced in a later revision it may be necessary to change the conformance request. For example, compiling
intrinsic move_alloc
end
with gfortran and -std=f95:
intrinsic move_alloc
1
Error: The intrinsic ‘move_alloc’ declared INTRINSIC at (1) is not available in the current standard settings but new in Fortran 2003. Use an appropriate ‘-std=*’ option or enable ‘-fall-intrinsics’ in order to use it.
External procedure instead of module procedure
Just as we can try to use a module procedure in a program, but forget to give the object defining it to the linker, we can accidentally tell the compiler to use an external procedure (with a different link symbol name) instead of the module procedure:
module mod
implicit none
contains
integer function sub()
sub = 1
end function
end module
use mod, only :
implicit none
integer :: sub
print *, sub()
end
Or we could forget to use the module at all. Equally, we often see this when mistakenly referring to external procedures instead of sibling module procedures.
Using implicit none (external) can help us when we forget to use a module but this won't capture the case here where we explicitly declare the function to be an external one. We have to be careful, but if we see a link error like
undefined reference to `sub_'
then we should think we've referred to an external procedure sub instead of a module procedure: there's the absence of any name mangling for "module namespaces". That's a strong hint where we should be looking.
Mis-specified binding label
If we are interoperating with C then we can specify the link names of symbols incorrectly quite easily. It's so easy when not using the standard interoperability facility that I won't bother pointing this out. If you see link errors relating to what should be C functions, check carefully.
If using the standard facility there are still ways to trip up. Case sensitivity is one way: link symbol names are case sensitive, but your Fortran compiler has to be told the case if it's not all lower:
interface
function F() bind(c)
use, intrinsic :: iso_c_binding, only : c_int
integer(c_int) :: f
end function f
end interface
print *, F()
end
tells the Fortran compiler to ask the linker about a symbol f, even though we've called it F here. If the symbol really is called F, we need to say that explicitly:
interface
function F() bind(c, name='F')
use, intrinsic :: iso_c_binding, only : c_int
integer(c_int) :: f
end function f
end interface
print *, F()
end
If you see link errors which differ by case, check your binding labels.
The same holds for data objects with binding labels, and also make sure that any data object with linkage association has matching name in any C definition and link object.
Equally, forgetting to specify C interoperability with bind(c) means the linker may look for a mangled name with a trailing underscore or two (depending on compiler and its options). If you're trying to link against a C function cfunc but the linker complains about cfunc_, check you've said bind(c).
Not providing a main program
A compiler will often assume, unless told otherwise, that it's compiling a main program in order to generate (with the linker) an executable. If we aren't compiling a main program that's not what we want. That is, if we're compiling a module or external subprogram, for later use:
module mod
implicit none
contains
integer function f()
f = 1
end function f
end module
subroutine s()
end subroutine s
we may get a message like
undefined reference to `main'
This means that we need to tell the compiler that we aren't providing a Fortran main program. This will often be with the -c flag, but there will be a different option if trying to build a library object. The compiler documentation will give the appropriate options in this case.
There are many possible ways you can see an error like this. You may see it when trying to build your program (link error) or when running it (load error). Unfortunately, there's rarely a simple way to see which cause of your error you have.
This answer provides a summary of and links to the other answers to help you navigate. You may need to read all answers to solve your problem.
The most common cause of getting a link error like this is that you haven't correctly specified external dependencies or do not put all parts of your code together correctly.
When trying to run your program you may have a missing or incompatible runtime library.
If building fails and you have specified external dependencies, you may have a programming error which means that the compiler is looking for the wrong thing.
Not linking the library (properly)
The most common reason for the undefined reference/unresolved external symbol error is the failure to link the library that provides the symbol (most often a function or subroutine).
For example, when a subroutine from the BLAS library, like DGEMM is used, the library that provides this subroutine must be used in the linking step.
In the most simple use cases, the linking is combined with compilation:
gfortran my_source.f90 -lblas
The -lblas tells the linker (here invoked by the compiler) to link the libblas library. It can be a dynamic library (.so, .dll) or a static library (.a, .lib).
In many cases, it will be necessary to provide the library object defining the subroutine after the object requesting it. So, the linking above may succeed where switching the command line options (gfortran -lblas my_source.f90) may fail.
Note that the name of the library can be different as there are multiple implementations of BLAS (MKL, OpenBLAS, GotoBLAS,...).
But it will always be shortened from lib... to l... as in liopenblas.so and -lopenblas.
If the library is in a location where the linker does not see it, you can use the -L flag to explicitly add the directory for the linker to consider, e.g.:
gfortran -L/usr/local/lib -lopenblas
You can also try to add the path into some environment variable the linker searches, such as LIBRARY_PATH, e.g.:
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/lib
When linking and compilation are separated, the library is linked in the linking step:
gfortran -c my_source.f90 -o my_source.o
gfortran my_source.o -lblas
Not providing the module object file when linking
We have a module in a separate file module.f90 and the main program program.f90.
If we do
gfortran -c module.f90
gfortran program.f90 -o program
we receive an undefined reference error for the procedures contained in the module.
If we want to keep separate compilation steps, we need to link the compiled module object file
gfortran -c module.f90
gfortran module.o program.f90 -o program
or, when separating the linking step completely
gfortran -c module.f90
gfortran -c program.f90
gfortran module.o program.o -o program
Problems with the compiler's own libraries
Most Fortran compilers need to link your code against their own libraries. This should happen automatically without you needing to intervene, but this can fail for a number of reasons.
If you are compiling with gfortran, this problem will manifest as undefined references to symbols in libgfortran, which are all named _gfortran_.... These error messages will look like
undefined reference to '_gfortran_...'
The solution to this problem depends on its cause:
The compiler library is not installed
The compiler library should have been installed automatically when you installed the compiler. If the compiler did not install correctly, this may not have happened.
This can be solved by correctly installing the library, by correctly installing the compiler. It may be worth uninstalling the incorrectly installed compiler to avoid conflicts.
N.B. proceed with caution when uninstalling a compiler: if you uninstall the system compiler it may uninstall other necessary programs, and may render other programs unusable.
The compiler cannot find the compiler library
If the compiler library is installed in a non-standard location, the compiler may be unable to find it. You can tell the compiler where the library is using LD_LIBRARY_PATH, e.g. as
export LD_LIBRARY_PATH="/path/to/library:$LD_LIBRARY_PATH"
If you can't find the compiler library yourself, you may need to install a new copy.
The compiler and the compiler library are incompatible
If you have multiple versions of the compiler installed, you probably also have multiple versions of the compiler library installed. These may not be compatible, and the compiler might find the wrong library version.
This can be solved by pointing the compiler to the correct library version, e.g. by using LD_LIBRARY_PATH as above.
The Fortran compiler is not used for linking
If you are linking invoking the linker directly, or indirectly through a C (or other) compiler, then you may need to tell this compiler/linker to include the Fortran compiler's runtime library. For example, if using GCC's C frontend:
gcc -o program fortran_object.o c_object.o -lgfortran
Related
I am trying to build a Fortran program, but I get errors about an undefined reference or an unresolved external symbol. I've seen another question about these errors, but the answers there are mostly specific to C++.
What are common causes of these errors when writing in Fortran, and how do I fix/prevent them?
This is a canonical question for a whole class of errors when building Fortran programs. If you've been referred here or had your question closed as a duplicate of this one, you may need to read one or more of several answers. Start with this answer which acts as a table of contents for solutions provided.
A link-time error like these messages can be for many of the same reasons as for more general uses of the linker, rather than just having compiled a Fortran program. Some of these are covered in the linked question about C++ linking and in another answer here: failing to specify the library, or providing them in the wrong order.
However, there are common mistakes in writing a Fortran program that can lead to link errors.
Unsupported intrinsics
If a subroutine reference is intended to refer to an intrinsic subroutine then this can lead to a link-time error if that subroutine intrinsic isn't offered by the compiler: it is taken to be an external subroutine.
implicit none
call unsupported_intrinsic
end
With unsupported_intrinsic not provided by the compiler we may see a linking error message like
undefined reference to `unsupported_intrinsic_'
If we are using a non-standard, or not commonly implemented, intrinsic we can help our compiler report this in a couple of ways:
implicit none
intrinsic :: my_intrinsic
call my_intrinsic
end program
If my_intrinsic isn't a supported intrinsic, then the compiler will complain with a helpful message:
Error: ‘my_intrinsic’ declared INTRINSIC at (1) does not exist
We don't have this problem with intrinsic functions because we are using implicit none:
implicit none
print *, my_intrinsic()
end
Error: Function ‘my_intrinsic’ at (1) has no IMPLICIT type
With some compilers we can use the Fortran 2018 implicit statement to do the same for subroutines
implicit none (external)
call my_intrinsic
end
Error: Procedure ‘my_intrinsic’ called at (1) is not explicitly declared
Note that it may be necessary to specify a compiler option when compiling to request the compiler support non-standard intrinsics (such as gfortran's -fdec-math). Equally, if you are requesting conformance to a particular language revision but using an intrinsic introduced in a later revision it may be necessary to change the conformance request. For example, compiling
intrinsic move_alloc
end
with gfortran and -std=f95:
intrinsic move_alloc
1
Error: The intrinsic ‘move_alloc’ declared INTRINSIC at (1) is not available in the current standard settings but new in Fortran 2003. Use an appropriate ‘-std=*’ option or enable ‘-fall-intrinsics’ in order to use it.
External procedure instead of module procedure
Just as we can try to use a module procedure in a program, but forget to give the object defining it to the linker, we can accidentally tell the compiler to use an external procedure (with a different link symbol name) instead of the module procedure:
module mod
implicit none
contains
integer function sub()
sub = 1
end function
end module
use mod, only :
implicit none
integer :: sub
print *, sub()
end
Or we could forget to use the module at all. Equally, we often see this when mistakenly referring to external procedures instead of sibling module procedures.
Using implicit none (external) can help us when we forget to use a module but this won't capture the case here where we explicitly declare the function to be an external one. We have to be careful, but if we see a link error like
undefined reference to `sub_'
then we should think we've referred to an external procedure sub instead of a module procedure: there's the absence of any name mangling for "module namespaces". That's a strong hint where we should be looking.
Mis-specified binding label
If we are interoperating with C then we can specify the link names of symbols incorrectly quite easily. It's so easy when not using the standard interoperability facility that I won't bother pointing this out. If you see link errors relating to what should be C functions, check carefully.
If using the standard facility there are still ways to trip up. Case sensitivity is one way: link symbol names are case sensitive, but your Fortran compiler has to be told the case if it's not all lower:
interface
function F() bind(c)
use, intrinsic :: iso_c_binding, only : c_int
integer(c_int) :: f
end function f
end interface
print *, F()
end
tells the Fortran compiler to ask the linker about a symbol f, even though we've called it F here. If the symbol really is called F, we need to say that explicitly:
interface
function F() bind(c, name='F')
use, intrinsic :: iso_c_binding, only : c_int
integer(c_int) :: f
end function f
end interface
print *, F()
end
If you see link errors which differ by case, check your binding labels.
The same holds for data objects with binding labels, and also make sure that any data object with linkage association has matching name in any C definition and link object.
Equally, forgetting to specify C interoperability with bind(c) means the linker may look for a mangled name with a trailing underscore or two (depending on compiler and its options). If you're trying to link against a C function cfunc but the linker complains about cfunc_, check you've said bind(c).
Not providing a main program
A compiler will often assume, unless told otherwise, that it's compiling a main program in order to generate (with the linker) an executable. If we aren't compiling a main program that's not what we want. That is, if we're compiling a module or external subprogram, for later use:
module mod
implicit none
contains
integer function f()
f = 1
end function f
end module
subroutine s()
end subroutine s
we may get a message like
undefined reference to `main'
This means that we need to tell the compiler that we aren't providing a Fortran main program. This will often be with the -c flag, but there will be a different option if trying to build a library object. The compiler documentation will give the appropriate options in this case.
There are many possible ways you can see an error like this. You may see it when trying to build your program (link error) or when running it (load error). Unfortunately, there's rarely a simple way to see which cause of your error you have.
This answer provides a summary of and links to the other answers to help you navigate. You may need to read all answers to solve your problem.
The most common cause of getting a link error like this is that you haven't correctly specified external dependencies or do not put all parts of your code together correctly.
When trying to run your program you may have a missing or incompatible runtime library.
If building fails and you have specified external dependencies, you may have a programming error which means that the compiler is looking for the wrong thing.
Not linking the library (properly)
The most common reason for the undefined reference/unresolved external symbol error is the failure to link the library that provides the symbol (most often a function or subroutine).
For example, when a subroutine from the BLAS library, like DGEMM is used, the library that provides this subroutine must be used in the linking step.
In the most simple use cases, the linking is combined with compilation:
gfortran my_source.f90 -lblas
The -lblas tells the linker (here invoked by the compiler) to link the libblas library. It can be a dynamic library (.so, .dll) or a static library (.a, .lib).
In many cases, it will be necessary to provide the library object defining the subroutine after the object requesting it. So, the linking above may succeed where switching the command line options (gfortran -lblas my_source.f90) may fail.
Note that the name of the library can be different as there are multiple implementations of BLAS (MKL, OpenBLAS, GotoBLAS,...).
But it will always be shortened from lib... to l... as in liopenblas.so and -lopenblas.
If the library is in a location where the linker does not see it, you can use the -L flag to explicitly add the directory for the linker to consider, e.g.:
gfortran -L/usr/local/lib -lopenblas
You can also try to add the path into some environment variable the linker searches, such as LIBRARY_PATH, e.g.:
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/lib
When linking and compilation are separated, the library is linked in the linking step:
gfortran -c my_source.f90 -o my_source.o
gfortran my_source.o -lblas
Not providing the module object file when linking
We have a module in a separate file module.f90 and the main program program.f90.
If we do
gfortran -c module.f90
gfortran program.f90 -o program
we receive an undefined reference error for the procedures contained in the module.
If we want to keep separate compilation steps, we need to link the compiled module object file
gfortran -c module.f90
gfortran module.o program.f90 -o program
or, when separating the linking step completely
gfortran -c module.f90
gfortran -c program.f90
gfortran module.o program.o -o program
Problems with the compiler's own libraries
Most Fortran compilers need to link your code against their own libraries. This should happen automatically without you needing to intervene, but this can fail for a number of reasons.
If you are compiling with gfortran, this problem will manifest as undefined references to symbols in libgfortran, which are all named _gfortran_.... These error messages will look like
undefined reference to '_gfortran_...'
The solution to this problem depends on its cause:
The compiler library is not installed
The compiler library should have been installed automatically when you installed the compiler. If the compiler did not install correctly, this may not have happened.
This can be solved by correctly installing the library, by correctly installing the compiler. It may be worth uninstalling the incorrectly installed compiler to avoid conflicts.
N.B. proceed with caution when uninstalling a compiler: if you uninstall the system compiler it may uninstall other necessary programs, and may render other programs unusable.
The compiler cannot find the compiler library
If the compiler library is installed in a non-standard location, the compiler may be unable to find it. You can tell the compiler where the library is using LD_LIBRARY_PATH, e.g. as
export LD_LIBRARY_PATH="/path/to/library:$LD_LIBRARY_PATH"
If you can't find the compiler library yourself, you may need to install a new copy.
The compiler and the compiler library are incompatible
If you have multiple versions of the compiler installed, you probably also have multiple versions of the compiler library installed. These may not be compatible, and the compiler might find the wrong library version.
This can be solved by pointing the compiler to the correct library version, e.g. by using LD_LIBRARY_PATH as above.
The Fortran compiler is not used for linking
If you are linking invoking the linker directly, or indirectly through a C (or other) compiler, then you may need to tell this compiler/linker to include the Fortran compiler's runtime library. For example, if using GCC's C frontend:
gcc -o program fortran_object.o c_object.o -lgfortran
This code:
void undefined_fcn();
void defined_fcn() {}
struct api_t {
void (*first)();
void (*second)();
};
api_t api = {undefined_fcn, defined_fcn};
defines a global variable api with a pointer to a non-existent function. However, it compiles, and to my surprise, links with absolutely no complaints from GCC, even with all those -Wall -Wextra -Werror -pedantic flags.
This code is part of a shared library. Only when I load the library, at run-time, it finally fails. How do I check, at library link-time, that I did't forget to define any function?
Update: this question mentions the same problem, and the answer is the same: -Wl,--no-undefined. (by the way, I guess this could even be marked as duplicate). However, according to the accepted answer below, you should be careful when using -Wl,--no-undefined.
This code is part of a shared library.
That's the key. The whole purpose of having a shared library is to have an "incomplete" shared object, with undefined symbols that must be resolved when the main executable loads it and all other shared libraries it gets linked with. At that time, the runtime loader attempts to resolve all undefined symbols; and all undefined symbols must be resolved, otherwise the executable will not start.
You stated you're using gcc, so you are likely using GNU ld. For the reason stated above, ld will link a shared library with undefined symbols, but will fail to link an executable unless all undefined symbols are resolved against the shared libraries the executable gets linked with. So, at runtime, the expected behavior is that the runtime loader is expected to successfully resolve all symbols too; so the only situation when the runtime loader fails to start the executable will indicate a fatal runtime environment failure (such as a shared library getting replaced with an incompatible version).
There are some options that can be used to override this behavior. The --no-undefined option instructs ld to report a link failure for undefined symbols when linking a shared libraries, just like executables. When invoking ld indirectly via gcc this becomes -Wl,--no-undefined.
However, you are likely to discover that this is going to be a losing proposition. You better hope that none of the code in your shared library uses any class in the standard C++ or C library. Because, guess what? -- those references will be undefined symbols, and you will fail to link your shared library!
In other words, this is a necessary evil that you need to deal with.
You can't have the compiler tell you whether you forgot to define the function in that implementation file. And the reason is when you define a function it is implicitly marked extern in C++. And you cannot tell what is in a shared library until after it is linked (the compiler's linker does not know if the reference is defined)
If you are not familiar with what extern means. Things marked extern signal external linkage, so if you have a variable that is extern the compiler doesn't require a definition for that variable to be in the translation unit that uses it. The definition can be in another implementation file and the reference is resolved at link time (when you link with a translation unit that defines the variable). The same applies for functions, which are essentially variables of a function type.
To get the behavior you want make the function static which tells the compiler that the function is not extern and is a part of the current translation unit, in which case it must be defined -Wundefined-internal picks up on this (-Wundefined-internal is a part of -Werror so just compile with that)
When someone statically links a .lib, will the linker copy the whole contents of lib into the final executable or just the functions used in the object files?
The whole library? -- No.
Just the functions you called? -- No.
Something else? -- Yes.
It certainly doesn't throw in the whole library.
But it doesn't necessarily include just "the functions used in the object files" either.
The linker will make a recursively built list of which object modules in the library satisfy your undefined symbols.
Then, it will include each of those object modules.
Typically, a given object module will include more than one function, and if some of these are not called by the ones that you do call, you will get some number of functions (and data objects) that you didn't need.
The linker typically does not remove dead code before building the final executable. That is, it will (usually) link in ALL symbols whether they are used in the final executable or not. However, linkers often explicitly provide Optimization settings you can use to force the linker to try extra hard to do this.
For GCC, this is accomplished in two stages:
First compile the data but tell the compiler to separate the code into separate sections within the translation unit. This will be done for functions, classes, and external variables by using the following two compiler flags:
-fdata-sections -ffunction-sections
Link the translation units together using the linker optimization flag (this causes the linker to discard unreferenced sections):
-Wl,--gc-sections
So if you had one file called test.cpp that had two functions declared in it, but one of them was unused, you could omit the unused one with the following command to gcc(g++):
gcc -Os -fdata-sections -ffunction-sections test.cpp -o test.o -Wl,--gc-sections
(Note that -Os is an additional compiler flag that tells GCC to optimize for size)
As for MSVC, function level linking accomplishes the same thing.
I believe the compiler flag for this is (to sort things into sections):
/Gy
And then the linker flag (to discard unused sections):
/OPT:REF
Linkers were invented in ancient times, when memory was especially precious. One of their primary functions was to prune out the modules you weren't using. That ability has been carried forward to the present day.
It's quite common for some library functions to rely on others though, and all the dependencies will be linked.
Sort of. It will however also need to fix up all the function call pointers. Especially if those function calls exist outside of the static library (ie in another static library or executable).
Depends on the linker. Some linkers are lazy and just throw the whole library in. The other extreme is linkers that throw in only the necessary code into an executable.
A sample test is to write a program that uses puts and compare with a program that uses printf. If the executables are the same size, you have more of a lazy linker.
Example:
puts_test.cpp
#include <cstdio>
using namespace std;
int main(void)
{
puts("Hello World\n");
return 0;
}
printf_test.cpp
#include <cstdio>
using namespace std;
int main(void)
{
printf("%s\n", "Hello World");
return 0;
}
With the above example, the puts function does not require extra code for parsing format strings or converting numerics into text. This is the baseline because it requires a minimal library function.
The example using printf requires more functionality. The printf function requires parsing the format string and outputting text.
The expected result is that the printf executable should be larger than the puts executable. Most compilers will haul in all the code for the printf function to resolve symbols (such as for displaying floats) even though that portion of the code is not used. More intelligent (and costly) compilers will break up the printf function and only include the parts that are used or required. In the example above, the compiler should only include the parts for processing text and not include code to format integers and floating point values.
A lazy compiler, or in debug mode, will copy the entire library for the puts example, thus making the executables the same size.
Symbol comparison
The *nix platforms and Cygwin provide tools to obtaining the symbols from executables. One such utility is nm. Run nm on each executable, directing output to a text file. Compare the two text files. Lazy compilers should have the same symbols; except their locations may differ (which is not important to the issue).
It will use only the used functions & symbols (unless told otherwise, but that can be tricky).
Side issue:
This can actually be a problem if you f.ex. have some classes that just register themselves to a factory. No-one calls these classes directly, so they won't be included and thus not registered in the factory. There are ways around this (usually by declaring some anonymous variable in the header file that references the source file).
I asked myself the following question, when I was discussing this topic .
Are there cases when some unused code from translation units will link to final executable code (in release mode of course) for popular compilers like GCC and VC++?
For example suppose we have 2 compilation units:
//A.hpp
//Here are declarations of some classes, functions, extern variables etc.
And source file
//A.cpp
//defination of A.hpp declarations
And finally main
//main.cpp
//including A.hpp library
#include "A.hpp"
//here we will use some stuff from A.hpp library, but not everything
My question is. What if in main.cpp not all the stuff from A.hpp is used? Will the linker remove all unused code, or there are some cases, when some unused code can link with executable file?
Edit: I'm interested in G++ and VC++ linkers.
Edit: Of course I mean in release mode.
Edit: I'm starting bounty for this question to get good and full answer. I'm expecting answer, which will explain in which cases g++ and VC++ linkers are linking junk and what kind of code they are able to remove from executable file(unneeded functions, unneeded global variables, unneeded class definitions, etc...) and why aren't they able to remove some kind of unneeded stuff.
As other posters have indicated, the linker typically does not remove dead code before building the final executable. However, there are often Optimization settings you can use to force the linker to try extra hard to do this.
For GCC, this is accomplished in two stages:
First compile the data but tell the compiler to separate the code into separate sections within the translation unit. This will be done for functions, classes, and external variables by using the following two compiler flags:
-fdata-sections -ffunction-sections
Link the translation units together using the linker optimization flag (this causes the linker to discard unreferenced sections):
-Wl,--gc-sections
So if you had one file called test.cpp that had two functions declared in it, but one of them was unused, you could omit the unused one with the following command to gcc(g++):
gcc -Os -fdata-sections -ffunction-sections test.cpp -o test.o -Wl,--gc-sections
(Note that -Os is an additional linker flag that tells GCC to optimize for size)
I have also read somewhere that linking static libraries is different though. That GCC automatically omits unused symbols in this case. Perhaps another poster can confirm/disprove this.
As for MSVC, as others have mentioned, function level linking accomplishes the same thing.
I believe the compiler flag for this is (to sort things into sections):
/Gy
And then the linker flag (to discard unused sections):
/OPT:REF
EDIT: After further research, I think that bit about GCC automatically doing this for static libraries is false.
The linker will not remove code.
You can still access it via dlsym dynamically in your code.
In general, linkers tend to include everything from the object files explicitly passed on the command line, but only pull in those object files from a static library that contain symbols needed to resolve external references from object files already linked.
However, a linker may decide to discard functions that are never called, or data which is never referenced. The precise details will depend on the compiler and linker switches.
In C++ code, if a source file is explicitly compiled and linked in to your application then I would expect that the objects with static storage duration that have constructors and/or destructors will be included, and their constructors/destructors run at the appropriate times. Consequently, any code called from those constructors or destructors must be in the final executable. However, if the code is not called from anywhere then you cannot write a program to tell whether or not the code is included without using things like dlsym, so the linker may well omit to include it in the final executable.
I would also expect that any symbols defined with global visibility such that they could be found via dlsym (as opposed to "hidden" symbols which are only visible within the executable) would be present in the final executable. However, this is an expectation rather than something I have confirmed by testing or reading the docs.
If you wanted to ensure code was in your executable even if it isn't called by inside it, you could load it in as a statically aware dynamic link library (a statically aware library is one which is loaded automatically into memory as the program is loaded, as opposed to the functionality where you can pass a string to a function that loads a library and then you manually search for hooks)
Under Solaris 10, I'm creating a library A.so that calls a function f() which is defined in library B.so. To compile the library A.so, I declare in my code f() as extern.
Unfortunately, I "forgot" to declare in A's makefile that it has to link with B.
However, "make A" causes no warning, no error, and the library A.so is created.
Of course, when executing A's code, the call of f() crashes because it is undefined.
Is there a way (linker option, code trick...) to make the compilation of library A fail ?
How can I be sure that all symbols refered to in library A are defined at compile time ?
Thanks for any suggestions.
Simplest way: add a "test_lib" target in the makefile, that will produce a binary using all the symbols expored from libraryA. (doesn't have to be anything meaningful... just take the address, no need to call the function or anything, it just needs to be referenced).
Tanks
I think I found something interesting and even simpler in the linker's manual (d'ho)
The -z defs option and the --no-undefined option force a fatal error if any undefined symbols remain at the end of the link. This mode is the default when an executable is built. For historic reasons, this mode is not the default when building a shared object. Use of the -z defs option is recommended, as this mode assures the object being built is self-contained. A self-contained object has all symbolic references resolved internally, or to the object's immediate dependencies.