SYSV vs. Linux/GNU ELF formats - c++

The question is in the context of the following question: Why are libraries not found, even though they are clearly on the -L path? . It seems that my libraries and my object file have a different ELF format, which might cause the linker to not "find" the libraries.
Now, this leads to a couple of questions:
It seems that my compiler normally generates SYSV ELF files. (Checked with file). However, for that particular C++ source, it generates a Linux/GNU ELF object file. I wonder why, so I reduced that source to an empty main method - and suddenly I get a SYSV object file. What inside a C++ source file can cause the compiler to switch the ELF format?
Is it true or false that I can not link Linux/GNU together with SYSV ELF?
Is there an option to force the compiler to create a certain ELF format?
I'm working with a Cray g++ (GCC) 5.3.0 20151204.

Regarding question 1: one of the answer seems to be that functions of type STT_GNU_IFUNC will cause the compiler/linker to switch the ELF file format of the corresponding object file from SYSV to GNU/Linux.
However, I still couldn't find out how I can identify functions in an object file which have type STT_GNU_IFUNC. Looking at object files which are GNU/Linux with objdump, readelf and nm still don't show a single function of type STT_GNU_IFUNC.
Some more information on STT_GNU_IFUNC can be found at https://www.airs.com/blog/archives/403 and in the following related question: How do I compile on linux to share with all distributions?

Related

can gdb allow to see ALL source code?

I was debugging an application created in C ++ for Linux when I realized that the executables in release version were compiled with the -g flag.
My concern is whether it is possible to read the source code of the executable through gdb using list or backtrace (exploiting some know core dump or antoher method)
No, the source code is not included in the executable, even when compiled with -g. What is included are references to the source code, so there's a mapping between program addresses and file and line numbers.
There will also be information in the debug that describe the functions in your program, so there will be information describing each function, the types taken and returned, and what local variables it contains, there's also information about which addresses correspond to which functions. All your types and global variables will also be described in the debug information.
It is possible to split the debug information out of you program using objcopy, the following is taken from the gdb online manual (https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html)
objcopy --only-keep-debug foo foo.debug
strip -g foo
objcopy --add-gnu-debuglink=foo.debug foo
This takes the debug information out of foo and places it in foo.debug, the strips the debug information out of foo and adds a link between foo.debug back to foo.
Now you, as the developer can debug release builds by loading the additional foo.debug file containing the debug data, while the customer is only given foo and so does not see the debug information.
A more detailed look at creating split debug information can be found here How to generate gcc debug symbol outside the build target?
No, source code is not included in a binary built with -g and therefore it will not be possible to read it using only the binary.
Things that they may be able to read include:
Names of variables and functions
For each intruction: full path of source file, line in the file and name of the function it is in

GCC: how to find why an object file is not discarded

I have an executable which links to a big .a archive that contains lots of functions. The executable only uses a small fraction of the functions in this archive, but for some reason it pulls everything from it and ends up being very big.
My suspicion is that some of the functionality that the executable is using somehow references something it shouldn't and that causes everything else to be pulled.
Is it possible to make gcc tell me what reference causes a specific symbol to be added in the executable? Why else can this happen?
I've tried using --gc-sections with no effect.
I've tried using --version-script to make all the symbols in the executable local with no effect
I'm not interested in -ffunction-sections and -fdata-sections since it is while object files I want to discard, not functions.
Other answers mention -why_live but that seem to be implemented only for darwin and I am in linux x86_64
Use -Wl,-M to pass -M to the linker, causing it to print a link trace. This will show you the reasons (or at least the first-found reason) for every object file that gets linked from an archive.

C++ compiling and linking

I found one question about compiling and linking in C++ and I don't know which answer is correct. It was discussed with my friends and opinions are divided. Here is a question:
In order to run program written in C++ language its source code is:
(A) compiled to machine code,
(B) compiled and linked to machine code
In my opinion the correct answer is A but I don't have any source to prove it.
Google, first hit.
Linkage is needed as well to create a standalone executable.
You need to link the code you have produced to make it into an executable file. For simple programs, the compiler does this for you, by calling the linker at the end of the compilation process.
The compiler proper simply translates C code to either assembler (classic C compiler) which is then assembled with an assembler or directly to machine code (many modern compilers). The machine code is usually produced as "object files", which are not "executable", because they refer to external units - such as when you call printf(). It is possible to write C code that is completely standalone, but you still typically need to combine more than one object file, and it certainly needs to be "formatted" to the right way to make an executable file - which is a different file-format than an object file [although typically fairly SIMILAR].
Compilation does nothing except creation of object files which means converting C/C++ source code to machine codes.
Linking process is the creation of executable file from multiple obj files. So for running an application/executable you have to also link it.
During compilation, compiler doesn't complain about non existing functions or broken functions, because it will assume it might be defined in another object (source code file). Linker verifies all functions and their existance, so if you have a broken function, you'll get error in linking process
Compiling: Takes input C/C++-code and produces machinecode (object file)
gcc –c MyProgram.c
Note that the object file does not contain all external references!
Linking: Combines object file with external references into an executable file
gcc MyProgram.o –o MyProgram
Note that no unresolved references!
Illustration:
Where libc.a is the standard C library and it's automatically linked into your programs by the gcc.
I've just noticed that your question was about c++, the same concept is in c++ too, if you understand this, you'll understand how it works in c++ too
strictly speaking. Answer A.
But for you to see the whole picture, lets say you have defined some function. Then the compiler writes the machine code code of that function at some address, and puts that address and the name of the function in the object ".o" file where the linker can find it. The linker then take this "machine code" and resolve the symbols as you might heard in some previous error.

Templates - huge object file causes linker crash

I have a source file which extensively makes use of templates.
I also have in that file explicit instantiations of different templates ... a lot of them.
This file is compiled as part of a static library. I compile this
library on multiple platforms\for multiple architecture: Win x86,
Linux x86 and Linux ARM. For the Linux builds I use different compilers
so the resulting files(I'm talking here in context of the ELF file
itself) are different: for GCC the resulting object file is 8.4MB in size
and has a bit more than 40000 ELF sections; for the ARM compiler(armcc)
the resulting file is 12.7MB in size and has more than 90000 ELF
sections(!); in both cases I have debug information.
What happens is that at link time the ARM linker chokes and dies trying to
link that huge object file in the static library. After some investigation it seems
that it cannot handle object files with more than 65536 ELF sections in it(I still have
to get a confirmation from the compiler vendor, though ... or I'm doing
something completely and utterly wrong). The solution I found to that is splitting the
file into multiple smaller files(it's structure and what was inside allowed for that).
The question(s): is there any other alternative solution? Would it be
possible for the compiler to generate extra code in the object file(in the context of the templates)
before the linking phase?
With the ARM RVCT compiler (armcc), try adding --remove_unneeded_entities to the command line. This might or might not have much effect depending on which version of the compiler you are using, but its worth a try

Includes with the Linux GCC Linker

I don't understand how GCC works under Linux. In a source file, when I do a:
#include <math.h>
Does the compiler extract the appropriate binary code and insert it into the compiled executable OR does the compiler insert a reference to an external binary file (a-la Windows DLL?)
I guess a generic version of this question is: Is there an equivalent concept to Windows DLLs under *nix?
Well. When you include math.h the compiler will read the file that contains declarations of the functions and macros that can be used. If you call a function declared in that file (header), then the compiler inserts a call instruction into that place in your object file that will be made from the file you compile (let's call it test.c and the object file created test.o). It also adds an entry into the relocation table of that object-file:
Relocation section '.rel.text' at offset 0x308 contains 1 entries:
Offset Info Type Sym.Value Sym. Name
0000001c 00000902 R_386_PC32 00000000 bar
This would be a relocation entry for a function bar. An entry in the symbol table will be made noting the function is yet undefined:
9: 00000000 0 NOTYPE GLOBAL DEFAULT UND bar
When you link the test.o object file into a program, you need to link against the math library called libm.so . The so extension is similar to the .dll extension for windows. It means it is a shared object file. The compiler, when linking, will fix-up all the places that appear in the relocation table of test.o, replacing its entries with the proper address of the bar function. Depending on whether you use the shared version of the library or the static one (it's called libm.a then), the compiler will do that fix-up after compiling, or later, at runtime when you actually start your program. When finished, it will inject an entry in the table of shared libraries needed for that program. (can be shown with readelf -d ./test):
Dynamic section at offset 0x498 contains 22 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libm.so.6]
0x00000001 (NEEDED) Shared library: [libc.so.6]
... ... ...
Now, if you start your program, the dynamic linker will lookup that library, and will link that library to your executable image. In Linux, the program doing this is called ld.so. Static libraries don't have a place in the dynamic section, as they are just linked to the other object files and then they are forgotten about; they are part of the executable from then on.
In reality it is actually much more complex and i also don't understand this in detail. That's the rough plan, though.
There are several aspects involved here.
First, header files. The compiler simply includes the content of the file at the location where it was included, nothing more. As far as I know, GCC doesn't even treat standard header files differently (but I might be wrong there).
However, header files might actually not contain the implementation, only its declaration. If the implementation is located somewhere else, you've got to tell the compiler/linker that. By default, you do this by simply passing the appropriate library files to the compiler, or by passing a library name. For example, the following two are equivalent (provided that libcurl.a resides in a directory where it can be found by the linker):
gcc codefile.c -lcurl
gcc codefile.c /path/to/libcurl.a
This tells the link editor (“linker”) to link your code file against the implementation of the static library libcurl.a (the compiler gcc actually ignores these arguments because it doesn't know what to do with them, and simply passes them on to the linker). However, this is called static linking. There's also dynamic linking, which takes place at startup of your program, and which happens with .dlls under Windows (whereas static libraries correspond to .lib files on Windows). Dynamic library files under Linux usually have the file extension .so.
The best way to learn more about these files is to familiarize yourself with the GCC linker, ld, as well as the excellent toolset binutils, with which you can edit/view library files effortlessly (any binary code files, really).
Is there an equivalent concept to Windows DLLs under *nix?
Yes they are called "Shared Objects" or .so files. They are dynamically linked into your binary at runtime. In linux you can use the "ldd" command on your executable to see which shared objects your binary is linked to. You can use ListDLLs from sysinternals to accomplish the same thing in windows.
The compiler is allowed to do whatever it pleases, as long as, in effect, it acts as if you'd included the file. (All the compilers I know of, including GCC, simply include a file called math.h.)
And no, it doesn't usually contain the function definitions itself. That's libm.so, a "shared object", similar to windows .DLLs. It should be on every system, as it is a companion of libc.so, the C runtime.
Edit: And that's why you have to pass -lm to the linker if you use math functions - it instructs it to link against libm.so.
There is. The include does a textual include of the header file (which is standard C/C++ behavior). What you're looking for is the linker . The -l argument to gcc/g++ tells the linker what library(ies) to add in. For math (libm.so), you'd use -lm. The common pattern is:
source file: #include <foo.h>
gcc/g++ command line: -lfoo
shared library: libfoo.so
math.h is a slight variation on this theme.