Can different .dwo files be combined into a single one? - c++

Background:
I have a need of the debugging information of the code in our project.
The following two approaches are available:
Compile using -g and afterwards use GNU binary utilities strip and objcopy to strip the debugging information into a separate file.
Compile using -gsplit-dwarf
Question
The second approach creates a .dwo for each translation unit in the application.
Although this would improve the linker time. But with the huge number of translation files, this would create management headache for us.
Is there a way to combine all the .dwo files into a single file per binary ?
System Info
Compiler : GCC toolchain.
OS: CentOS/RH 7/8

The tool you're looking for is called dwp. It collects your .dwo files into a .dwp file ("DWARF package"). .dwp files can themselves be combined into larger .dwp files if needed.
It should come with non-ancient binutils packages.

Related

Making sense of .so file: trying to restore poorly versioned source files

If this question is too generic, please tell me so i can delete it.
I have a software used in operation that is compiled with linking to a .so file. The file is generated in compilation of a set of versioned .c and .cpp sources. Previous developer generated the .so file compiling a local version of source files that was modified in unknown ways and modified sources are god-knows where, if anywhere in the system at all. Fortunately it was compiled with debugging symbols, so reading it with gdb is easier.
Software is being used in operation and i need to modify it. Recompiling any known version of it will obviously generate results that differ from current compiled version in unknown ways. I want to dig as deep as possible in current .so file to know what it is doing, so that i can recompile sources generating as similar a result as i can. What i did until now:
readelf --debug-dump=info path/to/file | grep "DW_AT_producer" to see compilation flags and reproduce them in new compilations.
(gdb) info functions to see what functions are defined and compare it with previous versions of code.
Going function by function on the functions listed by previous command and: list <function>
Does anyone have any more tips on how to get as much info from .so file as i can? Since im not expert with gdb yet: am i missing something important?
Edit: by using strip in both files (compiled from original source and compiled from mysterious lost source file) i managed to see that most of differences between them were just debug symbols (which is weird because it seems both were compiled with -g option).
There is only one line of difference between them now.
I just found out that "list" just reads the source file from the binary, so list doesn't help me
You are confused: the source is never stored in the binary. GDB list command is showing the source as it exists in some file on disk.
The info sources command will show where on disk GDB is reading the sources from.
If you are lucky, that's the sources that were used to build the .so binary, and your task is trivial -- compare them to VCS sources to find modifications.
If you are unlucky, the sources GDB reads have been overwritten, and your task is much harder -- effectively you'll need to reverse-engineer the .so binary.
The way I would approach the harder task: build the library from VCS sources, and then for each function compare disas fn between the two versions of .so, and study differences (if any).
P.S. I hope you are also using the exact same version of the compiler that was used to compile the in-production .so, otherwise your task becomes much harder still.

Finding all libraries and header files forming a C++ executable

If I have a C++ source file, gcc can give all its dependencies, in a tree structure, using the -H option. But given only the C++ executable, is it possible to find all libraries and header files that went into its compilation and linking?
If you've compiled the executable with debugging symbols, then yes, you can use the symbols to get the files.
If you have .pdb files (Visual studio creates them to store sebugging information separately) you can use all kinds of programs to open them and see the source files and methods.
You can even open it with a text editor and you'll see, among the gibrish, a list of the functions and source files.
If you're using linux (or GNU compilers in general), you can use gdb (again only if you have debug symbols enables in compilation time).
Run gdb on your executable, then run the command: info sources
That's an important reason why you should always remove that flag when going into production. You don't want clients to mess around with your sources, functions, and code.
You cannot do that, because that executable might have been build on a machine on which the header files (or the C++ code, or the libraries) are private or even generated. Also, if a static library is linked in, you have no reliable way to find out.
In practice however, on Linux, using nm or objdump or ldd on the executable will often (but not always) gives you a good clue about the needed libraries.
Also, some executables are dynamically loading a plugin e.g. using dlopen, so your question might not have any sense (since that plugin is known only at runtime).
Notice also that you might not know if an executable is obtained by compiling some C++ code (you might not be able to tell if it was obtained from C, C++, D, or Ocaml, ... source code, or a mixture of them).
On Linux, if you build an executable with static linking and stripping, people won't be able to easily guess the source programming language that you have used.
BTW, on Linux distributions, it is the role of the package management system to deal with such dependencies.
As answered by Yochai Timmer if the executable contains debug information (e.g. in DWARF format) you should be able to get a lot more information.

make SCons compile everything in one gcc line?

I have a rather complex SCons script that compiles a big C++ project.
This gcc manual page says:
The compiler performs optimization based on the knowledge it has of the program. Compiling multiple files at once to a single output file mode allows the compiler to use information gained from all of the files when compiling each of them.
So it's better to give all my files to a single g++ invocation and let it drive the compilation however it pleases.
But SCons does not do this. it calls g++ separately for every single C++ file in the project and then links them using ld
Is there a way to make SCons do this?
The main reason to have a build system with the ability to express dependencies is to support some kind of conditional/incremental build. Otherwise you might as well just use a script with the one command you need.
That being said, the result of having gcc/g++ optimize as the manual describe is substantial. In particular if you have C++ templates you use often. Good for run-time performance, bad for recompile performance.
I suggest you try and make your own builder doing what you need. Here is another question with an inspirational answer: SCons custom builder - build with multiple files and output one file
Currently the answer is no.
Logic similar to this was developed for MSVC only.
You can see this in the man page (http://scons.org/doc/production/HTML/scons-man.html) as follows:
MSVC_BATCH When set to any true value, specifies that SCons should
batch compilation of object files when calling the Microsoft Visual
C/C++ compiler. All compilations of source files from the same source
directory that generate target files in a same output directory and
were configured in SCons using the same construction environment will
be built in a single call to the compiler. Only source files that have
changed since their object files were built will be passed to each
compiler invocation (via the $CHANGED_SOURCES construction variable).
Any compilations where the object (target) file base name (minus the
.obj) does not match the source file base name will be compiled
separately.
As always patches are welcome to add this in a more general fashion.
In general this should be left up to the program developer. Trying to compile all together in an amalgamation may introduce unintended behaviour to the program if it even compiles in the first place. Your best bet if you want this kind of optimisation without editing the source yourself is to use a compiler with inter-process optimisation like icc -ipo.
Example where an amalgamation of two .c files would not compile is for example if they use two identical static symbols with different functionality.

Smallest possible package for distributing MinGW with another program

I'm working on an open source programming language, and I want my users to be able to distribute standalone .exe files from their programs. My strategy is to have 3 components:
A DLL that contains the interpreter
A small .o object file (generated once from C) that invokes the DLL to start the execution
A generated .o file that contains a binary representation of the user's program, to be embedded as a binary blob with #2.
When the user requests an .exe, #2 and #3 are linked together, and the resulting executable can be distributed with #1. So far so good.
The problem I have now is that this means MinGW has to be bundled with the language, in order to do the linking step. I don't want to have my users manually download MinGW (my primary audience are children) and the standard MinGW distribution is more than a 100 megabytes, so bundling all of that would spoil the minimalism of my language's download (it's currently ~5 mb).
My question is: Is there a definitive list of files to be yanked from \MinGW and bundled with the language by themselves, that would make g++.exe work to link two .o files and the needed libraries together?
Alternative solutions are also welcome (for example a freely redistributable C++ compiler that's more easily bundled with other apps).
You could try to use Dependency Walker and rip g++ out of MinGW. It will generate your list of dependencies. Alternatively, you could use cygwin, which reduces your footprint to around 15 megs.

Why does `include <iostream>` end up including so *many* files?

Follow up of this question:
When I do include <iostream>.
It happens that it includes many files from /usr/include .A grep "\usr\include" over g++ -E prog.cpp counted to about 1260 entries ;).
Is their a way to control including various files?
Platform: Linux
G++ version: 4.2.4
No, <iostream> includes them because it depends on them directly, or it's dependancies depend on them.
Ain't nothing you can do about it.
You can (depending on your compiler) limit the effect this has on compilation times by using Precompiled Headers
My suggestion is not to worry about how many files the compiler is including. Focus more on correctness, robustness, and schedule. If build time is a concern, get a faster machine, build overnight, go on a walk, or divide the code into smaller translation units. Translation units should be small enough to contain code that doesn't change often. Changes are EVIL.
The foundation of the build system is to compile only the few files that have changed. If your development process is functioning correctly, the build time will reside more and more in the linking phase as the project grows.
If the compile time is still lengthy, see if your compiler supports precompiled headers. Generally, this is method for the compiler to store all the declarations and definitions in a more efficient form.
You #include <iostream> when you need to use streams. That should define some things you need. How much activity it needs to do this is a quality of implementation issue. You could remove files from /usr/include, but that would break things.
I really doubt it's including 1260 files. Almost certainly most of those are duplicate entries that don't load anything but aren't pruned from the -E output.