I have an elf file, while analysing the mapfile and elf using elfparser, I saw a section called .Debug_info, which is taking largest memory.
I am compiling for xtensa DSP, using xt-xc++, I haven't used -g option also given -o2 optimization level.
is it possible to remove this for release builds?
section called .debug_info, which is taking largest memory.
Note that this section does not have SHF_ALLOC flag, and therefore does not take any RAM at runtime (it only takes space in the filesystem). Of course if you use ramdisk, then that section still does cost you RAM in the end.
is it possible to remove this for release builds?
Yes: none of the .debug* sections are necessary at runtime, and all of them can be safely stripped.
-g0 and -s option didn't work.
It's likely that you are getting .debug_* sections from libraries that you are linking in, and not from your own code. The -g was present when the libraries were compiled, so building with -g0 doesn't have any effect.
It's surprising that -s didn't work, but maybe your compiler interprets this flag differently.
In any case, you should use strip --strip-debug to get rid of .debug_* sections (note: this does not remove the symbol table).
The best practice is actually to compile all your code with full debug info (-g), save the full debug binary for post-mortem analysis, use strip --strip-debug to make a release binary, and use that binary for actual distribution.
If/when the release binary crashes and leaves a core dump, having (saved) full-debug exactly matching binary greatly improves post-mortem analysis you can do.
Related
I was quite surprised to see that even a simple program like:
print_string "Hello world !\n";
when statically compiled to native code through ocamlopt with some quite aggressive options (using musl), would still be around ~190KB on my system.
$ ocamlopt.opt -compact -verbose -o helloworld \
-ccopt -static \
-ccopt -s \
-ccopt -ffunction-sections \
-ccopt -fdata-sections \
-ccopt -Wl \
-ccopt -gc-sections \
-ccopt -fno-stack-protector \
helloworld.ml && { ./helloworld ; du -h helloworld; }
+ as -o 'helloworld.o' '/tmp/camlasm759655.s'
+ as -o '/tmp/camlstartupfc4271.o' '/tmp/camlstartup5a7610.s'
+ musl-gcc -Os -o 'helloworld' '-L/home/vaab/.opam/4.02.3+musl+static/lib/ocaml' -static -s -ffunction-sections -fdata-sections -Wl -gc-sections -fno-stack-protector '/tmp/camlstartupfc4271.o' '/home/vaab/.opam/4.02.3+musl+static/lib/ocaml/std_exit.o' 'helloworld.o' '/home/vaab/.opam/4.02.3+musl+static/lib/ocaml/stdlib.a' '/home/vaab/.opam/4.02.3+musl+static/lib/ocaml/libasmrun.a' -static -lm
Hello world !
196K helloworld
How to get the smallest binary from ocamlopt ?
A size of 190KB is way too much for a simple program like that in today's constraints (iot, android, alpine VM...), and compares badly with simple C program (around ~6KB, or directly coding ASM and tweaking things to get a working binary that could be around 150B). I naïvely thought that I could simply ditch C to write simple static program that would do trivial things and after compilation I would get some simple assembly code that wouldn't be so far in size with the equivalent C program. Is that possible ?
What I think I understand:
When removing gcc's -s to have some hints about what is left in the binary, I can notice a lot of ocaml symbols, and I also kinda read that some environment variable of ocamlrun are meant to be interpreted even in this form. It is as if what ocamlopt calls "native compilation" is about packing ocamlrun and the not-native bytecode of your program in one file and make it executable. Not exactly what I would have expected. I obviously missed some important point. But if that is the case, I'll be interested why it isn't as I expected.
Other languages compiling to native code having the same issue: leaving some naïve user (as myself) with roughly the same questions:
Go: Reason for huge size of compiled executable of Go
Rust: Why are Rust executables so huge?
I've tested also with Haskell, and without tweaks, all languages compilers are making binaries above 700KB for the "hello world" program (it was the same for Ocaml before the tweaks).
Your question is very broad and I'm not sure that it fits the format of Stackoverflow. It deserves a thorough discussion.
A size of 190KB is way too much for a simple program like that in today's constraints (iot, android, alpine VM...), and compares badly with simple C program (around ~6KB, or directly coding ASM and tweaking things to get a working binary that could be around 150B)
First of all, it is not a fair comparison. Nowadays, a compiled C binary is an artifact that is far from being a standalone binary. It should be seen more like a plugin in a framework. Therefore, if you would like to count how many bytes a given binary actually uses, we shall count the size of the loader, shell, the libc library, and the whole linux or windows kernel - which in total form the runtime of an application.
OCaml, unlike Java or Common Lisp, is very friendly to the common C runtime and tries to reuse most of its facilities. But OCaml still comes with its own runtime, in which the biggest (and most important part) is the garbage collector. The runtime is not extremely big (about 30 KLOC) but still contributes to the weight. And since OCaml uses static linking every OCaml program will have a copy of it.
Therefore, C binaries have a significant advantage as they are usually run in systems where the C runtime is already available (therefore it is usually excluded from the equation). There are, however, systems where there is no C runtime at all, and only OCaml runtime is present, see Mirage for example. In such systems, OCaml binaries are much more favorable. Another example is the OCaPic project, in which (after tweaking the compiler and runtime) they managed to fit OCaml runtime and programs into 64Kb Flash (read the paper it is very insightful about the binary sizes).
How to get the smallest binary from ocamlopt?
When it is really necessary to minimize the size, use Mirage Unikernels or implement your own runtime. For general cases, use strip and upx. (For example, with upx --best I was able to reduce the binary size of your example to 50K, without any more tricks). If performance doesn't matter that much, then you can use bytecode, which is usually smaller than the machine code. Thus you will pay once (about 200k for the runtime), and few bytes for each program (e.g., 200 bytes for your helloworld).
Also, do not create many small binaries, but create one binary. In your particular example, the size of the helloworld compilation unit is 200 bytes in bytecode and 700 bytes in machine code. The rest 50k is the startup harness which should be included only once. Moreover, since OCaml supports dynamic linking in runtime, you can easily create a loader that will load modules when needed. And in this scenario, the binaries will become very small (hundreds of bytes).
It is as if what ocamlopt calls "native compilation" is about packing ocamlrun and the not-native bytecode of your program in one file and make it executable. Not exactly what I would have expected. I obviously missed some important point. But if that is the case, I'll be interested why it isn't as I expected.
No-no, it is completely wrong. Native compilation is when a program is compiled to the machine code, whether it is x86, ARM, or whatever. The runtime is written in C, compiled to machine code, and is also linked. The OCaml Standard Library is written mostly in OCaml, also compiled to machine code, and is also linked into the binary (only those modules that are used, OCaml static linking is very efficient, provided that the program is split into modules (compilation units) fairly well).
Concerning the OCAMLRUNPARAM environment variable, it is just an environment variable that parameterizes the behavior of the runtime, mostly the parameters of the garbage collector.
What's the ascending order for the dub to build optimized binary below? (e.g. ... debug < plain < release ...)
$ dub build -h
...
-b --build=VALUE Specifies the type of build to perform. Note that
setting the DFLAGS environment variable will override
the build type with custom flags.
Possible names:
debug (default), plain, release, release-debug,
release-nobounds, unittest, profile, profile-gc,
docs, ddox, cov, unittest-cov and custom types
...
The dub build -b release-nobounds seems derived from dmd -O -release -boundscheck=off, so what's the equivalent for dub to build fastest executables?
Those options aren't really about optimizations (and I think it is weird that dub combines them, on dmd itself, those are eight independent switches....), and a lot of people are confused about what they mean, so let me list, using the dmd switch names:
-debug simply compiles in debug statements in the code, e.g. debug writeln("foo"); will only write foo if compiled with -debug. It doesn't do anything else! Importantly, it doesn't include info for debuggers, that is done with -g (though dub might combine these two options).
-g adds symbolic debugging info, for programs like gdb to know function names. This same info is also used on exception stack trace printing, so enabling it will cause the stack traces to show function names too.
-release disables assert statements, in, out, and invariant contracts, and automatic array bounds checks in #system functions (which are the default btw). That's it - it does NOT enable optimizations nor imply the opposite of -debug, it just skips those assert-related items. (note that assert(0); is a special case and is never disabled, but it should never happen anyway - it kills the program.)
-unittest will compile the unittest blocks, and run them right before running main (then main will still run afterward, like normal).
-profile adds timing info before and after functions, and writes that info to a log file when the program is complete. Note that it only works with single-thread programs and its logging can significantly slow down the program itself. You'd use it to figure out which functions are called the most and the slowest to know where to focus your optimization efforts.
-cov adds info to the test log that tells you which lines of your program were actually run, and which weren't.
-profile=gc does GC-specific profiling, and writes out a log with the timing info.
-D generates HTML files from the ddoc info in your code while compiling. dub calls this docs. ddox is similar, but uses a dub-custom doc generator instead of the default dmd html generator. This is ddoc's output: http://dlang.org/phobos/std_algorithm.html and this is ddox's: http://dlang.org/library/std/algorithm.html
-boundscheck=xxxx determines where array bounds checking is compiled - safe functions, all functions, or nowhere. (In old versions, this was tied to the -release switch, but can now be done separately). The default for -release is #safe functions, everywhere else, the default is all functions.
Notice that NONE of those were -O or -inline! Those are the dmd optimization switches: -O means to optimize the code and -inline means to inline functions (it does them separately because sometimes inlining messes up debuggers. The other compilers, gdc and ldc, will inline automatically with their -O options and generally do a better job of it than dmd anyway.)
Personally, I recommend strongly against using -boundscheck and -release - those just hide bugs in most cases without making that big of a difference on final speed. If you find bounds checks in some tight loop are slowing you down, instead of killing it in your entire program with -boundscheck, instead use .ptr on the specific accesses that are slow (you can use -profile to figure out which function to optimize!) Learn more on the tip of the week here: http://arsdnet.net/this-week-in-d/dec-06.html
-release only makes a significant difference if you are doing tons of expensive asserts... and again, I'd prefer to version out the expensive ones individually instead of disabling everything, including the really quick checks that catch legitimately common bugs.
So, I'd recommend going for just -O and maybe -inline for an optimized dmd build. For many (but not all) programs btw, gdc -O and ldc -O do a better job than any dmd switch combination - if you are CPU limited, you might want to try them too.
Back to dub. Check out the package format documentation: http://code.dlang.org/package-format?lang=json
Build type release, so dub build -b release will pass -O -release -inline to dmd. Type release-nobounds adds the nobounds switch too. That's what the dmd docs call the fastest executables, and what I call a buggy mistake.
The best dub option from what I can see (I don't actually use it myself) would be to add buildOptions to optimize in the dub config file (dub.json or dub.sdl)
That gives you -O, then you use stuff like the .ptr technique or version on expensive assert to selectively speed up your hot spots without compromising the anti-bug features in the rest of the program.
Read more dub documentation here:
http://code.dlang.org/package-format?lang=json#build-options
Debugging files of C++ applications/linux have always been a mystery to me and some basic understanding is lacking.
(1) Do we need to necessarily compile applications with -g flag without which core files are unable to give any useful information whatsoever? But I see that even when we don't compile with -g flag, core files are generated -- so they must be serving some purpose apart from occupying space on disk.
Wikipedia says : "In computing, a core dump, memory dump, or storage dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has terminated abnormally (crashed)".
This should mean that irrespective of if we compiled with -g flag, we still have state. and if we have stack track, we should still be able to know what function caused the error.
The -g option has nothing to do with the core files, but with putting debug information in the program. That is, the generated executable file will contain all symbols (e.g. function and variable names) as well as line number information (so you can find out which line a crash occurs in).
The actual core dump only contains a memory dump. Yes you can, together with the program, get a stack trace, but unless the program has debug information you can not see function names or line numbers, only their addresses.
so they must be serving some purpose apart from occupying space on disk
You can limit size of core files with ulimit -c $limit command and your core files won't occupy your disk space.
And, as Joachim already said -g option just includes debug symbols and checks to your program.
I'm working on a really big project which I would like to debug with gdb. Unfortunately, compiling with -g flag takes two days and a half and output libraries that are larger than 60Go (project takes ~1Go without -g).
is there a simpler way to obtain a symbols table (i.e. be able to backtrace) and if yes, how ?
I've seen that gdb offers three levels of debugging (-g level as described here), would it help ? Would string ?
Thanks in advance.
For a backtrace with just function names, you don't need -g at all.
For a backtrace with file and line info, using recent GCC versions, try -gmlt option (minimal line table). Note that no local variable info will be available in GDB.
If you want local variables, you'll probably want to use -gdwarf-4.
The documentation you pointed at is for gcc-2.95. That is an ancient version. If you are still using it, your first task should be to switch to (current) gcc-4.6.2
If you have an idea about source files you want to debug compile them with -g option. Make sure you link with -g option too. Now you have a partial debug image.
How do I analyze the core dump (using gdb)
which is not compiled with -g GCC option ?
Generate a map file. The map file will tell you the address that each function starts at (as an offset from the start of the exe so you will need to know the base address its loaded too). So you then look at the instruction pointer and look up where it falls in the map file. This gives you a good idea of the location in a given function.
Manually unwinding a stack is a bit of a black art, however, as you have no idea what optimisations the compiler has performed. When you know, roughly, where you are in the code you can generally work out what ought to be on the stack and scan through memory to find the return pointer. its quite involved however. You effectively spend a lot of time reading memory data and looking for numbers that look like memory addresses and then checking that to see if its logical. Its perfectly doable and I, and I'm sure many others, have done it lots of times :)
With ELF binaries it is possible to separate the debug symbols into a separate file. Quoting from objcopy man pages:
Link the executable as normal (using the -g flag). Assuming that is is called foo then...
Run objcopy --only-keep-debug foo foo.dbg to create a file containing the debugging info.
Run objcopy --strip-debug foo to create a stripped executable.
Run objcopy --add-gnu-debuglink=foo.dbg foo to add a link to the debugging info into the stripped executable.
that should not be a problem , you can compile the source again with -g option and pass gdb the core and the new compiled debug binary, it should work without any problem.
BTW You can generate a map file with the below command in gcc
gcc -Wl,-Map=system.map file.c
The above line should generate the map file system.map, Once the map file is generated you can map the address as mentioned above but i am not sure how are you going to map the share library, this is very difficult