Why are OCaml native binaries so large? [duplicate]

Why are OCaml native binaries so large? [duplicate] - ocaml

I was quite surprised to see that even a simple program like:
print_string "Hello world !\n";
when statically compiled to native code through ocamlopt with some quite aggressive options (using musl), would still be around ~190KB on my system.
$ ocamlopt.opt -compact -verbose -o helloworld \
-ccopt -static \
-ccopt -s \
-ccopt -ffunction-sections \
-ccopt -fdata-sections \
-ccopt -Wl \
-ccopt -gc-sections \
-ccopt -fno-stack-protector \
helloworld.ml && { ./helloworld ; du -h helloworld; }
+ as -o 'helloworld.o' '/tmp/camlasm759655.s'
+ as -o '/tmp/camlstartupfc4271.o' '/tmp/camlstartup5a7610.s'
+ musl-gcc -Os -o 'helloworld' '-L/home/vaab/.opam/4.02.3+musl+static/lib/ocaml' -static -s -ffunction-sections -fdata-sections -Wl -gc-sections -fno-stack-protector '/tmp/camlstartupfc4271.o' '/home/vaab/.opam/4.02.3+musl+static/lib/ocaml/std_exit.o' 'helloworld.o' '/home/vaab/.opam/4.02.3+musl+static/lib/ocaml/stdlib.a' '/home/vaab/.opam/4.02.3+musl+static/lib/ocaml/libasmrun.a' -static -lm
Hello world !
196K helloworld
How to get the smallest binary from ocamlopt ?
A size of 190KB is way too much for a simple program like that in today's constraints (iot, android, alpine VM...), and compares badly with simple C program (around ~6KB, or directly coding ASM and tweaking things to get a working binary that could be around 150B). I naïvely thought that I could simply ditch C to write simple static program that would do trivial things and after compilation I would get some simple assembly code that wouldn't be so far in size with the equivalent C program. Is that possible ?
What I think I understand:
When removing gcc's -s to have some hints about what is left in the binary, I can notice a lot of ocaml symbols, and I also kinda read that some environment variable of ocamlrun are meant to be interpreted even in this form. It is as if what ocamlopt calls "native compilation" is about packing ocamlrun and the not-native bytecode of your program in one file and make it executable. Not exactly what I would have expected. I obviously missed some important point. But if that is the case, I'll be interested why it isn't as I expected.
Other languages compiling to native code having the same issue: leaving some naïve user (as myself) with roughly the same questions:
Go: Reason for huge size of compiled executable of Go
Rust: Why are Rust executables so huge?
I've tested also with Haskell, and without tweaks, all languages compilers are making binaries above 700KB for the "hello world" program (it was the same for Ocaml before the tweaks).

Your question is very broad and I'm not sure that it fits the format of Stackoverflow. It deserves a thorough discussion.
A size of 190KB is way too much for a simple program like that in today's constraints (iot, android, alpine VM...), and compares badly with simple C program (around ~6KB, or directly coding ASM and tweaking things to get a working binary that could be around 150B)
First of all, it is not a fair comparison. Nowadays, a compiled C binary is an artifact that is far from being a standalone binary. It should be seen more like a plugin in a framework. Therefore, if you would like to count how many bytes a given binary actually uses, we shall count the size of the loader, shell, the libc library, and the whole linux or windows kernel - which in total form the runtime of an application.
OCaml, unlike Java or Common Lisp, is very friendly to the common C runtime and tries to reuse most of its facilities. But OCaml still comes with its own runtime, in which the biggest (and most important part) is the garbage collector. The runtime is not extremely big (about 30 KLOC) but still contributes to the weight. And since OCaml uses static linking every OCaml program will have a copy of it.
Therefore, C binaries have a significant advantage as they are usually run in systems where the C runtime is already available (therefore it is usually excluded from the equation). There are, however, systems where there is no C runtime at all, and only OCaml runtime is present, see Mirage for example. In such systems, OCaml binaries are much more favorable. Another example is the OCaPic project, in which (after tweaking the compiler and runtime) they managed to fit OCaml runtime and programs into 64Kb Flash (read the paper it is very insightful about the binary sizes).
How to get the smallest binary from ocamlopt?
When it is really necessary to minimize the size, use Mirage Unikernels or implement your own runtime. For general cases, use strip and upx. (For example, with upx --best I was able to reduce the binary size of your example to 50K, without any more tricks). If performance doesn't matter that much, then you can use bytecode, which is usually smaller than the machine code. Thus you will pay once (about 200k for the runtime), and few bytes for each program (e.g., 200 bytes for your helloworld).
Also, do not create many small binaries, but create one binary. In your particular example, the size of the helloworld compilation unit is 200 bytes in bytecode and 700 bytes in machine code. The rest 50k is the startup harness which should be included only once. Moreover, since OCaml supports dynamic linking in runtime, you can easily create a loader that will load modules when needed. And in this scenario, the binaries will become very small (hundreds of bytes).
It is as if what ocamlopt calls "native compilation" is about packing ocamlrun and the not-native bytecode of your program in one file and make it executable. Not exactly what I would have expected. I obviously missed some important point. But if that is the case, I'll be interested why it isn't as I expected.
No-no, it is completely wrong. Native compilation is when a program is compiled to the machine code, whether it is x86, ARM, or whatever. The runtime is written in C, compiled to machine code, and is also linked. The OCaml Standard Library is written mostly in OCaml, also compiled to machine code, and is also linked into the binary (only those modules that are used, OCaml static linking is very efficient, provided that the program is split into modules (compilation units) fairly well).
Concerning the OCAMLRUNPARAM environment variable, it is just an environment variable that parameterizes the behavior of the runtime, mostly the parameters of the garbage collector.

Related

Why does c++ code create a new exe file when it is run isn vscode and why is it so big?

I am newly learning c++. I ran a c++ code on vscode and it creates an exe file. Why is it?
Why the exe file is so big whereas the main file is small? Is there any way to shorten the size of this file?

Because you ask for one.
g++ Sample.cpp -o Sample
And then you run it
./Sample
This is the ordinary behaviour for creating programs in C++. Doing it this way means that you can send Sample.exe to another computer (of the same operating system and processor architecture), and it will run with no other files needed.
There's maybe tens of bytes of your program, and the rest of the size is the C++ runtime environment.

Your CPU cannot understand what's written inside Sample.cpp, therefore you need to compile it with:
g++ Sample.cpp -o Sample.exe
Now Sample.exe has a lot of information that you, as a human, cannot read or understad, but your computer can! You can ask to execute it with:
./Sample.exe
To reduce Sample.exe size you need to optimize it. There are many flags you can add to the g++ compiler. For example:
g++ Sample.cpp -o Sample.exe -Os -s
Please note that reducing the final .exe size means that the compilation is going to take more time (nothing is free unluckly).
When experimenting with some code ("debug" or "development" mode) it's unusual to reduce the .exe size, as you are just focussing on the result.
Reducing the .exe size is only done when in "production" mode (when the file is going to get published and shared).
Googling g++ reduce exe size leads to many interesting stackoverflow Q&A and many other sites as well! But, remember, this should be done only when publishing the executable online.
A tool that I use for a final size optimization is UPX: the Ultimate Packer for eXecutables.

What is section ".debug_info" in an elf file?

I have an elf file, while analysing the mapfile and elf using elfparser, I saw a section called .Debug_info, which is taking largest memory.
I am compiling for xtensa DSP, using xt-xc++, I haven't used -g option also given -o2 optimization level.
is it possible to remove this for release builds?

section called .debug_info, which is taking largest memory.
Note that this section does not have SHF_ALLOC flag, and therefore does not take any RAM at runtime (it only takes space in the filesystem). Of course if you use ramdisk, then that section still does cost you RAM in the end.
is it possible to remove this for release builds?
Yes: none of the .debug* sections are necessary at runtime, and all of them can be safely stripped.
-g0 and -s option didn't work.
It's likely that you are getting .debug_* sections from libraries that you are linking in, and not from your own code. The -g was present when the libraries were compiled, so building with -g0 doesn't have any effect.
It's surprising that -s didn't work, but maybe your compiler interprets this flag differently.
In any case, you should use strip --strip-debug to get rid of .debug_* sections (note: this does not remove the symbol table).
The best practice is actually to compile all your code with full debug info (-g), save the full debug binary for post-mortem analysis, use strip --strip-debug to make a release binary, and use that binary for actual distribution.
If/when the release binary crashes and leaves a core dump, having (saved) full-debug exactly matching binary greatly improves post-mortem analysis you can do.

How to know about all the files which are compiled together to make an executable?

We are looking for an procedure through which we can easily list down all the file which are compiled together to make an executable.
Use Case : Suppose, We have large repository and we want to know what all are the files existing in repository which are compiled to make an executable (i.e a.out)
For example :
dwarfdump a.out | grep "NS uri"
0x0000064a [ 9, 0] NS uri: "/home/main.c"
0x000006dd [ 2, 0] NS uri: "/home/zzzz.c"
0x000006f1 [ 2, 0] NS uri: "/home/yyyy.c"
0x00000705 [ 2, 0] NS uri: "/home/xxxx.c"
0x00000719 [ 2, 0] NS uri: "/home/wwww.c"
but it doesn't listed down the all the header files.
please suggest.

How to Extract Source Code From Executable with Debug Symbol Available ?
You cannot do that. I guess you are on Linux/x86-64 (and your question is operating system and ABI specific, and debugging format specific). Of course, you should pass -g (or even -g3) to all the gcc compilation commands for your executable. Without that -g or -g3 option used to compile every translation unit (including perhaps those of shared libraries!) you might not have enough information.
Even with debug information in DWARF format, the ELF executable don't contain source code, but only references to source code (e.g. source file path, position as line and column numbers). So the debug information contains stuff like file src/foo.c, line 34 column 5 (but don't give anything about the content of src/foo.c near that position). Of course once gdb knows the file path src/foo.c it is able to read that source file (if available and up to date w.r.t. executable) so it can list it.
Extracting that debugging meta-data is a different question. Once you have understood DWARF you could use tools like objdump or readelf or addr2line or dwarfdump or libdwarf; and you could also script gdb (recent versions of GDB may be extendable in Python or in Guile) and use it on your ELF executable.
Perhaps you should consider Ian Taylor's libbacktrace. It uses the DWARF information to provide nice looking backtraces at runtime.
BTW, cgdb is (like ddd) only a front-end to gdb which does all the real work of processing that DWARF information. It is free software, you can study its source code.
i have only a.out then i want to list done file names
You might try dwarfdump -i | grep DW_AT_decl_file and you could use some GNU awk command instead of grep. You need to dive into the details of DWARF specifications and you need to understand more about the elf(5) format.
It doesn't listed down the all the header files
This is expected. Most header files don't contain any code, only declarations (e.g. printf is not implemented in <stdio.h> but in some C source file of your C standard library, e.g. in tree/src/stdio/printf.c if you use musl-libc; it is just declared in /usr/include/stdio.h). DWARF (and other debug information formats) are describing the binary code. And some header files get included only to give access to a few preprocessor macros (which get expanded or skipped at preprocessing time).
Maybe you dream of homoiconic programming languages, then try Common Lisp (e.g. with SBCL).
If your question is how to use gdb, then please read the Debugging with GDB manual.
If your question is about decompilers, be aware that it is an impossible task in general (e.g. because of Rice's theorem). BTW, programs inside most Linux distributions are generally free software, so it is quite easy to get the source code (and you could even avoid using proprietary software on Linux).
BTW, you could also do more things at compilation time by passing more flags to gcc. You might pass -H or -M (etc...) to gcc (in addition of -g). You could even consider writing your own GCC plugin to collect the information you want in some database (but that is probably not worth the effort). You could also consider improving your build automation (e.g. adding more into your Makefile) to collect such information. BTW, many large C programs use some metaprogramming techniques by having some .c files perhaps containing #line directives generated by tools (e.g. bison) or scripts, then what kind of file path do you want to keep ??
We are looking for an procedure through which we can easily list down all the files which are compiled together to make an executable.
If you are writing that executable and compiling it from its source code, I would suggest collecting that information at build time. It could be as trivial as passing some -M and/or -H flag to gcc, perhaps into some generated timestamp.c file (see this for inspiration; but your timestamp.c might contain information provided by gcc -M etc...). Your timestamp file might contain git version control metadata (like generated in this Makefile). Read also about reproducible builds and about package managers.

What are the optimization levels on D?

What's the ascending order for the dub to build optimized binary below? (e.g. ... debug < plain < release ...)
$ dub build -h
...
-b --build=VALUE Specifies the type of build to perform. Note that
setting the DFLAGS environment variable will override
the build type with custom flags.
Possible names:
debug (default), plain, release, release-debug,
release-nobounds, unittest, profile, profile-gc,
docs, ddox, cov, unittest-cov and custom types
...
The dub build -b release-nobounds seems derived from dmd -O -release -boundscheck=off, so what's the equivalent for dub to build fastest executables?

Those options aren't really about optimizations (and I think it is weird that dub combines them, on dmd itself, those are eight independent switches....), and a lot of people are confused about what they mean, so let me list, using the dmd switch names:
-debug simply compiles in debug statements in the code, e.g. debug writeln("foo"); will only write foo if compiled with -debug. It doesn't do anything else! Importantly, it doesn't include info for debuggers, that is done with -g (though dub might combine these two options).
-g adds symbolic debugging info, for programs like gdb to know function names. This same info is also used on exception stack trace printing, so enabling it will cause the stack traces to show function names too.
-release disables assert statements, in, out, and invariant contracts, and automatic array bounds checks in #system functions (which are the default btw). That's it - it does NOT enable optimizations nor imply the opposite of -debug, it just skips those assert-related items. (note that assert(0); is a special case and is never disabled, but it should never happen anyway - it kills the program.)
-unittest will compile the unittest blocks, and run them right before running main (then main will still run afterward, like normal).
-profile adds timing info before and after functions, and writes that info to a log file when the program is complete. Note that it only works with single-thread programs and its logging can significantly slow down the program itself. You'd use it to figure out which functions are called the most and the slowest to know where to focus your optimization efforts.
-cov adds info to the test log that tells you which lines of your program were actually run, and which weren't.
-profile=gc does GC-specific profiling, and writes out a log with the timing info.
-D generates HTML files from the ddoc info in your code while compiling. dub calls this docs. ddox is similar, but uses a dub-custom doc generator instead of the default dmd html generator. This is ddoc's output: http://dlang.org/phobos/std_algorithm.html and this is ddox's: http://dlang.org/library/std/algorithm.html
-boundscheck=xxxx determines where array bounds checking is compiled - safe functions, all functions, or nowhere. (In old versions, this was tied to the -release switch, but can now be done separately). The default for -release is #safe functions, everywhere else, the default is all functions.
Notice that NONE of those were -O or -inline! Those are the dmd optimization switches: -O means to optimize the code and -inline means to inline functions (it does them separately because sometimes inlining messes up debuggers. The other compilers, gdc and ldc, will inline automatically with their -O options and generally do a better job of it than dmd anyway.)
Personally, I recommend strongly against using -boundscheck and -release - those just hide bugs in most cases without making that big of a difference on final speed. If you find bounds checks in some tight loop are slowing you down, instead of killing it in your entire program with -boundscheck, instead use .ptr on the specific accesses that are slow (you can use -profile to figure out which function to optimize!) Learn more on the tip of the week here: http://arsdnet.net/this-week-in-d/dec-06.html
-release only makes a significant difference if you are doing tons of expensive asserts... and again, I'd prefer to version out the expensive ones individually instead of disabling everything, including the really quick checks that catch legitimately common bugs.
So, I'd recommend going for just -O and maybe -inline for an optimized dmd build. For many (but not all) programs btw, gdc -O and ldc -O do a better job than any dmd switch combination - if you are CPU limited, you might want to try them too.
Back to dub. Check out the package format documentation: http://code.dlang.org/package-format?lang=json
Build type release, so dub build -b release will pass -O -release -inline to dmd. Type release-nobounds adds the nobounds switch too. That's what the dmd docs call the fastest executables, and what I call a buggy mistake.
The best dub option from what I can see (I don't actually use it myself) would be to add buildOptions to optimize in the dub config file (dub.json or dub.sdl)
That gives you -O, then you use stuff like the .ptr technique or version on expensive assert to selectively speed up your hot spots without compromising the anti-bug features in the rest of the program.
Read more dub documentation here:
http://code.dlang.org/package-format?lang=json#build-options

call stack for code compiled without -g option (gcc compiler)

How do I analyze the core dump (using gdb)
which is not compiled with -g GCC option ?

Generate a map file. The map file will tell you the address that each function starts at (as an offset from the start of the exe so you will need to know the base address its loaded too). So you then look at the instruction pointer and look up where it falls in the map file. This gives you a good idea of the location in a given function.
Manually unwinding a stack is a bit of a black art, however, as you have no idea what optimisations the compiler has performed. When you know, roughly, where you are in the code you can generally work out what ought to be on the stack and scan through memory to find the return pointer. its quite involved however. You effectively spend a lot of time reading memory data and looking for numbers that look like memory addresses and then checking that to see if its logical. Its perfectly doable and I, and I'm sure many others, have done it lots of times :)

With ELF binaries it is possible to separate the debug symbols into a separate file. Quoting from objcopy man pages:
Link the executable as normal (using the -g flag). Assuming that is is called foo then...
Run objcopy --only-keep-debug foo foo.dbg to create a file containing the debugging info.
Run objcopy --strip-debug foo to create a stripped executable.
Run objcopy --add-gnu-debuglink=foo.dbg foo to add a link to the debugging info into the stripped executable.

that should not be a problem , you can compile the source again with -g option and pass gdb the core and the new compiled debug binary, it should work without any problem.
BTW You can generate a map file with the below command in gcc
gcc -Wl,-Map=system.map file.c
The above line should generate the map file system.map, Once the map file is generated you can map the address as mentioned above but i am not sure how are you going to map the share library, this is very difficult

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js