What's the ascending order for the dub to build optimized binary below? (e.g. ... debug < plain < release ...)
$ dub build -h
...
-b --build=VALUE Specifies the type of build to perform. Note that
setting the DFLAGS environment variable will override
the build type with custom flags.
Possible names:
debug (default), plain, release, release-debug,
release-nobounds, unittest, profile, profile-gc,
docs, ddox, cov, unittest-cov and custom types
...
The dub build -b release-nobounds seems derived from dmd -O -release -boundscheck=off, so what's the equivalent for dub to build fastest executables?
Those options aren't really about optimizations (and I think it is weird that dub combines them, on dmd itself, those are eight independent switches....), and a lot of people are confused about what they mean, so let me list, using the dmd switch names:
-debug simply compiles in debug statements in the code, e.g. debug writeln("foo"); will only write foo if compiled with -debug. It doesn't do anything else! Importantly, it doesn't include info for debuggers, that is done with -g (though dub might combine these two options).
-g adds symbolic debugging info, for programs like gdb to know function names. This same info is also used on exception stack trace printing, so enabling it will cause the stack traces to show function names too.
-release disables assert statements, in, out, and invariant contracts, and automatic array bounds checks in #system functions (which are the default btw). That's it - it does NOT enable optimizations nor imply the opposite of -debug, it just skips those assert-related items. (note that assert(0); is a special case and is never disabled, but it should never happen anyway - it kills the program.)
-unittest will compile the unittest blocks, and run them right before running main (then main will still run afterward, like normal).
-profile adds timing info before and after functions, and writes that info to a log file when the program is complete. Note that it only works with single-thread programs and its logging can significantly slow down the program itself. You'd use it to figure out which functions are called the most and the slowest to know where to focus your optimization efforts.
-cov adds info to the test log that tells you which lines of your program were actually run, and which weren't.
-profile=gc does GC-specific profiling, and writes out a log with the timing info.
-D generates HTML files from the ddoc info in your code while compiling. dub calls this docs. ddox is similar, but uses a dub-custom doc generator instead of the default dmd html generator. This is ddoc's output: http://dlang.org/phobos/std_algorithm.html and this is ddox's: http://dlang.org/library/std/algorithm.html
-boundscheck=xxxx determines where array bounds checking is compiled - safe functions, all functions, or nowhere. (In old versions, this was tied to the -release switch, but can now be done separately). The default for -release is #safe functions, everywhere else, the default is all functions.
Notice that NONE of those were -O or -inline! Those are the dmd optimization switches: -O means to optimize the code and -inline means to inline functions (it does them separately because sometimes inlining messes up debuggers. The other compilers, gdc and ldc, will inline automatically with their -O options and generally do a better job of it than dmd anyway.)
Personally, I recommend strongly against using -boundscheck and -release - those just hide bugs in most cases without making that big of a difference on final speed. If you find bounds checks in some tight loop are slowing you down, instead of killing it in your entire program with -boundscheck, instead use .ptr on the specific accesses that are slow (you can use -profile to figure out which function to optimize!) Learn more on the tip of the week here: http://arsdnet.net/this-week-in-d/dec-06.html
-release only makes a significant difference if you are doing tons of expensive asserts... and again, I'd prefer to version out the expensive ones individually instead of disabling everything, including the really quick checks that catch legitimately common bugs.
So, I'd recommend going for just -O and maybe -inline for an optimized dmd build. For many (but not all) programs btw, gdc -O and ldc -O do a better job than any dmd switch combination - if you are CPU limited, you might want to try them too.
Back to dub. Check out the package format documentation: http://code.dlang.org/package-format?lang=json
Build type release, so dub build -b release will pass -O -release -inline to dmd. Type release-nobounds adds the nobounds switch too. That's what the dmd docs call the fastest executables, and what I call a buggy mistake.
The best dub option from what I can see (I don't actually use it myself) would be to add buildOptions to optimize in the dub config file (dub.json or dub.sdl)
That gives you -O, then you use stuff like the .ptr technique or version on expensive assert to selectively speed up your hot spots without compromising the anti-bug features in the rest of the program.
Read more dub documentation here:
http://code.dlang.org/package-format?lang=json#build-options
Related
I am wanting to measure the time it takes for my C++ video processing program to process a video. I am using CLion to write the program and have Cmake set up to compile and automatically run the program with a test video. However, in order to find execution time I have been using the following command in the MacOS terminal:
% time ./main ../Media/test_video.mp4
Is there a way for me to configure Cmake to automatically include time in the execution of ./main to streamline my process further?
So far I've tried using set(CMAKE_ARGS time "test_video.mp4") and some command line argument functions but they don't seem to be acting in the way that I'm looking for.
It is possible to use add_custom_target to do what you want. I'll not consider this option further as it seems abusing the build system for something it wasn't designed to do. Yet it may have an advantage over using CLion configuration: it would be available to be used outside of CLion. That advantage seems minor: why not run the desired command directly in those contexts?
The first CLion method would be to define an external tool which run time on the current build target. In File|Settings...|Tools|External Tools define a new tool with /bin/time as program, $CMakeCurrentProductFile$ $Prompt$ as arguments. When choosing that tools (in Tools|External Tools) it will now prompt you for the argument and then run /bin/time on the current target with the provided arguments. Advantage: you don't have to define the tool once, it will be available in every project. Inconvenients: the external tools are available in every project, thus it doesn't make sense to be more specific than $Prompt$ for the arguments and the working directory; it isn't possible to specify environment variables, it isn't possible to enforce the need of a build before running the command.
The second CLion method would to be define a Run/Debug Configuration. Again use /bin/time as program (chose "Custom Executable"), specify $CMakeCurrentProductFile$ as first argument (here it makes sense to provide the other arguments as desired, but note that $Prompt$ is still a valid choice if needed). Advantages: it makes sense to be as specific as needed; you have all the feature of configurations (environment variables, input redirections, specifying actions to be executed before the execution). Inconvenient: it doesn't work with other CLion features which assume that the program is the target such as the debugger, the profiler, ... so you may have to duplicate configurations to get them.
Note that the methods aren't exclusive: you can define an external tools and then add configurations for the case where it is more convenient.
I was quite surprised to see that even a simple program like:
print_string "Hello world !\n";
when statically compiled to native code through ocamlopt with some quite aggressive options (using musl), would still be around ~190KB on my system.
$ ocamlopt.opt -compact -verbose -o helloworld \
-ccopt -static \
-ccopt -s \
-ccopt -ffunction-sections \
-ccopt -fdata-sections \
-ccopt -Wl \
-ccopt -gc-sections \
-ccopt -fno-stack-protector \
helloworld.ml && { ./helloworld ; du -h helloworld; }
+ as -o 'helloworld.o' '/tmp/camlasm759655.s'
+ as -o '/tmp/camlstartupfc4271.o' '/tmp/camlstartup5a7610.s'
+ musl-gcc -Os -o 'helloworld' '-L/home/vaab/.opam/4.02.3+musl+static/lib/ocaml' -static -s -ffunction-sections -fdata-sections -Wl -gc-sections -fno-stack-protector '/tmp/camlstartupfc4271.o' '/home/vaab/.opam/4.02.3+musl+static/lib/ocaml/std_exit.o' 'helloworld.o' '/home/vaab/.opam/4.02.3+musl+static/lib/ocaml/stdlib.a' '/home/vaab/.opam/4.02.3+musl+static/lib/ocaml/libasmrun.a' -static -lm
Hello world !
196K helloworld
How to get the smallest binary from ocamlopt ?
A size of 190KB is way too much for a simple program like that in today's constraints (iot, android, alpine VM...), and compares badly with simple C program (around ~6KB, or directly coding ASM and tweaking things to get a working binary that could be around 150B). I naïvely thought that I could simply ditch C to write simple static program that would do trivial things and after compilation I would get some simple assembly code that wouldn't be so far in size with the equivalent C program. Is that possible ?
What I think I understand:
When removing gcc's -s to have some hints about what is left in the binary, I can notice a lot of ocaml symbols, and I also kinda read that some environment variable of ocamlrun are meant to be interpreted even in this form. It is as if what ocamlopt calls "native compilation" is about packing ocamlrun and the not-native bytecode of your program in one file and make it executable. Not exactly what I would have expected. I obviously missed some important point. But if that is the case, I'll be interested why it isn't as I expected.
Other languages compiling to native code having the same issue: leaving some naïve user (as myself) with roughly the same questions:
Go: Reason for huge size of compiled executable of Go
Rust: Why are Rust executables so huge?
I've tested also with Haskell, and without tweaks, all languages compilers are making binaries above 700KB for the "hello world" program (it was the same for Ocaml before the tweaks).
Your question is very broad and I'm not sure that it fits the format of Stackoverflow. It deserves a thorough discussion.
A size of 190KB is way too much for a simple program like that in today's constraints (iot, android, alpine VM...), and compares badly with simple C program (around ~6KB, or directly coding ASM and tweaking things to get a working binary that could be around 150B)
First of all, it is not a fair comparison. Nowadays, a compiled C binary is an artifact that is far from being a standalone binary. It should be seen more like a plugin in a framework. Therefore, if you would like to count how many bytes a given binary actually uses, we shall count the size of the loader, shell, the libc library, and the whole linux or windows kernel - which in total form the runtime of an application.
OCaml, unlike Java or Common Lisp, is very friendly to the common C runtime and tries to reuse most of its facilities. But OCaml still comes with its own runtime, in which the biggest (and most important part) is the garbage collector. The runtime is not extremely big (about 30 KLOC) but still contributes to the weight. And since OCaml uses static linking every OCaml program will have a copy of it.
Therefore, C binaries have a significant advantage as they are usually run in systems where the C runtime is already available (therefore it is usually excluded from the equation). There are, however, systems where there is no C runtime at all, and only OCaml runtime is present, see Mirage for example. In such systems, OCaml binaries are much more favorable. Another example is the OCaPic project, in which (after tweaking the compiler and runtime) they managed to fit OCaml runtime and programs into 64Kb Flash (read the paper it is very insightful about the binary sizes).
How to get the smallest binary from ocamlopt?
When it is really necessary to minimize the size, use Mirage Unikernels or implement your own runtime. For general cases, use strip and upx. (For example, with upx --best I was able to reduce the binary size of your example to 50K, without any more tricks). If performance doesn't matter that much, then you can use bytecode, which is usually smaller than the machine code. Thus you will pay once (about 200k for the runtime), and few bytes for each program (e.g., 200 bytes for your helloworld).
Also, do not create many small binaries, but create one binary. In your particular example, the size of the helloworld compilation unit is 200 bytes in bytecode and 700 bytes in machine code. The rest 50k is the startup harness which should be included only once. Moreover, since OCaml supports dynamic linking in runtime, you can easily create a loader that will load modules when needed. And in this scenario, the binaries will become very small (hundreds of bytes).
It is as if what ocamlopt calls "native compilation" is about packing ocamlrun and the not-native bytecode of your program in one file and make it executable. Not exactly what I would have expected. I obviously missed some important point. But if that is the case, I'll be interested why it isn't as I expected.
No-no, it is completely wrong. Native compilation is when a program is compiled to the machine code, whether it is x86, ARM, or whatever. The runtime is written in C, compiled to machine code, and is also linked. The OCaml Standard Library is written mostly in OCaml, also compiled to machine code, and is also linked into the binary (only those modules that are used, OCaml static linking is very efficient, provided that the program is split into modules (compilation units) fairly well).
Concerning the OCAMLRUNPARAM environment variable, it is just an environment variable that parameterizes the behavior of the runtime, mostly the parameters of the garbage collector.
I have an elf file, while analysing the mapfile and elf using elfparser, I saw a section called .Debug_info, which is taking largest memory.
I am compiling for xtensa DSP, using xt-xc++, I haven't used -g option also given -o2 optimization level.
is it possible to remove this for release builds?
section called .debug_info, which is taking largest memory.
Note that this section does not have SHF_ALLOC flag, and therefore does not take any RAM at runtime (it only takes space in the filesystem). Of course if you use ramdisk, then that section still does cost you RAM in the end.
is it possible to remove this for release builds?
Yes: none of the .debug* sections are necessary at runtime, and all of them can be safely stripped.
-g0 and -s option didn't work.
It's likely that you are getting .debug_* sections from libraries that you are linking in, and not from your own code. The -g was present when the libraries were compiled, so building with -g0 doesn't have any effect.
It's surprising that -s didn't work, but maybe your compiler interprets this flag differently.
In any case, you should use strip --strip-debug to get rid of .debug_* sections (note: this does not remove the symbol table).
The best practice is actually to compile all your code with full debug info (-g), save the full debug binary for post-mortem analysis, use strip --strip-debug to make a release binary, and use that binary for actual distribution.
If/when the release binary crashes and leaves a core dump, having (saved) full-debug exactly matching binary greatly improves post-mortem analysis you can do.
Having read the Meson site pages (which are generally high quality), I'm still unsure about the intended best practice to handle different options for different buildtypes.
So to specify a debug build:
meson [srcdir] --buildtype=debug
Or to specify a release build:
meson [srcdir] --buildtype=release
However, if I want to add b_sanitize=address (or other arbitrary complex set of arguments) only for debug builds and b_ndebug=true only for release builds, I would do:
meson [srcdir] --buildtype=debug -Db_sanitize=address ...
meson [srcdir] --buildtype=release -Db_ndebug=true ...
However, it's more of a pain to add a bunch of custom arguments on the command line, and to me it seems neater to put that in the meson.build file.
So I know I can set some built in options thusly:
project('myproject', ['cpp'],
default_options : ['cpp_std=c++14',
'b_ndebug=true'])
But they are unconditionally set.
So a condition would look something like this:
if get_option('buildtype').startswith('release')
add_project_arguments('-DNDEBUG', language : ['cpp'])
endif
Which is one way to do it, however, it would seem the b_ndebug=true way would be preferred to add_project_arguments('-DNDEBUG'), because it is portable.
How would the portable-style build options be conditionally set within the Meson script?
Additionally, b_sanitize=address is set without any test whether the compiler supports it. I would prefer for it to check first if it is supported (because the library might be missing, for example):
if meson.get_compiler('cpp').has_link_argument('-fsanitize=address')
add_project_arguments('-fsanitize=address', language : ['cpp'])
add_project_link_arguments('-fsanitize=address', language : ['cpp'])
endif
Is it possible to have the built-in portable-style build options (such as b_sanitize) have a check if they are supported?
I'm still unsure about the intended best practice to handle different options for different buildtypes
The intended best practice is to use meson configure to set/change the "buildtype" options as you need it. You don't have to do it "all at once and forever". But, of course, you can still have several distinct build trees (say, "debug" and "release") to speed up the process.
How would the portable-style build options be conditionally set within the Meson script?
Talking of b_ndebug, you can use the special value: ['b_ndebug=if-release'], which does exactly what you want. Also, you should take into account, that several GNU-style command-line arguments in meson are always portable, due to the internal compiler-specific substitutions. If I remember correctly, these include: -D, -I, -L and -l.
However, in general, changing "buildtype" options inside a script (except default_options, which are meant to be overwritten by meson setup/configure), is discouraged, and meson intentionally lacks set_option() function.
Is it possible to have the built-in portable-style build options (such as b_sanitize) have a check if they are supported?
AFAIK, no, except has_argument() you've used above. However, if some build option, like b_sanitize, is not supported by the underlying compiler, then it will be automatically set to void, so using it shouldn't break anything.
I was trying to verify if my macro really works during compilation by md5sum command in ubuntu.
For example, by "nvcc -DTEST_MACRO ...." I got an executable A.
Then by "nvcc ..." I got an executable B.
Of course the md5 values are different.
But, I recompiled and generated A again. Its md5 is different from previous one.
I took a pure c++ code and checked with g++, and its md5 value turns out to be the same no matter how many times I compiled. So I think there is something like time stamp in the executable generated by nvcc.
Just out of curiosity, how do I verify if my thought is right?
Anyway, how do I verify if "TEST_MACRO" really works in this case?
I think this variability is not necessarily due to an embedded timestamp, but rather by the way nvcc builds executables.
nvcc is a compiler-driver, meaning it launches a sequence of commands "under the hood" to compile code. During the execution of this sequence, a variety of temporary files are created with randomly-generated filenames. You can get a sense of this by looking at the output of your nvcc compile command with the -v switch added.
Some of these filenames do get embedded in the executable, and since these randomly-generated file names vary from one invocation of the nvcc compile command to the next, the resultant binary will vary.
If you want to verify this yourself, run your nvcc command with -v added. Then inspect the output at the end for a tmpxft... filename. Then grep the generated executable for that filename, eg.:
grep tmpxft_0000a76e myexe
(replace the tmpxft_0000a76e with whatever appears in your nvcc verbose output, and replace myexe with the actual name of your executable.)
If you want to verify if a TEST_MACRO really works, there are a few options. The least intrusive might be to put the following line in your TEST_MACRO body:
#ifdef TEST_MACRO
...
#warning TEST_MACRO_COMPILED
...
#endif
and you should see this echo'ed to the output during compilation, when you specify -DTEST_MACRO
(The above is a useful technique to avoid mistakenly including debug macros and other things you don't want in a production/release build of code.)
Of course, there are probably many other possibilities. If the test macro includes executable code, you could put a printf statement in it, to see evidence at run-time.