Is there any reason why not to strip symbols from executable? - c++

A couple of years ago I asked a question how to reduce size of executables. Using MinGW compiler, stripping symbols (-s option) helped to reduce 50%+ of the size.
Why the stripping is not default - is there any good reason why NOT to strip the symbols in some scenarios then? I'd like to understand it more deeply: today, I just vaguely know that symbols are involved in linking library. Are they needed in executable and do they affect executing speed?

I can't imagine them affecting the execution speed in any noticeable way though, in theory, you could have a tiny amount of cache misses in the process image.
You want to keep symbols in the file when you're debugging it, so that you can see which function you're in, check the values of variables and so forth.
But symbols make the file bigger: potentially, a lot bigger. So, you do not want symbols in a binary that you put on a small embedded device. (I'm talking about a real embedded device, not some 21st century Raspberry Pi with 9,000GB of storage space!)
I generally strip all release builds and don't strip any debug builds. This makes core dumps from customers slightly less useful, but you can always keep a copy of the unstripped release build and debug your core against that.
I did hear about one firm that had a policy of never stripping symbols even in release builds, so they could load a core directly into the debugger. This seems like a bit of an abstraction leak to me but, whatever. Their call.
Of course, if you really want, you can analyse the assembly yourself without them. But that's crazy...

MinGW is an acronym for "Minimalist GNU for Windows"; as such, the compiler suite is GCC ... the GNU Compiler Collection. Naturally, this compiler suite will tend to comply with the GNU Coding Standards, which demand that every build shall be a "debug build" ... i.e. it shall include both debugging symbols, and those required for linking. (This is logical, since a "debug build" is orders of magnitude more useful to the application developer, than is a so-called "release build"). Furthermore, a build system which discriminates between "debug build" and "release build" modes is more complex -- possibly significantly more so -- than one which concerns itself with only a "debug build".
In the GNU build model, there is no need for a "release build" anyway. A "release" is not created at build time; it is created at installation time -- usually as a "staged" installation, from which a release package is created. The GNU tool-chain includes both a strip command, and an install command, which can strip a "debug build" on the fly, when creating a staged installation for packaging as a release, (or in place, if you wish), so there really is no need to clutter the build system with "release build" specifics; just create a "debug build" up -front, then strip it after the event, so converting this to an effective "release build", when you require it.
Since your MinGW tool-chain is fundamentally a GNU tool-chain, it fully supports this GNU build model.

There is no good reason to strip symbols. By stripping symbols, for instance, you will eliminate a possibility to gather process stack - an extremely usuful feature.
EDIT. Several people here seem to be confused about the difference between debug symbols and 'general' symbols. The first are only produced when -g (or similar) option is given to the compiler, and are usually used only in debug builds (though can be used with optimized builds as well). The regular symbols are things such as function names, and are produced by compilers so that object files could be linked, or .so files loaded. There are several extremely useful non-invasive tools which can be used on non-debug running executables which depend on symbols being non-stripped.

Related

Why not always build a release with debug info?

If debug information is stored in a program database (not as part of an executable), is there any reason not to always build with it (e.g., MSVC's /Zi)?
In CMake, the default configurations are, "Release", "Debug", "RelWithDebInfo", and "MinSizeRel". Is there a reason not to only use "Debug" and "RelWithDebInfo" (perhaps renamed to "Release")?
Does it have any impacts on the size or performance of the code? Is the answer different for gcc or clang than it is for Visual C++?
Update
I did come across these posts that are similar:
Anything wrong with releasing software in debug mode?
Debug vs. RelWithDebInfo
However, neither of these get to the question of Release vs. RelWithDebInfo.
Yes. I could do a test on an executable with Release vs. RelWithDebInfo. That would definitely give me the answer about the size of the code, but would be very difficult to conclude that it has NO impact on performance if my test case showed similar performance. How would I know if I exercised aspects of the language that might be impacted by the change? That is, empirical testing could produce a false negative.
Releasing with debug info is mandatory for real-life development. When shit happens your primary tool would be a crash dump analysis that would be rather pointless without debug information. Note that this does not imply shipping debug info with product.
As for "little differences" between vc++ and gcc I would like to mention that by default vc++ emits debug information in a separate file while gcc will squeeze it into executable. It is possible to separate debug information on gcc as well, however doing so is not as convenient and requires some extra steps.
If you build as Release, surely you won't have Debug Info in the binary. I could see a lot of cases on Windows that people build "Release", but add flags to have debug info. So, i think the default real-life case for most Windows Users, even without knowing that, is Release With Debug Info.
With CMake, you have all three options and one more.
Release means no debugging symbols at all, and MAY also means build with optimizations.
RelWithDebugInfo means build with debug symbols, with or without optimizations. Anyway, the debug symbols can be stripped later, both using strip and/or obj-copy. If you use strip, the output can be saved. I guess the format is DWARF, version 4 or 5.
So, RelWithDebugInfo gives you Release mode binaries, and symbols can be stripped and kept apart. In some projects i have worked, that was the ONLY configuration used.
We must not forget about ASSERTS, at least for C/C++. Building with DEBUG, all Asserts are enabled, so they are not removed from the code during preprocessing. Release and RelwithDebugInfo remove the Asserts, so you won't have them. Some projects nowadays prefer to keep Asserts while developing (so that tests can catch "software" errors during development), and then finally remove in Production Code. Some other projects, for whom Fault-Tolerance is a must, may want to keep Asserts even in production code, so that software cannot execute with wrong Assertions about the software itself.
So,
RelWithDebugInfo: Optimized Build, With Symbols to be stripped, no Asserts
Debug: Not optimized, With Symbols that CAN be stripped, With Asserts.
Release: Optimized, No Symbols, No Asserts.
Recently we have cases on Release vs RelWithDebInfo, we compile a shared lib, and found either the performance and the lib file size has a significant difference, Release is much more runtime faster and less file size.
So it looks like Release should be used when shipping on production.

what does mean by debug build and release build, difference and uses [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Debug/Release difference
I want to know what do these two mean: Debug build and Release build and what is the difference between both.
Which one should I use (I mean which are the suitable conditions for each one)
and which build actually I am using now if I make a simple C++ project in Visual studio. [If I do not change any projects settings]
I am asking this because I am trying to make a GUI using wxWidgets 2.9.4 and they give different case of adding required .lib. these are
release ANSI static
debug ANSI static
release Unicode static
debug Unicode static
Please put a detailed answer.
Debug build and release build are just names. They don't mean anything.
Depending on your application, you may build it in one, two or more
different ways, using different combinations of compiler and linker
options. Most applications should only be build in a single version:
you test and debug exactly the same program that the clients use. In
some cases, it may be more practical to use two different builds:
overall, client code needs optimization, for performance reasons, but
you don't want optimization when debugging. And then there are cases
where full debugging (i.e. iterator validation, etc.) may result in code
that is too slow even for algorithm debugging, so you'll have a build
with full debugging checks, one with no optimization, but no iterator
debugging, and one with optimization.
Anytime you start on an application, you have to decide what options you
need, and create the corresponding builds. You can call them whatever
you want.
With regards to external libraries (like wxwidgets): all compilers have
some incompatibilities when different options are used. So people who
deliver libraries (other than in source form) have to provide several
different versions, depending on a number of issues:
release vs. debug: the release version will have been compiled with a
set of more or less standard optimization options (and no iterator
debugging); the debug version without optimization, and with iterator
debugging. Whether iterator debugging is present or not is one thing
which typically breaks binary compatibility. The library vendor should
document which options are compatible with each version.
ANSI vs. Unicode: this probably means narrow char vs wide wchar_t
for character data. Use which ever one corresponds to what you use in
your application. (Note that the difference between these two is much
more than just some compiler switches. You often need radically
different code, and handling Unicode correctly in all cases is far from
trivial; an application which truly supports Unicode must be aware of
things like composing characters or bidirectional writing.)
static vs. dynamic: this determines how the library is linked and
loaded. Usually, you'll want static, at least if you count on deploying
your application on other machines than the one you develop it on. But
this also depends on licensing issues: if you need a license for each
machine where the library is deployed, it might make more sense to use
dynamic.
When doing a DEBUG build the project is set up to not optimize (or only very lightly optimize) the generated code, and to tell the compiler to add debug information (which includes information about functions, variables, and other information needed for debugging). The pre-processor is set up to define the _DEBUG macro.
A RELEASE build on the other hand have higher level of optimization, and no debug information is saved. The pre-processor is set up to define the NDEBUG macro.
Another difference is that certain "system" macros, for example ASSERT-like macros, do different things depending on if _DEBUG or NDEBUG is defined. ASSERT does nothing in a release build, but does checks and abort in debug builds.
The difference between Unicode and non-Unicode is mostly the UNICODE pre-processor macro, which tells header files if certain Unicode functionality should be enabled or not. One thing is that TCHAR will be defined to wchar_t in Unicode builds but as char in non-Unicode builds.
In the debug build you get a lot more error checjking, so if something goes wrong you may get a more informative message ( and it will run more slowly )
In the debug build you will get more information when you run it under the debugger.
You can tell if the build is debug build by looking at the preprocessor definitions of the project properties: _DEBUG will be defined.
You will send the release build to your clients. ( The debug build uses the debug libraries which are not present on most non development machines )
if you want to link a static library to a project, it needs to be compiled with the same settings that you use to compile your code. That's why there is a Debug & a Release version of the library. Additionally, you need to specify whether you want to use unicode or ansi. Here the answer is quite simple (in my opinion) - just use unicode.
What is different in Release compared to Debug so that they can't mix? Mainly it's the memory management. The memory management in Debug does a lot of additional things to allow you to find errors early. As an example, there are canaries that can be checked for overwriting of code. Uninitialized memory is initialized with a specific pattern, ... Additionally, there are a lot of optimizations in release that are not used in debug. This allows release to run faster but makes it difficult to debug the code. Methods might get optimized away and instead are inlined, the parameter passing may be optimized to use registers, ...
So in C++ you manage (at least) 2 configurations. One Debug configuration that you link with the debug library. This one is for developing & testing. And a Release configuration linked with the release library. This one is for delivery. But don't forget that you need to test Release as well as it might behave differently than the Debug configuration.

STL and release/debug library mess

I'm using some 3rd party. I'm using it's shared library version, since the library is big (~60MB) and is used by several applications.
Is there a way at application startup to find out that release/debug version of library is used respectively for release/debug version of my application?
Longer description
The library which exposes C++ interface. One of API methods return std::vector<std::string>.
The problem when I compile my application in debug mode, debug version of the library should be used. Same for release. If incorrect version of the library is used application is crashed.
According to gcc (see http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt03ch17s04.html)
but with a mixed mode standard library
that could be using either debug-mode
or release-mode basic_string objects,
things get more complicated
P.S. 1
It looks like proposal of Timbo is a possible solution - use different soname for debug and release libraries. So, what should be passed to ./configure script to change library soname?
P.S. 2
My problem is not at link time, but rather at run time.
P.S. 3
Here is question demonstrating problem I is facing with.
The debug mode referenced here has nothing to do with debug or release build of your application. The STL debug mode is activated with -D_GLIBCXX_DEBUG and is a special checking mode.
It is very unlikely that the 3rd party library was in fact compiled with STL checking mode, but if it was, it would likely very promptly mention that your code should also be compiled with -D_GLIBCXX_DEBUG.
If the 3rd party library was not built with checking STL, then it is compatible with your code regardless of whether you are doing optimized or debug build.
Since you state that debug build of your code linked with optimized build of 3rd party library causes a crash, that crash is most likely caused by a bug in your code (or possibly by a bug in 3rd party library).
Valgrind and GDB are your friends.
I believe that you have misread the documentation at the link you provide. In particular, you've misunderstood its purpose -- that section is entitled "Goals", and describes a number of hypothetical designs for a C++ debug library and the consequences of those designs in order to explain the actual design choices that were made. The bits of text that follow the lines you quoted are describing the chaos that would result from a hypothetical implementation that had separate designs for release-mode and debug-mode strings. It goes on to say:
For this reason we cannot easily provide safe iterators for the std::basic_string class template, as it is present throughout the C++ standard library.
(Or, rephrasing that, providing a special "debug" version of string iterators is impossible.)
...
With the design of libstdc++ debug mode, we cannot effectively hide the differences between debug and release-mode strings from the user. Failure to hide the differences may result in unpredictable behavior, and for this reason we have opted to only perform basic_string changes that do not require ABI changes. The effect on users is expected to be minimal, as there are simple alternatives (e.g., __gnu_debug::basic_string), and the usability benefit we gain from the ability to mix debug- and release-compiled translation units is enormous.
In other words, the design of the debug and release modes in GCC's libstdc++ has rejected this hypothetical implementation with separate designs for the strings, specifically in order to allow cross-mode linking of the sort that you are worrying about how to avoid.
Thus, you should not have problems with compiling your library once, without -D_GLIBCXX_DEBUG (or with it, if for some reason you prefer), and linking it with either mode of your application. If you do have problems, it is due to a bug somewhere. [But see edit below! This is specific to std::string, not other containers!]
Edit: After this answer was accepted, I followed up in answering the follow-up question at std::vector<std::string> crash, and realized that the conclusion of this answer is incorrect. GCC's libstdc++ does clever things with strings to support "Per-use recompilation" (in which all uses of a given container object must be compiled with the same flags, but uses of the same container class within a program need not be compiled with the same flags), but that is not the same thing as complete "Per-unit compilation" that would provide the cross-linking ability you need. In particular, the documentation says of that cross-linking ability,
We believe that this level of recompilation is in fact not possible if we intend to supply safe iterators, leave the program semantics unchanged, and not regress in performance under release mode....
Thus, if you're passing containers across your library interface, you will need two separate libraries. Honestly, for this situation I've found that the easiest solution is just to install the two libraries into different directories (one for each variant -- and you'll want both to be separate from your main library directory). Alternately, you can rename the debug library file and then install it manually.
As a further suggestion -- you're presumably not running this in debug mode very often. It may be worth only compiling and linking the debug version statically into your application, so you don't have to worry about installing multiple dynamic libraries and keeping them straight at runtime.
Give the debug and release versions of the DLL different names and link the correct one through library dependency. Your application then wont start unless it finds the correct DLL.
This is the sort of check you should be doing in your build system. In your build script,
if you're building for release then link against the release library.
if you're building for debug then link against the debug library.
For instance, if you're using make:
release: $(OBJ)
$(CC) $(CXXFLAGS_RELEASE) $(foreach LIB,$(LIBS_RELEASE),-l$(LIB))
debug: $(OBJ)
$(CC) $(CXXFLAGS_DEBUG) $(foreach LIB,$(LIBS_DEBUG),-l$(LIB))

Using debug/release versions DLL in C++

I am writing an C++ application that could be compiled under Linux (gcc 4.3) and Windows (MS VS08 Express).
My application uses third-party libraries,
On Linux , they are compiled as shared libraries, while
on Windows, there are two versions "Debug" and "Release".
I know that debug version provides extra support for debugging ( just like using -ggdb option in linux gcc, right? )
But I found that if my application is in debug version , the libraries must also be in debug version, otherwise the application will crash.
Why is there such a limit? It seems that there are no such limits in the linux world
Thank you very much!
The Debug configuration of your
program is compiled with full symbolic
debug information and no optimization.
Optimization complicates debugging,
because the relationship between
source code and generated instructions
is more complex.
The Release configuration of your
program contains no symbolic debug
information and is fully optimized.
Debug information can be generated in
PDB Files (C++) depending on the
compiler options used. Creating PDB
files can be very useful if you later
need to debug your release version.
Debug vs Release
It is also possible to debug your release build with the compiler flags.
Debugging Release Builds
Heap Layout - Guard Bytes to prevent overwriting
Compilation - Removing Assert/Debug Info
Pointer Support - Buffers around pointers to prevent seg faults
Optimization - Inline Functions
Common Problems When Creating Release Builds
To elaborate on Martin Tornwall. The various libraries linked when in Debug or Release
LIBCPMT.LIB, Multithreaded, static link, /MT, _MT
LIBCPMTD.LIB, Multithreaded, static link, /MTd, _DEBUG, _MT
CRT Libraries
Most likely, the Release and Debug versions are linked against different versions of the C++ Runtime library. Usually, a Debug build links against the "Multithreaded Debug DLL" runtime, whereas a Release build will generally link against "Multithreaded DLL". Loading DLLs whose runtime libraries are mismatched with that of the application will often lead to mysterious crashes.
You could try verifying that all your DLLs build against the same runtime library as your application, regardless of which configuration (Debug or Release) is active. Whether or not this is desirable is another question entirely.
The ability to choose which runtime library to link with enables the application developer to select the best feature set given her application's requirements. For example, a single-threaded application might incur performance penalties as a result of unnecessary thread synchronization if it is linked with a runtime library that is designed with thread safety in mind. Likewise, the consequences of linking a multi-threaded application against a single-threaded runtime could potentially be disastrous.
While I never intentionally link libraries that were built with different compiler settings, there isn't much point in doing so, I only know of the STL classes in the Dinkumware implementation (the one MSFT uses) to cause this problem.
They support a feature called 'iterator debugging' which is turned on by default in the Debug configuration. This adds members to the classes to aid the diagnostic code. Making them larger. This goes bad when you create the object in a chunk of code that was compiled with one setting and pass it to code that was compiled with the opposite setting. You can turn this off by setting the _HAS_ITERATOR_DEBUGGING macro to 0. Which is rather a major loss, the feature is excellent to diagnose mistakes in the way you use STL classes.
Passing objects or pointers between different libraries is always a problem if you don't carefully control the compile settings. Mixing and matching the CRT version and flavor gets you in trouble when you do so. This normally generates a warning from the linker, not sure what you did to not see it. There won't be one if the code lives in a DLL.

Debugging using gdb - Best practices

I am a beginner in GDB and I got it working correctly. However, I am wondering how this is used in big projects. I have a project where build is done using makefile and g++. For GDB to work, we need to compile with debug symbols on, right (g++ -g files)?
Question
Do I need to create a new target in makefile something like "debug", so that I can make a debug build like make debug. Is this the best practice?
Suppose, I need to debug only foo.cpp and is it possible to generate debug symbols only for that other than building whole program including main?
Any thoughts?
Not needed, although you may want to consider always building with -g (sometimes, you may even need to try and debug optimized (-O1, -O2, etc) code; why not leave -g on? For releases, you can always just run strip on the binary.
Yes. Build just that file with -g .
I don't think there is a big difference between the usage of gdb in big, medium or small projects. However, for big projects you must consider the amount of space required for the build, because the debugging info increases the size of the object and executable files.
If you initially underestimate the need for debugging of the whole solution you will likely suffer from your decision in the future. It is always good when the build could be done with or without debugging information, so write your build scripts carefully.
Yes, but consider my previous answer. Sometimes the problem could be coming from a module for which you don't have debugging info.
In big projects here where I work we always build with most verbose debugging info possible (like, '-ggdb3' for native gdb format or '-gdwarf-2 -g3' for access to macros in gdb).
When we're done with debugging, we just use 'strip' command to strip all debugging info from the binaries.
gcc -ggdb3 blah.c -o blah
strip blah
gdb will work without the symbols; it's just that the output is much less useful then.
It's a matter of preference. I build everything by default in debug mode, and do make release when necessary.
Yes.
You can always have the debug version's saved somewhere, and if you ever need to re-bind the symbol info, after your debugging the stripped/release version, you can just go "file /path" and gdb will re-read the symbols for that target. Also you can use "symbol-file /path" to configure symbol information to be bound to a stripped file.