How to debug GCC/LD linking process for STL/C++ - c++

I'm working on a bare-metal cortex-M3 in C++ for fun and profit. I use the STL library as I needed some containers. I thought that by simply providing my allocator it wouldn't add much code to the final binary, since you get only what you use.
I actually didn't even expect any linking process at all with the STL
(giving my allocator), as I thought it was all template code.
I am compiling with -fno-exception by the way.
Unfortunately, about 600KB or more are added to my binary. I looked up what symbols are included in the final binary with nm and it seemed a joke to me. The list is so long I won't try and past it. Although there are some weak symbols.
I also looked in the .map file generated by the linker and I even found the scanf symbols
.text
0x000158bc 0x30 /CodeSourcery/Sourcery_CodeBench_Lite_for_ARM_GNU_Linux/bin/../arm-none-linux-gnueabi/libc/usr/lib/libc.a(sscanf.o)
0x000158bc __sscanf
0x000158bc sscanf
0x000158bc _IO_sscanf
And:
$ arm-none-linux-gnueabi-nm binary | grep scanf
000158bc T _IO_sscanf
0003e5f4 T _IO_vfscanf
0003e5f4 T _IO_vfscanf_internal
000164a8 T _IO_vsscanf
00046814 T ___vfscanf
000158bc T __sscanf
00046814 T __vfscanf
000164a8 W __vsscanf
000158bc T sscanf
00046814 W vfscanf
000164a8 W vsscanf
How can I debug this? For first I wanted to understand what exactly GCC is using for linking (I'm linking through GCC). I know that if symbol is found in a text segment, the
whole segment is used, but still that's too much.
Any suggestion on how to tackle this would really be appreciated.
Thanks

Using GCC's -v and -Wl,-v options will show you the linker commands (and version info of the linker) being used.
Which version of GCC are you using? I made some changes for GCC 4.6 (see PR 44647 and PR 43863) to reduce code size to help embedded systems. There's still an outstanding enhancement request (PR 43852) to allow disabling the inclusion of the IO symbols you're seeing - some of them come from the verbose terminate handler, which prints a message when the process is terminated with an active exception. If you're not using execptions then some of that code is useless to you.

The problem is not about the STL, it is about the Standard library.
The STL itself is pure (in a way), but the Standard Library also includes all those streams packages and it seems that you also managed to pull in the libc as well...
The problem is that the Standard Library has never been meant to be picked apart, so there might not have been much concern into re-using stuff from the C Standard Library...
You should first try to identify which files are pulled in when you compile (using strace for example), this way you can verify that you only ever use header-only files.
Then you can try and remove the linking that occurs. There are options to pass to gcc to precise that you would like a standard library-free build, something like --nostdlib for example, however I am not well versed enough in those to instruct you exactly here.

Related

Is it allowed to name a global variable `read` or `malloc` in C++?

Consider the following C++17 code:
#include <iostream>
int read;
int main(){
std::ios_base::sync_with_stdio(false);
std::cin >> read;
}
It compiles and runs fine on Godbolt with GCC 11.2 and Clang 12.0.1, but results in runtime error if compiled with a -static key.
As far as I understand, there is a POSIX(?) function called read (see man read(2)), so the example above actually invokes ODR violation and the program is essentially ill-formed even when compiled without -static. GCC even emits warning if I try to name a variable malloc: built-in function 'malloc' declared as non-function
Is the program above valid C++17? If no, why? If yes, is it a compiler bug which prevents it from running?
The code shown is valid (all C++ Standard versions, I believe). The similar restrictions are all listed in [reserved.names]. Since read is not declared in the C++ standard library, nor in the C standard library, nor in older versions of the standard libraries, and is not otherwise listed there, it's fair game as a name in the global namespace.
So is it an implementation defect that it won't link with -static? (Not a "compiler bug" - the compiler piece of the toolchain is fine, and there's nothing forbidding a warning on valid code.) It does at least work with default settings (though because of how the GNU linker doesn't mind duplicated symbols in an unused object of a dynamic library), and one could argue that's all that's needed for Standard compliance.
We also have at [intro.compliance]/8
A conforming implementation may have extensions (including additional library functions), provided they do not alter the behavior of any well-formed program. Implementations are required to diagnose programs that use such extensions that are ill-formed according to this International Standard. Having done so, however, they can compile and execute such programs.
We can consider POSIX functions such an extension. This is intentionally vague on when or how such extensions are enabled. The g++ driver of the GCC toolset links a number of libraries by default, and we can consider that as adding not only the availability of non-standard #include headers but also adding additional translation units to the program. In theory, different arguments to the g++ driver might make it work without the underlying link step using libc.so. But good luck - one could argue it's a problem that there's no simple way to link only names from the C++ and C standard libraries without including other unreserved names.
(Does not altering a well-formed program even mean that an implementation extension can't use non-reserved names for the additional libraries? I hope not, but I could see a strict reading implying that.)
So I haven't claimed a definitive answer to the question, but the practical situation is unlikely to change, and a Standard Defect Report would in my opinion be more nit-picking than a useful clarification.
Here is some explanation on why it produces a runtime error with -static only.
The https://godbolt.org/z/asKsv95G5 link in the question indicates that the runtime error with -static is Program returned: 139. The output of kill -l in Bash on Linux contains 11) SIGSEGV (and 128 + 11 = 139), so the process exits with fatal signal SIGSEGV (Segmentation fault) indicating invalid memory reference. The reason for that is that the process tries to run the contents (4 bytes) of the read variable as machine code. (Eventually std::cin >> ... calls read.) Either somethings fails in those 4 bytes accidentally interpreted as machine code, or it fails because the memory page containing those 4 bytes is not executable.
The reason why it succeeds without -static is that with dynamic linking it's possible to have multiple symbols with the same name (read): one in the program executable, and another one in the shared library (libc.so.6). std::cin >> ... (in libstdc++.so.6) links against libc.so.6, so when the dynamic linker tries to find the symbol read at program load time (to be used by libstdc++.so.6), it will look at libc.so.6 first, finding read there, and ignoring the read symbol in the program executable.

What is gcc compiler option "-unsigned" meant for?

I am compiling legacy code using gcc compiler 4.8.5 on RHEL7. All C and C++ files have "-unsigned" as one of the default flags for both gcc and g++ compilation. The compiler accepts this option and compiles successfully. However, I cannot find any documentation of this option anywhere either in gcc manual or online. Does anyone know what this option is? I have to port the code and am confused whether this compiler option needs to be ported or not.
I suspect it was just a mistake in the Makefile, or whatever is used to compile the code.
gcc does not support a -unsigned option. However, it does pass options to the linker, and GNU ld has a -u option:
'-u SYMBOL'
'--undefined=SYMBOL'
Force SYMBOL to be entered in the output file as an undefined
symbol. Doing this may, for example, trigger linking of additional
modules from standard libraries. '-u' may be repeated with
different option arguments to enter additional undefined symbols.
This option is equivalent to the 'EXTERN' linker script command.
The space between -u and the symbol name is optional.
So the intent might have been to do something with unsigned types, but the effect, at least with modern versions of gcc, is to enter nsigned in the output file as an undefined symbol.
This seems to have no effect in the quick test I did (compiling and running a small "hello, world" program). The program runs correctly, and the output of nm hello includes:
U nsigned
With an older version of gcc (2.95.2) on a different system, I get a fatal link-time error about the undefined symbol.
I suspect the intention is for the program to be compiled with -funsigned-char. I don't know whether this instead used to be -unsigned, back in dinosaur times, and perhaps GCC actually still heeds that spelling; there's no way to know from the manual because, if it does, it's an undocumented feature.
I'd ask the original author for the intention here, since they didn't see fit to document.

Barebones C++ without standard library?

Compilers such as GCC and Clang allow to compile C++ programs without the C++ standard library, e.g. using the -nostdlib command line flag. It seems that such often fail to link thou, for example:
void f() noexcept { throw 42; }
int main() { f(); }
Usually fails to link due to undefined symbols like __cxa_allocate_exception, typeinfo for int, __cxa_throw, __gxx_personality_v0, __clang_call_terminate, __cxa_begin_catch, std::terminate() etc.
Even a simple
int main() {}
Fails to link with
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400120
and is killed by the OS upon execution. Using -c the compiler still runs the linker which blatantly fails with:
ld: error in mytest(.eh_frame); no .eh_frame_hdr table will be created.
Is it a realistic goal to program and compile C++ applications or libraries without using and linking to the standard library? How can I compile my code using GCC or Clang on Linux? What core language features would one be unable to use without the standard library?
You will basically find all of your questions answered at osdev.org, but I'll give a brief summary anyway.
When you give GCC -nostdlib, you are saying "no startup or library files". This includes:
crti.o, crtbegin.o, crtend.o and crtn.o. Generally kernel developers only care about implementing crti.o and crtend.o and let GCC supply crtbegin.o and crtend.o by passing -print-file-name= to the linker. Generally these are just stubs that consist of .init and .fini respectively, leaving room for GCC to shove the contents of crtbegin.o and crtend.o respectively. These files are necessary for calling global constructors/destructors.
You can't avoid linking libgcc (the "low-level runtime library" (-lgcc) because even if you pass -nostdlib GCC will emit calls to its functions whenever you use it, leading to inexplicable linking errors for seemingly no reason. This is the case even when you're implementing/porting a C library.
You don't "need" libstdc++ no, but typically kernel developers want it. Porting a C library then implementing the C++ standard library from scratch is an extremely difficult task.
Since you only want to get rid of the "standard library", but keeping libc (on a Linux system) you're essentially programming C++ with just a C library. Of course, there's nothing wrong with this and you do you, but ultimately I don't see the point unless you plan on developing a kernel.
Required reading:
OSDev's C++ page - If you really care about RTTI/exception support, it's more annoying to implement than it sounds. Typically people just pass -fno-rtti or -fno-exceptions and then worry about it down the line or not at all.
"Standard" is a misnomer. In this context it doesn't mean "the library (set of functions, classes etc) as defined by the C++ standard" but "the usual set of libraries and objects (compiled files in a certain format) gcc links with by default". Some of those are necessary for most or even all programs to function.
If you use this flag, it's your responsibility to provide any missing functionality. There are several ways to do so:
Cherry-pick libraries and objects that your program really needs out of the default set. (Makes little sense as the result will most probably be exactly the same as with the default link flags).
Provide your own implementation of missing functionality.
Explicitly disable, through compiler flags, language features your program isn't using. I know of two such features: exceptions and RTTI. This is needed because the compiler needs to generate exceptions-related code and RTTI info even if these features are not explicitly used in this module.

maps in shared memory: Boost.Interprocess demo fails due to unmet date_time dependency

I want to create shared map objects that multiple processes can access. The most promising approach I've found is this demo code from Boost.Interprocess, which allocates map objects in a managed shared memory segment. This question will mostly be about the boost problems I'm having, but I'd also be grateful if anyone has non-boost alternative approaches.
I'm completely new to boost: it looks amazing, if huge, and I was encouraged by its claim that "often, there's nothing to build". But in this case that promise is broken in what seems to be a senseless way, and I'm failing to compile the demo because of dependency problems internal to boost.
I'm on Windows, with Visual C++ Express 2010 installed. After saving the demo code as shmap.cpp I do the following:
"%VS100COMNTOOLS%\..\..\VC\vcvarsall.bat"
cl /EHsc /I boost_1_57_0 shmap.cpp
It compiles OK, but then I get this:
LINK : fatal error LNK1104: cannot open file 'libboost_date_time-vc100-mt-s-1_57.lib'
This surprises me on a number of levels. (Q1): I didn't ask for libraries---where and how is boost leading the linker to expect them? (Q2): Why would it be asking for date_time in particular? At no point in the code is anything as functionally specific as a date or time computed, referenced or included. Is this a case of overzealous blanket dependency, and if so is there a way I can weed it out?
Regardless, the obvious first thing to try was to play the game: in the boost_1_57_0 directory I ran bootstrap.bat followed by b2. The Earth turned a good few degrees, boost was built successfully, and I retried with:
cl /EHsc /I boost_1_57_0 shmap.cpp /link /LIBPATH:boost_1_57_0\stage\lib
I still get the same linker error. This is because b2 seems to have built libs with -mt- and with -mt-gd- in their names, but not with the -mt-s- that the linker is looking for. Boost's "Getting Started" webpage tells me what these stand for but doesn't tell me (Q3): how can I change either the type of library that gets built, or the type that the linker expects?
"At no point in the code is anything as functionally specific as a date or time computed, referenced or included."
(Q2): Why would it be asking for date_time in particular?
Apparently the things you used depend on it.
E.g the mutex operations have timed_lock function
(Q1): I didn't add libraries to the project---where and how is boost leading the linker to expect them?
Boost does autolinking by default. This uses MSVC++ specific pragmas to indicate the right flavour of the right link libraries. This is an awesome feature.
You just have to make sure the import libraries are on the library path for your project.
There are ways to disable auto-linking in boost (I think it involves defining BOOST_ALL_NO_LIB)
There might be ways to
disable dependency on boost date_time (dropping features); see the autl-link description in the Getting Started guide
linking to date_Time statically (or make it header-only)
I'd refer to the documentation for this.
Here's what I've learned, in large part thanks to sehe:
Q1: It's magic---specifically, MSVC-specific magic---and it happens because it's necessary.
Q2: It becomes unnecessary---i.e. the demo can be compiled without needing to look for a binary date_time lib---if I add /DBOOST_ALL_NO_LIB to the compile flags. But it's unclear whether that will still be true once I start to use additional IPC functionality like time-dependent mutexing.
Q3: Strings from the "Boost.Build option" column of this table can be passed to b2, so the way to create *-mt-s-*.lib is to say b2 runtime-link=static. This finally lets me compile without the /DBOOST_ALL_NO_LIB flag, and discover that date_time is the only library the demo seems to need.
I also discovered that the dependencies can be tracked with the bcp tool, and (eventually) also how to build bcp in the first place, as follows:
build:
cd boost_1_57_0
bjam tools\bcp
cd ..
report:
boost_1_57_0\dist\bin\bcp.exe --boost=boost_1_57_0 --report --scan shmap.cpp report.html
The result is that the maps-in-shared-memory demo needs 1421 files from boost 1.57.0.

Is it possible to symbolicate C++ code?

I have been running into trouble recently trying to symbolicate a crash log of an iOS app. For some reason the UUID of the dSYM was not indexed in Spotlight. After some manual search and a healthy dose of command line incantations, I managed to symbolicate partially the crash log.
At first I thought the dSYM might be incomplete or something like that, but then I realized that the method calls missing were the ones occurring in C++ code: this project is an Objective-C app that calls into C++ libraries (via Objective-C++) which call back to Objective-C code (again, via Objective-C++ code). The calls that I'm missing are, specifically, the ones that happen in C++ land.
So, my question is: is there some way that the symbolication process can resolve the function calls of C++ code? Which special options do I need to set, if any?
One useful program that comes with the apple sdk is atos (address to symbol). Basically, here's what you want to do:
atos -o myExecutable -arch armv7 0x(address here)
It should print out the name of the symbol at that address.
I'm not well versed in Objective-C, but I'd make sure that the C++ code is being compiled with symbols. Particularly, did you make sure to include -rdynamic and/or -g when compiling the C++ code?
try
dwarfdump --lookup=0xYOUR_ADRESS YOUR_DSYM_FILE
you will have to look up each adress manually ( or write a script to do this ) but if the symbols are ok ( your dSym file is bigger than say 20MB) this will do the job .