GDB jumps to wrong lines in out of order fashion - c++

Application Setup :
I've C++11 application consuming the following 3rd party libraries :
boost 1.51.0
cppnetlib 0.9.4
jsoncpp 0.5.0
The application code relies on several in-house shared objects, all of them developed by my team (classical link time against those shared objects is carried out, no usage of dlopen etc.)
I'm using GCC 4.6.2 and the issue appears when using GDB 7.4 and 7.6.
OS - Red Hat Linux release 7.0 (Guinness) x86-64
The issue
While hitting breakpoints within the shared objects code, and issuing gdb next command, sometimes GDB jumps backward to certain lines w/o any plausible reason (especially after exceptions are thrown, for those exceptions there suitable catch blocks)
Similar issues in the web are answered in something along the lines 'turn off any GCC optimization) but my GCC CL clearly doesn't use any optimization and asked to have debug information, pls note the -O0 & -g switches :
COLLECT_GCC_OPTIONS= '-D' '_DEBUG' '-O0' '-g' '-Wall' '-fmessage-length=0' '-v' '-fPIC' '-D' 'BOOST_ALL_DYN_LINK' '-D' 'BOOST_PARAMETER_MAX_ARITY=15' '-D' '_GLIBCXX_USE_NANOSLEEP' '-Wno-deprecated' '-std=c++0x' '-fvisibility=hidden' '-c' '-MMD' '-MP' '-MF' 'Debug_x64/AgentRegisterer.d' '-MT' 'Debug_x64/AgentRegisterer.d' '-MT' 'Debug_x64/AgentRegisterer.o' '-o' 'Debug_x64/AgentRegisterer.o' '-shared-libgcc' '-mtune=generic' '-march=x86-64'
Please also note as per Linux DSO best known methods, we have hidden visibility of symbols, only classes we would like to expose are being exposed (maybe this is related ???)
What should be the next steps in root causing this issue ?

This sort of problem is usually GIGO -- gdb is just acting in the way that the compiler has instructed it to act. So, it's typically a compiler bug rather than a gdb bug. I've seen this happen even with -O0 compilations. The example that comes to mind is that some versions of g++ emitted the location of a variable's declaration when emitting a call to the variable's destructor. This lead to this odd jumping behavior in otherwise straight-line code.

I had a code producing wrong output, and when I tried to debug it with gdb, the lines were jumping arbitrarily. Finally I figured that it was not a gdb problem but a bug in g++: when -O3 was used the last line of a constructor was getting skipped. If I put a printf line after that line, the code would work fine! After changing CFLAGS from -O3 to -O0, the code gave the correct output. I was using c++11 with gcc-5.4.0

When I had a similar problem on the STM32L4R9I Evaluation board, I changed from compiling with -Os to -O0 and now it's working like a charm.
Be sure that you don't have any other file defining compilation flags.

Related

Debug symbols not included in gcc-compiled C++

I am building a module in C++ to be used in Python. My flow is three steps: I compile the individual C++ sources into objects, create a library, and then run a setup.py script to compile the .pyx->.cpp->.so, while referring to the library I just created.
I know I can just do everything in one step with the Cython setup.py, and that is what I used to do. The reason for splitting it into multiple steps is I'd like the C++ code to evolve on its own, in which case I would just use the compiled library in Cython/python.
So this flow works fine, when there are no bugs. The issue is I am trying to find the source of a segfault, so I'd like to get the debugging symbols so that I can run with gdb (which I installed on OSX 10.14, it was a pain but it worked).
I have a makefile, which does the following.
Step 1: Compile individual C++ source files
All the files are compiled with the bare minimum flags, but -g is there:
gcc -mmacosx-version-min=10.7 -stdlib=libc++ -std=c++14 -c -g -O0 -I ./csrc -o /Users/colinww/system-model/build/data_buffer.o csrc/data_buffer.cpp
I think even here there is a problem: when I do nm -pa data_buffer.o, I see no debug symbols. Furthermore, I get:
(base) cmac-2:system-model colinww$ dsymutil build/data_buffer.o
warning: no debug symbols in executable (-arch x86_64)
Step 2: Compile cython sources
The makefile has the line
cd $(CSRC_DIR) && CC=$(CC) CXX=$(CXX) python3 setup_csrc.py build_ext --build-lib $(BUILD)
The relevant parts of setup.py are
....
....
....
compile_args = ['-stdlib=libc++', '-std=c++14', '-O0', '-g']
link_args = ['-stdlib=libc++', '-g']
....
....
....
Extension("circbuf",
["circbuf.pyx"],
language="c++",
libraries=["cpysim"],
include_dirs = ['../build'],
library_dirs=['../build'],
extra_compile_args=compile_args,
extra_link_args=link_args),
....
....
....
ext = cythonize(extensions,
gdb_debug=True,
compiler_directives={'language_level': '3'})
setup(ext_modules=ext,
cmdclass={'build_ext': build_ext},
include_dirs=[np.get_include()])
When this is run, it generates a bunch of compilation/linking commands like
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/colinww/anaconda3/include -arch x86_64 -I/Users/colinww/anaconda3/include -arch x86_64 -I. -I../build -I/Users/colinww/anaconda3/lib/python3.7/site-packages/numpy/core/include -I/Users/colinww/anaconda3/include/python3.7m -c circbuf.cpp -o build/temp.macosx-10.7-x86_64-3.7/circbuf.o -stdlib=libc++ -std=c++14 -O0 -g
and
g++ -bundle -undefined dynamic_lookup -L/Users/colinww/anaconda3/lib -arch x86_64 -L/Users/colinww/anaconda3/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.7/circbuf.o -L../build -lcpysim -o /Users/colinww/system-model/build/circbuf.cpython-37m-darwin.so -stdlib=libc++ -g
In both commands, the -g flag is present.
Step 3: Run debugger
Finally, I run my program with gdb
(base) cmac-2:sim colinww$ gdb python3
(gdb) run system_sim.py
It dumps out a ton of stuff related to system files (seems unrelated) and finally runs my program, and when it segfaults:
Thread 2 received signal SIGSEGV, Segmentation fault.
0x0000000a4585469e in cpysim::DataBuffer<double>::Write(long, long, double) () from /Users/colinww/system-model/build/circbuf.cpython-37m-darwin.so
(gdb) info local
No symbol table info available.
(gdb) where
#0 0x0000000a4585469e in cpysim::DataBuffer<double>::Write(long, long, double) () from /Users/colinww/system-model/build/circbuf.cpython-37m-darwin.so
#1 0x0000000a458d6276 in cpysim::ChannelFilter::Filter(long, long, long) () from /Users/colinww/system-model/build/chfilt.cpython-37m-darwin.so
#2 0x0000000a458b0d29 in __pyx_pf_6chfilt_6ChFilt_4filter(__pyx_obj_6chfilt_ChFilt*, long, long, long) () from /Users/colinww/system-model/build/chfilt.cpython-37m-darwin.so
#3 0x0000000a458b0144 in __pyx_pw_6chfilt_6ChFilt_5filter(_object*, _object*, _object*) () from /Users/colinww/system-model/build/chfilt.cpython-37m-darwin.so
#4 0x000000010002f1b8 in _PyMethodDef_RawFastCallKeywords ()
#5 0x000000010003be64 in _PyMethodDescr_FastCallKeywords ()
As I mentioned above, I think the problem starts in the initial compilation step. This has nothing to do with cython, I'm just calling gcc from the command line, passing the -g flag.
(base) cmac-2:system-model colinww$ gcc --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Any help is appreciated, thank you!
UPDATE
I removed the gcc tag and changed it to clang. So I guess now I'm confused, if Apple will alias gcc to clang, doesn't that imply that in "that mode" it should behave like gcc (and, implied, someone made sure it was so).
UPDATE 2
So, I never could get the debug symbols to appear in the debugger, and had to resort to lots of interesting if-printf statements, but the problem was due to an index variable becoming undefined. So thanks for all the suggestions, but the problem is more or less resolved (until next time). Thanks!
The macOS linker doesn't link debug information into the final binary the way linkers usually do on other Unixen. Rather it leaves the debug information in the .o files and writes a "debug map" into the binary that tells the debugger how to find and link up the debug info read from the .o files. The debug map is stripped when you strip your binary.
So you have to make sure that you don't move or delete your .o files after the final link, and that you don't strip the binary you are debugging before debugging it.
You can check for the presence of the debug map by doing:
$ nm -ap <PATH_TO_BINARY> | grep OSO
You should see output like:
000000005d152f51 - 03 0001 OSO /Path/To/Build/Folder/SomeFile.o
If you don't see that the executable probably got stripped. If the .o file is no around then somebody cleaned your build folder earlier than they should have.
I also don't know if gdb knows how to read the debug map on macOS. If the debug map entries and the .o files are present, you might try lldb and see if that can find the debug info. If it can, then this is likely a gdb-on-macOS problem. If the OSO's and the .o files are all present, then something I can't guess went wrong, and it might be worth filing a bug with http://bugreporter.apple.com.

g++ arm-none-eabi upgrade from 4.9 to gcc 8.2. Generated binary do not fit any more in flash

I recently updated my Linux laptop from Ubuntu 16.04 to 18.04.
I had a STM32 (Cortex-M4) Makefile based project that compiled correctly with the arm-none-eabi g++ version provided by Ubuntu. The generated file required 47620 bytes in the .text section.
With the Ubuntu upgrade, I have also installed an up-to-date version of gcc (from ARM website). Version is 8.2.1.
When I compile the same project (make clean && make), the generated binary do not fit in flash (97424 bytes required, more than twice!). The project is exactly the same (sources, link script, startup files, Makefile).
The compiler options are: -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -DSTM32F303x8 -DARMCM4 -O0 -g -Wall -fexceptions -Wno-deprecated.
The linker options are -mthumb -mcpu=cortex-m4 -Tstm32f303K8.ld -mfloat-abi=hard -mfpu=fpv4-sp-d16 --specs=nosys.specs -lm -Wl,--start-group -lm -Wl,--end-group -Wl,--gc-sections -Lsys -Xlinker -Map=test.elf.map
When I look at the .Map generated file, all the user functions take approximatively the same size (new version saves 8 bytes!). But after, it includes C++ sepcific parts, and one is more than 26Kb (from map file):
.text 0x00000000080079e8 0x683c /usr/local/gcc-arm-none-eabi-8-2018-q4-major/bin/../lib/gcc/arm-none-eabi/8.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/libstdc++.a(cp-demangle.o)
0x000000000800e13c __cxa_demangle
Note: there is no problem with C only projects, only with C++. The library included are the same (gcc 4.9.3 -> armv7e-m/fpu, and gcc 8.2.1 -> thumb/v7e-m+fp/hard):
libm.a libstdc++.a libc.a libnosys.a libgcc.a
Is there a way to get rid of that so that I can compile and flash my (no so old) project?
regards,
I found a solution using the libstdc++_nano (instead of implicit libstc++). With that, the code size is reduced from 84kb to 26kb!
LDFLAGS += -lstdc++_nano
It just works. Thanks #Henrik, #Matthieu and #EOF for your support!
It might be related to exception handling, as std::terminate(), which is used with exceptions, might call the demangling routine. If you don't need exceptions then try disabling them with -fno-exceptions as described here.
Another solution might be to look at the GCC headers:
Demangling routine.
ABI-mandated entry point in the C++ runtime library for demangling.
[...]
returns a pointer to the start of the NUL-terminated demangled
name, or NULL if the demangling fails. The caller is
responsible for deallocating this memory using free.
The prototype is:
char*
__cxa_demangle(const char* __mangled_name, char* __output_buffer,
size_t* __length, int* __status);
So you could probably just supply your own dummy function returning NULL (Given that all library functions are weak, and can be overridden). I'll advise you to look at the disassembled code first though, and find out how and why it is being called in the first place, since it might change behaviour to just discard functionality).
They also give other advise in This forum post, which might be useful for you as well:
Optimize for size with -Os instead of -O0 (possibly add the -Og option instead, if you prefer easily debuggable code, it is often both smaller and faster than -O0).
Optimize at link-time with -flto while compiling and linking.
Maybe disable RTTI if not used.

Cannot find -lubsan on using -fsanitize=undefined (mingw-w64)

I'm using mingw-w64 (gcc 7.3.0) and when I compile any C++ program using the following command:
g++ file.cpp -fsanitize=undefined
I get the following error:
...mingw32/bin/ld.exe: cannot find -lubsan
I'm able to successfully compile and run my programs if I remove the -fsanitize=undefined flag, though. After some research I found out that this means the library ubsan (Undefined Behavior Sanitizer) is missing, but I couldn't find anything about the library. How do I fix this?
This is well known issue with mingw see for instance this msys github issue. No proper solution known but there are several WAs.
Install WSL, ubuntu over WSL and you will have ubsan inside it
Build GCC under Windows from source enabling sanitizers build. They are present in GCC sources they are just not here in mingw.
Use -fsanitize=undefined -fsanitize-undefined-trap-on-error to just not use libubsan rich logging capabilities but get it trap on undefined instruction.
Hope one of this helps.

Code size is doubled when compiling with GCC ARM Embedded?

I've just ported a STM32 microcontroller project from Keil uVision (using Keil ARM Compiler) to CooCox CoIDE (using GCC ARM Embedded compiler).
Problem is, the code size is the double size when compiled in CoIDE with GCC compared to Keil uVision.
How can this be? What can I do?
Code size in Keil: 54632b (.text)
Code size in CoIDE: 100844b (.text)
GCC compiler flags:
arm-none-eabi-gcc -mcpu=cortex-m3 -mthumb -g2 -Wl,-Map=project.map -Os
-Wl,--gc-sections -Wl,-TC:\arm-gcc-link.ld -g -o project.elf -L -lm
I am suspecting CoIDE and GCC to compile a lot of functions and files, that are present in the project, though aren't used (yet). Is it possible that it compiles whole files even if I only use 1 function out of 20 in there? (even though I have -Os)..
Hard to say which files are really compiled/linked in your final binary from the information you give. I suppose it takes all the C files it finds on your project if you did not explicitly specified which one to compile or if you don't use your own Makefile.
But from the compiler options you give, the linker flag --gc-sections won't do much garbage if you don't have the following compiler flags: -ffunction-sections -fdata-sections. Try to add those options to strip all unused functions and data at link time.
Since the question was tagged with C++, I wonder if you would like to disable exceptions and RTTI. Those take quite a bit of code. Add -fno-exceptions -fno-rtti to linker flags.

LLVM & Clang can't compile for a supported arch

Under Ubuntu 64 bit I got
llc --version
LLVM (http://llvm.org/):
LLVM version 3.1
Optimized build with assertions.
Built Oct 15 2012 (18:15:59).
Default target: x86_64-pc-linux-gnu
Host CPU: btver1
Registered Targets:
arm - ARM
mips - Mips
mips64 - Mips64 [experimental]
mips64el - Mips64el [experimental]
mipsel - Mipsel
thumb - Thumb
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
I can't do this
clang -march=arm -x c++ /tmp/cpp.cpp
error: unknown target CPU 'arm'
I'm missing something here ? Why I can't compile for ARM ?
-march is LLVM's internal tools command line option and is not connected with clang at all. If you need to compile for other target you need to specify the target triplet. This can be done in several ways (I do not remember offhand, whether they work with 3.1, but they definitely work with 3.2):
Make a link from clang to your-target-triple-clang, e.g. to
arm-none-linux-gnueabi-clang and compile everything via it
Provide -target option, e.g. clang -target arm-none-linux-gnueabi
To get a list of options of the clang compiler, use:
clang -cc1 -help
To specify the target, use -triple:
clang -cc1 -triple "arm-vendor-os" filename
where "vendor" and "os" should be replaced with the actual vendor and OS name. It can also be replaced with unknown.
-triple is a string of the form ARCHITECTURE-VENDOR-OS or ARCHITECTURE-VENDOR-OS-ENVIRONMENT. For example: x86_64-apple-darwin10
the llvm linker links for the host, which is only one of the targets, it wont link to every target in the list. it will definitely compile for any target. Basically clang goes from C/C++ to bytecode, then llc takes bytecode and makes assembly for the specific target (new experrimental option to take the bytecode straight to object file) then you need to get a cross assembler and a cross linker to take it the final mile (I use gnu binutils). Unfortunately I found that clang to bytecode is not completely generic (I had hoped and expected that it would be), it does in fact change the target independent output based on the target. The example below using the host triple instead of using -march allowed for my examples to build properly on more hosts.
ARMGNU?=arm-none-eabi
LOPS = -Wall -m32 -emit-llvm -ccc-host-triple $(ARMGNU)
OOPS = -std-compile-opts
LLCOPS = -march=thumb -mtriple=$(ARMGNU)
clang $(LOPS) -c blinker03.c -o blinker03.clang.bc
opt $(OOPS) blinker03.clang.bc -o blinker03.clang.thumb.opt.bc
llc $(LLCOPS) blinker03.clang.thumb.opt.bc -o blinker03.clang.thumb.opt.s
$(ARMGNU)-as blinker03.clang.thumb.opt.s -o blinker03.clang.thumb.opt.o
$(ARMGNU)-ld -o blinker03.clang.thumb.opt.elf -T memmap vectors.o blinker03.clang.thumb.opt.o
I have not, but before long will experiment with using the llc straight to object (actually I tried it on a simple test but have not used it on anything larger or posted it anywhere).
You're confusing your flags. clang's -march= wants a processor family. You probably meant to use clang -arch arm instead.
As this comment says this option it's not supported yet under linux, for now.
"-arch arm" is equivalent to "-arch armv4t" in clang. I suppose that a generic "arm" target is not allowed with "-march=", which should require something more precise, such as "armv6", "thumbv7", "armv4t", ...
Try selecting a specific subarch.
Starting Clang 11 (trunk), the list of supported target architectures could be handily printed using the newly added -print-targets flag.