Pure virtual method called - cross compiling - c++

I'm writing an event-based programming library for use on the BeagleBone Black and have encountered a strange error.
When I compile the exact same code with the exact same flags I receive the following errors on the ARM-based processor, but not when I run the code compiled for my x86 computer.
$ ./missionControl
pure virtual method called
pure virtual method called
pure virtual method called
terminate called recursively
terminate called recursively
Aborted
When I compile and run on my laptop, the program runs correctly.
This is the command I'm using to compile (ish, I'm using a Makefile, but both methods of compilation exhibit precisely the same behavior):
g++ -std=gnu++11 -pthread -O3 -D_GLIBCXX_USE_NANOSLEEP -o missionControl `find . -name *.cpp`
It doesn't matter whether I cross-compile with Ubuntu's arm-linux-gnueabi-g++ or the ARM-compatible g++ on the actual BeagleBone, I still get errors on ARM.
My question is this: What could be causing this error, and what can I do to try to find the source? Why would this happen on one processor architecture, but not another, for the same version of G++?
Thanks!
Here's a backtrace from the ARM processor's GDB:
#0 0xb6d4adf8 in raise () from /lib/libc.so.6
#1 0xb6d4e870 in abort () from /lib/libc.so.6
#2 0xb6f50ab4 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#3 0xb6f4ea4c in ?? () from /usr/lib/libstdc++.so.6
#4 0xb6f4ea4c in ?? () from /usr/lib/libstdc++.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

The problem turned out to be due to a bug in the ARM version of libstdc++ that runs on the BeagleBone. A small toy program that has no virtual functions at all causes the same error ("pure virtual function called") when std::thread is created.
I'm going to try to compile a custom version of gcc/libstdc++ 4.8 on the BeagleBone itself -- even if it takes a long time.

The pure virtual method called error occurs when you attempt to use dynamic dispatch to call a function that is pure virtual in a base before the derived type that implements it has been constructed or after it has already destructed.
The most common cause for this is if the base class attempts to call a virtual function that is pure at this level through the constructor or destructor. Other than that, as it has been pointed out in some comments, if you attempt to access a dead object, you might also run into this same issue.
Just attach a debugger to the program and see what virtual function is called and from where.

See: https://groups.google.com/forum/#!topic/automatak-dnp3/Jisp_zGhd5I
And: Why does this simple c++11 threading-example fail, when compiled with clang 3.2?
Now, I have no idea why this works, but it does for me at least. Add the following four preprocessor definitions to the compiler command line:
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_1
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_2
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4
__GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
I haven't experimented to see if they are all required, or whether or not you can get away with only some. But this solved the problem for me. Thanks to those who wrote the above answers, and thanks to my colleague for out-googling me :)

Related

Meaning of this=<optimized out> in GDB

I understand the general concept of using optimization flags like -O2 and ending up having had things optimized out, makes sense. But what does it mean for the 'this' function parameter in a gdb frame to be optimized out? Does it mean the use of an Object was determined to be entirely pointless, and that it, and the following function call was elided from existence? Is it indicative of a function having been inlined? Is it indicative of the function call having been elided?
How would I go about investigating further? This occurs with both -O0 and -Og.
If it makes any difference, this is with an ARM process. I'm doing remote debugging using GNU gdbserver (GDB) 7.12.1.20170417-git and 'gdb-multiarch' GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1.
But what does it mean for the 'this' function parameter in a gdb frame to be optimized out?
It means that GDB doesn't have sufficient debug info to understand the current value of this.
It could happen for two reasons:
the compiler failed to emit relevant debug info
the info is there, but GDB failed to understand it
GCC used to do (1) a lot with -O2 and higher optimization levels, but that has been significantly improved around 2015-2016. I have never seen <optimized out> with GCC at -O0.
Clang still does (1) with -O2 and above on x86_64 in 2022, but again I've never seen it do that at -O0.
How would I go about investigating further?
You can run readelf --debug-dump ./a.out and see what info is present in the binary. Beware -- there is a lot of info, and making sense of it requires understanding of what's supposed to be there.
Or you could file a bugzilla issue with exact compiler and debugger versions and compilation command, attach a small binary, and hope that someone will look.
But first make sure you still get this behavior from the latest released version of GCC and GDB (or the current tip-of-trunk versions if you can build them).

std::thread weak when using -static-libstdc++, thus causing crash at runtime

I need to build a portable shared object, which is a plugin for another software on Linux. I did some amount of reading on the subject, came down to the conclusion, that I should build a sysrooted gcc (gcc 5.4.0 if it matters) with a decently old glibc (to provide compatibility with older systems), link with -static-libstdc++ and -static-libgcc thus arriving to a point where I have something that only depends on the hosts glibc and some other minor stuff which will always be present.
Now, I did all that and now I am experiencing a weird crash - segmentation fault happens in a place where the code calls std::thread, and gdb actually shows that the stack frame is inside libstdc++.so.6 (where is shouldn't be, ldd of my shared object also does not list libstdc++.so). The top of the stack at the crash is:
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff79075e3 in std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6 # THIS SHOULD NOT BE HERE RIGHT?
#2 0x00007ffff5a25a5c in std::thread::thread<void (ReferenceAnalytics::*)(std::timed_mutex&), ReferenceAnalytics*&, std::reference_wrapper<std::timed_mutex> >
(this=0x7fffffffcf40, __f=
#0x7fffffffcf60: (void (ReferenceAnalytics::*)(ReferenceAnalytics * const, std::timed_mutex &)) 0x7ffff5a1750c <ReferenceAnalytics::WorkerThreadMethod(std::timed_mutex&)>)
at /home/developer/Toolchains/x86_64-unknown-linux-gnu/x86_64-unknown-linux-gnu/include/c++/5.4.0/thread:137 # Looks like my toolchain
So, I did some reading, and then using nm discovered that my shared object has all the std::thread stuff like ctor, dtor, swap, .... defined as weak symbols (which I assume causes a collision if the host that loads the plugin uses dynamic libstdc++ and then my calls are routed there and all hell breaks loose, is this right?).
My further attempts of googling and reading did not give me an answer to how can I control this as in force the std::thread stuff to be resolved to the static libstdc++ in my sysrooted gcc?
More over, I made a small executable that just does dlopen on my shared object and then calls a method which internally constructs the thread - if the executable is also built with -static-libstdc++ all is well, if not, the crash happens. So I assume my theory about the weak symbol for std::thread being resolved to the hosts libstdc++ is correct, but how to solve this?
If you statically link a DSO against libstdc++ without hiding the libstdc++ symbols, and the main program is linked against libstdc++ as well, then the symbol definitions in the main program will interpose/preempt the definitions in the DSO when it is opened with dlopen.
However, because the main program is not linked against libpthread, the the system libstdc++ DSO in the process image saw that the libpthread symbols were unavailable (null), and thus disabled thread support. However, your DSO needs thread support, but can't get it from the system libstdc++.
As an immediate workaround, you can hide all the statically linked libstdc++ symbols in the DSO. Then no interposition will take place, and your DSO will actually use the libstdc++ copy in the DSO itself, which has already established that there should not be any thread support in the process.
But this will likely not solve all of your problems because late loading of libpthread via dlopen has its problems. We fixed one bug here:
Segfault after a binary without pthread dlopen()s a library linked with pthread
But your distribution may not have that fix, and I expect there will be other issues, one of them being: The second, statically linked copy of libstdc++ is actually needed here because the system libstdc++ has been loaded without thread support (because libpthread was not loaded when its symbols were bound, causing the crash you observed), so you cannot use it for creating threads. It also has activated optimizations which make the library not thread safe (avoid atomic instructions and things like that).

illegal instruction in boost::gregorian::date::date

I have C++ program that uses boost (Logger mainly). This programs compiles and runs well on Windows and Ubuntu. However, when I try to port it to Linux Yocto on an embedded system (Intel Atom processor), I got illegal instructions error at runtime.
The program itself is built on Ubuntu PC with Intel-i5.
I debugged the issue and it was some AVX instructions from another library (OpenCV). I disabled all AVX and the problem solved but another problem occurred.
It now tells me that (after reading the core dumb using gdb):
Program terminated with signal SIGILL, Illegal instruction.
0x00007fe1aed03ade in boost::gregorian::date::date(boost::gregorian::greg_year,
boost::gregorian::greg_month, boost::gregorian::greg_day) ()
I did not use boost::gregorian::date explicitly
Is it possible that boost::gregorian::date use some optimized insruction?! like SSE or AVX? (seems non-logical)
Any clue about the issue?
P.S. the error occurs at run-time before anything else. Even a cout at the first line of the main function is not executed before I got the error. So, I suspect some static constructor inside boost causes the problem since there is no static constructor at my code.
Edit:
All librires and the program itself are compiled with -march=bonnell -mno-avx -O2

This pointer is 0xfffffffc, potential causes?

I'm compiling the Crypto++ library at -O3. According to Undefined Behavior Sanitizer (UBsan) and Address Sanitizer (Asan), its OK. The program runs fine at -O2 (and -O3 on many platforms).
Its also OK according to Valgrind under -O2. At -O3, Valgrind dies with "Your program just tried to execute an instruction that Valgrind does not understand". I'm fairly certain that's because of SSE4 instructions and vectorizations at -O3.
However, I'm catching a crash on some platforms with -O3. This particular machine is Fedora 22 i686, and its has GCC 5.2.1. The frame in question shows this=0xfffffffc:
Program received signal SIGSEGV, Segmentation fault.
0x0807be29 in CryptoPP::DL_GroupParameters_IntegerBased::GetEncodedElementSize
(this=0xfffffffc, reversible=0x1) at gfpcrypt.h:55
55 unsigned int GetEncodedElementSize(bool reversible) const {return GetModulus().ByteCount();}
The best I can tell, there's nothing located around that address:
(gdb) info shared
From To Syms Read Shared Object Library
0xb7fdd860 0xb7ff6b30 Yes (*) /lib/ld-linux.so.2
0xb7eb63d0 0xb7f7a344 Yes (*) /lib/libstdc++.so.6
0xb7e005f0 0xb7e32bd8 Yes (*) /lib/libm.so.6
0xb7951060 0xb7980cc4 Yes (*) /lib/libubsan.so.0
0xb7932090 0xb7948001 Yes (*) /lib/libgcc_s.so.1
0xb7916840 0xb79238d1 Yes (*) /lib/libpthread.so.0
0xb775d3f0 0xb78a0b6b Yes (*) /lib/libc.so.6
0xb7741a90 0xb7742a31 Yes (*) /lib/libdl.so.2
I've seen this=0x00000000 if a static class object declared in one translation unit is used in another translation unit before initialization is complete. But I don't recall seeing 0xfffffffc in the past.
What are some potential reasons for this=0xfffffffc? Or how can I troubleshoot it further?
If you have a 32 bits machine 0xfffffffc is ((int*)nullptr)-1. So perhaps you are taking the previous element of a nil pointer (e.g. wrongly using some reverse iterator, etc etc...)
Use the bt or backtrace command of gdb to understand what has happened. I guess that the trouble is in the caller (or its caller, etc...)
Try also some other compiler (e.g. some older version of GCC and several versions of Clang/LLVM....). You could have some undefined behavior that your other tools did not detect as such. You need to understand if the bug is inside Crypto++ (or perhaps, but very unlikely, it is inside GCC itself; then report a bug on GCC bugzilla....). If you suspect the compiler, pass -S -fverbose-asm -fdump-tree-all -O3 to g++ to understand what GCC is doing.... (this will dump hundreds of files, including the generated .s assembler code).
Ask also on crypto++ lists; perhaps report the bug on Crypto++ bug tracker. Test with other versions or snapshot of that library
BTW, I'm not sure that -fsanitize=undefined or -fsanitize=address should be used with -O3; I guess that they are more suitable with -O0 -g or -Og -g

C++ Suspected stack overflow changing function parameters

I am working on implementing a user level thread library in C++ using setcontext(), makecontext(), getcontext(), and swapcontext() on a Linux system.
I am using a wrapper function to wrap the function the user wants to run as a thread. For example, the user calls newthread(funcPtr), and within the thread library funcPtr is passed to a wrapper function that runs it.
The error occurs differently depending on whether or not I initiate an unused string within the function. If I include the line string s = "a"; the program will run to completion, but gdb reveals that context is switching to somewhere within the string library. Without this line, the program segfaults after leaving the function wrapper.
The gdb output shows the corruption of the parameters to function().
I ran valgrind but did not see anything particularly out of the ordinary in the output, just many "Invalid read of size 4" and "Invalid write of size 4" warnings, usually within the C++ standard map.
You could try also AddressSanitizer for debugging. It can detect stack buffer overflows. Here's how to use it on Linux:
At least gcc 4.8 is needed for AddressSanitizer and libasan must be installed (e.g. on Fedora yum install libasan as root). Compile and link with -g -fsanitize=address and run the generated executable. AddressSanitizer stops and emits information if it detects the first error, no long log files have to be analyzed. Solve the reported problem, compile and run again until AddressSanitizer doesn't stop the program anymore. Unfortunately there might be false positives because you use swapcontext in your program, but it's worth a try. Instrumentation can be turned off for a specific function by adding the attribute no_sanitize_address: extern int func(void) __attribute__((no_sanitize_address));