I have been Googling this one for a good long while now and am I'm not seeing anything quite like this, so here goes. I am trying to create a small statically-linked binary which can be easily distributed across machines on my home network. This is a pretty small project so I'm trying to keep things simple.
I am running into substantial difficulties when I statically link the pthread library on the ARM 32-bit architecture. Frustratingly, the very same code works just fine on all versions of x86. Here is my test.cpp program:
void threader( int num ) {
std::cout << "Child Thread Starting" << std::endl;
try {
throw 20;
} catch (int e) {
std::cout << "Child Thread Success" << std::endl;
}
int x = 0;
do {
x++;
} while (true);
}
int main(int argc, char *argv[]) {
std::cout << "Main Thread Starting" << std::endl;
new std::thread(&threader, 0);
try {
throw 20;
} catch (int e) {
std::cout << "Main Thread Success" << std::endl;
}
int y = 0;
do {
y++;
} while (true);
}
The idea is that the main thread starts a child thread, then tests to see if the main thread can throw an exception, then spins. Meanwhile, the child thread also tests to see if it can throw an exception, then spins. The original code throws a boost-brand exception, leading to a crash. This minimal example has identical behavior.
A successful result on x86 is as follows:
# g++ -m32 -std=c++11 -c -g test.cpp
# g++ -static -m32 *.o -o test -lrt -pthread
# ./test
Main Thread Starting
Child Thread Starting
Main Thread Success
Child Thread Success
However, on ARM, I get a variety of errors depending on how exactly I link things. For reference, I started with the simple:
$ g++ -std=c++11 -c -g test.cpp
$ g++ -static *.o -o test -lrt -pthread
$ ./test
Main Thread Starting
Child Thread Starting
Main Thread Success
Segmentation fault
The segmentation fault is not very helpful:
(gdb) bt
#0 0x00012be8 in __cxa_throw ()
#1 0x000108d0 in threader (num=0) at test.cpp:11
#2 0x00011a26 in std::_Bind_simple<void (*(int))(int)>::_M_invoke<0u>(std::_Index_tuple<0u>) (this=0xda50c) at /usr/include/c++/4.8/functional:1732
#3 0x00011950 in std::_Bind_simple<void (*(int))(int)>::operator()() (this=0xda50c) at /usr/include/c++/4.8/functional:1720
#4 0x0001190a in std::thread::_Impl<std::_Bind_simple<void (*(int))(int)> >::_M_run() (this=0xda500) at /usr/include/c++/4.8/thread:115
#5 0x0001befc in execute_native_thread_routine ()
#6 0x0004bcc2 in start_thread (arg=0x0) at pthread_create.c:335
#7 0x00071b4c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
When throw() is causing segmentation faults, you got problems. When I remove the "-static" flag, everything executes perfectly, as in the x86 case.
After extensive googling I found that this problem is apparently quite common. Other answers include:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52590
when g++ static link pthread, cause Segmentation fault, why?
And many other similar ones. The key recommendation appears to be to link with the "-Wl,--whole-archive -lpthread -Wl,--no-whole-archive" phrase. Ok, lets give it a go:
$ g++ -std=c++11 -c -g test.cpp
$ g++ -static *.o -o test -lrt -pthread -Wl,--whole-archive -lpthread -Wl,--no-whole-archive
$ ./test
Main Thread Starting
Child Thread Starting
terminate called after throwing an instance of 'Segmentation fault
Well, it gets points for being different. GDB:
(gdb) r
Starting program: ./test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
Main Thread Starting
[New Thread 0x76ffc2d0 (LWP 30844)]
Child Thread Starting
terminate called after throwing an instance of 'int'
terminate called recursively
Thread 1 "test" received signal SIGABRT, Aborted.
0x00055266 in __libc_do_syscall ()
(gdb) bt
#0 0x00055266 in __libc_do_syscall ()
#1 0x0001ab66 in raise (sig=6) at ../sysdeps/unix/sysv/linux/pt-raise.c:35
#2 0x0005a85a in abort ()
#3 0x0004bd3c in __gnu_cxx::__verbose_terminate_handler() ()
#4 0x00026d34 in __cxxabiv1::__terminate(void (*)()) ()
#5 0x00026d50 in std::terminate() ()
#6 0x0001ca3c in __cxa_rethrow ()
#7 0x0004bd1c in __gnu_cxx::__verbose_terminate_handler() ()
#8 0x00026d34 in __cxxabiv1::__terminate(void (*)()) ()
#9 0x00026d50 in std::terminate() ()
#10 0x0001c9fc in __cxa_throw ()
#11 0x0001098e in main (argc=1, argv=0x7efff704) at test.cpp:28
(gdb) frame 11
#11 0x0001098e in main (argc=1, argv=0x7efff704) at test.cpp:28
28 throw 20;
(gdb)
Two things to note: one, it is now the parent thread which is crashing, and two, it is apparently an 'int' which is segfaulting. Ok... that's special.
There is another answer on SO which seems to note something similar to this:
GCC: --whole-archive recipe for static linking to pthread stopped working in recent gcc versions
But this solution recommends jiggering around the order and frequency of the -lrt and -lpthread flags. I have tried... a great many combinations of these. It always results in the above behavior.
I will also note that the example program in the above issue runs perfectly fine on the problem ARM32 system. However, it immediately breaks if I add the "throw 20" test block to the thread function.
For the record, I have also tried many combinations with boost::thread and have also tried compiling with clang, all to the same results.
At this point I am at a complete loss and throw myself on the mercies of the internet. Does anyone have any idea what is going on here, or how I can investigate more?
Related
This is my code:
#include <iostream>
#include <filesystem>
int main(int argc, char *argv[]) {
auto iter = std::filesystem::directory_iterator("foo");
for (auto &entry : iter) {
std::cout << entry.path();
}
}
When I run it and the directory foo exists, I get a SIGSEGV. So I started gdb:
(gdb) run
Starting program: /home/krausefx/a.out
Program received signal SIGSEGV, Segmentation fault.
0x0000555555556a87 in std::vector<std::filesystem::__cxx11::path::_Cmpt, std::allocator<std::filesystem::__cxx11::path::_Cmpt> >::~vector (
this=0x23) at /usr/include/c++/8/bits/stl_vector.h:567
567 std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
(gdb) backtrace
#0 0x0000555555556a87 in std::vector<std::filesystem::__cxx11::path::_Cmpt, std::allocator<std::filesystem::__cxx11::path::_Cmpt> >::~vector (
this=0x23) at /usr/include/c++/8/bits/stl_vector.h:567
#1 0x00005555555566aa in std::filesystem::__cxx11::path::~path (this=0x3) at /usr/include/c++/8/bits/fs_path.h:208
#2 0x0000555555557ebe in std::filesystem::__cxx11::path::_Cmpt::~_Cmpt (this=<incomplete type>) at /usr/include/c++/8/bits/fs_path.h:643
#3 0x0000555555557ed9 in std::_Destroy<std::filesystem::__cxx11::path::_Cmpt> (__pointer=0x3) at /usr/include/c++/8/bits/stl_construct.h:98
#4 0x0000555555557ced in std::_Destroy_aux<false>::__destroy<std::filesystem::__cxx11::path::_Cmpt*> (__first=0x3, __last=0x0)
at /usr/include/c++/8/bits/stl_construct.h:108
#5 0x00005555555576de in std::_Destroy<std::filesystem::__cxx11::path::_Cmpt*> (__first=0x3, __last=0x0)
at /usr/include/c++/8/bits/stl_construct.h:137
#6 0x0000555555556fb9 in std::_Destroy<std::filesystem::__cxx11::path::_Cmpt*, std::filesystem::__cxx11::path::_Cmpt> (__first=0x3, __last=0x0)
at /usr/include/c++/8/bits/stl_construct.h:206
#7 0x0000555555556a9d in std::vector<std::filesystem::__cxx11::path::_Cmpt, std::allocator<std::filesystem::__cxx11::path::_Cmpt> >::~vector (
this=0x7fffffffdcf0) at /usr/include/c++/8/bits/stl_vector.h:567
#8 0x00005555555566aa in std::filesystem::__cxx11::path::~path (this=0x7fffffffdcd0) at /usr/include/c++/8/bits/fs_path.h:208
#9 0x000055555555630d in main (argc=32767, argv=0x7ffff7fadf40 <std::wcout>) at test.cpp:5
(gdb) p this
$1 = (vector * const) 0x23
So apparently, when initializing the directory_iterator, the destructor of std::filesystem::path gets called for some reason, and somewhere in there, the destuctor of std::vector is called on a this value of 0x23, which obviously is a bad thing and leads to a SIGSEGV.
What's happening here? Am I doing something wrong? Is this a compiler bug (compiler is g++ 8.3.0)?
I checked directory_iterator works fine using GCC 8 under Ubuntu.
Be sure to add the linker flag -lstdc++fs when compiling.
If you don't compilation ends successful but, at least in my system, I get a segfault as you do when it starts iterating.
I don't think std::filesystem is stable. It caused segfaults and other problems in my project (especially std::filesystem::path in mingw-w64 that ships with msys2). Try updating your gcc package and check if the problem persists. If it does then you can file a bug report or just wait and hope that someone already reported it (in my case updating fixed the problem).
My program crashes with a segfault trying to unwind the stack. Is this a gcc bug or is the combination of options -fexceptions and -static-libgcc not allowed?
The crash doesn't happen if:
-static-libgcc is omitted
-fexceptions is omitted
Compile and link are done in a single step
pthread_cleanup_push() and pthread_cleanup_pop() are omitted
Compilation is done using g++ or gcc -x g++ (*)
I have tried this on gcc 4.8.4 and 4.8.5.
(*) This doesn't work for one of our custom build environments based on gcc 4.2.3. Yet for a different version of the build environment also based on gcc 4.2.3 the crash doesn't happen at all!
Test case
/*
* thread_crash.c: Test case for thread unwinder crash bug.
*
* Compile (with native or V6p3, 32 or 64 bit) using:
* gcc -o thread_crash.o -c thread_crash.c -ggdb -Wall -pthread -fexceptions
* g++ -o thread_crash thread_crash.o -ggdb -Wall -lpthread -static-libgcc
*
* Expected behaviour: No output.
* Observed behaviour: Outputs "Aborted (core dumped)".
*/
#include <unistd.h>
#include <pthread.h>
#include <sys/types.h>
#include <signal.h>
static void cleanup(void *ptr)
{
}
void *child(void *ptr)
{
pthread_cleanup_push(cleanup, NULL);
pthread_exit(NULL);
pthread_cleanup_pop(1);
return NULL;
}
int main()
{
pthread_t foo;
pthread_create(&foo, NULL, child, NULL);
pthread_join(foo, NULL);
return 0;
}
Backtrace from gdb
#0 0x00007ffff72271f7 in raise () from /lib64/libc.so.6
#1 0x00007ffff72288e8 in abort () from /lib64/libc.so.6
#2 0x00000000004031be in _Unwind_SetGR ()
#3 0x000000000040587a in __gcc_personality_v0 ()
#4 0x00007ffff6feba14 in ?? () from /lib64/libgcc_s.so.1
#5 0x00007ffff6febd64 in _Unwind_ForcedUnwind () from /lib64/libgcc_s.so.1
#6 0x00007ffff7bcd240 in __pthread_unwind () from /lib64/libpthread.so.0
#7 0x00007ffff7bc7e35 in pthread_exit () from /lib64/libpthread.so.0
#8 0x0000000000400a97 in child (ptr=0x0) at thread_crash.c:46
#9 0x00007ffff7bc6e25 in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff72ea34d in clone () from /lib64/libc.so.6
When compiling with -fexception, pthread_exit() throws a ___forced_unwind exception to force all functions to be unwinded, this guarantees automatic storage (aka stack) cleanup. This is because pthread_exit() is designed not to return. From man pthread_exit:
This function does not return to the caller.
On the other hand, according to man pthread_cleanup_push:
POSIX.1 says that the effect of using return, break, continue, or
goto to prematurely leave a block bracketed pthread_cleanup_push()
and pthread_cleanup_pop() is undefined. Portable applications should
avoid doing this.
POSIX does not mention C++ exceptions since POSIX only care about C, but this is an educated guess that throwing an exception between pthread_cleanup_push() and pthread_cleanup_pop() results in an undefined behaviour.
I have build executable for below program with PPC toolchain.
Tool chain details:
powerpc-wrs-linux-gnu-g++ (Wind River Linux Sourcery G++ 4.4a-341) 4.4.1
We have included -pthread during compilation and -lpthread for linking. We are using -lrt and -ldl flags too.
#include <string>
#include <iostream>
#include <thread>
using namespace std;
// The function we want to execute on the new thread.
void task1(string msg)
{
cout << "task1 says: " << msg;
}
int main()
{
// Constructs the new thread and runs it. Does not block execution.
thread t1(task1, "Hello");
// Makes the main thread wait for the new thread to finish execution
// therefore blocks its own execution.
t1.join();
}
While executing the program am getting the crash as below
Program received signal SIGILL, Illegal instruction.
0x10000e30 in __gnu_cxx::__exchange_and_add(int volatile*, int) ()
(gdb) bt
#0 0x10000e30 in __gnu_cxx::__exchange_and_add(int volatile*, int) ()
#1 0x10000f14 in __gnu_cxx::__exchange_and_add_dispatch(int*, int) ()
#2 0x10001960 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() ()
#3 0x100016ac in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() ()
#4 0x100013ac in std::__shared_ptr<std::thread::_Impl_base, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() ()
#5 0x100013e8 in std::shared_ptr<std::thread::_Impl_base>::~shared_ptr() ()
#6 0x100014c0 in std::thread::thread<void (&)(std::basic_string<char, std::char_traits<char>, std::allocator<char> >), char const (&) [6]>(void (&&&)(std::basic_string<char, std::char_traits<char>, std::allocator<char> >), char const (&&&) [6]) ()
#7 0x10000fd4 in main ()
Can you please suggest are we missing something in flags for build.
There is one obvious bug in your code
cout << "task1 says: " << msg;
here cout (stream) is shared resource , you should synchronize access to it.
The main hint is here:
Program received signal SIGILL, Illegal instruction.
It looks like the default code generation settings of your compiler are outputting instructions that are not supported by your CPU. If you print the actual faulting instruction from gdb, you can get more detail about the problem. Try:
(gdb) x /i $pc
To see the exact instruction causing the SIGILL.
Since the illegal instruction is in __exchange_and_add, it's likely that this will be one of the atomic storage instructions.
To fix this, you will probably want to tell your compiler which CPU to generate instructions for. You can do this with the -mcpu= argument. If you give an invalid cpu specifier, gcc will print the available CPU types:
$ powerpc64le-linux-gnu-gcc -mcpu=?
powerpc64le-linux-gnu-gcc: error: unrecognized argument in option ‘-mcpu=?’
powerpc64le-linux-gnu-gcc: note: valid arguments to ‘-mcpu=’ are: 401 403 405 405fp 440 440fp 464 464fp 476 476fp 505 601 602 603 603e 604 604e 620 630 740 7400 7450 750 801 821 823 8540 8548 860 970 G3 G4 G5 a2 cell e300c2 e300c3 e500mc e500mc64 e5500 e6500 ec603e native power3 power4 power5 power5+ power6 power6x power7 power8 power9 powerpc powerpc64 powerpc64le rs64 titan
powerpc64le-linux-gnu-gcc: fatal error: no input files
I am new to C++ multithreading.
I wrote a simple program to print hello world using threads.
<<mythread.cpp>>
#include<iostream>
#include<thread>
using namespace std;
void hello()
{
std::cout<<"Hi this is a thread";
}
int main()
{
std::thread mythread(hello);
cout<<'1';
if (mythread.joinable())
{
cout<<'2';
mythread.join();
cout<<'3';
}
return 0;
}
Copilation command : g++ -std=c++0x mythread.cpp
It compiled successfully but gave Segmentaion fault at run time.
I check the core file :
(gdb) bt
#0 0x0000003ac340df7c in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#1 0x0000003ac3414625 in _dl_runtime_resolve () from /lib64/ld-linux-x86-64.so.2
#2 0x0000003ac84b65c7 in std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) () from /usr/lib64/libstdc++.so.6
#3 0x00000000004010d0 in std::thread::thread<void (*)()>(void (*)()) ()
#4 0x0000000000400e15 in main ()
Kindly help me to resolve this error it seems to be some library is not supportive.
The program looks correct. Compile it with -pthread flag:
g++ -pthread -std=c++11 mythread.cpp
When debugging a program that fails an assert I can't get the call stack in gdb. I'm using g++4.8 and gdb from Homebrew on Mavericks.
/usr/local/bin/g++-4.8 --version
g++-4.8 (GCC) 4.8.2
/usr/local/bin/gdb --version
GNU gdb (GDB) 7.6.2
Here is the smallest test to reconstruct the problem
//test.cpp
#include <iostream>
#include <cassert>
int main()
{
int i = 42;
std::cout << "Hello World!" << i << std::endl;
assert(0); // this also happens with abort() which assert(0) winds up calling
}
Compiling and with
/usr/local/bin/g++-4.8 -g -c test.cpp -o test.o
/usr/local/bin/g++-4.8 -g test.o -o test
/usr/local/bin/gdb test
(gdb) r
Starting program: /Users/pmelsted/tmp/test/test
Hello World!42
Assertion failed: (0), function main, file test.cpp, line 7.
Program received signal SIGABRT, Aborted.
0x00007fff9447d866 in ?? ()
(gdb) where
#0 0x00007fff9447d866 in ?? ()
#1 0x00007fff9229835c in ?? ()
#2 0x0000000000000000 in ?? ()
It seems gdb on MacOS don't display call stack correctly (or call stack is corrupted after assert() function call) for 64-bit programs. Here is the program slightly modified:
//test.cpp
#include <iostream>
#include <cassert>
int foo() {
assert(0);
}
int bar() {
return foo();
}
int main()
{
int i = 42;
std::cout << "Hello World!" << i << std::endl;
return bar();
}
I have compiled it invoking the g++ -g 15.cpp -m32 command and have ran it under ggdb. The bt full command shows call stack as the following:
(gdb) bt full
#0 0x9843f952 in ?? ()
No symbol table info available.
#1 0x96193340 in ?? ()
No symbol table info available.
#2 0x9615e43e in ?? ()
No symbol table info available.
#3 0x0000216f in foo () at 15.cpp:6
No locals.
#4 0x0000217b in bar () at 15.cpp:9
No locals.
#5 0x000021e4 in main () at 15.cpp:15
i = 42
(gdb) quit
So, all debug symbols are displayed correctly, first 3 function addresses are corrected and have no name because my libgcc is in release mode.
If I don't use -m32 key during compilation, the call stack is as follows:
(gdb) bt full
#0 0x00007fff8b442866 in ?? ()
No symbol table info available.
#1 0x00007fff8c64735c in ?? ()
No symbol table info available.
#2 0x0000000000000000 in ?? ()
No symbol table info available.
That is definitely wrong call stack, #2 frame function address is 0x0. So, the root cause is gdb can't display call stack correctly for 64-bit applications.
I found a workaround for the problem. Just set the breakpoint to abort() function in gdb:
b abort
then when assert is called it will halt at the breakpoint and at this moment one can see the call stack with bt.