I've written an application using threads from boost::thread. It compiles and works fine on my local machine. Problem occurs on the one of the servers. I've send main.cpp file and compiled it the same way I did on my local machine:
g++ -g main.cpp -o rdzen -lboost_thread
ulimit -c unlimited
I'm executing it with:
./rdzen input.txt dictionary.txt output.txt
then I got:
Segmentation fault (core dumped)
I used gdb to find out the reason:
gdb rdzen core
The backtrace is:
#0 0x0804c039 in boost::detail::atomic_exchange_and_add (pw=0x53006d76, dv=-1)
at /usr/local/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:50
#1 0x0804c11a in boost::detail::sp_counted_base::release (this=0x53006d72)
at /usr/local/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:143
#2 0x0804c17c in ~shared_count (this=0xbd928a8c, __in_chrg=<value optimized out>)
at /usr/local/include/boost/smart_ptr/detail/shared_count.hpp:305
#3 0xb2b388e1 in ~shared_ptr (this=0xbd928b3c) at ./boost/smart_ptr/shared_ptr.hpp:169
#4 boost::shared_ptr<boost::detail::thread_data_base>::operator= (this=0xbd928b3c)
at ./boost/smart_ptr/shared_ptr.hpp:305
#5 boost::thread::start_thread (this=0xbd928b3c) at libs/thread/src/pthread/thread.cpp:184
#6 0x0805022c in thread<boost::_bi::bind_t<void, void (*)(int, char*, char*, char*, int), boost::_bi::list5<boost::_bi::value<int>, boost::_bi::value<char*>, boost::_bi::value<char*>, boost::_bi::value<char*>, boost::_bi::value<int> > > > (this=0xbd928b3c, f=...)
at /usr/local/include/boost/thread/detail/thread.hpp:205
#7 0x0804a88a in main (argc=4, argv=0xbd928cb4) at main.cpp:542
main.cpp:542 is:
boost::thread watek1(boost::bind(&watek, 0, argv[1], argv[2], argv[3], 0));
What is the reason and why the same code works on my local machine and not on the server? Thanks in advance for help.
Related
Contents of hello.cpp
#include <gtkmm.h>
void RunInMain()
{
printf("RunInMain\n");
}
void ThreadFunc()
{
printf("ThreadFunc\n");
Glib::signal_idle().connect_once(std::bind(&RunInMain));
}
int main()
{
Gtk::Main kit(0, NULL);
Gtk::Window window;
window.set_title("hello world");
Glib::Thread* pThread = Glib::Thread::create(&ThreadFunc);
kit.run(window);
pThread->join();
return(0);
}
Compile with:
g++ `pkg-config gtkmm-2.4 --cflags --libs` hello.cpp -Wno-deprecated-declarations -fsanitize=thread
This is the error from TSAN when executing the resulting a.out file:
WARNING: ThreadSanitizer: data race (pid=153699)
Write of size 8 at 0x7b5000006f90 by thread T1:
#0 memset <null> (libtsan.so.0+0x37abf)
#1 g_slice_alloc0 <null> (libglib-2.0.so.0+0x71412)
#2 sigc::pointer_functor0<void>::operator()() const <null> (a.out+0x402835)
#3 sigc::adaptor_functor<sigc::pointer_functor0<void> >::operator()() const <null> (a.out+0x402606)
#4 sigc::internal::slot_call0<void (*)(), void>::call_it(sigc::internal::slot_rep*) <null> (a.out+0x4021d0)
#5 call_thread_entry_slot /usr/include/sigc++-2.0/sigc++/functors/slot.h:535 (libglibmm-2.4.so.1+0x5d889)
Previous write of size 8 at 0x7b5000006f90 by main thread:
#0 posix_memalign <null> (libtsan.so.0+0x3061d)
#1 allocator_memalign ../glib/gslice.c:1411 (libglib-2.0.so.0+0x706b8)
#2 allocator_add_slab ../glib/gslice.c:1283 (libglib-2.0.so.0+0x706b8)
#3 slab_allocator_alloc_chunk ../glib/gslice.c:1329 (libglib-2.0.so.0+0x706b8)
#4 __libc_start_main ../csu/libc-start.c:308 (libc.so.6+0x27041)
Location is heap block of size 496 at 0x7b5000006e00 allocated by main thread:
#0 posix_memalign <null> (libtsan.so.0+0x3061d)
#1 allocator_memalign ../glib/gslice.c:1411 (libglib-2.0.so.0+0x706b8)
#2 allocator_add_slab ../glib/gslice.c:1283 (libglib-2.0.so.0+0x706b8)
#3 slab_allocator_alloc_chunk ../glib/gslice.c:1329 (libglib-2.0.so.0+0x706b8)
#4 __libc_start_main ../csu/libc-start.c:308 (libc.so.6+0x27041)
Thread T1 (tid=153701, running) created by main thread at:
#0 pthread_create <null> (libtsan.so.0+0x5ec29)
#1 g_system_thread_new ../glib/gthread-posix.c:1308 (libglib-2.0.so.0+0xa0ea0)
#2 __libc_start_main ../csu/libc-start.c:308 (libc.so.6+0x27041)
SUMMARY: ThreadSanitizer: data race (/lib64/libtsan.so.0+0x37abf) in memset
The code runs as expected (I get all of the prints) but I don't understand why I'm getting the TSAN data race warning. If I comment out the Glib::signal_idle().connect_once line, there is no TSAN error. From what I've read, that function is supposed to be safe to call from any thread. Is TSAN reporting a false positive here or is there a real data race?
Fedora 31 linux
g++ 10.0.1
glibmm24-2.64.2-1
gtkmm24-2.24.5-9
libtsan-10.2.1-9
From TSAN wiki:
TSAN generally requires all code to be compiled with -fsanitize=thread. If some code (e.g. dynamic libraries) is not compiled with the flag, it can lead to false positive race reports, false negative race reports and/or missed stack frames in reports depending on the nature of non-instrumented code.
If you are using glib from distribution repository (e.g.: sudo apt get install libglib2.0-dev), the number of false positive reports will depend on how the library was built - number of warnings will vary from distro to distro. In order to get proper TSAN report, one should compile all used shared libraries by hand with -fsanitize=thread. In particular glib should be compiled by hand, because it contains various thread-related APIs.
Compile glib with TSAN (for Debian 11.5 "bullseye"):
# clone TAG 2.66.8 (TAG should match glib version on the host)
git clone --depth=1 --branch=2.66.8 https://github.com/GNOME/glib.git
cd glib
CFLAGS="-O2 -g -fsanitize=thread" meson build
ninja -C build
# add TSAN-enabled glib libraries to lib search path
export LD_LIBRARY_PATH=$PWD/build/gio:$PWD/build/glib:$PWD/build/gmodule:$PWD/build/gobject:$PWD/build/gthread
Before running your project, make sure that it links with freshly compiled glib libraries (all glib libraries if used, i.e.: libglib, libgio, libgmodule, libgobject, libgthread) with ldd a.out.
When I run opt with the irtranslator pass, I keep getting segfaults. Here's a variation of what I've been running:
opt -debug -mcpu=x86-64 -S sample.bc --irtranslator
Some other notes:
sample.bc is a simple hello world function I compiled into llvm bytecode with clang
some passes such as --instcombine do work
I'm using a version of llvm built from source
LLVM (http://llvm.org/):
LLVM version 11.0.0git
DEBUG build with assertions.
Default target: x86_64-unknown-linux-gnu
Host CPU: haswell
Stack Trace:
Stack dump:
0. Program arguments: ../llvm10/build/bin/opt -debug -mcpu=x86-64 -S sample.bc --x86-codegen
#0 0x00007fec32c271c7 llvm::sys::PrintStackTrace(llvm::raw_ostream&) llvm/lib/Support/Unix/Signals.inc:564:0
#1 0x00007fec32c2725a PrintStackTraceSignalHandler(void*) llvm/lib/Support/Unix/Signals.inc:625:0
#2 0x00007fec32c24ff5 llvm::sys::RunSignalHandlers() llvm/lib/Support/Signals.cpp:68:0
#3 0x00007fec32c26b44 SignalHandler(int) llvm/lib/Support/Unix/Signals.inc:406:0
#4 0x00007fec2dff2890 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12890)
#5 0x00007fec31c6e750 llvm::MachineModuleInfo::MachineModuleInfo(llvm::LLVMTargetMachine const*) llvm/lib/CodeGen/MachineModuleInfo.cpp:194:0
#6 0x00007fec31c6eddb llvm::MachineModuleInfoWrapperPass::MachineModuleInfoWrapperPass(llvm::LLVMTargetMachine const*) llvm/lib/CodeGen/MachineModuleInfo.cpp:295:0
#7 0x00007fec31c7099b llvm::Pass* llvm::callDefaultCtor<llvm::MachineModuleInfoWrapperPass>() llvm/include/llvm/PassSupport.h:80:0
#8 0x00007fec32192938 llvm::PassInfo::createPass() const llvm/include/llvm/PassInfo.h:102:0
#9 0x00007fec3218a9a2 llvm::PMTopLevelManager::schedulePass(llvm::Pass*) llvm10/llvm/lib/IR/LegacyPassManager.cpp:702:0
#10 0x00007fec3218aa07 llvm::PMTopLevelManager::schedulePass(llvm::Pass*) llvm/lib/IR/LegacyPassManager.cpp:706:0
#11 0x00007fec321933de llvm::legacy::PassManagerImpl::add(llvm::Pass*) llvm/lib/IR/LegacyPassManager.cpp:500:0
#12 0x00007fec3218f709 llvm::legacy::PassManager::add(llvm::Pass*) llvm/lib/IR/LegacyPassManager.cpp:1721:0
#13 0x00007fec2ff8650d OptCustomPassManager::add(llvm::Pass*) llvm/tools/opt/opt.cpp:340:0
#14 0x00007fec2ff7e9de addPass(llvm::legacy::PassManagerBase&, llvm::Pass*) llvm/tools/opt/opt.cpp:375:0
#15 0x00007fec2ff81013 main llvm/tools/opt/opt.cpp:862:0
#16 0x00007fec2cc51b97 __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:344:0
#17 0x00007fec2ff5150a _start (../llvm10/build/bin/opt+0x195150a)
Segmentation fault (core dumped)
IRTranslator is a pass used during the codegeneration. You're not supposed to run it directly via opt.
Code (m1.cpp):
#include <iostream>
using namespace std;
int main (int argc, char *argv[])
{
cout << "running m1" << endl;
return 0;
}
GDB Version: GNU gdb (GDB) 7.6.2
Built using: g++ -g m1.cpp
Command line history:
(gdb) b main
Breakpoint 1 at 0x40087b: file m1.cpp, line 6.
(gdb) r
Starting program: .../a.out
Program received signal SIGSEGV, Segmentation fault.
0x00002aaaaaac16a0 in strcmp () from /lib64/ld-linux-x86-64.so.2
(gdb) c
Continuing.
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb)
When I run without setting any breakpoints, it runs without errors.
As requested:
(gdb) bt
#0 strcmp () from /lib64/ld-linux-x86-64.so.2
#1 in check_match.12104 () from /lib64/ld-linux-x86-64.so.2
#2 in do_lookup_x () from /lib64/ld-linux-x86-64.so.2
#3 in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2
#4 in _dl_relocate_object () from /lib64/ld-linux-x86-64.so.2
#5 in dl_main () from /lib64/ld-linux-x86-64.so.2
#6 in _dl_sysdep_start () from /lib64/ld-linux-x86-64.so.2
#7 in _dl_start () from /lib64/ld-linux-x86-64.so.2
#8 in _start () from /lib64/ld-linux-x86-64.so.2
#9 in ?? ()
I was able to replicate the OP's observed behavior (using the same compile and getting the same backtrace). The behavior was persistent across a range GDBs and GCCs. I noticed that the symptom goes away when I unset SHELL. In my normal environment I use tcsh (version 1.15.00). If SHELL is set, then (I believe) gdb launches using tcsh. If I unset SHELL, gdb launches using sh. This is enough for me to make forward progress. I don't have a crisp explanation for what would be different in tcsh to manifest the issue but if others have the same behavior, it may shed more light on the issue.
I checked that in my GNU gdb version 7.11.1. It worked really fine in it.
I first compiled the same program and built it using:
g++ -g m1.cpp
Then, ran the executable in the gdb as follows:
gdb -q ./a.out
And did the same things you mentioned. It worked fine.
Update your gdb, and check that again and let know.
I am working on some rather large C++ app, with most of the code being stored in a static library, and some programs that use that code.
I have what looks like a memory corruption run-time crash:
*** Error in `build/bin/myapp': malloc(): memory corruption (fast): 0x00000000021f62a0 ***
I want to check where that happens. GDB seems the right tool (OS: Ubuntu 14.04).
My makefiles handles both debug and release with a makefile command-line switch.
With the switch on, the -g flag is added and the .a library is 23.8 MB, while the app is 519 kB.
Without, its 1.6 MB and 486kB (so I'm pretty sure the debugging symbols are there).
My (partial) CFLAGS, as suggested by the gcc manual:
CFLAGS = -std=c++11 -g -Wall -O0 -fno-inline
I run gdb with:
gdb --args build/bin/myapp datafile.dat -a -b (...and more arguments)
My problem is that even in the debug build, gdb keeps telling me that it can't find any symbols:
Reading symbols from build/bin/myapp...(no debugging symbols found)...done.
If I run it from within gdb, it crashes with:
Program received signal SIGABRT, Aborted.
0x00007ffff5298cc9 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
And the backtrace command shows indeed that symbols are missing: frames 10 to 18 have missing information, and are probably related to my code:
(gdb) bt
#0 0x00007ffff5298cc9 in __GI_raise (sig=sig#entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff529c0d8 in __GI_abort () at abort.c:89
#2 0x00007ffff52d5394 in __libc_message (do_abort=do_abort#entry=1, fmt=fmt#entry=0x7ffff53e3b28 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:175
#3 0x00007ffff52e00f7 in malloc_printerr (action=<optimized out>, str=0x7ffff53e3ec8 "malloc(): memory corruption (fast)", ptr=<optimized out>) at malloc.c:4996
#4 0x00007ffff52e2e04 in _int_malloc (av=0x7ffff5620760 <main_arena>, bytes=36) at malloc.c:3359
#5 0x00007ffff52e47b0 in __GI___libc_malloc (bytes=36) at malloc.c:2891
#6 0x00007ffff5babe68 in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7 0x00007ffff5c03e69 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8 0x000000000045a7a5 in char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) ()
#9 0x00007ffff5c05bd6 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) ()
from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x000000000042df7f in ?? ()
#11 0x000000000042eef6 in ?? ()
#12 0x0000000000421dab in ?? ()
#13 0x0000000000422223 in ?? ()
#14 0x0000000000422cfe in ?? ()
#15 0x0000000000423393 in ?? ()
#16 0x0000000000424600 in ?? ()
#17 0x000000000040fd50 in ?? ()
#18 0x000000000040566d in ?? ()
#19 0x00007ffff5283ec5 in __libc_start_main (main=0x4053c0, argc=6, argv=0x7fffffffddf8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fffffffdde8) at libc-start.c:287
#20 0x000000000040604f in ?? ()
I did check some of the many questions about this topic, but none of any help (most of these relate to a forgotten -g flag, or an added -s, stripping down the symbols).
Question: what can the next step be to find out why/where my crash happens?
Additional info:
gcc --version: 5.3.0
gdb --version: 7.7.1
code dependencies: boost and opencv
but none of any help (most of these relate to a forgotten -g flag, or an added -s, stripping down the symbols).
It is almost certain that you either have a stray -s somewhere on your link line, or you run stip on the binary during installation.
Look at your link command line and install command carefully, there is strip somewhere in there.
P.S. As Tom Tromey already said, GDB is rarely effective in helping with a problem like this. Using Valgrind or Address Sanitizer will likely get you to the root cause much faster.
Pursuant to the post, Standalone functions/data in C++, I proceeded to put my "common data" in an anonymous namespace as below and everything worked great on Windows (Vista 64 bit) on VS 2005/2008/2010
namespace {
...
static std::string mystrings[] = {
str1,
str2,
...,
strN
};
...
}
namespace mynamesp {
...
use mystrings[] here..
...
}
But on Linux (so far tested RHEL5 built with GCC-4.1.2) I promptly got a segmentation fault.
$>myprog
Segmentation fault
$>gdb myprog
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) r
Starting program: <path/to>/myprog
[Thread debugging using libthread_db enabled]
[New Thread 0x2b8901a9da60 (LWP 32710)]
Program received signal SIGSEGV, Segmentation fault.
0x0000003e4ce9c928 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string ()
from /usr/lib64/libstdc++.so.6
(gdb) bt
#0 0x0000003e4ce9c928 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string ()
from /usr/lib64/libstdc++.so.6
#1 0x00002b88ffde482b in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535)
at <path/to>/mysource.cpp:140
#2 0x00002b88ffde4d65 in global constructors keyed to _ZN91_GLOBAL__N__underscore_separated_path_to_mysource.cpp_00000000_6994A7DA2_1E () at <path/to>/mysource.cpp:12139
#3 0x00002b890011a296 in __do_global_ctors_aux ()
from <path/to/libs>/debug/libmylibd.so
#4 0x00002b88ffcd7f33 in _init () from <path/to/libs>/debug/libmylibd.so
#5 0x00002b8901672e40 in ?? ()
#6 0x000000326940d22b in call_init () from /lib64/ld-linux-x86-64.so.2
#7 0x000000326940d335 in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#8 0x0000003269400aaa in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#9 0x0000000000000001 in ?? ()
#10 0x0000000000000000 in ?? ()
(gdb)
Line 140 in the backtrace call stack item #1 basically points to the end of my array of strings definition. I've seen some others get this error; but no obvious fixes. Appreciate any thoughts/ideas/corrections as always. Thanks!
Your problem could be releated to a static initialization order fiasco.
This happens when you initialize a static variable using another static variable. When the latter one has not been initialized yet, then the first one is using a non-initialized variable for its initialization.
The root cause is that the order, in which static variables are initialized, is undefined.
Further reading:
https://isocpp.org/wiki/faq/ctors#static-init-order
A typical workaround would be to wrap the static variables inside a function. Example:
T& GetStaticA() {
T static_var_A; // <--initialization here
return A;
}
T static_var_B = GetStaticA(); // <-- static_var_A is guaranteed to be initialized
I had this problem and it turned out that in my compiling line I had missed the final output file in the linking.
g++ main.o logger.o timer.o keyboard.o -o main -lSDL -lSDL_image -lSDL_ttf -Wall
should have been
g++ main.o logger.o timer.o keyboard.o drawer.o -o main -lSDL -lSDL_image -lSDL_ttf -Wall
(Notice the now inclusion of drawer.o?)
It was easy to miss because my actual bash compilation script had many more lines to it.