I am running into a weird issue at work where after updating from RHEL 7 (linux kernel 3.10.0, GCC 4.8.5) to RHEL 8 (linux kernel 4.18.0, GCC 8.3.1), our enums have started to cause problems while destructing. From my best diagnosis in gdb, it is trying to call the destructor on the same static object more than once (once for each lib that instantiates the enums and is used to build the executable in question) and segfaulting on the second attempt, as the object has already been destroyed.
Here is the backtrace:
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff3c91b6f in __tcf_2 () at /sourcepath/ExampleEnum.H:106
#2 0x00007ffff68ae3c7 in __cxa_finalize () from /lib64/libc.so.6
#3 0x00007ffff3c33c87 in __do_global_dtors_aux () from /libpath/lib64/libsecond_lib.so
#4 0x00007fffffff9c10 in ?? ()
#5 0x00007ffff7de42a6 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
This is the second time it reaches that line of ExampleEnum.H in __tcf_2, a function related to static destruction. The first time is no problem.
Here is the structure of the enums:
#ifndef _EXAMPLEENUM_H
#define _EXAMPLEENUM_H
#include "OurString.H"
#define EXAMPLEENUM_SOURCE_LIST(enum) \
enum(THIS_EXAMPLE_ENUM, "THIS_EXAMPLE", "", false),\
enum(ExampleEnumMax, "ExampleEnumMax", "error", false)
#define NAME_GENERATOR(name, guiname, description, p4) name
#define GUI_NAME_STR_GENERATOR(name, guiname, description, p4) guiname
class Example {
public:
enum Enum {
EXAMPLEENUM_SOURCE_LIST(NAME_GENERATOR)
};
static const int NUM_FIELDS = ExampleEnumMax + 1;
static const char* names[NUM_FIELDS];
};
typedef Example::Enum ExampleEnum
extern const OurString ExampleEnum_GuiName[Example::ExampleEnumMax + 1];
#ifdef CONSTRUCT_ENUM_STRINGS
const OurString ExampleEnum_GuiName[Example::ExampleEnumMax + 1] = {
EXAMPLEENUM_SOURCE_LIST(GUI_NAME_STR_GENERATOR)
};
#endif
#endif
And then in the libs where it is used, this names.C is compiled into the lib:
#define CONSTRUCT_ENUM_STRINGS 1
#include <enumpath/ExampleEnum.H>
#undef CONSTRUCT_ENUM_STRINGS
const char* Example::names[Example::NUM_FIELDS] = {
EXAMPLEENUM_SOURCE_LIST(GUI_NAME_STR_GENERATOR)
};
We have a band-aid solution that basically just covers up the problem, ie calling _exit(0) at the end of main() skips all destructors, including static destructors which pose the problem so it doesn't segfault. However, obviously we want to fix the way our enums are handled such that we can run all necessary destructors (and no more than necessary) without segfaulting.
Is there anything obviously wrong with our enums? They have been working through several kernel/gcc versions and have only recently posed a problem.
Is there likely to be anything wrong with how they are used in the libs? This problem only occurs when an executable is compiled with multiple libs that use the same enum, which is unfortunately quite often. Is there some strict tree of import dependency structure we could keep to to fix this?
Why did it work up until we updated the OS?
EDIT:
Concerns about OurString's destructor have been raised, I didn't include it because it was trivial:
~OurString() throw () {}
ALSO: a little more debugging and going through a version compiled by GCC 4.8.5 that doesn't segfault shows me that __tcf_2 is entered twice there too, so my theory about improperly calling the destructor multiple times is wrong, and it looks like #PaulMcKenzie's theory of static initialization order is likely.
Thanks in advance!
Related
I'm working with some legacy C++ code that is behaving in a way I don't understand. I'm using the Microsoft compiler but I've tried it with g++ (on Linux) as well—same behavior.
I have 4 files listed below. In essence, it's a registry that's keeping track of a list of members. If I compile all files and link the object files into one program, it shows the correct behavior: registry.memberRegistered is true:
>cl shell.cpp registry.cpp member.cpp
>shell.exe
1
So somehow the code in member.cpp gets executed (which I don't really understand, but OK).
However, what I want is to build a static library from registry.cpp and member.cpp, and link that against the executable built from shell.cpp. But when I do this, the code in member.cpp does not get executed and registry.memberRegistered is false:
>cl registry.cpp member.cpp /c
>lib registry.obj member.obj -OUT:registry.lib
>cl shell.cpp registry.lib
>shell.exe
0
My questions: how come it works the first way and not the second and is there a way (e.g. compiler/linker options) to make it work with the second way?
registry.h:
class Registry {
public:
static Registry& get_registry();
bool memberRegistered;
private:
Registry() {
memberRegistered = false;
}
};
registry.cpp:
#include "registry.h"
Registry& Registry::get_registry() {
static Registry registry;
return registry;
}
member.cpp:
#include "registry.h"
int dummy() {
Registry::get_registry().memberRegistered = true;
return 0;
}
int x = dummy();
shell.cpp:
#include <iostream>
#include "registry.h"
class shell {
public:
shell() {};
void init() {
std::cout << Registry::get_registry().memberRegistered;
};
};
void main() {
shell *cf = new shell;
cf->init();
}
You have been hit by what is popularly known as static initialization order fiasco.
The basics is that the order of initialization of static objects across translation units is unspecified. See this
The call here Registry::get_registry().memberRegistered; in "shell.cpp" may happen before the call here int x = dummy(); in "member.cpp"
EDIT:
Well, x isn't ODR-used. Therefore, the compiler is permitted not to evaluate int x = dummy(); before or after entering main(), or even at all.
Just a quote about it from CppReference (emphasis mine)
It is implementation-defined whether dynamic initialization
happens-before the first statement of the main function (for statics)
or the initial function of the thread (for thread-locals), or deferred
to happen after.
If the initialization is deferred to happen after the first statement
of main/thread function, it happens before the first odr-use of any
variable with static/thread storage duration defined in the same
translation unit as the variable to be initialized. If no variable or function is odr-used from a given translation unit, the non-local variables defined in that translation unit may never be initialized (this models the behavior of an on-demand dynamic library)...
The only way to get your program working as you want is to make sure x is ODR-used
shell.cpp
#include <iostream>
#include "registry.h"
class shell {
public:
shell() {};
void init() {
std::cout << Registry::get_registry().memberRegistered;
};
};
extern int x; //or extern int dummy();
int main() {
shell *cf = new shell;
cf->init();
int k = x; //or dummy();
}
^ Now, your program should work as expected. :-)
This is a result of the way linkers treat libraries: they pick and choose the objects that define symbols left undefined by other objects processed so far. This helps keep executable sizes smaller, but when a static initialization has side effects, it leads to the fishy behavior you've discovered: member.obj / member.o doesn't get linked in to the program at all, although its very existence would do something.
Using g++, you can use:
g++ shell.cpp -Wl,-whole-archive registry.a -Wl,-no-whole-archive -o shell
to force the linker to put all of your library in the program. There may be a similar option for MSVC.
Thanks a lot for all the replies. Very helpful.
So both the solution proposed WhiZTiM (making x ODR-used) and aschepler (forcing linker to include the whole library) work for me. The latter has my preference since it doesn't require any changes to the code. However, there seems to be no MSVC equivalent for --whole-archive.
In Visual Studio I managed to solve the problem as follows (I have a project for the registry static library, and one for the shell executable):
In the shell project add a reference to the registry project;
In the linker properties of the shell project under General set
"Link Library Dependencies" and "Use Library Dependent Inputs" to
"Yes".
If these options are set registry.memberRegistered is properly initialized. However, after studying the compiler/linker commands I concluded that setting these options results in VS simply passing the registry.obj and member.obj files to the linker, i.e.:
>cl /c member.cpp registry.cpp shell.cpp
>lib /OUT:registry.lib member.obj registry.obj
>link /OUT:shell.exe "registry.lib" shell.obj member.obj registry.obj
>shell.exe
1
To my mind, this is essentially the first approach to my original question. If you leave out registry.lib in the linker command it works fine as well.
Anyway, for now, it's good enough for me.
I'm working with CMake so now I need to figure out how to adjust CMake settings to make sure that the object files get passed to the linker? Any thoughts?
The following piece of code works with g++ 4.6 compiler but crashes with segmentation fault when compiled with g++ 5.1 compiler. The variable access gString is causing the segmentation fault.
#define _GLIBCXX_DEBUG 1
#define _GLIBCXX_USE_CXX11_ABI 0
#include<string>
#include<iostream>
#include<vector>
static std::string gString("hello");
static void
__attribute__((constructor))
initialize()
{
gString.assign("hello world");
return;
}
static void
__attribute__((destructor))
finalize()
{
return;
}
int main(int ac, char **av)
{
//std::cerr<<gString;
return 0;
}
GDB output:
Reading symbols from /home/rk/str...done.
(gdb) b initialize
Breakpoint 1 at 0x401419: file str.cc, line 15.
(gdb) r
Starting program: /home/rk/str
Breakpoint 1, initialize() () at str.cc:15
15 gString.assign("hello world");
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x00000000004018d6 in std::string::size() const () at /usr/include/c++/5/bits/basic_string.h:3118
3118 { return _M_rep()->_M_length; }
(gdb) bt
#0 0x00000000004018d6 in std::string::size() const () at /usr/include/c++/5/bits/basic_string.h:3118
#1 0x00000000004016ff in std::string::assign(char const*, unsigned long) () at /usr/include/c++/5/bits/basic_string.tcc:706
#2 0x000000000040166e in std::string::assign(char const*) () at /usr/include/c++/5/bits/basic_string.h:3542
#3 0x0000000000401428 in initialize() () at str.cc:15
#4 0x00000000004023dd in __libc_csu_init ()
#5 0x00007ffff71ad700 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000000000401289 in _start ()
Why are you using __attribute__((constructor)) in C++ instead of simply a global object with a constructor? Those attributes are useful in C code, but redundant in C++.
The problem is that your constructor runs before the standard iostreams have been initialized, which would not be a problem if you used a global object with a constructor.
You could try adding a priority to your constructor, but I don't think it will help in this case:
__attribute__((constructor(999)))
The runtime error also happens with gcc 4.9.2 (see ideone example).
The problem is related to the iostreams which are not yet initialized. Commenting out the cerr line, and everything works fine
Apparently, it's a known issue.
Edit: Additional remarks
This small workaround seems to work, at least with 4.9: use c stdio instead of iostreams:
fprintf(stderr, "_initialize"); // this works
But I fully agree with Jonathan's suggestion of using a global (singleton ?) object relying solely on well defined standard C++ behaviour, unless you really need the constructor being run exactly at the moment of a dynamic library load.
I tried to build my application with ltalloc. I tried it with MinGW32 4.9.1 and MinGW64-32 4.9.2.
It compiles and links fine but when I run it a Segmentation Fault occurs. Debugging pinpointed the problem to the following code:
#include <pthread.h>
#pragma weak pthread_once
#pragma weak pthread_key_create
#pragma weak pthread_setspecific
static pthread_key_t pthread_key;
static pthread_once_t init_once = PTHREAD_ONCE_INIT;
static void init_pthread_key() { pthread_key_create(&pthread_key, release_thread_cache); }
static thread_local int thread_initialized = 0;
static void init_pthread_destructor()//must be called only when some block placed into a thread cache's free list
{
if (unlikely(!thread_initialized))
{
thread_initialized = 1;
if (pthread_once)
{
pthread_once(&init_once, init_pthread_key); // <--- THIS CAUSES THE SEGSEGV
pthread_setspecific(pthread_key, (void*)1);//set nonzero value to force calling of release_thread_cache() on thread terminate
}
}
}
As far as I know both versions support thread-local storage natively. The Wiki of of ltalloc also wrote the following:
Warning: in some builds of MinGW there is a problem with emutls and order of execution of thread destructor (all thread local variables destructed before it), and termination of any thread will lead to application crash.
Unfortunately this warning doesn't tell me anything. Googling it also didn't make me smarter.
Out of the blue, try this:
static void init_pthread_key(void)
{
if (pthread_key_create)
{
pthread_key_create(&pthread_key, release_thread_cache);
}
}
Also adding full error checking to the pthread_* might not only help during debugging.
Greetings,
I'm having a weird seg fault problem. My application dumps a core file at runtime. After digging into it I found it died in this block:
#include <lib1/c.h>
...
x::c obj;
obj.func1();
I defined class c in a library lib1:
namespace x
{
struct c
{
c();
~c();
void fun1();
vector<char *> _data;
};
}
x::c::c()
{
}
x::c::~c()
{
for ( int i = 0; i < _data.size(); ++i )
delete _data[i];
}
I could not figure it out for some time till I ran nm on the lib1.so file: there are more function definitions than I defined:
x::c::c()
x::c::c()
x::c::~c()
x::c::~c()
x::c::func1()
x::c::func2()
After searching in code base I found someone else defined a class with same name in same namespace, but in another library lib2 as follows:
namespace x
{
struct c
{
c();
~c();
void func2();
vector<string> strs_;
};
}
x::c::c()
{
}
x::c::~c()
{
}
My application links to lib2, which has dependency on lib1. This interesting behavior brings several questions:
Why would it even work? I would expect a "multiple definitions" error while linking against lib2 (which depends upon lib1) but never had such. The application seems to be doing what's defined in func1 except it dumps a core at runtime.
After attaching debugger, I found my application calls the ctor of class c in lib2, then calls func1 (defined in lib1). When going out of scope it calls dtor of class c in lib2, where the seg fault occurs. Can anybody teach me how this could even occur?
How can I prevent such problems from happening again? Is there any C++ syntax I can use?
Forgot to mention I'm using g++ 4.1 on RHEL4, thank you very much!
1.
Violations of the "one definition rule" don't have to be diagnosed by your compiler. In fact, they are often only going to be known at link time when you link multiple object files together.
At link time, the information about the original class definitions may not exist any more (they are not needed after the compiler step) so having multiple definitions of a class is typically not easy to flag to the user.
2.
Once you have two distinct definitions pretty much anything can happen, you are in the territory of undefined behaviour. Whatever happens, it's a possible outcome.
3.
The most sensible thing to do is to communicate with the other members of your team. Agree who's going to use which namespaces and you won't get these problems. Otherwise, you point a documentation tool or static analysis tool over your entire project. Many such tools will be able to diagnose multiple inconsistent definitions of classes.
Just a guess but I don't see any using namespace x; so perhaps it used one namespace instead of the other?
With the advent of templates it became necessary to allow multiple definitions of a body of code with the same name; there was no way for the compiler to know if the same template code had already been generated in another compilation unit i.e. source file. When the linker finds these duplicates, it assumes they are identical. The burden is on you to make sure that they are - this is called the One Definition Rule.
On the linker level this is library interpositioning. The effective symbol bound unfortunately depends on the order of object files on linker command line (this is, sigh, historical).
From what you describe it looks that lib1 comes first in linker argument list and lib2 comes second and interposes on symbols from lib1. This explains the calls to constructors and destructors from the lib2 but calls to func1 from lib1 (since there's no func1-derived symbol in lib2, so there's no "hiding", the call is bound to lib1.)
The solution to this particular problem is to reverse the order of libraries on the linker invocation command.
There's lots of answers about the one definition rule. However, to me, this looks a lot more like a missing copy constructor.
To elaborate:
If the copy constructor is called on your object, then you will get a memory leak. This is because delete will be called on the same set of pointers twice.
namespace x
{
struct c
{
c() {
}
~c() {
for ( int i = 0; i < _data.size(); ++i )
delete _data[i];
}
c(const c & rhs) {
for (int i=0; i< rhs.size(); ++i) {
int len = strlen(rhs[i]);
char *mem = malloc(len + 1);
strncpy(mem, rhs[i], len + 1);
_data.push_back(mem);
}
void fun1();
vector<char *> _data;
};
}
I am getting such a crash:
#0 0x90b05955 in __gnu_debug::_Safe_iterator_base::_M_detach
#1 0x90b059ce in __gnu_debug::_Safe_iterator_base::_M_attach
#2 0x90b05afa in __gnu_debug::_Safe_sequence_base::_M_detach_all
#3 0x000bc54f in __gnu_debug::_Safe_sequence_base::~_Safe_sequence_base at safe_base.h:170
#4 0x000aac05 in __gnu_debug::_Safe_sequence<__gnu_debug_def::vector<boost::signals::trackable const*, std::allocator<boost::signals::trackable const*> > >::~_Safe_sequence at safe_sequence.h:97
#5 0x000ac9c1 in __gnu_debug_def::vector<boost::signals::trackable const*, std::allocator<boost::signals::trackable const*> >::~vector at vector:95
#6 0x000acf65 in boost::signals::detail::slot_base::data_t::~data_t at slot.hpp:32
#7 0x000acf8f in boost::checked_delete<boost::signals::detail::slot_base::data_t> at checked_delete.hpp:34
#8 0x000b081e in boost::detail::sp_counted_impl_p<boost::signals::detail::slot_base::data_t>::dispose at sp_counted_impl.hpp:78
#9 0x0000a016 in boost::detail::sp_counted_base::release at sp_counted_base_gcc_x86.hpp:145
#10 0x0000a046 in boost::detail::shared_count::~shared_count at shared_count.hpp:217
#11 0x000a9fb0 in boost::shared_ptr<boost::signals::detail::slot_base::data_t>::~shared_ptr at shared_ptr.hpp:169
#12 0x000aa459 in boost::signals::detail::slot_base::~slot_base at slot.hpp:27
#13 0x000aad07 in boost::slot<boost::function<bool ()(char, int)> >::~slot at slot.hpp:105
#14 0x001b943b in main at vermes.cpp:102
This is the code:
#include <boost/signal.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/function.hpp>
#include <boost/bind.hpp>
bool dummyfunc(char,int) { return false; }
int main(int argc, char **argv)
{
boost::signal<bool (char, int)> myslot;
myslot.connect(0, &dummyfunc);
return 0;
}
It's the first time I am working with Boost and I am also completly new to the code of the project I am trying to port here.
That is why I would like to ask if such a crash could be in any way explained by Boost or if it must be unrelated to Boost.
I already tried to understand the crash itself but I got stuck somehow. It seems that probably the std::vector, which is going to be deleted here, is messed up (messed up = memory corrupt). That vector is a member of slot_base::data_t. The deletion is done in the destructor of slot_base::shared_ptr. So perhaps the shared_ptr also was messed up - so perhaps even the whole slot_base was messed up. But in the code I have, I don't really see a reason why that memory could be messed up. It is even the first access at all after the construction of myslot.
Addition: What I also don't really understand is why the ~slot_base() is called here at all when I do the connect. But I also didn't found the connect-memberfunction. Is that a magic makro somewhere?
I found the problem. When I enable these preprocessor definitions (my Xcode does that by default in Debug configuration), it crashes:
-D _GLIBCXX_DEBUG=1
-D _GLIBCXX_DEBUG_PEDANTIC=1
I guess Boost (bjam) compiled without those and that causes such problems because the STL structures (like vector) look different in binary form when compiled with or without this.
It sounds like your GConsole class is not derived from boost::trackable.
When a signal is bound to a member function, it expects the member's object to exist, always.
You can either explicitly disconnect signals when member function's owner is destroyed, or you can derive the object from boost::trackable, which will do the maintenance automatically when the object is destroyed.